Scheduler and Polling quits after some minutes, hours, or days

@April, Can you kindly push the engineering team to look at what is happening with the ST or the ST refreshing apps like Weather Underground. I did open a ticket for this. The Weather app does not refresh unless I and a lot of people on ST force refresh it via the Refresh button in th ST app “Things” tab. This is a pain. I even asked @Tyler for any help he can provide. And before you say it, I really do understand the burden of being in IT support field. But ST is falling short.

P.S. Are you going to be starting a Pre-order for Hun 2.0 soon?

I once had an engineering professor who told what is probably an apocryphal story of a mechanical engineering professor back in the 1950s who kept an aquarium with goldfish in the lab. To pass the course, you had to build one device that added value to the aquarium (light, heat, pump, feeder, whatever) and leave it running for one week unattended. There was only one requirement to pass: “Don’t kill the fish.”

Like I said, probably not a true story, but it made the point. We used to run DKTF tests on new systems/devices in beta…could it run hands off for a week with no catastrophic failures.

I suggest giving a plastic toy fish to everyone who decides how the engineering budget gets spent, and then to all the engineers, as a reminder that “reliable” doesn’t mean the same thing as “will work OK again if you reinstall.”

The first priority of any home automation system should be the simplest:
Don’t Kill the Fish.

5 Likes

@JDRoberts. Love the story. I heard something similar. I am still here, so that means that my house didn’t kill me. That said, it’s the miner but still useful parts of ST platform failing which drives me up the wall.

1 Like

Thanks for this. I’ve now got a scheduler monitoring app named “Don’t Kill The Fish” running to give support some feedback. This way I can keep restarting my important apps but still have something that can be an acceptable SSDS casualty. =)

It displays your image if everything is ok and switches to this if it’s not:

3 Likes

Seems to be some cross-posting going on…

Had another random fail overnight. Visible in the “Weather Station” device - did not update after a while:

Weather station does this:

runPeriodically(3600, poll)

And you can see it just stop last night (I kicked it this afternoon when the wife called and said lights were turning on for no reason - because the lights thought it was dark out!)

I’ve added a subscription to sunrise/sunset which resets the runPeriodically() call.

I’ve got an open ticket with ST support regarding an issue I’m having with my weather station app not updating with manually refreshing it. What’s interesting is the support person didn’t think this scheduler issue is related. He also suggested I reinstall the weather station app to get it working again. Although I won’t be surprised reinstalling would get it working, I find that solution to be sub-optimal…sigh. :disappointed:

2 Likes

RIP, The Fish.

That said, I’ve been talking to Aaron and was able to give him some statistics on when and how often my SmartApps are getting struck by SSDS as well as a couple of cases of apps currently in a dead state so they can do a post-mortem.

Here’s hoping that information is useful and helps them track this down.

8 Likes

That is hilarious. I like your style.

The Hue fish is dead at my house. :frowning:

Failed for a few days (listed as “inactive” by the hub, Big Switch failed). Refreshed Hue Connect. Ran for about 24 hours, then failed again.

But the Hues work great with IFTTT, a couple of third party apps, Beecon+, and the native Hue app. It’s only ST that can’t maintain the connection.

2 Likes

And it’s apparently contagious.

After months of consist though not perfect performance, my Hues controlled via SmartThings now fails more often than not.

And they work fine via Hue App and Echo.

Look… I’m perfectly willing to hunker down for a month or 2 or 3 and figure this out. But not for free. And not for Hot Pockets either.

I’m definitely not implying that ST’s engineers can’t fix things… They certainly could do it faster than myself… But we have agreed that ST has higher priorities. That’s fine. So… outsource it?

1 Like

Sure, politeness is a virtue, but truth to be told, I don’t believe they can. I have created Pollster almost a year ago primarily to poll my custom WiFi Thermostat device handler because native device polling never worked reliably. Pollster has been working like a charm until recently when it too began failing at least once every week, requiring a restart. I’m looking at my thermostat device now and see that it has not been updated since Tuesday, 7:15 PM - more than two days ago.

This ‘thing’ is now totally useless. Not only these clowns are unable to fix the old bugs, they just keep piling the new ones on top. And to top it off, they’re now telling me that Pollster does not meet their app submission requirements because “polling devices is not recomended”. What a joke! What is recommend then? Sitting with your thumb stuck in you know where? Oh, and phleeesee, keep sending those bug reports to support because, you know, we’re so busy hiring, doing hackathons and what not, we cannot prioritize our bugs without your help.

4 Likes

This has nothing to do with April. Platform stability should be priority number one for any cloud-based service. Period. They’re well aware of the problems, there’s no doubt about that. So all this talk about “we need more customer complaints to prioritize this issue” is just nonsense. A while ago the main excuse was “we’re understaffed, but please bear with us because we’re hiring”. Six month later, it’s “we’re so busy hiring, we don’t have time to get things done.” It’s ridiculous, if you ask me. I wonder what it’s going to be six month from now when they run out of excuses.

2 Likes

You’re right. Platform stability is the number one priority. Hiring and understaff is the same, and you’re right. It’s something we are facing. With growth, too though, communication also gets lost. Thus Bool is not evaluating as true/false instead of a string, and I wasn’t aware of the enum change until later. - infact, I learned it at the Dev call.

These are things we now have a process to communicate because of this. You’re right to call out of these excuses. They are the state of the business, and these are the challenges we’re facing.

With transparency, we share with that being understaffed and hiring is an issue. Without transparency, and putting our heads down to fix the issues, we are called out for not being on the forums and without communication.

We’re glad you’re here. We’re glad you’re calling us out to do better. We’re working on it. There’s no excuse that your experience is not ideal, nor do we thrive for that experience.

8 Likes

Based on this thread, so far, does this seem like the best approach for maximum scheduling reliability?

Just trying to distill the discussion into a “Conclusion”.

My solution is giving me near 100% scheduler uptime. I tried once a day and it generally left my apps dead longer than they were alive.

Here’s what I settled on. If anyone wants the code for the SmartApp side, I can throw together a gist.

  • Scheduled SmartApps update a timestamp within their main scheduled method.
  • Add two OAuth endpoints to each SmartApp. One to check the timestamp, one to reschedule the appropriate functions.
  • Script runs every few minutes to poll the first OAuth endpoint. If the scheduler isn’t running, it hits the second endpoint to restart it.

Your solution sounds thorough … I wonder whether 99% can be established without some portion of it, but the external script seems essential.

If the code is free of use restrictions, I’d love to see it, thanks!

Here you go! Just insert a call to updateState() at the top of whatever scheduled method you want to monitor. Sticking a call to logURLs() in your updated() method will spit out some easy-to-copy URLs in your logs.

1 Like

As an idea, have IFTTT periodically ping an endpoint and restart schedules.

This way you don’t need to run any servers or scripts locally.

Alternatively, have a virtual “reset” switch attached to an app. Whenever it’s activated, reset the schedules.

Just some ideas.

2 Likes

Reset schedules

  • on sunrise/sunset
  • on location (mode change)
  • on REST endpoint

etc.

Sigh. All THOSE are going to put more load on the system, which (apparently) is already struggling under load.

It would be so much better if it just worked - most importantly if it never just silently dropped scheduled callbacks - better late than never would be a great start.

(my lighting on/off based on motion + lux works about 60% of the time - when it fails a refresh of the weather tile (which should have refreshed based on both sunset AND periodic internal schedules) magically fixes it - because the weather tile is providing my lux, and the scheduler has died. it’s really like every 2nd or 3rd day now. Not great.)

2 Likes