Scheduler and Polling quits after some minutes, hours, or days

JDRoberts · June 14, 2015, 3:34pm

Hmmm…my Hue bridge shows no activity for several days and my hue bulbs never get their status updated if they are controlled by anything outside ST. This breaks the Big Switch I’ve been using.

@sidjohn1 suggests this might be another example of Sudden Scheduler Death Syndrome, as the Hue Connect smartapp runs a scheduled poll of the Hue Bridge which has apparently stopped at my house.

brbeaird · June 14, 2015, 5:10pm

SDSS affects us all. Tragic.

Both of my schedules died last night around that same time. Must have been a rough patch over in the ST servers. Otherwise, it’s been pretty solid having my monitor job restarting them. In case anyone’s curious, here’s my version of the RainMachine that runs a monitor schedule for the polling: GitHub - brbeaird/SmartThings_RainMachine at schedulerFix

I’ll probably try and add some of copyninja’s latest MyQ smartapp modifictions that have it subscribing to things like sunrise/sunset as well as multisensors as events that can restart the refresh polling scheduler.

tgauchat · June 14, 2015, 7:28pm

Is this an industry wide affliction?

Calendar service providers never seem to fail to issue a scheduled event alert on time, every time.

JDRoberts · June 14, 2015, 7:49pm

No. My security system, my medical alert system, IFTTT, Amazon Echo, several Philips Hue control apps, iBeacon+, my thermostat, my DVR, various calendar and reminder apps, and my pool monitoring system all have regular scheduled events, and while an individual event may occasionally, although very rarely, be missed, it has never killed the schedule altogether.

tgauchat · June 14, 2015, 9:00pm

Maybe I should build and sell an external scheduling service…

schettj · June 14, 2015, 10:29pm

Just be sure it scales

tgauchat · June 14, 2015, 10:48pm

Chronos?

github.com

Metaswitch/chronos/blob/dev/readme.md

Project Clearwater is backed by Metaswitch Networks.  We have discontinued active support for this project as of 1st December 2019.  The mailing list archive is available in GitHub.  All of the documentation and source code remains available for the community in GitHub.  Metaswitch’s Clearwater Core product, built on Project Clearwater, remains an active and successful commercial offering.  Please contact clearwater@metaswitch.com for more information. Note – this email is for commercial contacts with Metaswitch.  We are no longer offering support for Project Clearwater via this contact.

# Chronos

Chronos is a distributed, redundant, reliable timer service.  It is designed to be generic to allow it to be used as part of any service infrastructure.

Chronos is designed to scale out horizontally to handle large loads on the system and also supports elastic, lossless scaling up and down of the cluster to handle extra load on the service.  See [here](doc/technical.md) for a more detailed discussion of how Chronos works and [here](doc/scaling.md) for a more detailed discussion on how Chronos resynchronizes its timers during scaling. 

The HTTP API is described [here](doc/api.md), and the procedure for clustering a group of Chronos nodes together is described [here](doc/clustering.md).

copyninja · June 15, 2015, 2:32am

Pretty much, worst is it’s a "fire and forget"style. Once stopped, it’s unlikely to continue onto the next scheduled time unless re initiated. ST needs to change it to “better late than never”

ashutosh1982 · June 15, 2015, 4:09am

Is it just me or anyone else has observed issues related to scheduled apps and hello home actions lately? It seems to be happening quite a bit since last couple of weeks.

schettj · June 14, 2015, 10:29pm

Scheduler, yep. Discussing it here:

tgauchat · June 14, 2015, 10:41pm

You beat me to pasting the same link!

Perhaps this new Topic is redundant.

But while I’m here, my guess as to the Top SmartThings ongoing issues:

Scheduler (including runIn and dashboard solution SmartApps, etc.); jobs aren’t just delayed, they go to the abyss; jobs can’t reschedule selves if they don’t run.
Presence (mobile more unpredictable than keyfob, but fob still insufficiently reliable).
Periodic long event latency (e.g. delay between motion sensor trigger and switch on).
Incomplete Hello Home Action mode changes.
Mobile App slowness, crashes, and usability.
Slow updates and other failures from Philips Hue
Periodic IDE problems.
…

smart · June 14, 2015, 11:01pm

Add Sonos issues too, Terry. Except for #2 facing all of those. The worst is hues and #5.

bravenel · June 14, 2015, 11:31pm

If only ST actually cared about this… My Sunset apps failed last night. They’ve been solid for some time. All you can figure is that ST is Meh. Fixing it is not a priority, it would seem.

btk · June 15, 2015, 12:47pm

If I had to bet, I’d guess that they’re banking on “Hub v2” offloading enough of the work from the cloud that it gets back to some sort of stable state rather than pulling people off of that to work on improving the scheduler.

beckje01 · June 15, 2015, 2:58pm

In general using chained runIns is very brittle. If you can create a schedule with the cron syntax, it uses a slightly different mechanism under the covers which is more robust since a single miss won’t break the chain.

625alex · June 15, 2015, 3:43pm

First, they need to sell enough V2 hubs to make any difference. I bet lot’s of people are holding off their ST hub purchase until V2, but it may take a while for existing users to transition. Many existing customers would be hesitant to get on the V2 wagon with no obvious migration strategy.

brbeaird · June 15, 2015, 3:49pm

This is not what we’ve observed. Apparently something occasionally kills runs that are scheduled via cron, and it kills all future runs as well.

ashutosh1982 · June 15, 2015, 4:03pm

Out of these… #1 and #4 are the most prevalent for me. I have seen issues on the other things on and off, but not something that is as prominent as #1 and #4. I wonder if ST is going through some back end changes to get ready for Hub V2. If that’s the case, then it would help if all of us are kept informed.

pstuart · June 15, 2015, 6:39pm

Interesting topic. I’ve seen missing scheduled events too.

It begs the question, if the cloud is going to (or has) miss a scheduled event, what should the platform do?

Let’s take the types of scheduling that ST can do:

-RunIn
-RunOnce
-Schedule with Cron like functions

and these built in ones:
runEvery5Minutes(handlerMethod)
runEvery10Minutes(handlerMethod)
runEvery15Minutes(handlerMethod)
runEvery30Minutes(handlerMethod)
runEvery1Hour(handlerMethod)
runEvery3Hours(handlerMethod)

All methods do not accumulate, meaning a missed event will not queue up and fire when noticed.

Also, there is about a 20 second window in which a scheduled event will try to run, after that, it purposely dropped.

The issue here is that in all instances, there is no “failed” notification.

We know it didn’t fire, but we don’t know what to do if that occurs.

Essentially all scheduling just dies and doesn’t restart.

App initialization and Update functions don’t fire either at this condition, so no way to jumpstart the schedule… It really boils down to the user having to go into the app and force it.

The challenge is to figure out how to recover from a missed schedule.

The best way is to always use CRON like scheduling as mentioned above, because this seems to recover better than any other methods described.

All other schedule commands or RunIn commands log a DateTime and the system is just checking if it is within a 20 second window after that time to then run the function and then removes itself regardless of success or failure of event handler firing.

Hopefully ST is working on a much more robust scheduling function to exist outside of a SmartApp that can help track down the source of these issues.

More over, someone needs to develop a smarter scheduler that can queue up events within reason or use LIFO or FIFO queuing models to determine state and what needs to fire when the next process cycle occurs to parse scheduled events.

Ultimately, the sandboxing of SmartApps having to all run their own schedules is at fault. Instead, each hub / location instance should have its own scheduling ability and then create schedules within this location “smartapp scheduler” to fire off SmartApp functions.

Anyway, its a fundamental problem. It is plaguing ST as an unreliable system and I think the reality is it is something that has to be fixed beyond just Hub v2, it can be addressed at the cloud level and at a programming issue.

Hope that helps…

beckje01 · June 15, 2015, 7:18pm

If all future runs are being killed please open a ticket with the instance so we can dig into the issue.

Topic		Replies	Views
Polling clarification General Discussion developers	6	1129	February 4, 2016
Inconsistent Polling Intervals General Discussion archive_developers	8	2206	June 3, 2014
Hardware Pollster (Schedule Fix?) Projects & Stories	13	2254	February 17, 2016
Polling didn't work with Device Types General Discussion archive_developers	3	712	February 23, 2014
I am so sick of trying to figure out why my smart apps aren't working General Discussion	23	3533	July 1, 2016

Scheduler and Polling quits after some minutes, hours, or days

Related topics