I have a SmartApp that brightens lights in a zone when there is motion or a door opens. It dims the lights after a given timeout period.
I have 6 instances of the app on one hub, (6 different lighting zones), and 8 on another hub, in another building.
When the app initializes, it starts the scheduler to run a scheduleCheck() function every minute, to turn off lights, when appropriate.
It was working for months. Then it stopped dimming reliably.
I have determined that the scheduleCheck() function stops running as scheduled.
I have also noticed that it now usually schedules at the top of the minute, i.e., the seconds are zero. This did not used to be the case. And the time between scheduled calls can be 1, 2, or 3 minutes.
I have just installed a workaround, in which I re-initialize the scheduler every time motion starts or stops, or a door opens or closes. I think that the worst case will be if the scheduler quits before the lights are dimmed. Then they will stay bright until the next time motion is detected, which would be better than staying bright until I manually reset the app.
Iāve been having a similar problem with logging to Grovestreams (based on Jason Steeleās code).
My smart app has been running with very little problems for a couple months, but now the scheduler just seems to quit. Watching live logging, the event subscription is still working as temperatures are appended to the queue, but the post to Grovestreams that is run by the scheduler just stops.
I have to reinitialize the smart app to get it working again.
Over the past few days, itās stayed running for less than 24 hours at a time.
Initially, the scheduler ran every one minute. I changed it to every 2 minutes yesterday hoping that would help but so far no luck, it still quit after about 8 hours.
It was running at :00 seconds. Iāve just changed it again to run at :40 seconds every two minutes to see if that helps. Maybe itās contention with too many other schedulers running at the same time?
Iāve been seeing this recently as well. The ticket Iāve had open for like 4 months now is still being looked at. Karl said heād get back to me on Tuesday.
This just happened to me overnight, for 4 different apps across 2 locations for apps that use schedule() or runEveryXXMinutes(). The scheduled routine just stops being called, although in one app a separate scheduled routine (once daily) ran as scheduled, long after the other (more frequent) routine stopped being called.
This spells #FAIL for devices that have to be polled regularly and frequently to get meaningful status updates (like weather stations, thermostats, garage door openers, and the like).
Iām having this same problem with the MyQ garage app. It relies on API calls out to Liftmaster to get the door status. It has a scheduler to poll the API to keep the status refreshed. But for whatever reason, the scheduler has been silently dying randomly. Sometimes it will go a few days, other times it will only last a few hours. Iāve tried switching to using a chained RunIn call once per minute which would reinitialize the scheduler each run, but even that died after awhile. Iām at a loss right now. Because it dies so randomly, I canāt even debug it.
I finally got an error on my logging to Grovestreams after the scheduler stopped.
sensor readings were still trying to append to the queue, but the ProcessQueue was stopped.
Going through the mobile app to uncheck and recheck one of the devices reset everything and it started logging again.
So this time it ran for about 16 hours before the issue occured.
Every response Iāve gotten from support on this for the last several months is āYeah, we had a problem but itās fixed nowā¦ā, so Iāve rigged up what I consider a pretty good workaround.
Iāve added two web endpoints to each of my smartapps whose schedulers are important to me.
The first checks the time the last scheduled job ran (updated in state at each run) against the current time. If itās been long enough to declare the scheduler dead, it returns āFAILā, otherwise āFIRINGā along with the name of the app and the last time the scheduler ran.
The second simply calls my method that creates the schedules.
This alone makes it easy enough to just bookmark the URLs with all of the auth tokens so itās a one-click operation to check on the scheduler and another click to restart it via my browser, but weāre talking about home automation, and thatās not very automatic.
This is the part of the post where I go overboard, so feel free to tune out at this point. =)
Iām a network engineer by trade, and that involves monitoring things. My current free weapon of choice is Icinga. Using this, I can have it watch each of my āstampā URLs. If it sees āFIRINGā on the page, it does nothing. If it sees āFAILā, it hits the reschedule URL. Automation!
Now I just need something to monitor my monitoringā¦
Thisā¦is fantastic. I was actually planning on writing a very similar setup to monitor my appās problematic cron job. Mostly I was hoping it might shed some light into what exactly happens when the scheduler dies. It just feels wrong because, as you say, itās hardly automation.
I hadnāt heard of Icinga. I have a probe set up at montastic that watches for a URL on my network, so thatās been good enough for me just to make sure a particular service is running at home.
Iām experimenting with something like that. In my smartapp, I created a second scheduled job whose single job is to make sure the first is running, and if it died, recreate it. The second āmonitoringā job checks the last poll timestamp every 5 minutes, and if enough time has past to confirm the first scheduler is dead, it brings it back to life. Itās working well so far - Iāve had it successfully restarting things for me about 3 times in the last 24 hours. My hope is that because the monitoring jobās task is so light, itās less likely to succumb to SSDS (Sudden Scheduler Death Syndrome).
Maybe. I didnāt do it that way because to do it cleanly you need to unschedule and then reschedule everything. Doing it blindly every so often would cause the next scheduled run to be deleted and effectively pushed back by the reschedule.
Still no useful word from support, incidentally. @Tyler escalated my ticket to Mager a while back who then closed it saying that they had āplatform problemsā one night. Now Iāve got a new ticket open with L1 and Iām the responses Iām told to expect just never come. When I ask whatās up, they say they need a few more days. Frustrating to no end.
And what happens when your second scheduled job dies too? STās been having problems with their scheduler from day one and despite repeated assurances that āthis time its really fixedā, it still doesnāt work. Itās pathetic.
Yes, thatās what Iām afraid of. So far, that hasnāt happened. Iām hoping the second one is less likely to die because it does so little in terms of activity. I read somewhere on here that the schedulers die because they sometimes hit a 20-second running limit while doing their job, sometimes due to ST system just being slow. I figured if the monitoring scheduler does as little as possible, it wonāt hit that limit and wonāt just die.