Scheduler and Polling quits after some minutes, hours, or days

tgauchat · August 12, 2015, 2:01am

I’d support that! It might make ST staff extra cautious here regarding any hint of acknowledgement or resolution. I would hope not.

They can easily decide how to respond.

alexjhart · October 25, 2015, 10:17pm

Where do we see the things that the Community Advocacy Taskforce is working on?

RBoy · December 2, 2015, 5:02pm

BTW this issue has been acting up again, couple of users in the past week have reported the the scheduled timers quit working after a few days (this case it’s the RunEvery5Minutes API).

SetGoose · December 4, 2015, 9:01pm

Hi. I have issues with STv2 not sending my scheduled time codes to my lock. It works for perhaps a couple days, and then it will just stop. I have to resend my codes to my lock via my ST app and it works for a day or so. It seems to stop working overnight, as it remembers to delete the code at the time I have it to expire.

Is there a fix coming?

RBoy · December 5, 2015, 5:37pm

Interestingly this seems to be impacting v2 hub users more than v1 hub. @slagle is there a difference in the way the cloud is setup for v2 vs v1 (assuming right now v2 apps are still running in the cloud and not locally)

Ron · December 15, 2015, 1:05am

I have been seeing this issue again also. Mentioned on my “Frustrated with the State of things thread” but figured I should support this thread since it is more specific about this issue.

Scheduled events are random again after working really well for a few months.

Polling has never worked.

Both issues reported to support many times.

bravenel · December 15, 2015, 2:15am

Ron,

I’ve had some luck with scheduled things, and I’ve seen some patterns to failures. Usually, when I have a scheduled event fail, it’s been an on-the-hour event. When I moved those to 6 minutes before, or 7 minutes after, they became rock solid and don’t fail. Take that with a grain of salt (since it’s ST and who knows?), but it has helped me. I don’t have problems with sunrise and sunset, perhaps because of where I live (Arizona).

Ron · December 15, 2015, 2:38am

Interesting I bet because most people do things on the hour so the server gets overloaded. Of course your recommendation to move to 6 or 7 min before or after will result in more people using your technique and may break things for you Does makes sense that it will work for a while though so I will give it a shot.

RBoy · December 15, 2015, 6:36am

That’s doesnt explain why other timers die, like RunEvery5Minutes dies after a few days. Sometimes even run in a minutes dies. There’s no boundary condition for this. Something more fundamental isn’t right here.

Ron · December 15, 2015, 6:37am

I agree but I suspect the problem is they have junk code everywhere and this is one of the issues.

bravenel · December 15, 2015, 2:33pm

This is a different, although related problem. The issue here is that these app executions are chained together by a timer. One failure of that timer, and it’s all over. The failure could come from coinciding with a peak of traffic to the cloud in your area, or from some other cloud problem. I’ve had a process running that allows me to see cloud delays, and they can happen at any time, and sometimes there are “storms” of delays that go on for several minutes.

When I first ran this process using chained timers, it would not last a single day, being fired once a minute. One failure out of 1440 kills it. That might not be such a horrible failure rate, but for the chained timers it’s enough to make them fail.

Avoid chained timers…

Ron · December 15, 2015, 6:19pm

I am curious how you know this ? What does “chained together by a timer” mean ?
Why when there is “one failure of a timer” is it “all over”. Can you imagine if unix cron was coded so that is would stop processing all future jobs once one failed. There is no reason why ST can’t fix this. It should also be pretty easy to test if what you say is true, just define a schedule task that is hard coded to fail.

bravenel · December 15, 2015, 6:46pm

One way this is done in some SmartApps is by using runIn(), which schedules an execution of a handler in the app for some future number of minutes. By chained timer, I mean a SmartApp where the handler for the runIn() re-schedules itself. This type of app is susceptible to a single failure, because a single failure means the next execution is never scheduled, and the chain breaks.

The other way, which should be more reliable, is to use cron. That is not susceptible to the broken chain failure. It all depends on the SmartApp and how it handles things. When you have a “scheduled” event at some time each day, that shouldn’t fail from this problem, although it could fail for other reasons (peak overload).

Ron · December 15, 2015, 7:08pm

Actually I think you are a little incorrect about this, I am sorry to say, I wish it were true. Pollster uses only the cron style of scheduling and it just stops working all the time. The only way to fix it is to reconnect. Some folks are writing apps now that trigger events based on polling events of other devices that self poll this is really the only reliable way to schedule events. Actually that makes me think of an idea. I bet you could configure an outside process to notify ST to perform some scheduled events. Something like ifttt or frankly I may just write something for my Raspberry Pi that is sitting around doing nothing. Hmmmm…Ideas brewing…

I remember a thread a long time ago where a ST engineer stated that if a job fails it is not rescheduled because they didn’t want a bunch of failing jobs running all the time. Given the nature of zwave signals etc that is ridiculous. If a job fails I want it to continue, perhaps they could keep track of the number of times it fails and if it fails say 60 times in a row never reschedule it. But their policy of on failure and you are done is foolish.

brbeaird · December 15, 2015, 7:23pm

This is so ridiculous. It would help if it was something we could actually do to our smartapps to make the scheduled jobs not fail, but ST isn’t even telling us why they fail in the first place. No error message or anything. As far as we can tell, something is blowing up on their backend, and the lazy move is to just cancel everything that missed because of that failure.

bravenel · December 15, 2015, 7:31pm

That sounds right. And of course the one failure can be a 20 second overrun due to a cloud delay. So they kill off “failing” apps, which failed because the cloud didn’t keep up with the event load, and then it’s bye bye.

I have had the chained timer type of failure, in fact, it’s reproducible with a pretty high failure rate.

I get the impression that their load management design is flawed. Something about x number of hubs on a “node”, where the node is handling the event traffic to/from those hubs. Then if there is a surge in event rates beyond some threshold, things just back up in some queue, and some die from timeouts. That’s just an impression, based on looking closely at numerous fails. They don’t seem to have any graceful load handling ability, but I don’t know.

btk · December 15, 2015, 7:46pm

Scheduler and Polling quits after some minutes, hours, or days Writing SmartApps

Every response I’ve gotten from support on this for the last several months is “Yeah, we had a problem but it’s fixed now…”, so I’ve rigged up what I consider a pretty good workaround. I’ve added two web endpoints to each of my smartapps whose schedulers are important to me. The first checks the time the last scheduled job ran (updated in state at each run) against the current time. If it’s been long enough to declare the scheduler dead, it returns “FAIL”, otherwise “FIRING” along with the name of the app and the last time the scheduler ran. The second simply calls my method that creates the schedules. This alone makes it easy enough to just bookmark the URLs with all of the auth tokens so it’s a one-click operation to check on the scheduler and another click to restart it via my brow…

Here’s the code I’m using:

gist.github.com

https://gist.github.com/bkeifer/7298a67fb766e7530c10

SmartThings Scheduler "Fix"

mappings {
  path("/stamp") {
    action: [
      GET: "checkStamp",
    ]
  }
  path("/reschedule") {
    action: [
      GET: "reschedule",
    ]

This file has been truncated. show original

625alex · December 15, 2015, 8:14pm

I think you need Pollster for Pollster.

Check out the IFTTT Maker Channel. You could use it as a pace maker service to kick start your scheduled jobs via an endpoit.

bridaus · December 16, 2015, 1:28am

Pollster was designed with its own watchdog. Forget how it works.

This vein of this conversation is why I like to use events instead of times or timers. One failure is only one failure. It can’t cause others.

I also still use ifttt to backup the two things that use time in my setup, sun up and down. Has not failed once on me.

RBoy · December 16, 2015, 2:05am

This is a neat solution to manually restart the timers, not really a practical solution if most of your apps depends on timers. (each app would have it’s own token hence multiple bookmarks). However could this be integrated with IFTTT and somehow have IFTTT automatically poll ST every 5 minutes?

Topic		Replies	Views
Polling clarification General Discussion developers	6	1148	February 4, 2016
Inconsistent Polling Intervals General Discussion archive_developers	8	2224	June 3, 2014
Hardware Pollster (Schedule Fix?) Projects & Stories	13	2281	February 17, 2016
Polling didn't work with Device Types General Discussion archive_developers	3	726	February 23, 2014
I am so sick of trying to figure out why my smart apps aren't working General Discussion	23	3567	July 1, 2016

Scheduler and Polling quits after some minutes, hours, or days

Related topics