Scheduler and Polling quits after some minutes, hours, or days

I’d support that! It might make ST staff extra cautious here regarding any hint of acknowledgement or resolution. I would hope not.

They can easily decide how to respond.

1 Like

Where do we see the things that the Community Advocacy Taskforce is working on?

2 Likes

BTW this issue has been acting up again, couple of users in the past week have reported the the scheduled timers quit working after a few days (this case it’s the RunEvery5Minutes API).

1 Like

Hi. I have issues with STv2 not sending my scheduled time codes to my lock. It works for perhaps a couple days, and then it will just stop. I have to resend my codes to my lock via my ST app and it works for a day or so. It seems to stop working overnight, as it remembers to delete the code at the time I have it to expire.

Is there a fix coming?

1 Like

Interestingly this seems to be impacting v2 hub users more than v1 hub. @slagle is there a difference in the way the cloud is setup for v2 vs v1 (assuming right now v2 apps are still running in the cloud and not locally)

I have been seeing this issue again also. Mentioned on my “Frustrated with the State of things thread” but figured I should support this thread since it is more specific about this issue.

Scheduled events are random again after working really well for a few months.

Polling has never worked.

Both issues reported to support many times.

Ron,

I’ve had some luck with scheduled things, and I’ve seen some patterns to failures. Usually, when I have a scheduled event fail, it’s been an on-the-hour event. When I moved those to 6 minutes before, or 7 minutes after, they became rock solid and don’t fail. Take that with a grain of salt (since it’s ST and who knows?), but it has helped me. I don’t have problems with sunrise and sunset, perhaps because of where I live (Arizona).

2 Likes

Interesting I bet because most people do things on the hour so the server gets overloaded. Of course your recommendation to move to 6 or 7 min before or after will result in more people using your technique and may break things for you :slight_smile: Does makes sense that it will work for a while though so I will give it a shot.

1 Like

That’s doesnt explain why other timers die, like RunEvery5Minutes dies after a few days. Sometimes even run in a minutes dies. There’s no boundary condition for this. Something more fundamental isn’t right here.

1 Like

I agree but I suspect the problem is they have junk code everywhere and this is one of the issues.

1 Like

This is a different, although related problem. The issue here is that these app executions are chained together by a timer. One failure of that timer, and it’s all over. The failure could come from coinciding with a peak of traffic to the cloud in your area, or from some other cloud problem. I’ve had a process running that allows me to see cloud delays, and they can happen at any time, and sometimes there are “storms” of delays that go on for several minutes.

When I first ran this process using chained timers, it would not last a single day, being fired once a minute. One failure out of 1440 kills it. That might not be such a horrible failure rate, but for the chained timers it’s enough to make them fail.

Avoid chained timers…

3 Likes

I am curious how you know this ? What does “chained together by a timer” mean ?
Why when there is “one failure of a timer” is it “all over”. Can you imagine if unix cron was coded so that is would stop processing all future jobs once one failed. There is no reason why ST can’t fix this. It should also be pretty easy to test if what you say is true, just define a schedule task that is hard coded to fail.

1 Like

One way this is done in some SmartApps is by using runIn(), which schedules an execution of a handler in the app for some future number of minutes. By chained timer, I mean a SmartApp where the handler for the runIn() re-schedules itself. This type of app is susceptible to a single failure, because a single failure means the next execution is never scheduled, and the chain breaks.

The other way, which should be more reliable, is to use cron. That is not susceptible to the broken chain failure. It all depends on the SmartApp and how it handles things. When you have a “scheduled” event at some time each day, that shouldn’t fail from this problem, although it could fail for other reasons (peak overload).

1 Like

Actually I think you are a little incorrect about this, I am sorry to say, I wish it were true. Pollster uses only the cron style of scheduling and it just stops working all the time. The only way to fix it is to reconnect. Some folks are writing apps now that trigger events based on polling events of other devices that self poll this is really the only reliable way to schedule events. Actually that makes me think of an idea. I bet you could configure an outside process to notify ST to perform some scheduled events. Something like ifttt or frankly I may just write something for my Raspberry Pi that is sitting around doing nothing. Hmmmm…Ideas brewing…

I remember a thread a long time ago where a ST engineer stated that if a job fails it is not rescheduled because they didn’t want a bunch of failing jobs running all the time. Given the nature of zwave signals etc that is ridiculous. If a job fails I want it to continue, perhaps they could keep track of the number of times it fails and if it fails say 60 times in a row never reschedule it. But their policy of on failure and you are done is foolish.

1 Like

This is so ridiculous. It would help if it was something we could actually do to our smartapps to make the scheduled jobs not fail, but ST isn’t even telling us why they fail in the first place. No error message or anything. As far as we can tell, something is blowing up on their backend, and the lazy move is to just cancel everything that missed because of that failure.

1 Like

That sounds right. And of course the one failure can be a 20 second overrun due to a cloud delay. So they kill off “failing” apps, which failed because the cloud didn’t keep up with the event load, and then it’s bye bye.

I have had the chained timer type of failure, in fact, it’s reproducible with a pretty high failure rate.

I get the impression that their load management design is flawed. Something about x number of hubs on a “node”, where the node is handling the event traffic to/from those hubs. Then if there is a surge in event rates beyond some threshold, things just back up in some queue, and some die from timeouts. That’s just an impression, based on looking closely at numerous fails. They don’t seem to have any graceful load handling ability, but I don’t know.

2 Likes

Here’s the code I’m using:

I think you need Pollster for Pollster.

Check out the IFTTT Maker Channel. You could use it as a pace maker service to kick start your scheduled jobs via an endpoit.

1 Like

Pollster was designed with its own watchdog. Forget how it works.

This vein of this conversation is why I like to use events instead of times or timers. One failure is only one failure. It can’t cause others.

I also still use ifttt to backup the two things that use time in my setup, sun up and down. Has not failed once on me.

1 Like

This is a neat solution to manually restart the timers, not really a practical solution if most of your apps depends on timers. (each app would have it’s own token hence multiple bookmarks). However could this be integrated with IFTTT and somehow have IFTTT automatically poll ST every 5 minutes?