Scheduled jobs failing (again) (again 😥) (Ongoing Known Issue)

vlad · October 15, 2016, 6:18am

Yea - one of the issues was “fixed” by replacing 3 nodes in our events cluster for na01. These are read timeouts from that cluster. Reducing this should have helped with the UI, IDE, smartapp executions, etc…

The other issue that we’re seeing is timeouts for saving events:

It doesn’t look like the replacement fixed the issue here as the timeouts are still elevated and follow a cyclical pattern. (Can see other Cassandra metrics trending upwards again). This mainly affects execution and the chance of it occurring increases in proportion to the rate of created events per app execution. So while many smartapps/devices are firing fine certain ones are hit more often (the ones that create more events). We have a change that can go out to make the event creation async which would help with executions but that could have a number of unintended side affects and would change the behavior of the core part of the system - it would be much safer to figure out what happened in the last couple of days to cause these spikes.

Topic		Replies	Views
Frickin' schedules again!? General Discussion	15	2167	September 28, 2016
Automatic routines/modes not working, again General Discussion	7	1445	October 17, 2016
Scheduled Jobs Failing AGAIN Devices & Integrations	3	1489	February 2, 2016
The usual: automations fail (1 March 2021) General Discussion	2	390	March 2, 2021
Time-based events failing all over my home SmartApps & Automations	8	1488	August 22, 2014

Scheduled jobs failing (again) (again 😥) (Ongoing Known Issue)

Related topics