Scheduled jobs failing (again) (again šŸ˜„) (Ongoing Known Issue)

Please message me with any details for your failures.

1 Like

I have had two scheduling failures last two days. My alarm is suppose to turn off at 7:20 from core and normally works flawlessly but last two days I have had to ā€œkickā€ the piston after 15 minutes past due.

Also worth noting last night things seemed very slow - lots of red banner errors and login slowness. Also saw random app timeouts in core trying to play with new simple pistons

1 Like

Somethings definitely up. Red banners all over the android app since last night. Routines not fully executing. Manually executed this morning to get things going and it took 4 times to fully complete.

2 Likes

Same here, good morning routine didnā€™t set home mode, or turn off the alarm this morning.

1 Like

Same, either my mode did not change or SHM did not disarm / have sync status between hub cloud or all of the above.

Super slow last night about 10 pdt. Has to execute the goodnight routine three times before it completed fully.

Iā€™ve noticed the app has been REALLY slow the past few days when pulling up devices. I also had an event scheduled via Rule Machine which didnā€™t run last night as scheduled.

Last night and this morning Iā€™ve had a ton of red banners, unexpected errors, etc. My hub also went off line last night around 1:27am and came back online around 2am, though I didnā€™t lose internet at that time. Iā€™ve noticed presences has been wonky the past couple of days. It fires but takes a few minutes, I even set off my intrusion detection Wednesday.

Something is definitely up. Iā€™m going to be opening a ticket later on today to have it investigated.

Thanks

Thread added to the bug reports list in the community-created wiki for October:

http://thingsthataresmart.wiki/index.php?title=Bug:_First_Reports

Please continue to report individual account issues to support. The Wiki list is just for community information, it is not monitored by SmartThings staff.

1 Like

Same here, my mode is still in night mode. @vlad seems like a flair up of hemorrhoids that doesnt want to go away

4 Likes

Looking now - glancing at the monitors things look healthyā€¦ Any info to support with:

  1. Shard
  2. Expected exec time
  3. SmartApp name
  4. What happened (Complete miss, timeout, partial execution)
    Would be very helpful.

NA01
7:21AM EST
Core - Disarm_Alarm_Weekday piston
complete miss - had to kick piston to get it to run

That has happened last two mornings.

Also yesterday when my wife got home around 6:45pm the alarm did not disarm but it did know she was home as I got a text so that isnā€™t scheduling related but maybe a lot state in core or load?

2 Likes

Guessing na01, when i goto login i stay on graph.api,
Sunrise
Good Morning
Complete miss, house still in night mode. Set night mode by hand as always. At sunrise stock routine good morning is supposed to set home mode.

Ticket 265509 opened for it.

For me I cant look into hub or anything cause I still get server 500ā€™s from the webpage. Looking at weather station it did say it execute sunrise. Tho Good Morning says lastTime is 10-13 so yesterday.

Not sure if the web errors are actually an indication
Oh No! Something Went Wrong!
Error
500: Internal Server Error
URI
/hub/show/xxxxxxxxxxxxxxxxxxxxxxxxxxxx (just incase thats unique :wink: )
Reference Id
2d9b64c9-7584-4daf-a065-6799ec0d4fc6
Date
Fri Oct 14 14:16:27 UTC 2016

2 Likes

First few weā€™ve spot checked are database related event save failures - in CoREā€™s case its happening when it calls (either in timeHandler or recoveryHandler):

sendLocationEvent

Donā€™t think the failures are limited to scheduler related at this point.

It may not be limited to the scheduler part of the platform, but from the customerā€™s point of view itā€™s the same end result: a routine/smartapp that was scheduled to run, didnā€™t. :disappointed_relieved:

So how should people report these problems?

Related to was a bad choice of words - changed the wording to ā€œnot limited to schedulerā€. You should still contact support when you see a failure.

1 Like

@vlad get ready for a long list of stuff that happened:

This morning, at 7:00 am, my Good Morning routine was supposed to happen. My notifications shows ā€œGood morningā€ but none of the light changes that are supposed to take place with it. It did not set my alarm to ā€œunarmedā€ and it did not change my mode to ā€œHomeā€ from ā€œNight.ā€

at 7:35am~ish, kitchen door opens, sets off alarm since alarm still set.

at 7:40 I manually clear the alarm.

at 7:40 also, I manually click routine ā€œGood Morning.ā€ Nothing happens. No lights change. Cannot change alarm mode.

at 8:55 kitchen door intrusion detected since alarm still set. Cannot disable alarm or dismiss the alert because the page doesnā€™t even load on my android smart things app.

at 9:55 I get a reminder that thereā€™s been an intrustion. Still canā€™t clear it, page wonā€™t load.

at 10:55 I get a reminder that thereā€™s been an intrustion. Still canā€™t clear it, page wonā€™t load.

at 11:55 I get a reminder that thereā€™s been an intrustion. Still canā€™t clear it, page wonā€™t load.

at 12:55 I get a reminder that thereā€™s been an intrustion. Page loads, get an error trying to clear the alarm. Trying to refresh the page to see if alarm is cleared, aaaaand page wonā€™t load again.

At this point, itā€™s unreliable enough that my friends who I talked into getting smartthings are doubting the usefulness of this product.

3 Likes

It somewhat amazes me that we find new ways to break scheduling and that things like this arenā€™t monitored. Itā€™s not always just the health of the machines/os running but the details in the DB.

I would have thought a growing number of failures would instantly set off some red flags.

At what point does this become proactive vs reactive when people are already annoyed and complaining.

5 Likes

Yesterday (October 13) a SmartApp which had been running for 2 years broke down.
I tried to execute it from the IDE, but got obscure error messages (2 different).
I contacted support, with a strong suspicion of some SmartThings cloud overload, but once they saw ā€œcustom SmartAppā€, they declared it was not their problem !
I posted my problem in this other thread : Cassandra timeout during read query?, since one of the 2 error codes I got was a ā€œCassandra timeoutā€.
Interestingly (?!!), up to now, this periodic smartApp would often fail to schedule (it runs every 5 days), but when it did it executed properly. Now it schedulesā€¦ and aborts !
Note sure I would label that as ā€œprogressā€ā€¦:weary:

Now my system didnt go to ā€˜homeā€™ when i arrived home. Left, it went to away, and armed itself with SHM. Got back, opened garage door, boom intrusion detected. Thanks STā€¦ basic geo location and firing of routines not working. Had been working basicly foreverā€¦