SmartThings Community

Scheduled jobs failing (again) (again 😥) (Ongoing Known Issue)

Super slow last night about 10 pdt. Has to execute the goodnight routine three times before it completed fully.

I’ve noticed the app has been REALLY slow the past few days when pulling up devices. I also had an event scheduled via Rule Machine which didn’t run last night as scheduled.

Last night and this morning I’ve had a ton of red banners, unexpected errors, etc. My hub also went off line last night around 1:27am and came back online around 2am, though I didn’t lose internet at that time. I’ve noticed presences has been wonky the past couple of days. It fires but takes a few minutes, I even set off my intrusion detection Wednesday.

Something is definitely up. I’m going to be opening a ticket later on today to have it investigated.

Thanks

Thread added to the bug reports list in the community-created wiki for October:

http://thingsthataresmart.wiki/index.php?title=Bug:_First_Reports

Please continue to report individual account issues to support. The Wiki list is just for community information, it is not monitored by SmartThings staff.

1 Like

Same here, my mode is still in night mode. @vlad seems like a flair up of hemorrhoids that doesnt want to go away

4 Likes

Looking now - glancing at the monitors things look healthy… Any info to support with:

  1. Shard
  2. Expected exec time
  3. SmartApp name
  4. What happened (Complete miss, timeout, partial execution)
    Would be very helpful.

NA01
7:21AM EST
Core - Disarm_Alarm_Weekday piston
complete miss - had to kick piston to get it to run

That has happened last two mornings.

Also yesterday when my wife got home around 6:45pm the alarm did not disarm but it did know she was home as I got a text so that isn’t scheduling related but maybe a lot state in core or load?

2 Likes

Guessing na01, when i goto login i stay on graph.api,
Sunrise
Good Morning
Complete miss, house still in night mode. Set night mode by hand as always. At sunrise stock routine good morning is supposed to set home mode.

Ticket 265509 opened for it.

For me I cant look into hub or anything cause I still get server 500’s from the webpage. Looking at weather station it did say it execute sunrise. Tho Good Morning says lastTime is 10-13 so yesterday.

Not sure if the web errors are actually an indication
Oh No! Something Went Wrong!
Error
500: Internal Server Error
URI
/hub/show/xxxxxxxxxxxxxxxxxxxxxxxxxxxx (just incase thats unique :wink: )
Reference Id
2d9b64c9-7584-4daf-a065-6799ec0d4fc6
Date
Fri Oct 14 14:16:27 UTC 2016

2 Likes

First few we’ve spot checked are database related event save failures - in CoRE’s case its happening when it calls (either in timeHandler or recoveryHandler):

sendLocationEvent

Don’t think the failures are limited to scheduler related at this point.

It may not be limited to the scheduler part of the platform, but from the customer’s point of view it’s the same end result: a routine/smartapp that was scheduled to run, didn’t. :disappointed_relieved:

So how should people report these problems?

Related to was a bad choice of words - changed the wording to “not limited to scheduler”. You should still contact support when you see a failure.

1 Like

@vlad get ready for a long list of stuff that happened:

This morning, at 7:00 am, my Good Morning routine was supposed to happen. My notifications shows “Good morning” but none of the light changes that are supposed to take place with it. It did not set my alarm to “unarmed” and it did not change my mode to “Home” from “Night.”

at 7:35am~ish, kitchen door opens, sets off alarm since alarm still set.

at 7:40 I manually clear the alarm.

at 7:40 also, I manually click routine “Good Morning.” Nothing happens. No lights change. Cannot change alarm mode.

at 8:55 kitchen door intrusion detected since alarm still set. Cannot disable alarm or dismiss the alert because the page doesn’t even load on my android smart things app.

at 9:55 I get a reminder that there’s been an intrustion. Still can’t clear it, page won’t load.

at 10:55 I get a reminder that there’s been an intrustion. Still can’t clear it, page won’t load.

at 11:55 I get a reminder that there’s been an intrustion. Still can’t clear it, page won’t load.

at 12:55 I get a reminder that there’s been an intrustion. Page loads, get an error trying to clear the alarm. Trying to refresh the page to see if alarm is cleared, aaaaand page won’t load again.

At this point, it’s unreliable enough that my friends who I talked into getting smartthings are doubting the usefulness of this product.

3 Likes

It somewhat amazes me that we find new ways to break scheduling and that things like this aren’t monitored. It’s not always just the health of the machines/os running but the details in the DB.

I would have thought a growing number of failures would instantly set off some red flags.

At what point does this become proactive vs reactive when people are already annoyed and complaining.

5 Likes

Yesterday (October 13) a SmartApp which had been running for 2 years broke down.
I tried to execute it from the IDE, but got obscure error messages (2 different).
I contacted support, with a strong suspicion of some SmartThings cloud overload, but once they saw “custom SmartApp”, they declared it was not their problem !
I posted my problem in this other thread : Cassandra timeout during read query?, since one of the 2 error codes I got was a “Cassandra timeout”.
Interestingly (?!!), up to now, this periodic smartApp would often fail to schedule (it runs every 5 days), but when it did it executed properly. Now it schedules… and aborts !
Note sure I would label that as “progress”…:weary:

Now my system didnt go to ‘home’ when i arrived home. Left, it went to away, and armed itself with SHM. Got back, opened garage door, boom intrusion detected. Thanks ST… basic geo location and firing of routines not working. Had been working basicly forever…

Is this the acknowledgment of this issue?

Investigating - Some North American users may be experiencing issues with loading resources in the mobile app and web UI, arming/disarming Smart Home Monitor, and the execution of SmartApps. The engineering team is working on the issue and we will provide an update shortly.
Oct 14, 14:19 EDT

If so, can someone articulate what is causing this?

Really annoying… this morning…

  1. Mode and/or SHM status did not change, when CoRE Piston executed. Re-Ran piston several times and did not fix it.
  2. Hub displayed a notification the hub went offline, which if it did it was the ST Hub or the Cloud, not my internet. Amazingly, the app showed the hub was online (live status) and turning on a light with the app worked.
  3. Lights took a very very very long time to turn on in response to motion (smart lighting) so I manually rebooted the hub.

Sadly, this ended a good 75 day run I had with basically no issues.

Establishes again that ST can’t be relied upon for important functions. A huge reminder, it’s entirely possible the system will not inform me when an intrusion occurs even if every other required support system is up and running (power, internet, etc) - and perhaps even worse could set off a false intrusion and siren, etc and disturb or scare the family.

ELI5?
:blush:

1 Like

Same @JH1 I seemingly have been very stable and when people were complaining my setup had been very reliable since finding the Cree/Osram bulb bug. Switching most my bulbs to the Hue Bridge had made my home like dang near 100% for months now, easily 3 months. Last few days things have been super slow, aka hue delay. And then starting last night tons of failures. Routines not firing, stock basic automations not working or partially working. yet the status page is “investigating”. I love the spin on it, dont say we have a problem… Hell the IDE has been unusable for me… Cant view Hub, events, and other parts of the page just give 500’s. But hey, lets investigate.

3 Likes

While I am both curious and of course annoyed by additional outages, to put it in perspective, if all I endure is what happened this morning I will survive. Relatively speaking, comparing to some of the outages and degradation I have been through.

Investigating is good, acknowledging there is an issue and seeking to understand cause is good. Let’s see if discovered cause is communicated clearly and if the issue remains on status until truly resolved…

Litmus test…

@bamarayne @bridaus

Report Status!

2 Likes

About a week ago my mode didn’t change once… and I believe it was ST. That’s the only problem I’ve had in weeks (months maybe?). I have a thermostat that was acting wonky but I’m pretty sure it wasn’t ST because my other two were fine.

1 Like

@bridaus
Try to go to IDE and click on hubs… I get:
Oh No! Something Went Wrong!
Error
500: Internal Server Error

I also have issues going to a Thing in the mobile app and viewing the device logs in the Recently tab… takes forever, fails,etc…

Check it out, let us know results…