Scheduled jobs failing (again) (again đŸ˜„) (Ongoing Known Issue)

My time triggered morning routine also did not run today. Home remained in Night mode until I manually ran Good Morning routine.

My scheduled good morning did not run either. I hopped into a COLD shower which was a huge surprise. I kicked off good morning via the app and everything seems fine now.

Has it already been about 45 days since the last failure? I am beginning to sense a pattern? Hmmmmmm. #notsosmartthings

1 Like

Same here this morning

Glad it wasn’t just me. Part of my Good Morning routine did not fire and also my Pollster app is not polling one of my devices as it should every 10 minutes.

Given that this user had an issue 4 hours ago and my good morning routine which was supposed to run at sunrise which was 2 hours ago this suggests there is 100% a path. Did you guys figure out the issue as to why it was not working for at least 2 hours? Did your systems at least alert you that it failed to run jobs on time?

1 Like

Yep my routines didn’t fire and lights didn’t turn on. I started another thread which got merged into this one. Just dropping this here for good measure: I did open a support ticket, so it’s reported.

What didn’t work:

Good morning did not run at 5:59am
-Morning Wake-up did not kick off (Lamp dims up in brightness)
-Hallway lights did not turn off at 6:30am
-Livingroom lamp did not turn on; Had to turn on manually (and off)
-Manually turned on bedroom lamp (and off)
-Sonos turned on when door opened (as it should’ve) then turned off 1 minute later (should’ve stayed on)

What did work:
Presence.

1 Like

No “Good Morning” this morning here either.

Mine failed as well this morning. But everything on the Status page shows a nice green color (Operational) except for Device Control (Degraded Performance).

I’ve been experiencing the IFTTT Maker fails since Wednesday. Once in a while they will go through but mostly not.

I noticed that also. Specifically the routine to turn off security didn’t run two days in a row.

And I fired off an email to support and have received no reply back.

A rundown of what is going on -

Yesterday around 12:00 CDT we started seeing crashes in the Scheduler cluster. They seem to be happening with a fairly regular period of every 2h for scheduler. It looks like the errors are caused by the servers running out of memory. The JVMs on these servers are tuned to have additional memory available (as standard) but something is causing the JVM to attempt to allocate additional memory when none is available on the system (this should never happen). Current theory is this is off heap memory. This leads to a fatal crash of the JVM with a malloc error.

We haven’t encountered these types of errors before and had to create some new monitoring rules to react prior to the servers crashing. These rules weren’t configured correctly which resulted in the scheduler cluster being severely under provisioned from 01:00 to 06:00 CDT - which was the reason why many scheduled executions failed to execute. (Yes we have other metrics available that would have signaled a problem but don’t normally use them for alerts).

I’ll provide an update when I get more info - the dumps we’re looking at are ~15gb in size so identifying what the root cause is will take some time. Schedules should be performing better than they did overnight but expect some rockiness as servers are constantly being replaced to keep up with the crashes. Will also ask to get the status page updated for last night & continued degraded performance until we figure out what is causing these crashes.

19 Likes

Was up until 4am working on SmartTiles V6 this “morning” and then noticed that the garden lights had not turned off at the usual scheduled time of 11:25pm (Smart Lighting, Hub V1).

Thought I might have accidentally clicked them on, but I have a second set of lights on similar schedule which were also on.

Scheduler obviously broke overnight.

I’m not inclined to write Support for what is obviously a known and widespread issue.

3 Likes

Had a routine fail to work this morning also.

Good Night routine did not execute the past two nights and this morning, Good Morning did not execute either. Good Night is time driven, Good Morning is sunrise driven. I changed times for Good Night but did not resolve. I don’t have the patience or stamina to deal with the busy work that Support requires and at the end hearing “too bad, deal with it, we have more important issues”. Of course, they say it much nicer but thats the gist of it so I give up and just hope it gets fixed.

There are two topics on this subject now.

I had a routine and a piston fail so far today.

I am hoping to start a trend. I edited the title of this thread to [KNOWN ISSUE]. When it’s fixed, someone should change it to [RESOLVED]

5 Likes

Thanks for the update regarding this situation. Hope it gets sorted quickly.

From where I sit, things didn’t “just break.” I have the sense scheduled stuff began failing more frequently over the last week or two and this is on my two geographically diverse hubs. These are the same issues I’ve had for many months: a twice-per-day fan automation at my second home and smart lighting automatons here at the primary residence both randomly failing. Support and @Aaron have looked at them on numerous occasions, with no definitive improvement. Gotta give them both kudos for corresponding with me and I hope I was able to contribute something meaningful to their troubleshooting efforts


@vlad kindly chimed in, above, and his response serves to calm some of us “propeller heads” who try to guess what is going on. Thanks, Vlad. The information you shared is appreciated.

As one of those old Grey Beards who deployed a lot of embedded software over the years for a number of mission-critical applications, both military and commercial, I have only the following to offer:

  1. There is a major problem in the ST architecture: cloud, primarily, but also the hub firmware. I am hopeful the reason we’re not seeing all kinds of whiz-bang features being added and support for every device y’all are clamoring for is that the bulk of the engineering team is feverishly working on “fixing” the underpinnings of this product to bring about dependable, reliable operation. There were (probably) a bunch of very painful lessons learned about reliability, scalability, and defensive programming we will never hear of. There is nothing official I’ve heard to support this, just my hunch.

  2. There is a cultural shift going on at ST. Not unusual following an acquisition. New management, new & departing employees, and new objectives from the Mother Ship. The sudden absense of @alex from the forums and his promise of weekly reports has gone on so long that I’m convinced he is “on special assignment” or “spending more time with family” or something like that. Samsung is probably just not ready to announce it yet.

  3. This stuff is still cool. I’m anxious to get on with further additions to my two locations! :grin:

5 Likes

Well
 At least he has “The Smartest Home in America”:

( http://video-api.wsj.com/api-video/player/iframe.html?guid=7A2BC378-F7BC-4A42-BCBB-1C830F82174B )

3 Likes