Loss of State affecting some users / some SmartApps (was Death knell for RM today?)

The silence is indeed deafening.

Also has Alex given up on us all now?

The reality is, effective leadership would not have let it get to this point.

I sincerely want ST to succeed, but it will never happen under current leadership. No matter how much one may think of ST, no matter how stable their system has been for them. This is objectively bad leadership. And, in my opinion, why ST cannot realize it’s potential.

Putting aside the failures to respond meaningfully to the community about recent failures and lingering concerns, simply the fact that the c-suite consistently fails to fulfill their explicit commitments. This is magnified by the later of course, there is no worse time to be an absentee than when there is palpable fear, uncertainty and doubt predicated on recent failures in the community upon which your entire eco-system depends.

Leaders… well, they lead. They step out in front, especially when there are issues that their staff are taking heat on.

Leadership has let their staff hang out to dry, time after time.

4 Likes

Leadership? What leadership?

1 Like

How would it help you knowing that they made no progress since his last post and that they are still struggling to push the upgrade that he announced last time?!

1 Like

Good question.

I think the humility and transparency would help the fine folks that put their blood sweat and tears into making ST happen, for one.

2 Likes

We were promised transparency. I interpreted that as sharing information not ST leadership being transparent. :grin:

5 Likes

Things are getting so bad that even using an event to trigger the recovery of a timed event failure, is now failing. I can no longer keep a timed event going more than two hours without a failure.
Detecting several dozen timed event fails per day. I put it on par to the springtime fiasco where SmartThings took many weeks and eventually expanded their capacity. Just like the last time when many of us started losing our rules followed by massive timed failures.

Not looking good folks, at least for me.

2 Likes

Seeing the same problems. Fixing some stuff in CoRE’s recovery… at least make it kick all pistons when the play button is clicked, or when a recovery stage is reached. Other device events will only kick pistons that have subscribed for kicking. (i.e. when a piston is executed, it tells CoRE it ran, and when the next expected time to run is, if any, giving CoRE a chance to check if there’s any pending piston that hasn’t run at the right time. CoRE should then kick those pistons, making their recovery faster than the 1/2/3 hour first stage recovery kicks. At least in theory…

3 Likes

Please shoot a note to support and DM the ticket number. We are flagging these types of failures for engineering to evaluate individually (I am trying to catch them before they hit the front line agents).

I think my problem was with the Weather Underground service - it times out once in a while. I also see large delays sending commands to Hue Bridge-connected devices. 2-3 seconds per command. Still looking into it, not sure they are pure “time schedule failures”

1 Like

I also experience a problem with Weather Underground so in order to eliminate that variable I created a piston where all it does is turns a switch on every 5 minutes and off 3 minutes later. A very simple task and yet I am unable to keep it running for more than a few hours. Last week the same piston would run for a day. Whatever it is, its in the platform and its going down hill.

Can you please update to latest CoRE - fixed some problems with the recovery mechanism - it should kick it started once in a while, at least it resumes at some point…

I contacted support with a brief outlay of what has been going on.

I will do that. At this point I had disabled the recovery option in the main CoRE app. I am assuming I should turn this back on?

The “fast recovery” is always-on. The two stage recovery is optional. Try it either way. I explained what the “fast recovery” does in an earlier post.

Yeah, watching the logs a bit, I see today that the longer running apps are getting killed. Initialstate, Nest Manager, and Weather are all having occasional failures.

The real issue here is recovery after a failure. The platform is sometimes unable to recover after a timed event is not run. CoRE is the shining star in this regard, some other apps recover well (Nest Manager, Initialstate), but the stock apps seem to be the worst at this (SHM, Routines).

1 Like

Revive…

Since Wednesday 2 Nov 2016 -I

  • have discovered at least 5 of my core pistons losing state information.
  • multiple smart lighting app failures
  • slow to no response from zigbee triggered devices

At least two other community members reporting the same thing this morning. :disappointed_relieved:

@aaron @slagle

1 Like

8 Likes

Please email support with the account information, name of the failed/missing SmartApps, and when you noticed this occur. If you DM the ticket numbers, I’ll flag them for investigation.

Addendum: I am not seeing/hearing of trending reports on this. We will look into the reports right away

2 Likes