Outage Affecting STHM and some automations (19 Jan 2021)

That worked for me. Thanks.

I guess I want to know why? It will probably make me fell better understand this. The Virtual switch type was working before and now it doesn’t. What got changed? Are we going to deal with this again when something prevents Simulated switches from working with STHM?

I can tell you what I had in mind when I suggested in another thread that I could see a failure mechanism that Virtual Switches have that Simulated Switches wouldn’t have. It’ll be a bit wordy as I want to cater for a broad audience.

Legacy Groovy device handlers (which the stock Virtual Switches and Simulated Switches are) set device attributes by sending an event. These events then get propagated to apps that have subscribed to them. The apps can choose to subscribe to certain values, e.g. on or off in the case of switches, but it is arguably more common to take what comes. Apps are often particularly interested in changes of state, so for a switch they want to know about changes from on to off or off to on. The legacy SmartThings platform agrees and by default only propagates the state changes, so apps only see on, off, on, off etc. That is how the Simulated Switch works.

Sometimes it is useful for apps to see every event regardless if it is a change or not. For example you might have a button used to turn a switch on and you might want to see every press on it. So you want to see on, on, on etc. This can be achieved with the isStateChange: true flag when sending events. This means that the event should be considered as a change of state regardless of whether the value has changed. Apps can see off, on, on, off, on, off, off, on etc. For whatever reason, this is how the Virtual Switch was written.

Now let us consider the particular example of the using six automations and three virtual switches to expose the STHM status (Security Mode) to third party apps. It basically distills down to three pairs of Automations that say, for example:

if
  Security Mode is disarmed
then
  turn Disarmed switch on
  turn Armed (Stay) switch off
  turn Armed (Away) switch off

and

if
  Disarmed switch is on
then
  Set Security Mode to disarmed

Bearing in mind that automations (however implemented) typically get triggered by changes of attribute state, what you have there is a potential infinite loop. Turning on a switch changes the mode, which turns on a switch, which changes the mode etc.

The user is entirely dependent on SmartThings stopping that infinite loop happening and we know it can do it as it has been doing so for a long time. So how does it do it?

Well if the user chose to use the Simulated Switch, or other custom handlers that behave the same, they have a result because if, for example, you set the Disarmed switch to on when it was already on you aren’t going to have the event propagated and it won’t start another Automation.

However lots of people use the Virtual Switch handler and that will happily pass on continuous on events. So now we are relying on SmartThings helping us out in other ways. For example, maybe you can’t repeatedly set the Security Mode to disarmed. Well anyone who has seen repeated notifications from the app knows that you can so that can’t be it.

So what is it? Well to be honest I don’t know for sure. However the Automations aren’t legacy apps, and new integrations are working with a different API. I seem to remember seeing the state change flag on subscriptions. So maybe they can choose to only subscribe to state changes. That would explain why things worked OK.

So what has changed? Well apparently Automations and Scenes have been reengineered as front ends for the Rules API rules in the Rules API. So is the new implementation of Automations also only subscribing to state changes, or is it seeing everything? I don’t know, but if it has changed it would explain some apparent infinite looping issues with Virtual Switches.

Update: I had it in my head that the Automations and Scenes would still be entities in their own right that were effectively a front end for the Rules API. Things make more sense if they ARE rules and that what we see in the apps is derived back from the rules. Indeed ST have already pretty much said that is what they are, I just didn’t completely grasp it.

All I know for sure is that a mechanism like I describe above could explain certain reported issues. It could be correct, it could be broadly along the right lines, or it could be complete nonsense. It was where I was coming from though.

6 Likes

Thanks, Graham.

Very clear explanation of a possible mechanism for the recent changes in behavior, thank you! :sunglasses:

@jody.albritton

Thanks Graham for a very good explanation.

My problem persist with either though. I’ve tried to change them from Virtual to Simulated but none of the works anymore.
And the location based arm/disarm doesn’t work either.
I’ve stopped the loops as mentioned above as all automations got altered and pre-conditions got removed.

1 Like

Yes, there seem to be at least two issues:

  • Those using the six Automation / three Virtual Switch STHM fudge suffering from a loop turning on switches and the same Security Mode. Simulated Switches seem to help there.

  • I believe others are getting a loop between two Security Modes because their Automations that set the modes are dependent on a check of the Location Mode being a precondition to prevent those loops and the precondition setting got turned off. My Automations aren’t vulnerable to this.

Update: I’ve also now encounter a loop between Disarmed and Armed (Stay) that apparently is fixed by switching from Virtual Switch to Simulated Switch. I can’t envisage the failure mode there.

Further update: I am reliably informed that the Location Mode precondition was investigated and found to be purely a display issue. Have to say that is completely the opposite conclusion I came to based on a number of third party reports, but there you go. Lots going on that can’t be seen by the users.

3 Likes

I had to switch (pun intended) all my virtual switches to simulated switches then all switches started working as they should. With just one Virtual switch left none of the Simulated ones worked. Tested back and forth and could reproduce it… :astonished:

How do you set the security modes without pre-conditions @orangebucket?

Awesome. Thank you

1 Like

As many do, I set my Security Modes based on the Location Mode changes and I do use checks on the current Location Mode in my Automations, all of which are preconditions.

Home → Night and Night → Home looks to have looping potential but I use fixed time conditions. So even if they are activated when I don’t want them to be they aren’t going to do anything and no loop will happen.

Similarly Home → Away and Away-> Home are based on mutually exclusive presence conditions. They might do something I don’t particularly want them too once, but they aren’t going to loop.

However I haven’t covered all bases. If I wanted Night → Home based on someone arriving then I probably would be dependent on the precondition working. I currently leave that as an edge condition requiring manual control though.

2 Likes

This may help:

Yes i thought this would fix it also. So frustrating.

Will the switching them to simulated switches also fix it on my wall mounted tablet? I’m not seeing a change when Looking at it. I use an Amazon tablet with fully kiosk browser along with sharptools. Thank you for any info.

Last night I had an alert from Device Monitor that I had 10 devices not reporting any events, which should only happen for devices that don’t report any events for 24 hours. When I checked them in the Classic app, they showed offline, but all were actively showing events within half an hour of the notification. In the new app, several of them weren’t showing any history and briefly flashed offline, even though I was able to actively control them. Is this part of this existing outage or a new one?

Good information in this thread. switched over to simulated switches from virtual switches for my 6 STHM automations.

AND my other 12 virtual switches (change all of them) once all were changed over (and only then)

they started to show as available for automations and scenes again.

Thanks

I also had the ‘No Security Sensor’ message in STHM, although the automations were still working. I changed all the virtual switches to simulated switches and the STHM is now working correctly again.
Thanks for all the good information in this thread

1 Like

Curious. I wonder what was going on there. One thing that does concern me about the Virtual Switch is that it doesn’t have the Health Check capability. Once upon a time that might have seemed like ‘a good thing’ given the chaos device health seemed to cause, but now it feels rather like painting a target on it, or perhaps the reverse - giving it a invisibility cloak.

At first last week I had the looping thing going on a little bit, but I just did a few minor tweaks to those few Automations/Scenes and all of my Automations/Scenes with virtual switches are all working fine using the virtual switch device handler, including the Automations with the virtual switch that controls the STHM. I had encountered the looping/stateless issue before when I first migrated over to the new SmartThings app over a year ago, so my Automations were already pretty much loop/stateless virtual switch resistant.

Tried changing my switches to a virtual switch. Tried editing my automations. No go. My ActionTiles screens will not update the STHM status.

If you’re looking for the workaround to fix what broke a few weeks ago, you need to change them to the “simulated switch” DTH and not use the “virtual switch” DTH. The first has state and the second is stateless, which seems to be triggering a new issue.

Can you post a screenshot of the automation you were trying to use and indicate which if any of the devices are virtual and if they are virtual, which DTHs you are using? Then we might be able to help.

Switch status will not update on ActionTiles.