Critical error: status randomly flips back and forth on EVERY device!

The last few days, devices are no longer properly reflecting/reporting their status. Eg, if I turn a switch off, the device will properly turn off, but when I look in at the device, it will report as either on or off (seemingly randomly) with no relation to the actual status of the device,

This is the case both in the mobile App and through the web portal. For the latter, it happens when I refresh the page; see the three screenshots attached, taken within a minute of each other, showing the same device flipping between on, off, & on again - the physical device was just on the whole time.

This problem is affecting zigbee, matter, z-wave, lan, even virtual devices. I’ve tried physically restarting my hubs twice now, with no effect.

Needless to say, this is a VERY SERIOUS problem. I’d really appreciate any help or guidance anyone can provide.

Thanks!



Do you have Alexa?

Some more details would be helpful.
Which SmartThings hub do you have?
Is this affecting every one of your devices or only selected ones?
Are all these devices directly connected to your SmartThings hub (I don’t know much about matter devices)
What are the specific device models?
I presume you are seeing this behaviour on the SmartThings App; can you confirm the behaviour on the advanced web app?
Just some thoughts.

Thanks for the sparing a thought! To answer your questions, I have 2 Aeotec V3 hubs and the issue is affecting devices connected to both of them. It’s also affecting third-party WiFi devices that aren’t running through either of them.

And yeah I’m seeing the same behavior through the advanced web portal. That’s where those screenshots are from.

Have you noticed the timestamps though? The first and third are 10:57 but the middle one is 10:16.

So the API is not always returning the latest information for you.

Yeah, I noticed that too, but I have no idea what to do about it!

A screenshot from the Events section would be helpful.

And I have a to ask again: do you have Alexa (connected)? I’m asking, because you’d not be the first with an issue like that…

I do not have Alexa.

And the event section just shows the actual events. Ie the flipping back and forth isn’t reflected in it.

Hi, @jghoffer

It is strange that you’re seeing this behavior in Hub and Cloud connected devices.

So, since we can see the driver logs for Hub Connected devices to have a better reference on how are those events being originated, please help me collect the following info:

  • Enable support access to your account:
  1. Confirm the email account registered in the forum is the same one you use for SmartThings. If not, please share it with me over DM
  2. Enable support access to your account:
  1. Go to the SmartThings Web (my.smartthings.com)
  2. Log in to your Samsung Account
  3. Select Menu (⋮) and choose Settings
  4. Toggle on Account Data Access
  5. Select the time period and confirm - In this step, please select “Until turned off”, once the team finishes, we’ll let you know so you can disable it again.

See more information about this access here: https://support.smartthings.com/hc/en-us/articles/36170233944852-Enabling-Account-Data-Access-for-Support

  • Choose a device to test the issue
  • Then, set up the SmartThings CLI (here are the instructions), then use the command below to start listening to the driver logs:
smartthings edge:drivers:logcat 
  • Start recording the ST app on your mobile device and send a command to the device so we can see how the status changes back and forth.
  • In the CLI logs, we’ll see if events are received and/or emitted comparing it to what you see in the app.
  • After the issue passes, submit the Hub logs:
  1. In the Advanced Users app, enter the “Hubs” section
  2. Enter the corresponding Hub and click on “Dump Hub logs”
  3. Confirm the process by clicking on “Dump Hub logs” again in the pop-up.
  4. You’ll get a green box at the top confirming the Hub logs were requested.
  1. Confirmed.
  2. Done.
  3. Done.
  4. Sent.

Please let me know if there’s anything else you need. I’m highly motivated to get this fixed…

Thanks!

My understanding is that the Aeotec hub is internally identical to a SmartThings V3 hub (I am not aware of an Aeotec 2nd gen model).
Are your two Aeotec hubs joined in a ‘hub group’?

I do not understand this. Are these WiFi devices connected to a non-SmartThings/Aeotec hub which is cloud connected to SmartThings (and thus showing in the SmartThings app)?
If they are completely independent of SmartThings, and are showing the same symptoms as SmartThings devices, then I suspect you may have Ethernet/WiFi networking issues (do the hub status lights ever turn blue?) as the SmartThings app and advanced web application rely on cloud connectivity for both control and status.
Just my thoughts. I am open to correction.

I don’t think there is a lot you can do other than flag it. The mobile apps and the web apps will be getting their info from the API, probably using the client API servers rather than the public ones. Either way, at any one time there will be six to eight servers handling the API calls in each of the Amazon points of presence, and which six to eight it is will vary over time. You only need one of the servers that are currently handling your calls to be out of bonk in some way and you may have a problem that none of the rest of us will notice.

1 Like

Yeah, sorry that’s on me. I meant V3, got my numbers mixed up! Corrected the original post; thanks for pointing that out.

That makes two of us! :laughing:

They’re cloud-connected, yes. For example, one device is an ecobee thermostat; another is an LIFX light bulbs; another is a SwitchBot thermometer; etc.

All of them all working correctly and (more to the point), all of them are correctly reporting their status in their respective native apps. But in SmartThings, they’re unable to change states without flipping back and forth for ~10 minutes.

And yes, I’m fairly certain that it’s not a local network issue…

Yeah, I think you’re probably right.

I’m effectively at the complete mercy of someone on the back-end getting around to turning the bonky server off and on again.

Thanks for replying, though. I appreciate the cogent explanation, even if the take away is that there’s nothing I can do at this point but keep on waiting and enduring the insanity!

To keep my life simple I only use Zigbee and Z-Wave devices that work natively on SmartThings. I don’t want any other hubs and am even giving Matter a wide berth for now.
In the UK I did struggle to find thermostats (at a reasonable price).
I settled on older ‘Secure’ Z-Wave thermostats. There was a community Groovy handler, but I had to write my own Edge drivers for them. I have written a handful of other Z-Wave Edge drivers (TRVs and the like) but never got my head around Zigbee Edge drivers.

Hi, @jghoffer

I checked the information you sent.

You mentioned the lines you share are all the logs showed during the event of going back and forth.
What I observe is the following:

  1. No events are being emitted to change the “switch” value, which is why you don’t see anything in the device’s history. There’s only the response to turn the switch on which is the initial command you sent
  2. I can see that the device only has a single switch capability, which means the status shown in the dashboard view belongs to the same capability and that value doesn’t match the status change in the detail view. It always shows “On” which is correct.
  3. The value change in the Advanced Users app is something that still doesn’t match the situation.

    Note: Have you checked the behavior while controlling the device through my.smartthings.com where you get a similar view to the one in the app? I’m curious to see what happens there

  4. Could you please share with us the version of the SmartThings app you are using? Currently, iOS is on 1.7.33. If you have that one, I’ll need new evidence to report this to the plugin team since that seems to be the issue:

First:

  1. Enable the creation of additional logs in the app:
  1. In the ST app, go to “menu” > “settings”.
  2. At the bottom of that page, you’ll find a section called “Troubleshooting”. Please, enable the option that says “Create Additional SmartThings Log”.
  3. Restart the app

Then:

  1. Replicate the issue while recording the screen again, the one you sent is perfect but we need the logs that match the timestamp for the team’s analysis.
  2. In the app menu, go to “contact us”
  3. Then, tap on “Error reports”
  4. On the opened page, tap 10 times on the title/label “What is the error about?” or until the prompt to create a log appears.
  5. Click on “ok” and wait for the process to finish. Then, save the generated file in the place you prefer and share it with us in the same email you sent to build@smartthings.com

Thanks so much for your quick responses!

Log and new recording sent - and yes, I do have the latest app version installed.

I also want to strongly emphasize that what I happen to see in the app / portal is the least of my worries. I’m not especially bothered if the app wants to visually glitch out. What’s bothering me and what has largely rendered my entire set-up nonfunctional is that this glitch breaks automations

If you can can get automations working again (ie make it so that routines respond to a device’s actual status and not a randomly flipping ghost status), I will be 100% satisfied, no matter what’s happening aesthetically in the app or anywhere else!

So oddly enough, the basic my.SmartThings portal does appear to be behaving properly - even though the “advanced” one resolutely isn’t…

That being said, while I do get that the interface / view / manual control aspect of this can be diagnostically valuable here, please let me repeat just one more time that I genuinely don’t care how devices visually report — the only thing I care about here is unbreaking automations.

Thanks again for your help so far; at this point you’re pretty much my only hope to ever have a working home again!

OK, I think you didn’t mention this before.
So, I just want to clarify, does it happen for every single device? I see that the device you tested isn’t included in any manual or automatic routines, Can you choose a different device where the routines are executed and let me know their names in case it was more than one, please?
Since you’ll send a video, I’ll see the timestamp of when the routine was executed but I need to know your timezone so the engineering team can track the execution.

Indeed, I was worried that I hadn’t made that at all clear. And yes, it happens for every single device.

Ok, so to make this as simple as possible, I’m create a new simple routine, consisting of the following

Screen recording sent via email.

Obviously, what’s supposed to happen is that if “Fan?” is on, turning “DR1” on won’t trigger the routine (b/c “Fan?” being off is a precondition). But, as you can see in the recording, what is actually happening is that turning “DR1” on sometimes triggers the routine to fire, and sometimes doesn’t — presumably based on what status “Fan?” Happens to be flipped to at that precise moment.

Important background info is that the actual physical “Fan?” device was on throughout the recording.

Other info: 1) the relevant routine is named “DR1 Turn Off”; 2) relevant devices for this demonstration are “Fan?” (which is a Z-Wave switch connected to “Main Hub”) and “DR1” (which is a LAN WiFi bulb); 3) my time zone is CDT (UTC-5).

Also, just in case it’s somehow pertinent, ~10 minutes ago, one of my hubs inexplicably declared itself offline, but came back after maybe 90 seconds; and then ~5 minutes ago, 50+ devices (of all types) inexplicably declared themselves offline off a couple of minutes, but seem to be back on now… no idea.

I saw this actually, I was in your account when it happened so I checked your hub and you’re in the softlimit, generally, restarts happen when the hub is in the “hardlimit”. I collected the logs after the restart just in case.

Let me trace the events of the routine to see why it executed and get back to you.

Thanks!

Am I? Are you referring to the “driverMemory LimitStatus” / “driverCountLimitStatus” characteristics (b/c I see both showing “OK”) or something else?

Definitely not the priority right now, but just curious!