When in doubt, REBOOT (power-cycle) YOUR ENTIRE HOUSE

Edit 03/20/2024 to refine a LIMITED Power-Cycle and COMPLETE Power-Cycle.

We all know the benefits of a therapeutic power-cycle of a SmartThings Hub, Computer, Router/Modem, SmartTV, SmartPhone, etc.

Just wanted to share a solution for an odd SmartThings behavior we experienced.

Your results may vary, but I would strongly recommend following the Step-By-Step below if you’re experiencing communication issues that aren’t resolved by the traditional power-cycling/reboots.

Odd Behavior:
We have a house with 8 Schlage Smart Deadbolts, and various other z-wave and wi-fi devices.

All 8 Schlage Locks went offline at once and would NOT come back online. Upon power-cycling the Locks, they would come online, but wouldn’t respond consistently to commands and were very sluggish (at best).

Tried the traditional fixes… like in this order:

  1. Pull Batteries from Schlage Locks
  2. Power-Cycle Wi-Fi Modem/Router
  3. Power-cycle SmartThings Hub (including internal v2 Hub Internal Batteries)
  4. Z-Wave Network Repair

Z-wave Light Switches seemed to work OK, but were a bit sluggish.

It occurred to me that even though we were power-cycling the various key components, there was still an extensive Z-Wave Mesh Network which remained active throughout the home across 42± Light Switches which were never reset.

With that in mind, we proceeded to perform a Whole-House Power-Cycle (Reboot)! Turned Off the Main Electric Circuit Breaker (killing power to ALL AC-Powered Z-Wave Devices that weren’t on battery backup). Unplugged some Z-Wave Device with Battery Backups. Waited 2 minutes. Restored Power. BAM! Almost instantly, all 8 Schlage Locks came back online and were VERY responsive!

First: If you only have a handful of constant-power Z-Wave devices or you know which specific circuits ALL of your constant-power Z-Wave devices are on, there is no need to do a whole-house reboot! But it’s important that you know, or can identify, every Z-Wave device in your house. If you only have a handful, pretty straight forward. If you have like 50+, it’s easy to forget this light switch, that sensor, that Z-Wave Repeater you plugged in 2 years ago behind the couch :wink: the water shutoff, etc. It could be that “one” Z-Wave device that’s creating the bottleneck.

Here’s the LIMITED Step-By-Step Reboot in Cases Where You HAVE Identified EVERY AC-Powered Z-Wave Device in Your Home:

  1. Start by powering down your SmartThings Hub(s).
  2. If you have any electronic devices on those circuits (A/V Equipment, Computers, Appliances, etc), be sure to gracefully power those down first!
  3. Then slowly turn off only those circuits with Z-Wave devices on them, one at a time.
  4. Wait say 3 minutes.
  5. Power on your Hub(s) and wait to fully initialize.
  6. Then slowly power on the circuit breakers one at a time.

Here’s the COMPLETE Step-By-Step for a Truly Therapeutic Whole-House SmartThings Reboot:

  1. Unplug Power from SmartThings Hub (note v2 Hubs have internal batteries which need to be removed)
  2. Unplug Power from Wi-Fi/Modem/Routers
  3. Unplug any SmartDevices from Battery Backups
  4. “Gracefully” Power Down any Computers, Printers, Appliances, or Other Electrical Devices that you have ready access to and/or feel may be susceptible/sensitive to Power Outages.
  5. Note: Battery-Powered Z-Wave Devices do NOT act as repeaters so are not part of the Mesh Network per say, so we did NOT remove batteries from 24± Z-Wave Devices. But as @JDRoberts points out here, a Battery-Powered device could also create Network Issues. If you have a lot of Battery-Powered Devices, perhaps try this step-by-step first by power-cycling all of the AC-Powered devices to see if it solves your problem. Then consider a Round #2 by also pulling batteries from all of your Battery-Powered devices. Of course, if you only have a handful of Battery-Powered devices, go ahead and pull them in Round #1. We just weren’t keen on going around pulling 24+ batteries and in our case, the AC-Power Cycle did the trick :grin: .
  6. At the Circuit Panel, begin by slowly turning OFF each of the individual breakers, alternating between a Left Side Breaker, then a Right Side Breaker, then a Left Side Breaker, etc. until they are ALL in the OFF position.
  7. Turn OFF Main Breaker at Circuit Panel
  8. Wait 3 Minutes (up to 5 minutes perhaps)
  9. Slowly begin turning ON the individual breakers one at a time, alternating from a Left Side Breaker to a Right Side Breaker, then a Left Side Breaker, etc.
  10. Turn ON Main Breaker at Circuit Panel
  11. Power On Wi-Fi/Modem/Routers
  12. Power On SmartThings Hub
  13. Test your Z-Wave Devices
  14. If they are all working, power on your various Computers, Appliances, etc. that you had previously turned off.
1 Like

Interesting. I’m glad you found something that helped.

One small point: as a network engineer, I would have to profoundly disagree with the following:

These devices are most definitely part of the Z wave network. It’s just that they are typically nonrepeating endpoints. They do send their own messages out over the network, that’s how they report things like a door opening for a contact sensor or motion detected for a motion sensor. In fact, in many homes, the battery powered devices are responsible for initiating the majority of the network traffic.

But it’s unlikely that they have anything to do with your locks, either sending or receiving.

So I understand why they weren’t necessary when addressing the specific issues that you had. But a runaway sensor might cause zwave network problems for other people just by flooding the network, so that other messages couldn’t get through.

In your case, fortunately, you didn’t need to address all the individual battery powered devices. But if you truly want to, or need to, reboot, the entire net work, then you would need to include the battery powered devices in that process.

Just sayin’…:wink:

3 Likes

Agreed. Often a battery powered device will spam the network with things like “my battery is low” or sometimes constantly oscillating between active and inactive states as the battery is on the edge of dying, but not enough to kill power to the radio. Sometimes it just sends garbage which still takes away airtime from the rest of the devices. These can be hard to nail down as they often aren’t getting logged by the hub either.

4 Likes

Hey @JDRoberts, point well taken. I’ve edited Step #4 to consider pulling batteries from battery-operated devices as perhaps a Round #2 if Power-Cycling all of the AC-Powered Devices doesn’t work.

1 Like

All previous being said–and rebooting the house works for lots of folks–people like me with two homes in widely separated geographic locations simply cannot do things like “reboot the entire house!” That solution is unacceptable for that use case.

One can only hope the entire SmartThings environment becomes more stable as time goes on.

In the meantime, I try to have additional means to reboot critical parts of my infrastructure at each location. Alas, not all pieces are covered: at present one of two humidity/temperature sensors at our Florida condo is marked “Offline.” It is critical for monitoring the environment when away. (There is also another plug marked “Offline” that is used only when resident there. Fortunately, it isn’t used in Away mode.)

Frustrating for everyone with these kinds of issues, for sure…

2 Likes

I have over 100 devices across 2 different locations on multiple hubs, using Zwave, zigbee and cloud/WiFi devices. I’ve never had to reboot my whole house under ST.

2 Likes

I’m just curious: what were the results of the zwave repair you ran prior to starting the full house reboot?

I was a field tech and had worked with zwave prior to ever setting up my own SmartThings account, I’ve heard of people having to throw a circuit breaker in order to reset some of the fourth generation Z wave GE switches, but otherwise problems of the kind you experienced were resolved by correcting the errors on the zwave repair report. A bad runaway device would almost always show up there one way or another, although sometimes just as a high transmission error rate on a different device. But then we had a lot better diagnostic tools than smartthings provides. :hammer_and_wrench:

Anyway, I don’t doubt that what you did worked for you, I just haven’t heard of it as a common practice. :man_shrugging:t2: The standard zwave repair Utility is in some ways the equivalent of a full reboot for a Z wave network since it is polling every device on the network and getting a response, which should resync the radios. But the smartthings hybrid hub/cloud architecture may introduce some additional idiosyncrasies, it wouldn’t surprise me. :thinking:

2 Likes

Morning @JDRoberts,

We ran the Z-Wave Repair on 2 separate days and it did not fix the problem. Ran once from the SmartThings Phone App; Once from the SmartThings Advanced Web App. Ran both overnight to assure enough time to pass. I didn’t see any results per se. Is there a log somewhere I can grab for you?

Hmmm… as it turns out… the Light Switches in this location are the UltraPro Z-Wave Smart Light Dimmers that were purchased in March 2023. It is my understanding that these are made by JASCO for both GE and UltraPro. So perhaps that’s the common denominator :thinking:

1 Like

Morning @csstup & @JDRoberts

My experience has been the same as yours since 2017… 4 locations, hundreds of Z-Wave, Zigbee, and WiFi devices… I’ve never had to reboot a whole house… until now :wink:

They say there’s a first for everything. Without question, this solved the problem for us, so I wanted to document & share with the community as something to consider when they are stumped and things are NOT working as they should after the traditional methods outlined in my original post.

2 Likes

There should definitely be an error report somewhere which would identify specific issues with an individual device, but to be honest I don’t know where it shows up in the new architecture. @jkp might know. :thinking:

Normal procedure for any zwave system:

  1. run zwave repair utility.

  2. review the results to identify any individual devices that are throwing errors.

  3. do whatever is needed to fix those errors if possible, or potentially even remove a problematic device.

  4. review transmission counts for devices showing an unusually high failure rate OR an unusually high message count. I’m not sure where this data is in the new architecture, though. It used to be a line in the old IDE. @Mariano_Colmenarejo might know.

  5. do whatever is needed to address any problems from 4).

  6. rerun the zwave repair utility. Hopefully some of the problems from the first time will have been resolved. Ideally repeat this whole process until you get a clean report.

1 Like

I don’t know anything similar in the new architecture

The only thing I have found written in a comment about the firmware’s lua libraries is that the online and offline status for zigbee and zwave devices will be established by the radio messages received or not received from the devices.

For zwave I have not been able to establish a relationship between the messages received and seen in the logcat with the establishment of the offline state.

In zwave, periodic reports are not received from the devices, only those powered by batteries with the wakeup interval.

I have not been able to see in the zwave device logs, I only have 5 in use, something that indicates when it is going to be activated offline on a device.

The only thing I have been able to verify with a fibaro keyfob is that if something is misconfigured for some reason, change of driver, … it starts to show offline after 20, 25 minutes, although if you press a button it emit event and goes online, but it goes back offline another 20, 25 minutes later.

If I exclude it and reinclude it again everything works perfectly again. because? I don’t know.
In the stock driver there is no configuration, nor wakeup, for this device.

For other devices, one siren and 2 switches, network problems gave me persistent offline problems, but I was able to resolve them with several zwave repairs and a lot of patience to repair Network. I haven’t had these problems for more than 6 months.

The new platform does not have or I do not know if there is any useful tool to diagnose these problems.

For zigbee I have spent many hours experimenting with different types of devices and observing the cli logs.
Here a clear relationship is verified between the messages you see in the logs and a future offline state.

I have verified until today, everything can change without prior notice, that:

  • for devices that use the OnOff cluster and report every 300 sec, about 22 or 25 minutes are shown offline without receiving device messages.

  • For devices that use IASZone, contact, motion, smoke,… are marked offline if no device messages are received for approximately 2:30 hours.

Before going offline, lua libraries run a health check trying to read the monitored attribute. The battery-powered devices do not respond because they are asleep and the others, if they respond, the process is reset.

A few days ago I checked and Nayely confirmed that just before going offline the Hub attempts a final read of the attribute and this read command is not seen in the logs, but the response is seen in logs, if there is one and that is why I asked smartthings. Smartthings reply:

The hub FW will occasionally communicate with the device to determine if it is still online, this interval is not related to device configuration like the default Zigbee health check mechanism in the Zigbee Lua libraries

Therefore, it may be that for zwave the hub can also communicate with the devices outside of the lua libraries to determine offline and online and it is not seen in the logs.

I don’t understand so much opacity of smartthings in this offline, online issue and I don’t think they have it definitively resolved either.

4 Likes

Yeah, that’s a mess. :scream:

Going back to before the Samsung acquisition, smartthings’ top management (but not the engineers) has always argued that they wanted the individual device protocol to be largely invisible to their clients, who should just be working with the smartthings overlay that is the same for all devices. They believed that would be less confusing to their mass market consumer. So the official features don’t provide any easy way of setting up Z wave central scenes or Zigbee groupcasts because they didn’t want to show any features as being limited to a specific set of devices.

I personally disagree with that, but I’m just another customer, and obviously nobody’s listening to me. LOL! :laughing:

Anyway, if you did want to get meaningful diagnostics for zwave and use the North American frequency, you can still buy the thirdparty zwave toolbox, but while the cost has come down significantly from the time when it was limited to members of the Z wave alliance, it’s still almost $250 (although often on sale for about half that), and I don’t think most people will want to spend that. But some people do, and people trying to manage multiple properties might find it worthwhile, although they would have to physically move it to each new location to run the reports. It was designed as a tool for field technicians, so you don’t have to know programming to use it.

The point being that the information is all available from the hub, smartthings just doesn’t present it to their users. :man_shrugging:t2:

3 Likes

I have a few zwave devices and they don’t give me many problems now.

The problem is that in zwave there is less information in the cli logs to be able to analyze whether it is a network or signal problem or not.

In the app and my.smartthings.com there is zero useful information for both zigbee and zwave problems, only zwave repair can help in certain cases.

3 Likes

My theory with Zwave and offline-ness of devices with the WAKEUP CC (ie, battery powered sleepy devices):

  • I believe the framework monitors the values that the device returns for its GET WAKEUP INTERVAL responses and bases a “device is offline if no message is received from XX seconds from the last message” off that. Trying to query a sleepy zwave device is pointless so I wouldn’t expect any traffic from the radio trying to get an update or ping the device in any way. This logic would be similar to how the Groovy based drivers tended to use the health-status capabilities under the old system. I believe devices that support WAKEUP are queried during inclusion to get an initial value, even if that is later reset by the driver itself. I’ve not actually watched the traffic during inclusion using a Zniffer but if I were them that is how I’d do it.

For Zigbee, sleepy devices are supposed to check in on their long poll interval (~7 seconds) with their parent routing device to see if it has flagged any data/messages are waiting for it by the coordinator. Most sleepy devices don’t do this in an effort to extend battery life so using the attribute reporting values with the triggered lua framework health check seems pointless as well - the device isn’t likely to be listening and probably isn’t going to check in with its parent router within that 7 second window that it caches the flag anyway.

3 Likes