I’ve been doing a lot of construction lately, adding Zwave devices (mostly GE with @philh30 's Edge driver) and unplugging some devices (e.g. a dimmer using SmartThings-smartthings-Z-Wave_Dimmer_Switch_Generic). I created some routines with the button-press scene capability of the GEs and noticed some routines were just not running right, they were very delayed, and the switches themselves became non-responsive for a period. During that time Zigbee devices were just fine. My hub is running 0.47.11 firmware.
I traced the issue to the offline devices. Here’s what I found:
If there is a Zwave offline device in a Routine, the routine will lock up for 20-30 seconds before other online devices will receive commands in that routine.
Even so much as changing on/off state or dim state of an offline device will cause this issue (i.e., it’s not just when an offline device is in a Routine) - this is easy to reproduce, just find an offline device, try to turn it on, and watch all your Zwave devices go off the rails.
For a period of about 2 minutes from initiating a Routine that contains an offline device or trying to operate an offline device, all Zwave devices and Routines will experience a period of poor response time, no matter if they are Edge or Groovy, and no matter if it’s done from the app or a Routine from a button/scene.
I solved the issue by removing offline devices from active Routines.
Almost seems like the Zwave stack is single-threaded. At best it’s hanging on trying to talk to another device that won’t respond because it’s offline, and stopping up the works on the entire Zwave stack.
Anyone else experience this? I’m willing to bet a lot of people’s vexing performance issues can be traced to offline Zwave devices, probably being hit in Routines. I can’t say this is recent or not with the latest firmware, because I just started noticing this as I recently was adding and shifting a number of devices.
I’ll let @JDRoberts discuss all the in’s and out’s of Z-Wave repeaters and what happens when a device goes offline
One thing I have noticed during the move to Edge is that S2 security plays a bigger role in affecting Z-Wave individual device and network performance. If you have devices that previously weren’t S2 authenticated and then get moved to an Edge driver, there have been reports (and I experienced) of non-responsive devices and devices preventing proper communication with the hub. Another thing I noticed is that S2 security devices with a large number of settings can flood the Z-Wave queue and cause other devices Z-Wave commands to be delayed or dropped.
I would suggest checking each of your Z-Wave devices in the API Browser+ for its NW Security Level and if you have ZWAVE_S2_FAILED devices, you should re-pair them to get to ZWAVE_S2_AUTHENTICATED state. While I can’t say for sure that will fix all the issues being reported, it’s one less variable in the equation.
I know where you’re going with this, that the devices online are having issues finding their way to the hub. The thing is these devices all work fine until any offline device is touched by a routine/interacting with the device in the app – the changes to the zwave network are already baked in when zwave commands start backing up after a zwave command tries to be sent to an offline device.
Interestingly in my case, the offline device was already offline at the time I added the new Edge-based GE S2-secured devices. I checked API Browser and all online S2 devices are ZWAVE_S2_AUTHENTICATED - I have one offline that is ZWAVE_S2_FAILED
Might be worth excluding/including that device and seeing if it makes any difference. For S2 devices, I always find that adding via the QR code is more reliable than a Scan Nearby and then manually entering the DSK.
In almost all cases these devices aren’t really offline: they are being marked off-line by a SmartThings – specific architecture layer that is independent of the Z wave specification. So there isn’t really anything I can add since I am not familiar with the details of that overlay. I know it’s there, I know odd things can happen, but it isn’t really a Z wave issue, per se. It’s a SmartThings issue.
In my case, the device is truly offline and unplugged in the Routine I used/that I “touched” in the app, after which other Zwave activity is slow for several minutes.
Do you have any offline unplugged devices you can try to replicate? It should be quite simple.
I have the CLI, if there’s something I should try to get diagnostics to see what’s happening I can do that, too.
My online offline devices are a mystery Zigbee device that comes and goes and I have no idea what it is and Matter device that I don’t have plugged in. No Z-Wave devices.
You could try doing some driver logging for the offline device and others in the Routine that seem sluggish/slow to respond. Might give a clue as to what is going on . There isn’t really any Routine level logging that is accessible to us users, only to support. Driver logging is “smartthings edge:drivers:logcat” and you will be prompted for your hub address and then which driver to log from. You can add --all to get logging for all drivers.
I wonder if some of the issues are from having a mix of S2 and non-S2 connections?
None of my devices are S2 authenticated. I’ve got five Leviton devices that are listed as ZWAVE_LEGACY_NON_SECURE, one Zooz ZEN26 that is ZWAVE_S2_UNAUTHENTICATED, and fourteen other switches and dimmers that are ZWAVE_S2_FAILED.
Those are all on the Edge stock Z-Wave Switch driver. I’ve also got a handful of Z-wave devices that aren’t yet on Edge drivers that I believe are all ZWAVE_S2_FAILED.
Are S2 AUTHENTICATED devices effectively on a separate network or can both secure and insecure devices route thru the entire mesh?
It’s complicated, and the smartthings implementation makes it more complicated.
Under the independent third-party specifications, S2 devices are backwards compatible with devices at other security levels, except for association. The hub has to support S2 level security but at that point you should be able to have a mix of devices.
Here’s the official answer from SI labs:
Q: Is S2 device security compatible with older devices?
A: Yes, S2 is fully backward compatible. The device and controller will use the highest common security scheme. Non S2 devices can join S2 networks and vice-versa.
Repeaters do not check security encapsulation when repeating; devices with S2 can be repeaters for devices without S2, and vice versa.
Using the old post office analogy, repeaters just check the address on the envelope, they don’t try to read the letter.
Nice side discussions, but … anyone else confirm that the Hub Zwave gets hosed for several minutes after a Routine executes that has an offline Zwave device? Renamed this topic to better reflect that.
1 Like
RBoy
(www.rboyapps.com - Making SmartThings Easy!)
14
That’s not quite accurate, sending and receiving are asynchronous activities. While Lua is non preemptively multithreaded, the transmit doesn’t wait for a response. Those are 2 separate activites.
However note that the routines are an app that run completely separately from the z-wave stack. It’s possible that routine may be waiting for a confirmation or response from the z-wave device before moving onto the next device, which is a little unlikely, but it’s also possible that the z-wave stack may try to “activate” the “office” device before sending it the command it receives from the routine. An easy to verify whether it’s the routine or the stack causing the delay would be to issue the “offline” device + other devices a command from a different app (not a routine) and see the response time.
I think it’s much more than that, as I wrote in the OP, if I manually try to change the state of an offline device by directly accessing it in the app, it causes the same ill effects on all future zwave commands coming from the hub for about the next 2 minutes. So, I wouldn’t blame it on Routine logic.
It should be easy to test out, if you have any offline ZWave devices (DTH or Edge doesn’t matter). Try to turn the offline device on/off, or set the dimmer value. Then try issuing commands from the app/Alexa/button-scene presses to other devices, you’ll see delayed response of around 15-20 seconds, for about 2 minutes.
I’m fairly certain that a lot of transient ZWave performance issues people are reporting are due to something trying to set an offline device, and I suspect its a Routine firing around the same time or during testing. Having an offline device referenced in a Routine really shouldn’t cause issues outside of the Routine.
RBoy
(www.rboyapps.com - Making SmartThings Easy!)
16
I’m not seeing this issue, tested it on 2 different hubs with multiple z-wave devices going offline, there’s an instant response to routines triggers and it’s sending the commands over z-wave immediately.
Thanks so much for trying, I’m glad someone did. Did you try going into the offline device in the App and manually turn it on then try an online zwave device to see if there was any performance degradation, or was the offline device part of any routine with an online device?
RBoy
(www.rboyapps.com - Making SmartThings Easy!)
18
I double checked and it’s working fine, the commands are sent our immediately, even after adding an offline device to the routine.
What you’re experiencing I believe is related to a mesh problem. When a device goes offline, ideally the hub and other repeater devices should update their tables and locate a new faster route. However in real life that often doesn’t happen, it’s possible that the hub is still trying to route the packet through the offline device causing the delay. Try doing a Z-Wave Repair and that may help resolve the issue.
I neglected to mention in the OP that a ZWave repair was done beforehand. Is that possibly a cause? I.e., if the hub no longer has a defined route to the offline device because it was offline and didn’t respond to the repair cmd?