Z-wave switch offline after ota update

I have observed behaviours in smartthings for over a year where my z-wave in wall switches “randomly” drop offline and the only way to recover is to forcibly remove them and then re-add from scratch…

It happened again yesterday, 2 switches offline within a couple of hours of each other…

I’ve been better at keeping tabs on what’s going on through the advanced web interface and I’m now pretty sure it is related to over the air edge driver updates.

Both switches are using Mariano’s z-wave switch and child MC driver - which the advanced interface tells me was updated at 17:24 yesterday.

I have 2 more switches using the same driver,… On past experience they will also drop offline in the next few hours…

Anyone have any idea what is going on here??

Why do the switches drop offline? Is the ota update failing? How can I check??

Is there a better way to recover than remove and re-add, which forces reconfig on all associated automation.

Simon

Tagging @Mariano_Colmenarejo.

Thanks for that… Yes, should have tagged Mariano… Though at this stage I wasn’t considering that it could be a driver issue. On reflection, I guess it could be…

Hi @Simon_Townsend Since you ask, I’ll expand a bit.

The management of the zigbee and zwave networks is done by the hub firmware and the online and offline status of these types of devices depends on the messages received by the hub from each device.
The driver only intervenes in the initial configuration of the parameters and association groups during the pairing of the device and in sending to the device the commands received from the app and platform and handling the commands received from the device by emitting the events to the platform or interrogating the device.

When a driver is updated to a new version or the driver is reinitialized by a hub reboot, the device is not reconfigured again, only all the non-persistent variables and tables of each device are initialized, following the code of the lua libraries of the hub firmware.
This should not affect the online or offline status unless the data in the code that controls the online offline status of the device is lost or poorly managed in the hub firmware.
Something has changed in the latest firmware version, but I don’t know if this affects your specific case, although there are several users who are complaining about similar problems.

Every time I update the driver I also update it for myself and I don’t detect those problems.

My own experience in my very small zwave network, 7 devices, of which 2 are always off because I use them for testing, is that the zwave network is robust, but when there is a change in the devices it is slow to rebuild itself, let me explain.

I had been without any problems in the zwave network for over a year.
A few days ago I connected one of the devices, which I have for testing, for several days to debug some changes in a driver. It is physically placed between the Hub and 2 wall switches further away from the hub. I figure that the furthest device found this new device and the network management decided to use it as a parent device to better reach the hub.

When I disconnected it a few days later I started having offline problems on the furthest switch. If I turned it on and off physically then it would go back online and after a few hours it would go back offline.

Looking at the driver logs with the CLI, I see that it almost never responded to the refresh commands that the driver sent to device, but Hub always received the commands from the device when I turned it on or off physically. Which makes me think that it could doesn’t use the same route from the hub to the device as from the device to the hub.

I performed several zwave repairs and it gave a network error messages on that device.

I updated the driver, the same one you use, with the changes I had made in a subdriver that only affects the child configuration device and nothing changed.

After two days the problems have been solved by themselves, the device must have found a new route without the test device and in the cli logs it now responds well to the refresh commands and I periodically see basic reports of that device and meter reports of another identical device, I don’t understand why the two devices configured the same don’t behave the same, but…

To sum up, if any device on the network disconnects or changes its id because you have excluded and re-included it, if that device has child devices that it repeats on the network, all of them will be affected until they find another way to reach the hub.

It is the Hub that is in charge of managing all of this and there is no way to analyze this in smartthigs at the moment.

Therefore I would recommend you try to identify areas of low coverage in your network, be patient and be methodical in the changes you make to your network, perform zwave repair several times and wait for it to rebuild.
By turning off the Hub for a while, 30 minutes or so, some say that the network is rebuilt from scratch, but it won’t be immediate and if you have poorly covered areas the problems will arise again since this is dynamic, but slow.

In Zigbee it is similar, but in my opinion it is faster to rebuild the network.

3 Likes

Mariano,

thanks, as ever, for the comprehensive, insightful and informative reply. It’s a massive bonus for us all that you share your knowledge so freely with us…

Also, I think you have nailed it… I have learned the hard way about z-wave repair… been caught out many times by physically moving a device, then. others drop out…

I did make a small change - moved a z-wave outlet switch - which would have become the next hop for the switch in question… and I did run (just one) z-wave repair. I think what you are telling me is that 1 repair is probably not enough… a bit crap really…

Anyway - I have re-added the 2 switches in question (and re-configured all the automation… grrrrr!) - all back up and running. Then I ran z-wave repair 4 times in sucession. No errors.

Key message - just don’t change (move) anything… !!! Lets see how long it lasts…

Simon

4 Likes