FAQ: Zwave repair not working (how to fix error messages)

I wish this procedure worked for me like it has in the past. I have two ghost devices that simply refuse to leave my system. Support has been trying to fix it for weeks now.

This worked for me, thank you!

1 Like

I just ran my first z-wave network repair and it is proven to be disastrous. The repair seemed to have ended without error, but now more than 90% of my Z-Wave devices are down:

  • All my Fibaro dimmer 2 and Dual Relays are not responding to the SmartThings app inputs although they are still reporting the power in W when turned ON via the physical switch.

  • The Fibaro motion sensors are also not reporting motion, but still have their temperature, and illuminance reported. The Fibaro dual relays are also not responding to the app.

I saw “Err 113: Not reporting switch status” and “Err 112: ZM EEPROM write fail” in the event log. The Err 113 was referring to a Fibaro RGBW module which is actually still responding.

Everything seems to be going haywire after the attempted z-wave network repair. I’ve tried all the reboot and power down / up cycle. Numerous z-wave network repairs, but all with no success. Support seems to be taking forever to respond. Thus, would really appreciate if anyone has any idea here. *Having my finger crossed that I don’t have to remove and add all my devices. :frowning: *

When you powered off your hub, did you leave it off for at least 15 minutes?

Err 112 is an error in the hub itself, you pretty much have to work with support on that one unfortunately.

People have reported seeing this particular error much more since the firmware was changed in the spring of 2017. I have some guesses as to why, but nothing that I know that has been officially confirmed. In any case, unlike the other errors, it’s not something that specific to an individual device. So working with support is your best bet.

https://support.smartthings.com/hc/en-us

Yes. I left it for almost an hour when I went to attend to a business call after powering it down.

I have contacted the support in UK as I’m on a U.K. Hub. Haven’t heard a word from them yet.

I tried to reset some of the devices and replace them in the app. It seems to do the trick for the motion sensor for now. Not sure whether this will last since there’s those Err 112/113 messages.Planning to do it one by one although a little dreaded to get the dimmer and relays down one by one from the false ceiling.

1 Like

When you removed the power did you take the batteries out as well?

I did not install any battery in my hub. Didn’t think it was necessary.

I’m now in the midst of replacing the devices one by one. Some are successful and some was mysteriously removed by the hub in the process of replacing and thus has to be re-added. A painful process. :frowning:

Ok. Just thought I’d ask.
I leave mine out as well.
Good luck.

Thanks Bob. I really need that luck now… :slight_smile:

Just an update: I have managed to get my devices back online by replacing / removing and re-adding them one by one. Support came back after all were done, and they took a look at my case upon request. They commented that network repair would be helpful for a network with not more than 30 devices, but for bigger network, it may ties things up if a device drops off. They suggest to perform a rejoin if any of the devices became unresponsive or lagging in response. I guess that is somewhat similar to what I did, and it is a real tedious task considering the number of devices affected and their location behind the wall or false ceiling.

It seems to me that network repair may not be a good idea for network with more than 30 devices then. Any thought?

Another interesting point that I noted, the affected devices in my network were Fibaro dimmer 2, double relays, motion sensors, Window / Door sensors and water leak sensors. The Fibaro single relay and RGBW modules were somehow not affected. Other devices like Aeon multisensor 6, smart switch 6, Sensative strips and Remotec ZXT-120 were also not affected. Just thought that it is interesting that the devices affected are all Fibaro made, but perhaps I just happen to have a lot of Fibaro devices.

This is one of the sore spots for me as a network engineer who worked with zwave before I ever bought SmartThings. :rage:

Several of the competitors to SmartThings run a Zwave repair every night or every week as a regular maintenance activity. That’s the usual best practice. It will take longer if you have more devices, but it shouldn’t hurt anything if you do.

Z wave repair is how you keep your network running efficiently.

That said… SmartThings support does tell people that it can be a problem if you have a lot of devices. That makes me crazy. I can only assume that there is something in the SmartThings cloud architecture that introduces these issues. I know that in the past one of the senior zwave engineers at SmartThings has said the issue only comes up if there is already a problem in the network, but again, I have to think that has to have something to do with the cloud architecture. Because normally, per the spec, the zwave repair utility should be one of those “can’t hurt, might help” protocols even if there are problems. That’s the whole point of the repair utility, to help identify and fix problems.

Anyway… I have to believe them because it’s their architecture. But I still don’t like it. :disappointed_relieved:

Also: note that if you have any zwave classic devices and you add a new repeater to your system the only way to get the old devices to use it is either to run a zwave repair or to completely rebuild your entire Z wave network from scratch. So even if you have 100 Z wave devices, you’re likely going to want to run a Z wave repair from time to time. Just sayin’…

6 Likes

JD - I have a ticket open with support but I have this when I run a repair
zwNwkRepair failure Network repair for Living Room Ceiling Fan

The switch has been working for a long time. No issues until recently. Nothing changed in the wiring either. Not sure what else to check?

I’m running into similar issues with zwave switches all of a sudden

@JDRoberts Thanks for such a detailed description of the Zwave repair messages (some time ago) Just trying to %100 understand this zWave repair thing and have one question. It stems around the fact that I have one Zwave plug that is too far away from the Hub to speak consistently directly to it and it is always the only one that comes back with the ‘Falied to update mesh information’ message.

When you describe:

Does this mean the hub tries to DIRECTLY speak to each device and this is why I get the error message of ‘Failed to update mesh information the one device’ furthest from the hub?

I guess I am wondering if that one device ever meshes with another closer to it instead of trying to always speak directly to the Hub. Is there any way to get the mesh network info out of the Hub to know what device is meshed to what?

One “hop” is the distance that signal will travel between two zwave devices. If you have typical US construction, indoors that’s probably around 50 feet for zwave classic and around 75 for Z wave plus. If you were outdoors on a clear day with a line of sight and no obstacles in between, the distance would be a lot farther, but as a rule of thumb when laying out home automation devices it’s going to be around 15 m, a little more.

If a zwave device is within one hop of the hub, it will communicate directly to the hub.

If it is further away, it will use the mesh, which is to say it will communicate to a device which it knows to be within communications range, and then that device will pass the message along to the intended destination. This is allowed to take up to four hops.

So it’s like the old pony express mail service. The first horse carries it as far as it can, then the message is given to a second rider who carries it along to the next station, and then a third rider might actually deliver the message.

The device chooses which neighbor to give the message to to pass along based on several factors, but the table of which neighbors are available to use doesn’t get automatically updated. It’s really important to understand that. It gets created once when that device is joined to the hub, and after that, its neighbor table will only get updated if you manually run the “Z wave repair utility.” No matter where you physically move the devices to or if you’ve added new ones.

So if you do a “bench pairing,” As many people do, and pair all your new devices when they are very close to the hub, they won’t know who their true neighbors are after you move them to their intended location. This in itself can really reduce the efficiency of your network.

That’s why after you add a new device and move it to its final location you should run the Z wave repair utility. That way the new device knows who all its true neighbors are and all of those neighbors know the new device has moved into the neighborhood.

Typically you get the “failed to update” message from the device that’s physically farthest away if you didn’t lay out your backbone of repeaters just done a distance based methodology. The one that’s farthest away is the one that’s having the most trouble reaching the hub, and it’s quite possible that a number of devices had neighbor tables didn’t reflect their true neighbors and so no one really knew how to get the message to that guy.

This is one reason that field techs trying to troubleshoot problems will run a Z wave repair several days in a row, just trying to finally get everybody’s neighbor tables updated correctly. Especially if you haven’t run a repair for a long time. :wink:

So say you have the hub and three devices, a, B, and C.

hub A B C

Each one is maximum distance from the next.

You added all the devices very close to the hub, but at different times, first A, then B, then C. And you’ve never run a zwave repair utility.

That means A thinks its only neighbor is the hub.

B thinks its only neighbors are the hub and A.

And (read this carefully) C thinks its only neighbors are the hub and A. Not B. And it’s wrong on both counts. :scream:

In this setup, no one knows how to reach device C. It’s too far away from both the hub and A. B doesn’t know C exists and C doesn’t know B exists.

Oops!

It’s no wonder if messages to and from C can be a little flaky, even if it’s physically sitting right next to B.

The first Z wave repair you run should get A totally cleaned up and should make B aware that C exists. But B will likely not have passed along the “update your neighbor table” message to C when it first came through. So you’ll get the error message for C. The hub knows that C exists, but it still isn’t quite sure where it is.

However, once everybody’s neighbor tables are updated, which can take a while, and you run the Z wave repair utility a second time, that time, because B now knew that C was one of its neighbors, C should get the rebuild message and as long as it’s physically close enough to be to receive a transmission, the error message will probably go away.

With zwave plus, The network gets the benefit of “explorer frames” which are sort of like one device just calling out “can anybody hear me?” And then using that information. (That’s a bad analogy but it will work for now.) So if both B and C were Z wave plus devices they might eventually have found each other anyway and gotten their neighbor tables updated. But most people have a mix of zwave plus and Z wave classic devices so you can’t count on that.

I hope that helps.

But to answer your specific question the whole point of mesh is that devices are talking directly to other devices to pass the messages around, they don’t all have to have a single hop path to the hub. That’s why Z wave and zigbee devices can have a much lower energy draw than devices that do you have to be able to Cover everything in one hop. Which is why Z wave and zigbee sensors have much better battery life than Wi-Fi sensors with comparable functions. :sunglasses:

7 Likes

The IDE shows I have over 150 devices however I believe the number of actual Z-Wave devices to be at around 100. When I took a peek at the mesh network map using the Zensys tool it appeared that every single device had a large number of nodes it could talk to so if the routing table is optimized, none of my devices should ever have an issue reaching the hub… but that doesn’t seem to be my case as some devices either half work (report events but are not controllable, or do not report events but are controllable) or just drop dead (no control, no events) until they are excluded and added back. I also see events in the hub log from device IDs with no names which I am guessing are “ghosts” as those IDs do not appear in my device list. Network repairs used to help but now they often seem to do nothing or possibly even break the system more. ST support insists on telling me NOT to use the tool but I really can’t follow their suggestion given the frequent addition and/or changes I make. I don’t know for sure whether it actually is a routing issue but it does seem related. Typically a device reset, exclude and include will fix the issues for some time but it is a really pain to set everything back up in smart apps and webCore. “can’t update mesh info”, “Can’t read protocol info”, etc errors just seem to never go away lately when once I had to run the repair 2 or 3 times to get rid of the errors (whether it actually helped or not with my issues).

I have wired zwave devices in every room (dimmers, fan controllers) maybe around 50 or so of such devices all over the house so that should be a robust backbone evenly spread in the 3400sqft area. I have also added 10 or more zwave plus outlets where needed and close to my 3 door locks (for beaming). I also have a good number of plug in zwave and zwave plus switches and dimmers evenly spread around the house. Typically I try to pair the devices in their final operating location and resort to doing so close to the hub only when it is just impossible to get it included. When including devices I often get no feedback in the ST app that the device was found and added so I end up resetting it and trying again which results in ghost devices everywhere… now that I know I am more careful about it but why isn’t the ST app showing it found the device? Or why is the wait so long (even the app says it is taking longer than usual).

The latest beta zwave radio firmware did not seem to fix these issues either… so what next?

1 Like

I hate that kind of problem – – you have my sympathies. :scream:

You probably won’t want to do this, but if you want the nuclear option for troubleshooting this kind of set up, it would be to add a different Z wave controller, maybe a vera, as a secondary. Then shift it to be the primary. Then run a Z wave repair from that hub using its utilities. Fix as many errors as you can and see if that solves the base performance problems.

If it does, the issues are due to the smartthings architecture and not just zwave.

But because smartthings doesn’t support controller shift, if you want to move everything back to smart things you will probably have to re-add each device individually, which is why I said this was a nuclear option.

So I’m not saying this would be a good thing to do. I’m just saying it’s what a field tech would do next if they had a network that they had to get fixed.

Unfortunately it would take me days (or weeks) to do that given the little time I have available at home (kids!). Also, unless I figure out how this is happening, I might just end right back where I was after all the effort but I might just have to bite the nuke at some point…
.
If only the zwave alliance allowed the zwave IMA tool to be purchased by end users at a reasonable price, maybe I could figure out the routing issues and/or bad node(s) so I can get rid of them. You have recommended against the zwave toolbox (which is pretty pricey as well) in the past but I really am not left with much other choice. I have read about the Zwave.me RaZberry but I can’t tell whether it can be used solely as a diagnostic tool without having to exclude devices from my current mesh in order to use its diagnostic features. Maybe @erocm1231 can provide some info on that since he has one.

I tried the Zensys tool but it was a struggle for it to find the USB Zstick every time and the UI is plain terrible. The large network map can’t be resized and even with 4k screens I can’t see it all. It might provide what I need but it is hard to use compared to something with a web gui designed for the purpose. I noticed that Homeseer provides all the tech detail that I would likely need but I don’t like their HA setup (at least when I reviewed it 3 years ago after buying their RPI based controller). I am hoping that ST will see the light at some point and provide a few more diagnostic tools and technical details to its users in the future rather than taking the Apple approach where they treat their customers as clueless brain-dead zombies buying their products by hiding all the nuts and bolts to make it grand-grandpa friendly.

1 Like