Z-wave repair failing

You aren’t alone. But even before the Z-Wave / Z-Wave Repair collapse I got loads of those. I think JD is right and it has to do with some local LAN ping.

None of that is z-wave repair…

2 Likes

Z-Wave network repair failure is not a new problem, unfortunately. It seems to hit most people when their Z-Wave network gets to a certain size, like 20 repeaters or so, but it can also affect smaller networks if there are misbehaving devices or an include or exclude went awry and messed up the network data.

Once it starts failing repeatedly like that it’s usually very difficult to get it back to working again without sophisticated debugging tools which we have not had a chance to develop yet. Often even if people reset their network (by deleting the hub) when they add everything back it will still have the same problems, which makes me think it sometimes is caused by problematic devices in the network.

I feel really bad that I still haven’t made significant progress on these network problems. I’m the only one working on it and I have a lot of other stuff to do too. Sorry. :disappointed:

3 Likes

Across v1 and v2, Z-Wave has always been rock solid for me up until the latest meltdown.

2 Likes

Hey @duncan thanks for chiming in. Could you please explain what kind of “failure” are you referring to and how does it manifest? Also, for consistency purposes, @slagle mentioned that the repair failure is a “new” thing that would be fixed soon. :dizzy_face:

Slagle was responding to Mike_Maxwell’s mention of that bug as a reason why there is a sudden uptick in problems with Z-Wave devices. It’s not going to cause repairs to fail, but usually people don’t run repairs unless they are noticing problems with their devices.

Just getting repair errors isn’t a problem in itself, you can have a working Z-Wave system that consistently throws errors in repair. When someone has a Z-Wave device that stops responding and goes to run a repair and gets errors, they think that’s causing the device’s problems, when more likely it’s the other way around.

I still don’t fully understand this phenomenon, but my advice would be: reboot the hub, wait 15 minutes preferably without any z-wave commands being sent from the hub, then run z-wave repair and be patient. If you get a bunch of errors back or a “repair was canceled” event, don’t try to run repair any more. Instead, remove all devices that aren’t responding well (as in not responding to commands – if they are just throwing errors in repair that’s not necessarily bad). Make sure to only remove via exclude or force delete in the ios/android app – never delete a z-wave device via the web interface. When everything left is working, add back the other devices one per week, and as soon as problems start again take the last one you added and destroy it with a hammer.

Again, sorry.

Oh and if absolutely none of your z-wave devices are working, don’t try a network repair! It can only hurt at that point. Contact support.

4 Likes

I’ll have to try this.

Thanks for the help.

3 Likes

So I’ve removed EVERY one of my devices that reports a route / mesh error. PITA with all the apps involved… Rebooted, waited 15 min, and ran repair, which came back clean, without errors. Great. Then added back only one of the errored devices. Generic GE/Jasco switch using the standard handler. Ran a repair. Eff. Same “failed to update mesh” error. I’m totally stumped at this point.

I think you failed to use the hammer. That is the step I missed also.

2 Likes

Bad switch. These do go bad. Mine only lasted a year or two.

1 Like

They are all actually working fine(ish). Just these weird repair issues, and occasional flakiness, that started ~23 days ago. Have tried adding back other devices with mesh update errors on a similar one off basis; same error persists when the device is re added.

If it is the devices, then there must have been some massive power surge that only kind of sort of affected these 6 devices at the same time (unlikely).

All the ST issues, something is hosed in Z-wave… or perhaps some device deeper in my mesh is hosed. What is the link to enable debug logging for z-wave again??

I had the same exact answer too…Good thing I read yours LOL…Not looking forward to my aging GE switches. I have 30 or 40 of them. Haven’t had a problem with them yet, but…I am right at the edge of @bravenel encouraging experience…

Tried adding a single device and this is what I am getting. Also removed that device and added another just to enusre it isn’t a device issue, and same error. With all the problem devices remove (about 6 out of 40 z-wave), repair runs fine. Adding back any of the problem devices seems to have the same result. WTF is going on?

Not good. That’s when this platform meltdown, whatever it is, started. Don’t know what to say.

You don’t need to start removing devices just because they throw errors in repair. Only if they don’t consistently respond when sending commands from the app. Occasional flakiness could be platform problems that have nothing to do with Z-Wave. Like I said above you can have a working z-wave network that fails network repair (above a certain size it seems inevitable). Just use re-inclusion to fix routing problems instead.

The best thing to do when things go wrong and you’re not sure if it’s z-wave is to file a support ticket with the precise time of the event and tell them to pull hub logs so I can check for z-wave problems. Don’t reboot the hub because that wipes the diagnostic hub logs that we can access.

We’re planning changes to our protocols to report back for every command whether it reached the device and if not where in the pipeline it was dropped.

4 Likes

That would be awesome! It would solve the mystery of my lock missing events…

This is great news and is way over due!

In all due respect, I think it has been disingenuous of SmartThings (and other vendors for that matter) to just blame faulty Z-Wave hardware for corrupting the Z-Wave and/or Zigbee mesh. The hub is the only piece of the architecture capable of isolating these accused devices.

It is a nature of the beast and I’ve always felt that vendors that don’t recognize this will fail. Just think of the ill will and support cost when we know at some point a device will fail.

2 Likes

By “re-inclusion” do you mean use the.“replace” utility? Or something else?

2 Likes

Indeed. I’ve certainly tried both the repair and a full remove / re include for devices with reoccurring mesh / route errors. Where in the past either method would resolve stubborn mesh errors, neither method works now.

1 Like

Wow… I’ve been overly patient with SmartThings but it gets very frustrating that as soon as one thing is fixed (scheduler) another breaks (can’t handle too many z-wave repeaters). Given that I manage a team of engineers in a high tech company, I understand the pit falls of being an early adopter, but DAMN!!! Let’s get this solution stable and focus on fixing the existing customer problems before trying to grow the business.

I have 45 Z-Wave GE Light Switches and SEVERAL of them stopped working within the last 24 hours. I’ve tried to rebuild the z-wave network and get several failed messages. I tried repairing the nonfunctioning switches and one was successfully repaired but about 5 others failed to repair given one of two messages (No failure found or a timeout error). I’ve had problems with the rebuild utility for months but I was happy as long as the devices were working. Well, the devices are no longer working and neither is my happiness. What makes it worst is hearing that only one SmartThings engineer is assigned to resolve the problem and it’s not even high on his priority list. Seems like SmartThings management should adjust some priorities. So let me know what I should do with a nonfunctional SmartThings v2.0 automation hub?