Need help with trouble shooting distance issue

My network has about 30 devices, 3 locks, 20 lights/dimmers by GE, 4 repeaters by Aeon and 3 motion sensors by aeon. Most of the devices are concentrated in the front part of the house. The hub is located in the mid rear of the house. It’s definitely out of range without the repeaters. When I migrated over to v2, i specifically added the repeaters first and then the rest of the devices. So for about 2-3 weeks things worked but occasionally (at once a day) some command would fail to route. Especially true when I try to run a large routine, like turn all lights on and turn all lights off. What I would end up with is some lights not acting.

So last night I decided to see if clearing up the routing table, hoping would resolve the delay and I did the following:

  1. Unplugged power and took out battery in the v2 hub. For 15 min+. As I understand it, this puts all devices in “lost” mode.
  2. Turned back on the hub and ran a repair zwave network.

Worked well, extremely well for a couple of hours. There was very little delay, everything was great!

But this morning, the lights, locks motion, on the outter limits of the range stopped working.

I ran more repairs, but they all finished within a minute but no improvement.
Can someone here explain what could have caused this behavior? What’s the right way to optimize the routing of the zwave network?

I thought the usual advice was, repair several times, or something like that, to be more likely to catch the sleepy devices when they wake up briefly.

It doesn’t really explain why you have good performance that then erodes.

The radio environment changes surprisingly often so I would add more repeaters and expect failures.

Also locate your hub higher and change orientation.

Also stand on your head. Hey it could help.

1 Like

Zwave has a 4 hop max, while yes repeaters can help make a zwave network more robust, if it takes more than 4 hops to get to the hub/device the command will still time out. You may want to consider adding a few zwave plus devices to give you better range.

This is typically a sign of intermitent outside interference, other devices that are also in the 900mhz range is where i’d start, or maybe the hub is too close to the kitchen and the microwave is messing with the signal.

If the zwave has a 4 hop limit, does that mean having too many repeaters might actually be bad?

No, the devices are smart enough to build a route that takes maximum advantage of what’s there.

If there are seven repeaters in a room, but you’re trying to get a message to a device down the hallway in a different room, the route will pick the neighbor that is all the way across the room to do the repeat, so you won’t run out of hops before you get there. It tries to go as far as It can in each hop.

But to be efficient, it has to know its true neighbors, not just the ones it had when It first joined the network. :sunglasses:

Can you elaborate what it means to be “true neighbors”?

When the hub sends out a command to a desinated device, is the message sent with a specific route?

as @JDRoberts mentioned, no it’s not bad… but if the only thing the devices does is repeat, it can be unnecessarily expensive w/o fixing a range issue. As zwave plus devices have ~5x the range of standard zwave devices that can do a better job of covering more area than a standard zwave devices.

That’s a really complicated question to answer because there’s a primary mesh routing philosophy, and then there are a whole bunch of engineering tweaks to it.

So the next thing I’m going to say is not strictly true, but is the intent of mesh.

(Both Z wave and the zigbee networks controlled by SmartThings are mesh topologies.)

Resilient routing

The idea of mesh was that there would be no fixed routes. Instead, the hub would pass the message with the destination on it to a device, who would then relay it by the currently available best route to the neighbor in its local table that was closest to the intended destination. And then that neighbor would start over trying to find the best route to the destination using its neighbors.

The whole reason for this is it allows you to have battery operated devices who are not continuously awake and attached to the network. If a particular device is off-line for any reason, because it’s sleeping to save battery, because its having its batteries changed, because it’s defective, it shouldn’t hold up any routing. The other active nodes will just find another way to get the message through. And when that original device does come back to the network, nobody has to do anything To get it back into the message relaying business. If it’s the best available node, it will get used.

So far so good. There are a lot of advantages to this, in particular cost savings, relative to devices that are continuously connected with fixed routing. There are also some disadvantages Just depends on what you need to do.

So every node has a short list of potential candidates to relay messages to other nodes. But only the hub knows all the nodes.

Neighbors

The potential candidates that a single device has on its address list are “neighbors.” They’re intended to be the devices within its own signal range. The idea is there’s no need for it to know about nodes that are three hops away, It can’t talk to them anyway. So when a message comes through, it basically says “can I talk to this guy directly? If not, I’ll pass it to the guy who sits as far away from me as possible and he can see if he can reach it.”

Wait, what if it sends the message in the wrong direction? Like downstairs to the basement when it’s needed upstairs in the hallway?

OK, right about now, somebody usually says doesn’t that mean messages bounce around a lot? What if the message heads in the wrong direction the first time?

And the answer is yes! so in fact, if the resources are available, the hub does do some preplanning and lays out some routes in advance based on who is available and then tells the individual devices to remember those routes and try them first and the whole thing gets really complicated.

These are the engineering tweaks I mentioned. They’re all about improving efficiency. But they’re just add Ons to the basic philosophy. They only run if there are resources to run them. But even with the tweaks, each individual node only knows about a few other nodes. And any individual node can leave the network without breaking the ability to relay messages around the remaining network.

The true neighbors change as the physical location of the device changes

In order for all of this to work, the list of neighbors that each device has should be the devices that that device can talk to in one hop. The candidates for relaying messages. It’s fine if some of them are off-line. But if any of them are physically out of range, it messes up the whole system.

So “true neighbors” are the devices physically within one hop. Moving a device around the house changes who the true neighbors are. But the device will not know it until its address tables are rebuilt. It will still try to use the neighbors it has on file from the last time the address tables were built .

So that’s what A network “heal” is for. It forces each individual device to rebuild its little subset of neighbor addresses so that they are devices it actually could reach. The “true” neighbors instead of just the ones that were nearby last time.

That’s a very detailed answer. I really appreciate you taking the time to educate me.

How can I get each device to report who their neighbors are? Is there a smart app that would gather these stats?

To report to you? You need a system mapping tool, they’re quite expensive. SmartThings doesn’t give us access to this information. But most of the time you wouldn’t need it, you just the run repair utility, that should take care of things. The whole point is it that everything is supposed to happen without human intervention anyway.

If you mean how do you get the devices to report their neighbors to the hub, see the following FAQ:

Ok, so tonight, I have discovered that “repair” function launched off the app vs. launched off via the graph.api site is completely different. The app one finishes in 1 minute, while the graph.api one takes 20+ min.

However, I am in a worst state now. After doing 2 repairs, the entire front side of the house devices stopped working. The rear side of the house devices still works. But I think they work because they are within the range of the hub.

Should I move the hub to the front of the house and run a proper repair and then move the hub back to the center of the house? I am beginning to feel that I must have all the devices re-join the network again… help please…

I feel your pain, that sounds awful. :scream:

As far as I know, it’s the same utility in both places, but maybe the phone app one is actually never executing at all. You never know with SmartThings.

At this point it could be any of the many things, and I’m dealing with some database corruption issues at my own house that seem to have been caused by the platform update yesterday (switches that are neither on nor off, and some commands being sent twice randomly), so maybe we just both got caught up in something in the cloud. It’s very frustrating.

Don’t move the hub. The hub is one of the devices that repeats, so if you move it, you end up with bad address tables. For a repair, the hub needs to stay wherever it will live permanently while the network is running.

You can move the hub temporarily if you need to exclude and include a fixed device, like a doorlock that’s already in place, but then after you move the hub back to its usual location you need to run a network repair to fix the address tables again.

At this point, it may be that the only thing you can do is get in touch with support@smartthings.com and get their help. I wish I had a better answer for you.

Yea I have noticed that I see ‘on’ command sent twice back to back for one switch and the switch is stuck on a false on mode. I couldn’t even get it exclude!

Where is the official notice for platform updates? If the database is corrupted the only option is to run exclude and include again?

I am once again happy with smartthings.
Steps taken to get back to normal were:

  • move the hub to front of the house to exclude devices one at a time.
  • during the exclusion I experienced some wierd errors. One of the light won’t exclude at all. Two other light needed to be excluded twice. Twice during exclusion I got an error message saying another devices was excluded.
  • re-included all the devices and purposely left out a repeater this time.
  • move hub back and ran repair from graph.api interface. First repair finished super fast, like under a minute. I knew that one didn’t do a thing so I kicked off the second one. During repair nothing worked.
    But once repair finished everything was working again and life is good.

I believe somehow my previous setup got messed up at the identification level. Kind of corrupted itself or at the db level. But I can’t tell exactly what happened.

1 Like

Excellent!

BTW, it is true that while the zwave repair is running nothing else will ever work. It’s a system utility that takes priority over all of the messages and basically your network will just be off-line until the utility stops running.

Once the hub is finished running the utility, typically about 20 minutes but it depends on how many devices you have, the individual devices will work again, but for some complicated technical reasons you may not see the full improvements from the utility until the next day.

Even for,devices thay say to incude close to the hub I try first in the location where they will live it makes things easier. 95% off the time this works. Especially if I already have wired devices such as switches etc nearby. When I setup a new house I did hard wired light switches, on off switches , garage opened etc first. Basically anything that would be a repeater. Then I moved to battery devices such as motion and open close switches , water detectors etc. Only had move stuff close to hub once or twice. I also place the hub in a central location so I am only going through 1 floor and 2-3 walls Max.

Hmm, my experience tells me that’s not at all how pairing mode works. I actually did try to pair things from where the hub is sit and expected my switches and relays to route the “pairing” requests, unsuccessfully. I think it has to do with existing in network devices can’t route to a device that’s not yet in the network. Otherwise it’s kind of scary that an unpaired device can get existing devices to talk to it. Sound like a security issue if that’s allowed.

It’s not allowed. They both have to belong to the same primary controller.

That said, with zwave plus there is a feature which allows the pairing requests to be relayed if they’re at a physical distance from the hub. But the hub has to be in pairing mode and it’s just asking the device already on the network to help pass the messages onto the new candidate member. The candidate can’t actually do anything until it’s formally accepted by the hub and assigned a network ID for this particular network.

So it’s really just a signal boost kind of concept, not letting an unknown device onto the network before it’s approved.

BTW, the Aeon minimote is a secondary controller that can be authorized by the SmartThings hub that is its primary controller to add new devices to the network on the hub’s behalf, and that can be very helpful, especially for light switches that are already wired in place. However, I don’t think it’s allowed to add new locks, but to be honest I don’t remember for sure.

I suppose that’s missing from “legacy” zwave version… which happens to be 99.9% of my devices currently.