FAQ: Why do Zigbee "things" just stop working?


(David Wright) #1

I have a Centralite open close sensor that just quit being recognized by ST. I removed it from the app, reset the unit and re-paired. All is now OK. My question is why does this happen? It worked fine day after day and one day became “unavailable” . Knowledgeable input would be appreciated.


Things keep disconnecting
(ActionTiles.com co-founder Terry @ActionTiles; GitHub: @cosmicpuppy) #2

First things first… What exactly do you mean by “quit being recognized by ST”?

  • Was it still in your Things list?
  • Do you have Health Check disabled (I still recommend not using Device Health Check!).
  • Did it just not reflect the current state of the contact sensor (open/closed)? Did you try at different times of the day or was it solidly dead?
  • Any other symptoms?
  • Did you check Live Logging: https://account.SmartThings.com
  • Did you check the Device Event History (same URL)?

Reasons they “stop”…

  • Dead or low battery.
  • Defective
  • Temporary RF 2.4ghz interference: WiFi, microwaves, old cordless phones, other ZigBee networks, …
  • Weak signal path through your ZigBee mesh: Repair your mesh by unplugging (and removing batteries) from your Hub for 15 to 30 minutes. All devices will attempt to communicate with their strongest neighbor that is non-battery powered in order to determine the best possible route. If you don’t have mains powered devices (e.g., ZigBee Outlets or light switches), then add some to your SmartThings Hub first.

#3

There are a number of different possible reasons why this can happen, and it depends in part on your specific set up.

@tgauchat has given you a good starting list for zigbee devices ( which is what the centralite sensors use to communicate with the hub). I believe he is still running on a V1hub, so he may be unaware that probably the most common reason for this to occur if you have one of the newer hub models is the fact that those models come with “insecure rejoin” disabled. (this option was not available on the V1 hub)

There’s a long and complicated technical reason for what this means, but here’s the short form:

One) zigbee devices are supposed to check in with the hub on a regular basis.

  1. The exact frequency of this check in is left up to each manufacturer, as the check in will use up some of the battery life. But the zigbee home automation profile" that smartthings uses does have a maximum of under one hour.

  2. if the device fails to check in, the standard provides for two possible options.

3a) “insecure rejoin,” which essentially tells the hub not to worry about it, and if the device shows up again later, it just rejoins and everything goes forward. However, this can create a security vulnerability if someone is intentionally trying to hack your network.

3b) " secure rejoin," which requires an additional authentication step at the time of the rejoin. This eliminates the hacking vulnerability, but means that if any zigbee device doesn’t report in time, it will be marked as off-line or unavailable, and you will have to go through a reset step to bring it back. This can be inconvenient, so smartthings does give hubs V2 and up the option of choosing between the two methods. But the default is to “disable insecure rejoin”: The double negative meaning that if a device takes too long to check in for whatever reason, you then have to reset it to get the network to recognize it again.

But why didn’t my zigbee device check in on time?

So the next question, of course, is why would the device fail to check in in time?

Again, there are multiple possibilities for this.

  1. maybe the batteries died and you didn’t realize it. Or you just didn’t get around to replacing them within the check-in window.

  2. maybe the device is not fully adhering to the ZHA standard. This is true for example of the xioami devices. These are very popular sensors because they are well engineered and very inexpensive. However, they are made by a Chinese company that only intended them to be used with their own Gateway, and they do not fully implement the ZHA standard. and in order to preserve battery life, they are scheduled to check in exactly once each hour, which is just over the limit in the standard that smartthings uses. So depending on how often the devices are activated at your own house, you may find that this specific brand drops off the network a lot. On the other hand, if the sensor is in a location where it frequently triggers, you won’t have a problem because the hub will still know that it’s “alive.”

  3. you may have added additional devices to your zigbee network, or moved devices around, in such a way that your network is not operating as efficiently as it used to. This can cause messages to get lost or have to be re-transmitted and you may miss some check in windows because of that. This is why support will ask you if you’ve added any new devices or changed the physical location of any of your devices recently.

Four) similar to three, one of your existing repeater devices may start to go bad for whatever reason. If the child device can’t talk to the parent, then the child device may miss a check in

  1. sources of local interference such as @tgauchat mentioned, may have increased, particularly boosted Wi-Fi, and that can also cause checkin messages to get lost or delayed, again triggering the security rejoin protocol. In this case, it doesn’t even have to be something that you were doing: you may have a neighbor who has added a new boosted Wi-Fi device.

  2. sometimes in an effort to troubleshoot intermittent problems people will add polling or refresh requests to the network. This is almost always a mistake for zigbee as it just overburdens the network with more traffic, causing more messages to be delayed or lost. :disappointed_relieved:

  3. you may have recently added a zigbee device that does energy monitoring. These can absolutely flood your network with Messages, making it difficult for other traffic to get through, and then causing other devices to miss the check-in.

  4. One that is similar to seven, but doesn’t actually have anything to do with insecure join, is if you have added multiple zigbee lightbulbs to your network that connect directly to the hub (not to a hue bridge). For various reasons most of the zigbee lightbulbs available at retail now turn out to be unreliable repeaters except for each other. If a sensor ends up choosing one of these bulbs as their parent, and the bulb keeps dropping your messages, then the sensor may miss the check in.

  1. you switched to rechargeable batteries. Rechargeable don’t have the same usage curve as non-rechargeable, and as their level fluctuates, they may fall below the threshold where the message can be delivered even though the battery level report is OK. So, once again, the device may miss the check in period. This is another one where you may see very inconsistent effects, because maybe the battery is still good enough to reach one repeater but not another and it just depends on which one is busy when the message is sent. Also, the battery may react to temperatures , changing the strength of the transmission signal on very hot or very cold days.

  2. your hub was off-line for a while, and when it came back online devices chose different parents and a bottle neck condition caused some devices to be orphaned. There’s another long technical explanation for this, but the main point is that you might be adding devices back to the network in a different order than you used the first time around. But if this is the cause you should be aware that the hub either had to be rebooted or was off power for a while before the problems started.

  3. because smartthings is primarily a cloud-based system, it’s also possible That problems on the cloud side can cause the hub to ignore messages for a while, which means the checking messages will be sent just fine but the hub won’t realize that they were sent. I would like to say that this is rare, but it seems to be one of those things that can happen for several days in a row and then not happen again for several months. It’s just one of the vulnerabilities of a cloud-based architecture. :cloud_with_lightning_and_rain:

I know there are a couple more that I’m missing but I’m tired today. Anyway this will give you some idea of why a zigbee battery operated device goes off-line occasionally in addition to the primary factors list that @tgauchat already gave you. (Primary factors being issues with the sensor itself such as insufficient power, defective device, or local interference.)

The ones I’ve added here are mostly secondary factors where the message is being formatted and transmitted but not being received, or in particular not being received within the check in For secure rejoin.


(David Wright) #4

Thanks for the input, I have learned a lot about device failures. Maybe I can do some things differently to avoid this in the future.


(David Wright) #5

Yes. It was still in the things list but marked unavailable. Device health is on. It did not reflect the open closed status and I did try for a few days. No I did not check live logging or device event history. My bad. The unit is fairly new, so I don’t think the battery is the problem. The mesh shouldn’t have changed recently. If I was to guess, I would blame rf interference. Thanks for the input.


#6

The following FAQ might be of interest.


(Steve Jackson) #7

Thanks JD. I haven’t had any ZigBee issues but learned a lot from the post.


(ActionTiles.com co-founder Terry @ActionTiles; GitHub: @cosmicpuppy) #8

Try turning it off.