So I am new to developing for SmartThings but not new to Z-wave development. I have a really good understanding of the z-wave protocol. I will only be working on EDGE drivers (I think that’s the only option now-a-days anyway).
Problem I am trying to solve is there is a sleepy battery device I am working with, that for some users will get flagged as offline and does not show as online again until it is triggered manually. It works perfectly fine and communicates it state to the hub when triggered so it is in fact still functional and should not be marked offline. Since it is a leak sensor they sit idle nearly all the time except for the scheduled wake up. Either the hub is not getting the wake up or the wake up stops keeping it online, but it does not impact all users. So it could be some sort of z-wave signal problem for specific people. Either way I just want to make the driver more robust around the wake ups and health checks to hopefully keep it online as much as possible.
So the questions I cannot seem to find answers for are below. If this is in the docs already please give me links. I am having a terrible time trying to search the docs. I have tried using built in searches and google searches and it is very hard to find whatever I am looking for.
Do I need to do anything with the healthCheck capability? I seem to be finding mixed information as to whether it should be included anymore or not?
Do all z-wave devices automatically get health-checked, or only sleepy devices, or none? Example, if I disconnect a light switch and try to command it, will it get flagged as offline at some point?
If automatically health-checked can a driver opt-out of the health check somehow?
How is the check interval determined on mains and sleepy devices? For example on a sleepy device it is calculated from the Wake Interval reported by the device?
How can we see the calculated/determined check interval for a device?
What is the criteria for a sleepy device to get flagged as offline? For example, if it misses one wake up, multiple? Some other time interval?
What is the criteria for a sleepy device to be flagged as online again?
No, the health check interval is defined by an internal algorithm when the device is installed, so, even if you include the capability it won’t have any effect.
AFAIK, all devices are health checked, but it’s more noticeable when they are sleepy. I will confirm with the engineering team if non-battery-powered devices are also checked based on your example.
As we cannot control the algorithm, there’s no way to mark a device as “untracked”
I don’t have information on how the algorithm determines this, but I’ll check with the engineering team from the “wake interval” perspective.
I don’t think it’s possible to see this value directly, only monitoring the driver logs but I’ll confirm as well.
Let me check this as well with the engineering team because I’ve seen devices get offline after one ping from the health check is missed but it was in the case of Smart locks, I’m not sure if it’s different in other device types.
The device should come back once SmartThings receives a new event from it. In the case of routines where the device is included, commands to it are still sent and if a response is received, it should go back online.
You mentioned you see the device report its wake up? Are you catching this event in a custom handler or just see the report in the logs?
I am not catching the wake up report but I can see it coming in the logs with resulting actions taken. The driver does have the battery and refresh capabilities so the wake up triggers a battery request, which results in a battery report and a corresponding battery event posted. I figured it was easier to just the hub do its default actions here instead of coding it all myself.
I also do not have the offline issue myself so it is hard to troubleshoot. I am just trying to optimize the driver to have the best chance of success. I need to know a lot of the info I asked above to understand what will get the best results.
Well I can only assume the same happens for other users since I cannot reproduce the issue myself. But yes the way the driver is coded every wake up should result in a battery report and event.
It has been reported that the device stays offline until manually triggered via a water test.
The battery might be the same value every time but the driver posts it to the hub via the default zwave handler.
@nayelyz I seem to be onto something. Last night I set the wake interval to 6 hours and then forced a wakeup (which also forced an interval request). So this way the hub knows it is online and knows what the wake is set to.
Then I removed the battery from the device. It got marked as offline close to 6 hours afterwards.
This morning I did the same procedure with a 1hr wakeup and the device got marked offline around 1.5hrs later. So the health check seems to be tied to the wake interval reported by the device.
The bad thing here is that it seems to only account for one single missed wake up, which may or may not be good. Its good to know the device is offline ASAP for certain types of devices. On the other hand with z-wave there could be times when the device tries to send its wake up but it doesn’t make it to the hub. Could be a lot of traffic on the mesh, a repeater could be down, etc… Since it is a sleepy device the hub cannot ping it and check if it is really offline or not, it just has to wait for another wakeup.
I am running a more precise test now with a 2hr wake up and I recorded the exact time I forced the wake and pulled the batteries. So I can see if its 2.5hrs now or some other time frame.
This should be either extended to at least 2 missed wake-ups, or be optionally configurable by the driver to override the hub default and either set a check interval time manually, or a multiplier to extended the hubs default calculated interval.
What are the chances this would actually be changed? Or do I just need to find a way to work around it?
Oh also, I can confirm that once the device is offline, as soon as I put the batteries in it sends a wake up and it goes back to online again. So that part is working correctly. I now assume in the other reports from users they were not waiting long enough for the next wake up, and not manually triggering one. They just went straight to testing the sensor by triggering it manually.
So, the engineering team confirmed the hub watches incoming messages for the WAKE_UP_INTERVAL_REPORT message from the device and sets the checkInterval based on that.
Also, if one ping is missed, the device will be marked offline. I shared your comments with them to start a conversation about this, but we cannot share anything at the moment.
You can allow users to configure the wake-up interval to be shorter so the device reports more often, with the disclaimer that it consumes more battery.
There was also a note about this:
Sleepy Z-Wave devices, since they actually turn their radios off, they will likely always miss the hub’s pings, and we are relying on them sending wakeup notifications.
It would be helpful to know this workaround works:
We know that devices cannot be controlled from the app when they are marked offline
But, we have the possibility of going into the Advanced Users app and send a command to the device through the API.
If you use a capability like “refresh”, you could call the refresh command, “catch” it in your driver and send a Get request for the current status and see if the device responds.
This wont help because then the system just adjusts the check interval to match the new wake interval, thus creating the same problem only at shorter intervals.
This wont work because the device is asleep and will not respond to any get requests. The only option is to manually wake the device by triggering it or a wake up button on the device itself. This is true for ALL sleepy Z-wave devices (other than FLiRS devices like door locks and Blinds).
If the check interval is only set based on incoming interval reports and it does not look at outbound interval set commands then I have a possible hack workaround I have conjured up in my head. It should not come to this but I may have to try it.
RBoy
(www.rboyapps.com - Making SmartThings Easy!)
10
This one missed event timeout is a problem for all devices including Zigbee. If the mesh loses a packet the devices goes offline. I’m seeing packet loss issues with some Zigbee devices also but am in the same bucket.
I plan on testing a workaround once I get the code worked out. In theory it may let me trick the hub into a longer check interval of any time I want. I will report back here if it works. May not work for Zigbee though. Do you code Zigbee drivers at all or you just have issues for devices using other drivers?
RBoy
(www.rboyapps.com - Making SmartThings Easy!)
12
Have you ever tried to do device:online() for Z devices? I thought I read in the docs to only use it for LAN devices but now of course I cannot find that doc (I find it very hard to find anything on that site for some reason). But I tried it anyway and it does not toss any errors, but also since they hide all the information from us I dont think I can see if it is doing anything or being ignored.
I setup a little test on a device with a 1hr wakeup, and a 1hr delayed online(). Pulled the battery so it cannot wake up any more. So we will see if it gets marked offline in 1.5hrs or 2.5hrs.
If that does not work I have another more convoluted idea for z-wave only.
Hmm… also I wonder if just posting a delayed fake event (repeat battery or something) would also work, another thing to try if the online call does not work.
1 Like
RBoy
(www.rboyapps.com - Making SmartThings Easy!)
14
Posting any event will mark the device online. The issue isn’t that, it’s about differentiating between when the device is actually offline vs when there is packet loss which is causing it to go offline
Well I was thinking of scheduling a delayed event at the wake up, out far enough to be close to the next scheduled wake up. Basically extending it to allow for one missed wake, after that it would go offline. This should nearly eliminate false off lines. It would just slightly delay a real offline event.
Not even this is working, unless it does not count for duplicated events, it may need to be a unique value or something. The delayed device:online() did not work either, it seems to just ignore it for Z-wave.
Why is ST so dead set on developers not having control over the health check? Seems like they put a lot of effort into making it impossible to override the defaults. Just give me a function to adjust the check interval!
RBoy
(www.rboyapps.com - Making SmartThings Easy!)
17
I tested it with a custom capability that is on the driver “SyncStatus”. Just set it to the same status it had already. I can see in the log my delayed function ran and it sent the event, but it did not extend the time for the device to get marked offline.
I suspect that is what may work but I am not thrilled with posting a bunch of fake events that will show in the history. I would have to setup a check so it only posts the fake event if it looks like one wake up got missed.