Anyone else losing confidence in Device Health besides me?

FYI @slagle, @JDRoberts and anyone else that wants to voice their displeasure…

Here’s the reply to my support ticket I emailed in yesterday. Of course, ST support closed the ticket assuming this was the perfect resolution. Unreal…

I’m not going to reply because it’s useless. It’s not just in the mobile app, but in the IDE, and resetting the hub isn’t the savior of all things wrong with ST.

Xxxxxxx X (SmartThings)
May 29, 8:59 AM MST

Hello

Thank you for contacting SmartThings Customer Support. I’m sorry to hear you’re having trouble with your SmartThings App showing incorrect states, but I’ll be happy to help!

We’ve had a few customers experience this issue. There are a couple ways to fix this. The first is to log out of your SmartThings App, force close it, then open it again and log back in. This will refresh the states of the devices on the app. If you are still experiencing this issue, then a hub reset can help as well. Just push the reset button on the back of your hub and wait for it to reboot. Give these a try and let us know if the problem still persists. Thank you!

Please let me know if you have any further questions or concerns. Thank you!

Hope you have a great day!

Xxxxxxxx Xxxxx - SmartThings Customer Support
:scream:

Finally! I’m a member of the club! :sunglasses:

Two of my switches were tagged ‘unavailable’ and their on/off controls removed.

Solution: turn off Device Health & refresh list of things in My Home – both switches work fine.

Returned Device Health to ‘on,’ just to see if it might happen again…

1 Like

My advice is to write in to support (you don’t need to necessarily follow the steps like logging out if you don’t absolutely want to, though some might actually resolve incidents that are isolated) and kindly note that you have noticed a few devices are causing trouble with device health and seem to report false-positives. (It is possible that at the time the device health offline event is triggered the device was in fact “offline,” but only so briefly that it wouldn’t be something you or I consider offline.) Be as detailed as possible with regard to the device (if you know how to use the IDE, getting the UUID – this is that random string in the URL when viewing that specific device like 7bde5337-5709-489b-b2ea-0c1b91b2fc9a – or the Device Network ID – it might look like C91F – is helpful, as well as the the “Type” field with you click the edit button, as if you were going to re-type the device) and when it seems to happen, if you can. Also, please include info on what the device actually is, including manufacturer and model. Remember, be as specific and detailed as possible.

My understanding of support is that if enough requests are received that appear to not be an isolated incident a problem ticket will be created and passed to engineering for review. (I will note that at the time I have written this comment I have not actually checked our bug tracking to see if such exists.)

Even if your thinking is, “oh, it is just this one device and not really frequent, I am just turning device health off for now,” please send in so the data gets collected! It is likely that a few DTHs don’t play all that kindly with some devices and we will need to re-evaluate them.

1 Like

@professordave Communication to the device is definitely possible. I know one logic to device health is that if the device doesn’t check in after some period of time (generate some sort of event, even if just a battery event) the system will try to ping the device, and if it doesn’t hear back mark it as offline. Sometimes toggling a device can do that. For instance, in my case, my two cree bulbs in my closet went offline. Turns out I had flipped the switch off. However, restoring power still did not allow me to communicate with the devices. I imagine it was trying to route through another device that’s battery died. :confused: (It was a sensor that I wasn’t actively using, so I didn’t care so much about it.)

Is this how it works for ‘sleepy’ devices, too?

Hi @professordave,

I work on Device Health here at SmartThings. Thanks for the feedback and know that we are taking this feedback very seriously. Let me briefly explain how Device Health works:

We determine the “aliveness” of a device based on events generated by the device. Devices on our platform usually have an expected reporting interval (e.g: report in temperature values, battery levels, light levels, etc) in the absence of an event. We use the “3 strikes you’re out rule” to determine if a device is OFFLINE.

For example, let’s say SmartPower Outlet reports in every 5 minutes. We are expected to hear from this device every 5 minutes. So after say 10min or so, we ping the device. Device Type Handler implements “ping()” method- in an attempt to generate an event from the device. If we still do not hear from the device in 1 minute, we mark it OFFLINE.

Hope this helps! In the coming weeks, I will be working with our Customer Support and Knowledge Base team to get more documentation and streamline the Device Health triage process.

cc @dckirker @jesse

6 Likes

From a network engineering standpoint, for Z wave and Zigbee mesh networks I would be reluctant to mark any device off-line unless there had been no events for at least 12 hours. And maybe 24. The whole point of this topology is that it is supposed to be resilient. If you’re moving some devices around, if you’re changing batteries, if you’re doing some tests, the network should run as it always runs and self correct as needed.

I understand the customer desire for immediate feedback, particularly from customers who are used to Wi-Fi networks and star topologies, but mesh networks are designed to operate with a light hand. Tiny messages sent infrequently.

This design feels like something intended for a Wi-Fi platform.

JMO

5 Likes

Very good feedback @JDRoberts. From using Z-Wave sensors, I get your point. Let me bring up this idea back to the team and our partners to see if the 12 / 24 hours checkInterval tolerance would be sufficient.

You’re right; the design does work better for LAN & Cloud connected devices.

cc @mortent

1 Like

@jackchi
Can you check the interaction between the linear auxiliary dimmer switch device type and device health?

Is there a possibility there is a bug there for this one device type?

All of my other devices are behaving fine, it is just my 3 linear auxialliary dimmer switches.

Thanks!!

I really don’t like adding new Zigbee devices as I typically get issues with existing Zigbee devices every time I do. Even moving an existing one can be a problem. I’ve got 20 Zigbee devices on the shelf and right now I don’t have the willpower to install them due to these issues.

Personally not had any issues with Z-Wave yet.

What do you have? Interested in trade/sale?

Motion, contact and leak sensors. Not selling. I will eventually install - just want the platform more stable first.

1 Like

It will never be stable, you should cut your losses and sell them all to me at a deep discount. :grinning:

3 Likes

No one likes an ambulance chaser :grin:.

5 Likes

@jackchi @mortent

Another Question: When a device is marked as not available, other than showing as unavailable in the App, is the device treated differently while unavailable?

Is the device removed from the Zwave or Zigbee Mesh?

Are schedules events not sent to the device?

If I perform a zwave repair while a device is marked unavailable, is that device not included in the zwave repair.?

If marking a device as unavailable is just a status and no behavior is changed, a statement would be appreciated as there is some confusion on these forums as to whether marking a device as unavailable causes other issues.

If marking a device as unavailable does in fact change behavior, maybe device health should instead just be a status, and no behavior should be changed.

By changing this to be just a status, then if the device is not available for just a short while, but will be available shortly, the status might be true, but when the device comes back online, everything is fine!!

Suggestion/Question

I assume from the behavior I see that when I click on the “X” inside a device that NO poll request is sent to the device.

Suggestion is when the “X” is clicked, a poll event should be sent. What I have to do now is to go and toggle the device on which will then mark the device as available.

If clicking on the “X” does send a poll event, then this is the bug for Linear ZWave Aux dimmers. If I turn them on and off, they become available again. But clicking on the “X” does not make them available.

I’ve been having several seemingly related issues over last 2 weeks or so. Several of my devices (all SmartThings brand ZigBee devices) are showing up as unavailable. They appear to be working then randomly go unavailable and actually do stop working. Not sure if they are stopping due to Device Health marking them or some other issue? In any case, I cannot get them to come back “online” without removing/replacing the battery. Battery level is good in all of them and majority if these device are <40’ from the hub. I’ve used this method to get them back several times plus rebooted the hub twice but the same thing keeps happening. None of these devices has ever had issues until about 2 weeks ago. Anyone experience this? I know there is a hub update expected today so I’ll probably wait until tomorrow to see if it improves but otherwise I think I’ll need to open a ticket.

I am having the exact same issues. It even happens on devices within 15 feet of the hub. Clearly there are problems with this feature. All of my 10 or so sensors have shown up as unavailable at some point yet they all work fine when device health is turned off. Time to give up.

Either my memory is failing, or Device Health got turned ‘on’ by itself, somehow. I was certain I’d switched it ‘off’ on both hubs!

Anyone else notice it ‘on’ again after turning it off?

1 Like

yeah someone from ST mentioned that there were some major improvements made to device health and it was automatically enabled during one of the recent app updates. I’ve noticed an improvement since I first decided to disable DH about a year ago. So far its still enabled :slight_smile:

1 Like

I have had lots of issues with device health marking working devices as unavailable. Click the X to ignore it, hit refresh and the device works fine. Support each time said to turn off device health. Each time I told them I have 5 leak sensors that report 100% battery because of that bug and I want to know when they are offline. They then said don’t turn off device health. :confused:

So for me I really do hope they made it better like they said they did with the last firmware update.