Device Health Beta 0.24.5

tmleafs · September 13, 2018, 3:25pm

The centercode doesn’t seem to make it clear how big the problem is under the current firmware
So post a list of devices that are reporting unavailable in the format below

Device Name:
Product:
Became Unavailable:
Still Function:
Battery/Mains:
Notes:

My List

Device Name: Stevens Presence
Product: SmartThings Presence Sensor
Time Unavailable: 3 Days
Still Function: Yes
Battery/Mains: Battery
Notes:

Device Name: Stevens Car
Product: SmartThings Presence Sensor
Time Unavailable: 3 Days
Still Function: Yes
Battery/Mains: Battery
Notes:

Device Name: Muiltsensor Kitchen
Product: Aeon Multisensor 6 (Running 1.11eu)
Time Unavailable: 2 Days
Still Function: Yes
Battery/Mains: Mains
Notes:

Device Name: Muiltsensor Office
Product: Aeon Multisensor 6 (Running 1.11eu)
Time Unavailable: 2 Days
Still Function: Yes
Battery/Mains: Mains
Notes:

jkp · September 13, 2018, 3:35pm

Ok, 2 responses to this. 1) why limit it to devices that are reporting as unavailable but still functioning? What about devices that report as unavailable but not controllable? 2) why not just post in the existing beta firmware thread for 24.5? Why do we need another thread?

tmleafs · September 13, 2018, 3:41pm

Changed to all devices

I feel this is a huge issue and needs its own post to get the attention it needs

jjslegacy · September 13, 2018, 4:34pm

no issues with device health for me in the beta…

johnconstantelo · September 14, 2018, 12:38am

I’ll be here all night updating this thread…

I did add another Centercode bug, and updated another bug for zigbee devices dropping again.

Beta 24.5 has not gone well in my opinion, and I agree this needs some serious attention right now.

tpmanley · September 14, 2018, 10:12pm

Just wanted to acknowledge we’ve seen this post and are investigating a few device health related issues. One of them that started late last night and is related to device control and reporting. This issue is mentioned on our public status page and is actively being worked on.

We’re also looking into the issue where devices are offline but controllable and reporting. At this time it doesn’t appear to be related to the beta firmware and our Device Health team is actively looking into it.

We’ve been looking into the other issues that have been reported and debugging them one by one. Thank you for the reports and assistance in tracking down issues.

professordave · September 15, 2018, 12:06am

Thanks for the update!

Any info is appreciated!

professordave · September 15, 2018, 12:22am

@tpmanley
Is it possible that these issues are affecting zwave direct association?

I have 3 linear aux in wall dimmers that are master, that I use in 3 different places for 3 way lights.

The behavior is almost like the CPU in the master and slave are at 100% and commands and delayed and queued up.

I can click several times let’s say on and off, there might be a delay, and then can see by the status led on the switch that it then slowly sends that many commands to the slave switch.

I used to be able to use the master switch to dim the slave lights. Now the response is so slow I can only turn the slave on and off, because holding the master to dim up or down is delayed then sets the slave to either full on or most dim.

So again on the master side, I can click on the paddle, but if I click more than once it seems to queue up the commands which I can see later going out by the flash of the status led.

Okay, I just remembered a setting that might cause something like this.

But how did all three get their settings corrupted?

I just remembered that these switches have association groups, it sends to association group 1, then waits some amount of time, and then sends to association group 2, etc.

professordave · September 15, 2018, 12:57am

My direct zwave association issue might be fixed. I used zwave tweaker.

Each of the 3 master switches had extra devices listed as slaves.

All the extra devices’ IDs just happened to match “ghost” devices that appeared this week that I had to request support to remove.

Each master only ever controlled 1 and only 1 slave. So it is unclear why multiple devices where listed for each target association group in each switch.

professordave · September 15, 2018, 1:07am

Okay, I think there are 2 scenarios.

back in 2015 I accidentally added 3 devices instead of just 1 into each switch. But for over 2 1/2 years it did not matter until this week, when suddenly something changed and those extra devices cause problems
those extra devices somehow got added this week, when The problem started.
??

RLDreams · September 17, 2018, 3:11pm

I’ve been noticing similar behavior with Schlage locks. It hadn’t really been an issue until last night. Lock was " unavailable" when good night routine ran, so it didn’t lock. When lock came back online it ( correctly ) reported unlocked, so welcome home routine ran. Turning on lights, disarming SHM, etc.

johnconstantelo · September 17, 2018, 6:00pm

Hi @tpmanley,

Thanks for that update. I’m starting to see behavior in online/offline statuses unlike anything I’ve seen in the past.

Take a look:

Those zwave switches never go that long for Last Activity, typically it’s an hour, and then anything past that I’d typically see the erroneous offline message.

For the first time ever I see my dimmer show up. You can also see my lock as well. The lock was manually used much less than 13 hours ago (5:25am EST to be specific).

My water valve has also been rock solid for months, and now it’s gone wonky.

Something has to have very recently changed, but not for the good unless something else regarding last activity and offline/online status are going to be used/displayed differently than in the past.

EDIT: @tpmanley

So 5 hours later I’m trying to get the Foyer Lights working…

In the app (classic) I can turn it on, but not off because the state doesn’t update, BUT Alexa will turn it off - followed by the message “Foyer Lights isn’t responding. Please check network connection and power supply”. Once she turns it off, I can turn it on again via the app, but not off. This is easily repeatable.

I fixed this problem by using the air gap switch. Once I did that, it started working again:

All the other devices in the first pic above started working on their own, but now I have new offline devices down for a few hours I should track down…

EDIT: I just had to do the same process on another device, Steph’s Ceiling Fan:

veeceeoh · September 17, 2018, 7:39pm

@tpmanley

Sorry to interject here, but with a good number of posts and complaints and confusion in the past week or so, this seems like the best thread to ask about this:

What is SmartThings stance on Device Health support for devices using custom device handler code?

Brad_ST · September 18, 2018, 11:12pm

I’m not sure I understand what you mean by stance. Are you asking if custom device handlers can use device health?

veeceeoh · September 18, 2018, 11:39pm

Exactly. Sorry if the question wasn’t clear.

And if custom device handlers can use device health, where is documentation on how to properly implement it?

Brad_ST · September 19, 2018, 12:23am

There isn’t user-facing documentation yet.

Previously I posted this:

Device Health utilizes a new capability Health Check which was added to most official device handlers. For example:
SmartThingsPublic/devicetypes/smartthings/smartsense-open-closed-sensor.src/smartsense-open-closed-sensor.groovy

This capability uses the state checkInterval to track the device’s health. In the device handler above, the device is checked every 12 minutes (60 * 12) which displays as 720 in the IDE. This device is polled every 5 minutes so 12 minutes allows for two missed checks and a small buffer.

and Jim added this:

In case you look at other Device Handlers, you might see that certain devices that support the “Health Check” capability (which defines the checkInterval attribute) don’t actually send a checkInterval event. There is currently inconsistency in how the device status is reported across devices, with some devices using checkInterval and others using a different event.

The plan is to consolidate and make the usage consistent, but in the meantime just don’t get too confused if you see different devices doing it differently

Automated_House · September 19, 2018, 3:53am

I’ve followed the above and some reverse engineering from the public GitHub and added device health to some of my more import custom handlers. Works well.

veeceeoh · September 19, 2018, 5:29pm

Okay, so does that mean Health Check is not yet supported for use in custom device handlers?

Without documentation, looking at official device handler code, it appears there are two elements: the checkInterval event that sets up the time interval for device health checks, and the ping function that uses some kind of read command (e.g., readAttribute if a ZigBee device) for performing the actual check. Is that correct?

Also, I’m wondering: If (for whatever reason) a device doesn’t support a read command as a polling method, then does that mean Health Check should not be used with it?

Brad_ST · September 20, 2018, 10:05pm

Well as Jimmy mentioned above, it is possible to incorporate it into custom device handlers. So I wouldn’t say it isn’t supported. It just isn’t documented so that it can be easily incorporated by community developers.

I spoke with a dev more familiar with device health and he said for Zigbee/Z-Wave DTHs, you should use the Health Check capability and a ping function. For cloud-to-cloud/LAN devices, there are some enrollment/status events that should be included in the initialize method. Similar to this:

github.com

SmartThingsCommunity/SmartThingsPublic/blob/61b864535321a6f61cf5a77216f1e779bde68bd5/devicetypes/smartthings/testing/simulated-contact-sensor.src/simulated-contact-sensor.groovy#L50


      
          def installed() {
          	log.trace "Executing 'installed'"
          	initialize()
          }
          
          def updated() {
          	log.trace "Executing 'updated'"
          	initialize()
          }
          
          private initialize() {
          	log.trace "Executing 'initialize'"
          
          	sendEvent(name: "DeviceWatch-DeviceStatus", value: "online")
          	sendEvent(name: "healthStatus", value: "online")
          	sendEvent(name: "DeviceWatch-Enroll", value: [protocol: "cloud", scheme:"untracked"].encodeAsJson(), displayed: false)
          }
          
          def parse(String description) {
          	def pair = description.split(":")
          	createEvent(name: pair[0].trim(), value: pair[1].trim())

Topic		Replies	Views
Anyone else losing confidence in Device Health besides me? Devices & Integrations	42	6091	January 4, 2019
Hub Firmware Beta 0.24.5 Hub Firmware Beta	64	2992	August 26, 2020
Devices keep showing as "Unavailable" but still working- Device Health General Discussion offlinebug	45	11281	January 23, 2020
This device is unavailable at the moment Apps & Clients	56	54929	December 23, 2021
More "unavailable" devices General Discussion	36	19364	October 8, 2018

Device Health Beta 0.24.5

Related topics