[SmartThings Edge] Issue with the device health (Zigbee)

Hi @nayelyz,
A Offline and online flagged problem of a zigbee device with edge drive.

I have seen that in a zigbee bulb paired to the edge driver stock zigbee switch, when the power is removed it is marked as Offline in App and API after about 20 minutes without receiving any message from reports configured in attributes.

The “health check read” procedure is carried out correctly every 8 minutes approximately, if the report configured with a maximum interval of 5 minutes is not received.

  local monitored_conf = {
    expected_interval = config.maximum_interval * 1.5,
    mfg_code = config.mfg_code,
    last_read_time = os.time() - (math.random(config.maximum_interval - 120, config.maximum_interval)),
    last_heard_time = nil
  }
*********************** At 12:45 Turn On bulb in app and then turn off mechanical switch of bulb *******************************

2022-04-17T12:45:04.485871627+00:00 TRACE Zigbee Light Multifunction v6  Received event with handler zigbee      
2022-04-17T12:45:04.494392293+00:00 INFO Zigbee Light Multifunction v6  <ZigbeeDevice: a983309d-3963-4bdd-bc3b-3855f1106d14 [0x1EE7] (Luz H. Irene)> received Zigbee message: < ZigbeeMessageRx || 
type: 0x00, < AddressHeader || src_addr: 0x1EE7, src_endpoint: 0x01, dest_addr: 0x0000, dest_endpoint: 0x01, profile: 0x0104, cluster: OnOff >, lqi: 0xC0, rssi: -52, body_length: 0x0007, < ZCLMessageBody || < ZCLHeader || frame_ctrl: 0x18, seqno: 0x00, ZCLCommandId: 0x0A >, < ReportAttribute || < AttributeRecord || AttributeId: 0x0000, DataType: Boolean, OnOff: true > > > >
2022-04-17T12:45:04.516458960+00:00 TRACE Zigbee Light Multifunction v6  Found ZigbeeMessageDispatcher handler in zigbee_light_multifunctions
2022-04-17T12:45:04.522076960+00:00 INFO Zigbee Light Multifunction v6  Executing ZclClusterAttributeValueHandler: cluster: OnOff, attribute: OnOff
2022-04-17T12:45:04.530086293+00:00 INFO Zigbee Light Multifunction v6  <ZigbeeDevice: a983309d-3963-4bdd-bc3b-3855f1106d14 [0x1EE7] (Luz H. Irene)> emitting event: {"attribute_id":"switch","capability_id":"switch","component_id":"main","state":{"value":"on"}}

********************** Doing first health check read after first missing configured 300 sec interval report

2022-04-17T12:52:57.830239635+00:00 INFO Zigbee Light Multifunction v6  Doing health check read for [1EE7]:0006:0000
2022-04-17T12:52:57.978352635+00:00 INFO Zigbee Light Multifunction v6  <ZigbeeDevice: a983309d-3963-4bdd-bc3b-3855f1106d14 [0x1EE7] (Luz H. Irene)> sending Zigbee message: < ZigbeeMessageTx || Uint16: 0x0000, < AddressHeader || src_addr: 0x0000, src_endpoint: 0x01, dest_addr: 0x1EE7, dest_endpoint: 0x01, profile: 0x0104, cluster: OnOff >, < ZCLMessageBody || < ZCLHeader || frame_ctrl: 0x00, seqno: 0x00, ZCLCommandId: 0x00 >, < ReadAttribute || AttributeId: 0x0000 > > >

********************** Doing second health check read after second missing configured 300 sec interval report

2022-04-17T13:00:57.760339488+00:00 INFO Zigbee Light Multifunction v6  Doing health check read for [1EE7]:0006:0000
2022-04-17T13:00:57.795747155+00:00 INFO Zigbee Light Multifunction v6  <ZigbeeDevice: a983309d-3963-4bdd-bc3b-3855f1106d14 [0x1EE7] (Luz H. Irene)> sending Zigbee message: < ZigbeeMessageTx || Uint16: 0x0000, < AddressHeader || src_addr: 0x0000, src_endpoint: 0x01, dest_addr: 0x1EE7, dest_endpoint: 0x01, profile: 0x0104, cluster: OnOff >, < ZCLMessageBody || < ZCLHeader || frame_ctrl: 0x00, seqno: 0x00, ZCLCommandId: 0x00 >, < ReadAttribute || AttributeId: 0x0000 > > >

********************** At 13:06 app Bulb device was flagged as Offline status (Nothing is seen in the logs) ******************************

********************** Doing health check read every 8 minutes (in app offline status is flagged) *******************************

2022-04-17T13:08:57.898460922+00:00 INFO Zigbee Light Multifunction v6  Doing health check read for [1EE7]:0006:0000
2022-04-17T13:08:57.942689255+00:00 INFO Zigbee Light Multifunction v6  <ZigbeeDevice: a983309d-3963-4bdd-bc3b-3855f1106d14 [0x1EE7] (Luz H. Irene)> sending Zigbee message: < ZigbeeMessageTx || Uint16: 0x0000, < AddressHeader || src_addr: 0x0000, src_endpoint: 0x01, dest_addr: 0x1EE7, dest_endpoint: 0x01, profile: 0x0104, cluster: OnOff >, < ZCLMessageBody || < ZCLHeader || frame_ctrl: 0x00, seqno: 0x00, ZCLCommandId: 0x00 >, < ReadAttribute || AttributeId: 0x0000 > > >

The device remains in the offline state until power is reapplied and the first message of a monitored attribute is received.

So far everything is correct.

The observed problem is the following

In an edge driver that has custom Capabilities (zigbee Light Multifunction Mc, zigbee switch Mc, … where it is necessary to emit a value or state determined by the user’s preferences, the following problem occurs if the hub is rebooted or the driver version:

When the Offline status of the device that is without power is marked:

  • When executing the lifecycle init due to a Hub reboot or driver updated, when emitting any event of one of the custom capabilities to establish the value that the user has selected, the device is marked in the App and API as Online, with the last state that it had when the power was removed.
  • The procedure to re-mark the device as Offline no longer works and it stays Online until power is restored.
  • If the power is removed again, then it is marked as Offline again after 20 minutes.

According to the default 42.x libraries the Offline and Online status of zigbee and Zwave devices is automatically marked using the level of the radio signal.

--- 42.x edge libraries

--- Mark device as being online
---
--- Only useable on LAN type devices, calls to this API for ZIGBEE or ZWAVE type devices are
--- ignored as their online/offline status are automatically determined at the radio level.
---
--- @return status boolean Status of whether the call was successful or not
--- @return error string The error that occured if status was falsey
function Device:online()
    return self.device_api.device_online(self)
end

--- Mark device as being offline and unavailable
---
--- Only useable on LAN type devices, calls to this API for ZIGBEE or ZWAVE type devices are
--- ignored as their online/offline status are automatically determined at the radio level.
---
--- @return status boolean Status of whether the call was successful or not
--- @return error string The error that occured if status was falsey
function Device:offline()
    return self.device_api.device_offline(self)
end

I don’t understand why when emitting an event of any capability the device is marked as Online and it is not necessary to receive any message from the device to analyze its radio signal.

If it is marked as Online, why not reset the procedure to mark it as Offline again if no new message is received?

This would be correct for a presence sensor, which is not marked as Offline even if no messages are received.

How can I get in the driver the Online or Offline status of each device connected to the driver to handle this without re-marking the device as Online when it is actually Offline?

I have tried several things, like resetting the value of last_heard_time = os.time(), relaunching the device configuration…, but I can’t get the procedure to flag Offline again to be restored.

Thanks

This part is to change the reporting interval of an attribute, right?

As far as I know, we don’t have control over the health check interval, it is made automatically by the driver.
I understand the point that it is not considering the lack of activity of the physical device with the driver when you send an internal event (driver > capability). I will check this with the team to see what could be causing this behavior.
Once I have more info, I’ll let you know.

1 Like

This is the configuration monitored attributes table to monitor the attributes and do the Healt Check in the default libraries st.zigbee.device.lua

--- Add a monitored attribute for this device
---
--- A monitored attribute will monitor responses from the device for the corresponding attributes and send periodic reads if the value isn't updated in too long.  That length is determined by config.maximum_interval * 1.5
---
--- @param config AttributeConfiguration the attribute configuration to add
function ZigbeeDevice:add_monitored_attribute(config, opts)
  if not self:supports_server_cluster(config.cluster) and not (opts or {}).force then
    log.info(string.format("Device does not support cluster 0x%04X not adding monitored attribute", config.cluster))
    return
  end
  local monitored_conf = {
    expected_interval = config.maximum_interval * 1.5,
    mfg_code = config.mfg_code,
    last_read_time = os.time() - (math.random(config.maximum_interval - 120, config.maximum_interval)),
    last_heard_time = nil
  }
  local monitored_attrs = self:get_field(MONITORED_ATTRIBUTES_KEY) or {}
  monitored_attrs[config.cluster] = monitored_attrs[config.cluster] or {}
  monitored_attrs[config.cluster][config.attribute] = monitored_conf
  self:set_field(MONITORED_ATTRIBUTES_KEY, monitored_attrs)
end

--- Check all monitored attributes for this device and send a read where necessary
---
--- This will look through all monitored attributes that have been added to this device
--- and if we have not heard from or sent a read in above the expected interval, we will
--- send a read attribute to update our status.
function ZigbeeDevice:check_monitored_attributes()
  local monitored_attrs = self:get_field(MONITORED_ATTRIBUTES_KEY) or {}
  local cur_time = os.time()
  for cluster, attrs in pairs(monitored_attrs) do
    for attr, config in pairs(attrs) do
      if (cur_time - (config.last_heard_time or 0) > config.expected_interval) and
          (cur_time - config.last_read_time > config.expected_interval)
      then
        config.last_read_time = cur_time
        log.info(string.format("Doing health check read for [%s]:%04X:%04X", self.device_network_id, cluster, attr))
        self:send(device_management.attr_refresh(self, cluster, attr, config.mfg_code))
      end
    end
  end
end

Hi, @Mariano_Colmenarejo

We are still investigating your case. We will keep you updated on any news we may have.

1 Like

Thanks for your reply,

Just to emphasize again that for zigbee presence devices it is necessary that the device be marked as Online when the “Not present” event is emitted even if no messages are received from the device.

For the rest, perhaps restoring the procedure to mark it Offline again if no messages are received, would be enough.

Hi, @Mariano_Colmenarejo

Is this issue still happening?

Hi @andresg

Yes this stays the same

When a device is marked offline, for example:

  • A zigbee bulb that was removed from power, is marked offline about 20 minutes later.
  • If the driver is updated to add anything or hub updated or rebooot, if in some lifecycle a value is emitted for a capability then device is marked Online and it is never marked offline again