Try_create_device: though device may be "added", not always "init"ed

I am creating a Legrand RFLC edge driver for my SmartThings hub that will mDNS-discover neighboring Legrand LC7001 hubs on its LAN. For each LC7001 discovered, my LC7001 sub_driver handles its lifecycle init. Subsequent device discoveries will find each RF Switch and Dimmer that the LC7001 controls and request a device be created (try_create_device) for each of them. These are handled by my Switch and Dimmer sub_drivers.

Not all of these device creation requests are honored. That is, not all result in a device lifecycle “added” event. For those that are, I will see these devices in the SmartThings app. Unfortunately, even for those that are “added”, not all of them are "init"ed. My lifecycle init handling is where I device:emit_event the current device state. Since this is not done, all these devices in the app say “Checking…” and, since I have not accommodated for them, cannot be controlled.

The current documentation suggests that “added” is called only when first added but that “init” is called afterwards and always during device initialization. So, I am expecting to always handle a lifecycle init event but it does not always come. Why?

Also, in the SmartThings IDE, these don’t show up as “local” devices (Execution location is “cloud”). Why? Making these execute locally on the SmartThings hub is the whole point.

I can’t answer any of the other questions, but the IDE will be going away when the groovy cloud goes away and it has not been updated to work with edge drivers. So edge drivers just show up as “placeholders“ and their method of running local is not the same as the previous Architecture. So not being marked local in the IDE doesn’t have any meaning as far as edge drivers go.

1 Like

Are you sure each device you’re trying to create is getting a unique device ID?

Init is also called whenever the driver restarts, including when the hub is rebooted or the driver is updated, so you have other options for querying the device for an initial state. It’s also helpful to include the refresh capability since you can trigger it with a swipe down even when the other device controls are locked.

Yes. I identify each device with a unique device_network_id. For the LC7001 hubs, I use their MAC address. The lights supported by a hub are identified by a unique ordinal number. For these, I concatenate the hub MAC address with the ordinal number.

The device “added” has an id (UUID) that, I assume, is unique (it was just “added”, right?).

My “init” lifecycle handler seems to be the appropriate place to associate my value with the SmartThings device. Unfortunately/inappropriately, it is not being called reliably.

This is still a problem.

Rebooting the hub does seem to “init” all of the previously “added” devices, some of which were not "init"ed immediately after being “added” (which is the problem). A reboot should not have to be required.

Take a hard look at your channel handlers or timer routines to make sure they aren’t blocking for any length of time. Sounds like the driver thread may be starved.

Don’t loop on socket receives; process one message and exit and let the cosock select call your channel handler again.

Had exact same problem as this with a UPnP driver that was discovering and creating multiple devices in quick succession. Monitoring busy multicast addresses can do this.

I am not blocking anywhere in my code (who knows what happens in try_create_device). All of my potentially blocking calls (cosock receives) are guarded with cosock.socket.select to ensure that they are ready first.

There are two problems here:

  1. Not all try_create_device attempts succeed (actually create a device)
  2. For those that do succeed (are lifecycle “added”), not all are lifecycle "init"ed.

I was, certainly, blasting out many try_create_device attempts at once (over 30) and experienced this bad behavior.

Now, before attempting another try_create_device, I ensure that the last one succeeded (was lifecycle "init"ed). The round-trip time slows down creation attempts substantially but, so far, I am able to discover all of my devices and create them successfully in one scan.

I feel your pain. I’ve had both these exact issues at one point or another during rapid discovery of multiple devices. If you’ve got your stuff on github I’d be happy to try and find anything obvious. And yes, I think at one point I tried putting a 1-second sleep between each device creation to try and alleviate the problem.

Thanks.

I tried cosock.socket.sleep in all sorts of places with all sorts of values.
I really hate that kind of kludgy solution.
What I have now works better than anything else that I have tried.

Solution is simpler now.
There does not seem to be a need to run a separate build thread during discovery.
try_create_device is either run

  • immediately upon discovery if last one has already completed being built
  • immediately after last one does complete being built (at the end of its lifecycle init)

How are you determining the status of the prior discovered device init completion?

For example, my DIMMER sub_driver, on a lifecycle init, creates a Dimmer Adapter for the device and adds it to those already built.

lifecycle_handlers = {init = function(driver, device) built:add(Dimmer(driver, device)) end}

built:add remembers this adapter and emits an indication that this event has occurred

add = function(self, adapter)
    local device_network_id = adapter.device.device_network_id
    self.adapter[device_network_id] = adapter
    self:_emit(device_network_id, adapter)
end,

built:_emit calls any handler registered for this event once (automatically un-registers it)

_emit = function(self, device_network_id, adapter)
    local handler = self._handler[device_network_id]
    if handler then
        handler(adapter)
        self._handler[device_network_id] = nil
    end
end,

A handler for this event might have been registered once in a built:after call …

_once = function(self, device_network_id, handler)
    self._handler[device_network_id] = handler
end,

after = function(self, device_network_id, handler)
    if not device_network_id then
        handler()
    else
        local adapter = self.adapter[device_network_id]
        if adapter then
            handler(adapter)
            return adapter
        end
        self:_once(device_network_id, handler)
    end
end,

… which ensures that the handler is either called immediately or once after this event happens.
built:after was done when a need for the build was discovered.

local last_device_network_id
local function build(device_network_id, model, label, parent_device_id)
    built:after(last_device_network_id, function()
        driver:try_create_device{
            type = "LAN",
            device_network_id = device_network_id,
            label = label,
            profile = model,
            manufacturer = "legrand",
            model = model,
            parent_device_id = parent_device_id,
        }
    end)
    last_device_network_id = device_network_id
end

Thus try_create_device is only called after the last device adapter was built.

I have to say I bow down to your OO Lua skills. I thought I might be able to help by doing a quick scan of your driver on github but quickly realized your level of expertise and that it would take me some time to study your code to understand how you are even doing everything!

I’m glad you found a solution and I may have to look into doing something similar in my own problem driver. I intend to study your code in more depth to up my own game!

Thanks.

I do not know why I have to do this only that, empirically, doing so seems to work reliably. There is really no harm done waiting for the creation (build-built) round trip to complete as, at best, the device is unusable until it does and, at worst, is not usable until a hub reboot.

I hope this can help you and/or others.
Maybe someone can explain why this works or a suggest a better way.

OO skills.

Although I have a lot of experience with OO, I am a lua newbie.
I just took the concepts from Programming in Lua (first edition) and implemented them in a classify module.
I use these classify methods similar to how I would write OO in other languages and I stop worrying about the implementation.
I have added comments to classify.lua to better explain what is going on under the hood.
For the most part, I try to keep the hood closed.

1 Like

I am seeing this as well. I have to say, as a Lua noob, I am lost in the code above. In my case, discovery finds a bunch of devices on a connected bridge and issues a bunch of try_create_device calls. Those devices all receive added and doConfigure callbacks, but the init is not called.

Reinstalling the driver works fine, as all of the existing devices have their init called.

Are there anymore insights on this?

One more note…

I added a work around of waiting in between device creation and it does init them for me. So my discovery loop essentially finds a single device, then waits 10 seconds, then finds the next device.

Hi, continuing the discussion of this post on this thread:

The team mentioned the following:

  • The init lifecycle should be executed after an added event if the device was previously unknown.
  • Also, an init event should be triggered for each known device on startup.

If this is not happening, there might be a function you’re using that could be yielding the device’s thread (for example, calling receive on a socket or channel) which is preempting the init event from being able to call the callback.

Yes, I expected that it should be but it is not, always:

I don’t think so. I presume it is in the “device’s thread” where all such lifecycle events would be handled. The problem is, my code is not even given the opportunity to handle its first one. That is, init is the first lifecycle event that I am prepared to handle and it isn’t even called.

I have worked around the problem by only issuing a try_create_device in the discovery thread if any previous attempt has completed its lifecycle init; otherwise, the next try_create_device attempt will be pending and will be performed just before the lifecycle init of the previous device returns. This results in a try_create_device → (init, try_create_device) → (init, try_create_device) … daisy chain.

While my work-around appears to work, I do not know why and I should not have to do it.

1 Like

This non-blocking “bug” driver exhibits the issue.

2 Likes