Try_create_device: though device may be "added", not always "init"ed

I feel your pain. I’ve had both these exact issues at one point or another during rapid discovery of multiple devices. If you’ve got your stuff on github I’d be happy to try and find anything obvious. And yes, I think at one point I tried putting a 1-second sleep between each device creation to try and alleviate the problem.

Thanks.

I tried cosock.socket.sleep in all sorts of places with all sorts of values.
I really hate that kind of kludgy solution.
What I have now works better than anything else that I have tried.

Solution is simpler now.
There does not seem to be a need to run a separate build thread during discovery.
try_create_device is either run

  • immediately upon discovery if last one has already completed being built
  • immediately after last one does complete being built (at the end of its lifecycle init)

How are you determining the status of the prior discovered device init completion?

For example, my DIMMER sub_driver, on a lifecycle init, creates a Dimmer Adapter for the device and adds it to those already built.

lifecycle_handlers = {init = function(driver, device) built:add(Dimmer(driver, device)) end}

built:add remembers this adapter and emits an indication that this event has occurred

add = function(self, adapter)
    local device_network_id = adapter.device.device_network_id
    self.adapter[device_network_id] = adapter
    self:_emit(device_network_id, adapter)
end,

built:_emit calls any handler registered for this event once (automatically un-registers it)

_emit = function(self, device_network_id, adapter)
    local handler = self._handler[device_network_id]
    if handler then
        handler(adapter)
        self._handler[device_network_id] = nil
    end
end,

A handler for this event might have been registered once in a built:after call …

_once = function(self, device_network_id, handler)
    self._handler[device_network_id] = handler
end,

after = function(self, device_network_id, handler)
    if not device_network_id then
        handler()
    else
        local adapter = self.adapter[device_network_id]
        if adapter then
            handler(adapter)
            return adapter
        end
        self:_once(device_network_id, handler)
    end
end,

… which ensures that the handler is either called immediately or once after this event happens.
built:after was done when a need for the build was discovered.

local last_device_network_id
local function build(device_network_id, model, label, parent_device_id)
    built:after(last_device_network_id, function()
        driver:try_create_device{
            type = "LAN",
            device_network_id = device_network_id,
            label = label,
            profile = model,
            manufacturer = "legrand",
            model = model,
            parent_device_id = parent_device_id,
        }
    end)
    last_device_network_id = device_network_id
end

Thus try_create_device is only called after the last device adapter was built.

I have to say I bow down to your OO Lua skills. I thought I might be able to help by doing a quick scan of your driver on github but quickly realized your level of expertise and that it would take me some time to study your code to understand how you are even doing everything!

I’m glad you found a solution and I may have to look into doing something similar in my own problem driver. I intend to study your code in more depth to up my own game!

Thanks.

I do not know why I have to do this only that, empirically, doing so seems to work reliably. There is really no harm done waiting for the creation (build-built) round trip to complete as, at best, the device is unusable until it does and, at worst, is not usable until a hub reboot.

I hope this can help you and/or others.
Maybe someone can explain why this works or a suggest a better way.

OO skills.

Although I have a lot of experience with OO, I am a lua newbie.
I just took the concepts from Programming in Lua (first edition) and implemented them in a classify module.
I use these classify methods similar to how I would write OO in other languages and I stop worrying about the implementation.
I have added comments to classify.lua to better explain what is going on under the hood.
For the most part, I try to keep the hood closed.

1 Like

I am seeing this as well. I have to say, as a Lua noob, I am lost in the code above. In my case, discovery finds a bunch of devices on a connected bridge and issues a bunch of try_create_device calls. Those devices all receive added and doConfigure callbacks, but the init is not called.

Reinstalling the driver works fine, as all of the existing devices have their init called.

Are there anymore insights on this?

One more note…

I added a work around of waiting in between device creation and it does init them for me. So my discovery loop essentially finds a single device, then waits 10 seconds, then finds the next device.

Hi, continuing the discussion of this post on this thread:

The team mentioned the following:

  • The init lifecycle should be executed after an added event if the device was previously unknown.
  • Also, an init event should be triggered for each known device on startup.

If this is not happening, there might be a function you’re using that could be yielding the device’s thread (for example, calling receive on a socket or channel) which is preempting the init event from being able to call the callback.

Yes, I expected that it should be but it is not, always:

I don’t think so. I presume it is in the “device’s thread” where all such lifecycle events would be handled. The problem is, my code is not even given the opportunity to handle its first one. That is, init is the first lifecycle event that I am prepared to handle and it isn’t even called.

I have worked around the problem by only issuing a try_create_device in the discovery thread if any previous attempt has completed its lifecycle init; otherwise, the next try_create_device attempt will be pending and will be performed just before the lifecycle init of the previous device returns. This results in a try_create_device → (init, try_create_device) → (init, try_create_device) … daisy chain.

While my work-around appears to work, I do not know why and I should not have to do it.

1 Like

This non-blocking “bug” driver exhibits the issue.

2 Likes

Thank you for providing this sample, I will share it with the team so they can replicate the issue and give us feedback.

1 Like

Great sample @rossetyler :+1:

Hi, @rossetyler. Sorry for the delay, the engineering team mentioned the following about your sample:

  • In the driver you aren’t using any sockets at all, so there are no calls to receive but you use both cosock.socket.sleep and cosock.socket.select
  • In this file it is not clear if one of those functions is getting called in the added lifecycle and that is blocking init.
  • You should check if the creation of the devices was successful using assert(driver.try_create_device(...)) or local success, err = driver.try_create_device(...). This way, you can see if there were immediate errors.
    • Especially because, in your sample, you’re calling try_create_device in a hot loop 100 times and this might be reaching the rate limit of devices created in a given timespan.

In this case, this number of devices was an example, or do you actually have drivers that create 100 devices? If so, could you share the details of that use case?

  • Any time something could block the current thread, you can avoid that entirely by calling cosock.spawn. So, something like cosock.spawn(function() local value, err = socket:receive() end) would allow the use of the device thread for any other device events. For example:

This sample shows how the device thread can be blocked


-- This driver will lock up any device thread by
-- yielding the thread into a deadlock because
-- init cannot be processed until added is complete
-- but added is relying on init to complete

local tx, rx = cosock.channel.new()
local function added(driver, device)
  -- wait for init
  local init, err = rx:receive()
end

local function init(driver, device)
  -- send init
  tx:send(device.id)
end

local driver = Driver("...", {
  lifecycle_handlers = {
    added = added,
    init = init,
  }
})

Considering the previous implementation, this is how we can avoid the blockage using cosock.spawn

local tx, rx = cosock.channel.new()
local function added(driver, device)
  cosock.spawn(function()
    local init, err = rx:receive()
    
  end)
  -- wait for init
end

local function init(driver, device)
  -- send init
  tx:send(device.id)
end

local driver = Driver("...", {
  lifecycle_handlers = {
    added = added,
    init = init,
  }
})

cc @blueyetisoftware

I have said most of this before …

There are two drivers that I have discussed

  • Legrand RFLC
  • Bug

Legrand RFLC tries to create ~35 devices all at once: one for the LC7001 lighting controller (bridge) and the rest for the lights that it controls.

Bug was created in an attempt to simply illustrate the issue.

Neither of these has an added lifecycle handler so neither could block my init lifecycle handler.
Yet my Legrand RFLC logs added events and some are not followed by inits.
Failed try_create_device calls and rate limiting do not explain this.

Legrand RFLC does use cosock sockets (see lc7001 module).
Bug does not use any cosock features nor does it explicitly block anywhere (see code).

An LC7001 lighting controller can control up to 100 lights (that is where 100 comes from). Mine, currently, only has ~34.

What is the rate limit on try_create_device calls?

Where else are rate limits going to bite me?
I am concerned about the refresh capability on my parent LC7001 controller that refreshes each of its children.
For each, it will make

  • device:online or device:offline

  • device:emit_event switch on or off

  • device:emit_event switch level
    So, refreshing ~33 dimmers will make ~100 such calls all at once.
    If this is a problem, how can I deal with it?