I feel your pain. I’ve had both these exact issues at one point or another during rapid discovery of multiple devices. If you’ve got your stuff on github I’d be happy to try and find anything obvious. And yes, I think at one point I tried putting a 1-second sleep between each device creation to try and alleviate the problem.
Thanks.
I tried cosock.socket.sleep
in all sorts of places with all sorts of values.
I really hate that kind of kludgy solution.
What I have now works better than anything else that I have tried.
Solution is simpler now.
There does not seem to be a need to run a separate build thread during discovery.
try_create_device
is either run
- immediately upon discovery if last one has already completed being built
- immediately after last one does complete being built (at the end of its lifecycle init)
How are you determining the status of the prior discovered device init completion?
For example, my DIMMER
sub_driver
, on a lifecycle
init
, creates a Dimmer
Adapter
for the device
and add
s it to those already built
.
lifecycle_handlers = {init = function(driver, device) built:add(Dimmer(driver, device)) end}
built:add
remembers this adapter
and emit
s an indication that this event has occurred
add = function(self, adapter)
local device_network_id = adapter.device.device_network_id
self.adapter[device_network_id] = adapter
self:_emit(device_network_id, adapter)
end,
built:_emit
calls any handler
registered for this event once (automatically un-registers it)
_emit = function(self, device_network_id, adapter)
local handler = self._handler[device_network_id]
if handler then
handler(adapter)
self._handler[device_network_id] = nil
end
end,
A handler for this event might have been registered once
in a built:after
call …
_once = function(self, device_network_id, handler)
self._handler[device_network_id] = handler
end,
after = function(self, device_network_id, handler)
if not device_network_id then
handler()
else
local adapter = self.adapter[device_network_id]
if adapter then
handler(adapter)
return adapter
end
self:_once(device_network_id, handler)
end
end,
… which ensures that the handler
is either called immediately or once
after
this event happens.
built:after
was done when a need for the build
was discovered.
local last_device_network_id
local function build(device_network_id, model, label, parent_device_id)
built:after(last_device_network_id, function()
driver:try_create_device{
type = "LAN",
device_network_id = device_network_id,
label = label,
profile = model,
manufacturer = "legrand",
model = model,
parent_device_id = parent_device_id,
}
end)
last_device_network_id = device_network_id
end
Thus try_create_device
is only called after
the last
device
adapter
was built
.
I have to say I bow down to your OO Lua skills. I thought I might be able to help by doing a quick scan of your driver on github but quickly realized your level of expertise and that it would take me some time to study your code to understand how you are even doing everything!
I’m glad you found a solution and I may have to look into doing something similar in my own problem driver. I intend to study your code in more depth to up my own game!
Thanks.
I do not know why I have to do this only that, empirically, doing so seems to work reliably. There is really no harm done waiting for the creation (build-built) round trip to complete as, at best, the device is unusable until it does and, at worst, is not usable until a hub reboot.
I hope this can help you and/or others.
Maybe someone can explain why this works or a suggest a better way.
OO skills.
Although I have a lot of experience with OO, I am a lua newbie.
I just took the concepts from Programming in Lua (first edition) and implemented them in a classify
module.
I use these classify
methods similar to how I would write OO in other languages and I stop worrying about the implementation.
I have added comments to classify.lua
to better explain what is going on under the hood.
For the most part, I try to keep the hood closed.
I am seeing this as well. I have to say, as a Lua noob, I am lost in the code above. In my case, discovery finds a bunch of devices on a connected bridge and issues a bunch of try_create_device
calls. Those devices all receive added
and doConfigure
callbacks, but the init
is not called.
Reinstalling the driver works fine, as all of the existing devices have their init
called.
Are there anymore insights on this?
One more note…
I added a work around of waiting in between device creation and it does init
them for me. So my discovery loop essentially finds a single device, then waits 10 seconds, then finds the next device.
Hi, continuing the discussion of this post on this thread:
The team mentioned the following:
- The
init
lifecycle should be executed after anadded
event if the device was previously unknown. - Also, an
init
event should be triggered for each known device on startup.
If this is not happening, there might be a function you’re using that could be yielding the device’s thread (for example, calling receive
on a socket or channel) which is preempting the init
event from being able to call the callback.
Yes, I expected that it should be but it is not, always:
I don’t think so. I presume it is in the “device’s thread” where all such lifecycle events would be handled. The problem is, my code is not even given the opportunity to handle its first one. That is, init
is the first lifecycle event that I am prepared to handle and it isn’t even called.
I have worked around the problem by only issuing a try_create_device
in the discovery thread if any previous attempt has completed its lifecycle init
; otherwise, the next try_create_device
attempt will be pending and will be performed just before the lifecycle init
of the previous device returns. This results in a try_create_device
→ (init
, try_create_device
) → (init
, try_create_device
) … daisy chain.
While my work-around appears to work, I do not know why and I should not have to do it.
This non-blocking “bug” driver exhibits the issue.
Thank you for providing this sample, I will share it with the team so they can replicate the issue and give us feedback.
Hi, @rossetyler. Sorry for the delay, the engineering team mentioned the following about your sample:
- In the driver you aren’t using any sockets at all, so there are no calls to
receive
but you use bothcosock.socket.sleep
andcosock.socket.select
-
In this file it is not clear if one of those functions is getting called in the
added
lifecycle and that is blockinginit
. - You should check if the creation of the devices was successful using
assert(driver.try_create_device(...))
orlocal success, err = driver.try_create_device(...)
. This way, you can see if there were immediate errors.- Especially because, in your sample, you’re calling
try_create_device
in a hot loop100
times and this might be reaching the rate limit of devices created in a given timespan.
- Especially because, in your sample, you’re calling
In this case, this number of devices was an example, or do you actually have drivers that create 100
devices? If so, could you share the details of that use case?
- Any time something could block the current thread, you can avoid that entirely by calling
cosock.spawn
. So, something likecosock.spawn(function() local value, err = socket:receive() end)
would allow the use of the device thread for any other device events. For example:
This sample shows how the device thread can be blocked
-- This driver will lock up any device thread by
-- yielding the thread into a deadlock because
-- init cannot be processed until added is complete
-- but added is relying on init to complete
local tx, rx = cosock.channel.new()
local function added(driver, device)
-- wait for init
local init, err = rx:receive()
end
local function init(driver, device)
-- send init
tx:send(device.id)
end
local driver = Driver("...", {
lifecycle_handlers = {
added = added,
init = init,
}
})
Considering the previous implementation, this is how we can avoid the blockage using cosock.spawn
local tx, rx = cosock.channel.new()
local function added(driver, device)
cosock.spawn(function()
local init, err = rx:receive()
end)
-- wait for init
end
local function init(driver, device)
-- send init
tx:send(device.id)
end
local driver = Driver("...", {
lifecycle_handlers = {
added = added,
init = init,
}
})
I have said most of this before …
There are two drivers that I have discussed
- Legrand RFLC
- Bug
Legrand RFLC tries to create ~35 devices all at once: one for the LC7001 lighting controller (bridge) and the rest for the lights that it controls.
Bug was created in an attempt to simply illustrate the issue.
Neither of these has an added lifecycle handler so neither could block my init lifecycle handler.
Yet my Legrand RFLC logs added events and some are not followed by inits.
Failed try_create_device calls and rate limiting do not explain this.
Legrand RFLC does use cosock sockets (see lc7001 module).
Bug does not use any cosock features nor does it explicitly block anywhere (see code).
An LC7001 lighting controller can control up to 100 lights (that is where 100 comes from). Mine, currently, only has ~34.
What is the rate limit on try_create_device calls?
Where else are rate limits going to bite me?
I am concerned about the refresh capability on my parent LC7001 controller that refreshes each of its children.
For each, it will make
-
device:online or device:offline
-
device:emit_event switch on or off
-
device:emit_event switch level
So, refreshing ~33 dimmers will make ~100 such calls all at once.
If this is a problem, how can I deal with it?