Edge Driver Shutdown

I’m seeing an odd behavior locally on my hub. I have a LAN driver that is restarting every 15-16 minutes. I have verified that it isn’t leaking memory nor crashing. There is no activity when the restarts occur. As a sanity check, I commented out all of the code in the driver, so it was just left with the most basic st.Driver and the Driver.run command. The driver_lifecycle callback is called after 15-16 minutes with a shutdown command, indicating that this is a controlled restart from the hub. This happens every 15 minutes for hours on end. I have tried restarting the hub and reinstalling the driver.

local Driver = require 'st.driver'
local capabilities = require 'st.capabilities'
local utils = require 'st.utils'
local log = require 'log'

local driver = Driver("mydriver", {
    driver_lifecycle = function(driver, event)
        log.error(string.format('driver received lifecycle event %s', event))
    end
})


local session = driver.datastore.RUN_COUNT or 0
local uptime = 0
driver:call_on_schedule(60, function()
    --collectgarbage("collect")
    local memory = math.ceil(collectgarbage("count"))
    log.debug(string.format('session %d: uptime %d minutes, memory %d KB', session, uptime, memory))
    uptime = uptime + 1
end)
driver.datastore.RUN_COUNT = session + 1

driver:run()

@nayelyz Do you know what conditions would cause the hub to send a shutdown command to the driver? I have only every seen this in the past when installing an update from the CLI. Is the consistent 15 minute interval a clue? There are no timers in the driver other than the logging every minute. It seems like something is going on in the internals of the hub that is restarting this.

Hi, @blueyetisoftware

I already asked the team about this to see if there’s something that can cause this. In the meantime, could you send your Hub logs and share your Hub ID, please?

Note: The Hub ID required is the one that appears in the IDE

  1. In the IDE, enter “my hubs”
  2. Enter the corresponding Hub and go to “view utilities”
  3. Click on “send” below "send hub logs
1 Like

Do you happen to have 15 running drivers? If so, try running logcat on all drivers. At low memory the hub restarts one driver per minute, cycling through all of them.

@nayelyz/@philh30 I actually just verified it is happening in some of the other drivers on the hub as well. Not sure if this started with v46 of the hub firmware, but I think I would have noticed this prior to that.

@philh30 That’s a heck of a guess. I have 17 drivers at the moment, so roughly corresponds. I still don’t know where the memory limits are. Technically, I see 16 of my one-minute log statements, so presumably the restart is happening on the 17th minute.

I guarantee it’s happening to all of your drivers. The restarts are fairly precise at 60 second intervals, hitting whichever driver is the longest running at the time. I think they added this with FW 45.x, but it seemed to me that 46.x lowered the threshold where it kicks in.

2 Likes

That’s a great answer. I spent days trying to debug my changes and never able to pin it down. At times, it would just stop and work for a few days. Presumably, I was back under the limit. Really appreciate it. Now just need to figure out where the memory is being used :beers:

Update:
@philh30 Deleted unused drivers and now running for 1 hour +

This also explains why the delta started at 15 min, then 16, then 17. I was installing debug versions of the driver and switching to them.

1 Like

@nayelyz If this is confirmed by the engineers as the reason, I’d like to make a feature request that the driver_lifecycle callback receive something like a low_memory event in addition to the shutdown event. At a minimum, a logging statement would have been a huge help as well.

How many drivers are you at to get below the restart threshold?

Removed two debug drivers I had installed. No devices were running those at the time of the issue.

That’s interesting. I didn’t think drivers with no devices attached would have an impact…

Yes. Very odd. Removed those and it worked for more than an hour. Then I used it discover 50+ devices on the LAN and it’s still running fine. Those unused drivers must be stored in the same flash memory? Not sure

Edit: 50+ was pushing it. Started shutdown sequence again :slight_smile: I’m just happy to finally have a thread to pull on

It is able to run the drivers during discovery, even without devices, so it must live somewhere that allows it to be loaded quickly.

nervously looking at my V3 hub with 22 installed drivers… :grimacing:

1 Like

If you have zwave and zigbee drivers, you can run that number up. There is a shared platform for those on the hub, so they are lighter. The LAN drivers are heavier since each one has to roll its own. They all have different protocols, messaging, etc. If the stream is encrypted, then it really gets big. Most crypto packages are big if you don’t trim out the unused stuff.

For reference, at the time of the issue I was running 6 LAN drivers, 1 virtual device driver, and 10 zwave drivers. I can condense that down quite a bit. Most of the brand specific zwave drivers aren’t necessary. To get things working, I dropped 2 of the LAN drivers that were just development projects.