Socket with uuid * does not exist, and other LAN oddities


Oddity 1. socket with uuid * does not exist

Consider the following simple edge driver …

local cosock    = require "cosock"
local Driver    = require "st.driver"
local log       = require "log"

Driver("bug", {
    discovery = function(_, _, should_continue)

        local address = ""
        local port = 2222

        local block_size = 128
        local block_count = 16

            .. "# on linux host reachable by smartthings hub on address "
            .. address
            .. ", run:\n"
            .. "while :; do dd if=/dev/zero bs="
            .. block_size
            .. " count="
            .. block_count
            .. " | socat - TCP-LISTEN:"
            .. port
            .. ",reuseaddr; done"

        while should_continue() do
            local socket, socket_error = cosock.socket.tcp()
            if socket_error then
                log.error("socket", socket_error)
                local _, connect_error = socket:connect(address, port)
                if connect_error then
                    log.error("connect", connect_error)
                    local receive_count = 0
                    while true do
                        local whole, receive_error, part = socket:receive(block_size)
                        local receive = whole or part
                        if receive then
                            receive_count = receive_count + #receive
                        if receive_error then
                  "receive", receive_count, receive_error)

… whose discovery implementation connects and reads everything it can from a server every second until discovery no longer should_continue.

As the code suggests, one can implement the server simply on a linux machine at the address referenced in the code (change as necessary).

while :; do dd if=/dev/zero bs=128 count=16 | socat - TCP-LISTEN:2222,reuseaddr; done

It should work fine but often it does not and I see this in the log as a connect_error.

socket with uuid `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx` does not exist

What can I do about that?

Oddity 2. First part of HTTP response missing

I am writing a LAN-based edge driver that interfaces with its devices (for commands and refreshing status) using HTTP. I always expect a complete, well-formed HTTP response but, occasionally, the first part of the response will get dropped and the first part that my code will see will be mid-stream.

I don’t see this behavior outside the SmartThings hub.

I have port-mirrored my SmartThings hub to my Linux development machine on my switch and witnessed a complete, well-formed response from the device on the wire being seen by my edge driver as starting mid-body. This clearly suggests that something is being dropped in the hub.

I wrote the driver above in effort to demonstrate this simply but, so far, it has just exposed another oddity (which I had occasionally seen before as well).

Oddity 3. socket:receive returns success and error all at once.

I am not sure this is a SmartThings-specific oddity or not but, regardless, it certainly seems odd to me. I have always considered these things (success and error) to be mutually exclusive.

However …

local whole, receive_error, part = socket:receive(2048)

… will sometimes both return (successfully) something in whole or part but at the same time will also return closed in receive_error.

Oddity 4. TCP listener on INADDR_ANY needs to restart on IP address change.

When the SmartThings hub decides it needs to change its IP address (for example, when its DHCP lease cannot be renewed on its old address), TCP listeners on any/every interface/address (INADDR_ANY) are no longer reachable. Restarting them fixes the problem but one should not have to do this. The socket should continue to be bound to any/every interface/address.

Oddity 5. cosock.socket.bind shortcut does not work.

This cosock shortcut creates the tcp socket, binds it, listens on it and then sets the reuseaddr socket option to true. I have never seen this strange behavior before. Even the code is confused about it

-- I don't know why, but this is what the docs say
ret, err = skt:setoption("reuseaddr", true)

I don’t know what/where “the docs” are but usually such reuseaddr behavior is required before the bind as that is where address reuse becomes an issue.

Anyway, much worse behavior is seen on the SmartThings hub where my code fails in this setoption call. I stopped using the cosock.socket.bind shortcut and do the tcp create, bind and listen steps myself.

1 Like

Hi, @rossetyler

I’m checking this with the engineering team, so far, we have the following info:

Case 1. socket with uuid * does not exist
This is a bug, so the team will work on the fix.

Case 2. First part of HTTP response missing
This requires more investigation, in case, it is an issue on our side, the engineering team will work on that as well. In case we need more information, I’ll contact you directly.

Case 3. socket:receive returns success and error all at once
It is not clear what could be causing this behavior but the results in the variables depend on the following:
sock:receive returns up to 2 values:

  • If the first value is nil then the second value will be a string and the 3rd value may be a string
  • If the first value is not nil then the other returned values should be nil

Is that consistent with what you’re experiencing?

Case 4. TCP listener on INADDR_ANY needs to restart on IP address change.
The interface used on the Hub is set explicitly, so when we use INADDR_ANY it is actually looking up for the correct IP address and binding the socket to that address.
Have you experienced frequently the change of IP address for the Hub?

Case 5. cosock.socket.bind shortcut does not work.
I understand the shortcut you’re referring to is setoption, right?
Where did you found the reference you mention? I only found it is not implemented for TCP sockets here

Yes I can confirm this happens often, once every few weeks

  1. socket:receive returns success and error all at once

The behavior that I am seeing is not SmartThings specific.
I have seen it on my Fedora Linux Lua implementation(s) as well.
Indeed, the “odd” behavior that I see is consistent with the documentation:

In case of error, the method returns nil followed by an error message which can be the string ‘closed’ in case the connection was closed before the transmission was completed or the string ‘timeout’ in case there was a timeout during the operation. Also, after the error message, the function returns the partial result of the transmission.

When reading a HTTP stream, unless you are reading 1 byte chunks, almost every successful read will be a partial result. Sometimes these successful reads come with a closed error at the same time. I guess this is just the way it is as it is consistent with the documentation. I still find it very odd (success and error all at once).

  1. TCP listener on INADDR_ANY needs to restart on IP address change.

Yes, apparently. My point is that it should not behave this way. INADDR_ANY implies any (that is, no) address binding. On Linux, binding a socket to INADDR_ANY behaves as I expect. This is what most Linux services do …

netstat -tl | grep ssh
tcp        0      0   *               LISTEN    

… and because they do so, they don’t have to restart when interfaces come and go or are readdressed. ( is IPv4 INADDR_ANY).

Yes, but only because I have induced it to test for this possibility. Unless one (has the ability to and) creates a static DHCP lease for the SmartThings hub (I do and I do), one should expect that its address might change and code defensively. An edge driver that does not restart its TCP listeners when this happens will be broken until the driver is reloaded (e.g. the hub is rebooted). An edge driver should not have to do this and would not have to do this if binding to INADDR_ANY worked as it does on Linux (as I expected).

This is a case that, I imagine, many SmartThings edge developers will overlook because they are used to better/correct INADDR_ANY binding behavior. When their edge driver breaks months/years into deployment, it will be very hard to figure out why. This could/should be corrected in the SmartThings edge environment to honor the INADDR_ANY binding intent.

  1. cosock.socket.bind shortcut does not work.

The “shortcut” that I am referring to is cosock.socket.bind.
See its definition in lua_libs-api_v0_OPEN_BETA_41X/lua_libs-api_v0/cosock/socket.lua.

This bind is not the same as the bind method on a cosock.socket.tcp instance. It is a shortcut for the tcp object’s bind method after it creates the tcp object. It also bundles in a listen call and a setoption call. So, it is a shortcut for these 4 calls (see the code).

In the SmartThings hub environment, the cosock.socket.bind shortcut does not work because the setoption call fails. Again, neither I nor the code seems to know why this call is even made.

Thanks for the details, I already shared them with the team. Once I get feedback about that, I’ll let you know

Ok, the team already located this issue in the helper that makes the three calls (tcp() , listen() and bind()), so, they will fix it, there’s no ETA when the change will be made.
For now, you need to make the calls separately. For example:

local client_sock = cosock.socket.tcp()
client_sock:bind('*', 80)