I understand, the team mentioned that the fragmentation of the packet shouldn’t cause the issue and if you could replicate it locally, it was something else as you mentioned, they also suggested another test:
Split the HTTP head into two messages manually to see the behavior, just make sure the CRLFs are in place, for example:
local socket = require "cosock.socket"
local s = socket.tcp()
s:send("GET / HTTP/1.1\r\n")
Thanks Nayely. If the team says packet fragments are not a problem, then I won’t pursue that path. I believe! I’ve switched back to using socket.http in my driver since trying to do everything with TCP sockets was turning into a big chore!
I’ll have to keep trying different things to see if I can figure out what the problem could be.
Thanks for your patience @TAustin, I think right now the distinction is that how the bytes for the HTTP request are sent should not cause a problem for any server that is not making bad assumptions about how TCP works. It is entirely possible that a server might make a bad assumption that the first read of a chunk of bytes on a socket will contain a full set of headers as well as the start line which is definitely not guaranteed the case and even if a client performs a send/write to the OS of these bytes all at the same time (there are many layers of buffering, parameters impacting the MTU for TCP chunks, etc.).
As mentioned by @nayelyz if the server is buggy in this way (and the root cause here isn’t some other issue) the fallback is probably to try to do the socket calls directly to have a better chance of chunking in the way that the server expects.
Hopefully that makes sense; we generally are trying to provide a very consistent implementation of APIs like luasocket for consistency and ease of testing but in this case it seems like it is possible that the timing of adjacent calls to send chunks via TCP through the additional layers involved with the sandboxed execution environment may result in slightly different behavior in terms of the segments emitted at the transport layer.
Hi Paul - Thanks very much for the response on this. I am still struggling with this problem, but here’s what I know so far:
I don’t think it’s a ‘chunking’ problem (although let’s be careful with that term since these HTTP requests do not include a body; we’re just talking about the startline and headers). The reason I don’t think it is, is because I have been able to capture tcpdumps of even a browser sending the data in separate TCP messages, and it still worked fine
I have bent over backwards to ensure that my headers are IDENTICAL to those sent by a browser. This is no easy task in Lua, as the socket.http module sticks in a ‘TE’ header, and you normally have no control over the ORDER in which the headers are sent (thank-you Lua tables ). Wondering if the order made any difference, I patched the socket.http module to brute force the headers, including the order. Still no joy.
All this testing is done off-hub using Lua5.3. So it’s not even really an Edge issue, it’s a Lua issue. I’ve also tried sending the same HTTP request from a Python script using the requests module, and IT works just fine. EDIT: I just tried using curl, it it works fine as well!
So at this point I’m at a loss. I have all the tcpdump logs from each of these scenarios, so perhaps someone with more expertise could look at them and pinpoint the issue. If you have anyone on staff with those abilities, I’ll package them up and send them to the firstname.lastname@example.org account. I haven’t yet because I wanted to keep testing in hopes I’d eventually figure it out; and again, it’s not an Edge issue per se.
Out of curiosity, when you get the ‘closed’ responses, what is going on at the other end? Is it receiving and acting on the requests? Is there any difference between response messages sent when you get ‘closed’ and when things look more conventional?
Perhaps most importantly … Do things actually make sense if you don’t consider ‘closed’ as an error?
I think the original poster was getting closed errors. I’m getting no return at all from the device at times, so the socket receive times out. I’m beginning to think perhaps my problem is due to a bug in the device firmware.
There is a separate issue with devices that quickly close the connection after sending data. What happens is that if you have a listening socket server and you get the connection request, by the time you do a getpeername to find out the IP address of the sender, you get an error that the connection is closed. You can proceed to receive the data, but unless it has self-identifying info in it, you won’t even know who it’s from!
I am seeing these errors as well. It seems to be related to the load of requests. If I hit an API that returns a large amount of data, or make a bunch of calls very rapidly, I will see these errors. For now, I am simply retrying the calls when I see this. The retry rarely works, probably because it is adding to the load. This started showing up for me when I integrated the Hue api which can return thousands of lines of json when calling their API.
I’ve been able to use wireshark to trace these cases and from my instances it is one of two cases:
I am sending data TO the device and it doesn’t like it for whatever reason and simply ignores the message and closes the socket
The device is sending a message to me and immediately closes the socket as soon as it’s done transmitting, not waiting for any acknowledgement. With all the coroutine stuff going on in the hub, it seems that sometimes by the time my driver gets a chance to accept the connection and receive the data, the socket is already closed. The data is usually still available to read, but there is also a bug where getpeername fails so you can’t get the IP address of the sender (getting a socket doesn’t exist error - probably a separate issue).
To clarify, I am seeing it with http GET requests on the hub. To see this, I am not doing any socket work on my own. I have not been able replicate this with the curl or any other desktop GET requests against that same API.