Hub disconnect and flapping (Nov 2019)

Went into the app to check a battery status and noted lots of devices showing offline. So logged into the api and my hub appears to be flapping a lot over the last 30mins. First started 2019-11-04 1:01:40.483 PM CST with a disconnect and .1 sec later a ‘now active’. It did this every 1 min for at least 10-15 mins according to Hub Events. Its been ‘stable’ I guess for about 10mins now but still devices not all online…waiting before I start bouncing things. But yea…middle of the day something. Internet has not been an issue, Ive been here connected, watching streaming video, on an active irc…no disconnects in that time period.

https://status.smartthings.com/incidents/4jjbmwgsh1g6

Monitoring

Between the times of 1:55 PM and 2:20 PM ET, some users in the Americas may have experienced issues with delays or intermittent failures in device control from the mobile app and automations. Users may have also seen issues with Hubs incorrectly showing offline.

These issues have been resolved but some users may still see delayed device health status updates while we ensure all services fully recover. We will provide updates as available.

Posted 5 minutes ago. Nov 04, 2019 - 14:49 EST

1 Like

Yea I kinda laughed that within about what, 10-15 mins of me making this post…they throw up that status page. Kinda like ‘no one notices so lets not worry about it’… Having worked in a system where its like…no one notices and complains, its not a problem right… I understand, sorta

I’ve been having loads of issues the past 72 hours. Granted, I have a boat load of SmartThings stuff deployed so I’m going to encounter the issues way more easily and frequently than your average consumer. I informed support of two different service issues days ago and they only just recently put out the two degraded performance alerts. The response to all of this has been rather disappointing.

FWIW, the offline devices appear to be operating normally, just reporting a false status. I also have been unable to add or manage hubs with the app on both Android and iOS.

Well I’m sitting at about 14% of my devices in either HUB_DISCONNECTED or offline status. Have done a reboot hub from the api utilities…but still having issues after that. Have waited about 15mins now since last reboot without any measurable change. Ill give it a bit longer before pulling power for 15+mins …but this is actually the worst ‘outage’ ive had in months…kinda sad too

I have a device that says off-line in the IDE, but it’s not off line. It’s an ST water sensor that I just tested with a wet paper towel and it’s working fine.

The status page is updated by a team that strives to ensure they capture and effectively communicate the scope of issues. It is understandable feeling the status page update is reactionary but ongoing monitoring kicks off the process in advance of user feedback.

As referenced, there are still some delayed device health statuses. Disconnecting your hub may bring devices back online but can also contribute to the backlog of events that need to be handled before device health events are processed in real-time again.

2 Likes

Believe me I understand. I’ve worked in the operations side of sysadmin for 15 years. There’s a fine line of too much info. Monitoring systems that are too talkative tend to mean they get ignored, still why i think Nagios was named after being nagged. So I get the point of a bit of lag on the status page. But same time, just like with targeted ads…when you ‘feel’ the targeting is when it gets creepy. Like people saying their phones are listening to them talk and seeing an ad for pizza. When you complain of an outage, and then the status page is updated…it feels … reactionary :wink: know what I mean.

1 Like

Still happening and worse now.

Agreed. Little over an hour ago I was down to like 4 devices saying hub_disconnected (so about 4%), now easily half of my 104 devices saying disconnected. As Brad stated this is a lot to do with health checks…but its not getting better it seems.

I currently have 67 out of 223 devices showing either offline or HUB_DISCONNECTED.

Edit: Now 91 devices either offline or HUB_DISCONNECTED

This is happening all too often. I wish Samsung would get their act together to prevent these kinds of issues. They will lose customers if they don’t…

Email came from the status page saying ‘this issue has been resolved’. Yet I still have many devices reporting as not connected. Sooooo if this is ‘resolved’ I’m not sure for whom cause its not me

Ok, yet another ‘reboot hub’ from within the API finally got it to have things show up as online.

@Brad_ST are we seeing this again? 3 hours ago, starting at 5:58am cst till at least 6:22 I am seeing same flapping state on the hub. Again not having internet issues. Also now at least 10% of my zigbee devices are stating ‘offline’ not ‘hub disconnected’ tho… Haven’t touched the hub yet

Welp spoke too soon, of the 10% (14 devices) 4 now are hub disconnected. Is device health taking a poop again? Are we back to where health checks are going to be total crap for a while like in the past? Only reason I even went to look is because my kitchen motion sensor battery is finally dead I think. Over 2 months of ‘omg 1%’ warnings before it gets to the state of constant motion events thus signaling the battery really dead. Went to look and boom tons of offline devices. My usual stance of ignorance is bliss works usually guess not now lol.

Annnnddddd…1hour after posting we get status update email :slight_smile: batting 1000 on calling this, at this point, weekly issue :wink:

How is it that this issue continues on a regular basis. It is as though things are just being restarted so that they will run a while but the real problem (what ever it is) goes unresolved.

With the recent issues, including the fact that for over a week I have been unable to add a hub with awful response time from support, I think I’ve learned the hard way that SmartThings cannot be trusted to be stable. I really like the concept, but it feels like there’s a major lack of resources or lack of leadership driving ST into a state of chronic unreliability.

Well I’ll say this… so far it didnt affect my automation at all. Only reason I knew something was up was I was in the process of changing batteries. Just like last week, changing battery in a ST motion sensor that has been doing the usual ‘nearly constant motion’ thus signaling battery is finally dead…despite telling me its at 1% for many many months.

I’ve found that with proper ‘ST minded’ setup, I rarely see an issue with operation. Everything was automating as expected, just when I went into the app things weren’t online. I so rarely go into the app since I setup automations, that most this stuff is transparent to me. I’ll give them that, its defiantly much better than 3 years ago. Sure its not the fanciest of setups…but every room in the house, garage, and the outside is fully automated. The only light switch I hit is the bathroom cause I dont want to automate it. So yea… if you setup the system with a mind of how ST see’s people using it…its pretty stable.

But there’s always holes…