SmartThings Outage - Jan 03 2018

If the repair was done via the https://account.smartthings.com/login “My Hubs” —> “View Utlities”—> “Repair Z-Wave Network”, there is a link that appears afterwards you can click to view status. That should tell you the specific device with issues.

Running repair multiple times cleared rage problem devices. It can take a while for details to show up in the IDE. Try not to navigate off the page or multi-task as many browsers (i.e. Chrome) nowadays will put a tab to sleep if it’s in the background, so you’ll miss the log output.

I have Ring as well. Haven’t noticed any issues, but now I’m intrigued and will test. :slight_smile:

AWS Status Page Here ——> https://status.aws.amazon.com/

1 Like

I also noticed Ring recording and notification failures around the same time SmartThings was failing. Currently, Ring seems OK, Smarthing seems to fluctuate based upon the responses with the Xfinity Keypad and Lannouncer/Big Talker speech.

I’ve noticed if I go into the /dev interface and select my hub and then List Events, I’m seeing hubStatus zb_radio_off events right around the time a lot of my devices become unavailable. I’m pretty sure it’s my zigbee devices (e.g. Cree bulbs). About a minute after the zb_radio_off event I get a zb_radio_on event. I suspect that when the radio is cycled it takes a while for all the zigbee devices to re-connect.

I have no idea why the zb_radio_off events are occurring, but I got a few of them early this morning (like around 5-6 AM, EST) and the one in the log capture below early this afternoon. I’m not doing anything to my hubs like resetting them or power cycling them. Dunno if SmartThings is able to issue commands from the cloud that are cycling my zb radio. Wondering if this could be a hack/virus?

Event Log screenshot, note the zb_radio_off event at 1:49:43 PM:
Screenshot-2018-1-7 Events List

$160 / month! I pay $50 for 250MBit. You need to call Comcast and all for a better deal.

Hehe, No I don’t. My ISP is for business (24x7), not just residential with 10 additional email addresses, and other options. Downtime. Don’t know what that is with my ISP. Knock knock knock… :sunglasses:

So I have been live on my hub since Black Friday. This outage lingered on longer than I expected. Things seem to be normal for me as of this morning. This is my first outage.

So how frequent are these outages?

I have been using SmartThings for a little over a year and this is the most significant / widespread / longest outage that I have been witness too. Long timers 3 to 5+ years on SmartThings can tell you about outages of the past that top this little one by miles. Outside of the outage there have been a bunch of smaller scale type of things such as the UK having problems with Modes and Routines ignoring those Modes for about 3 weeks. There have been some functional elements that have been removed or broken that to some are pretty significant. Overall my ST experience has been about 90% positive, then again I treat this as more of a hobby type project and don’t allow my home to be 100% reliant on SmartThings. Here’s a topic that might be useful to you to plan for the future:

I have a feeling that this wasn’t a complete outage from start to finish. Day 1 appeared to only affect users on the na02 shard (myself included). Day 2 - All na02 users for the most part were restored (never had an issue after first 24 hours) and na04 users are now affected. Day 3 - Seemed to be more sporadic issues for various users, and maybe another shard. It almost looks like that each day represented a different set of users or shard / url. That’s just a guess based on what was posted. If every single person were to post what country and shard they are on, it makes it easier for us community members to make a little more rhyme or reason as to when, who and what is affected when we haven’t heard anything back directly from ST.

1 Like

There were 7 planned outages in 2017, but some of them had problems and there had to be multiple fixes deployed which resulted in more outages over a couple of days.

There were at least 12 unplanned outages as well.

Sometimes the outages only affect one region, sometimes they affect multiple. Sometimes they only last about 15 minutes but sometimes several days or in the case of device – specific problems, weeks.

Major outages like the one we just had that affected a large percentage of users for 12 hours or more seem to happen about twice a year just based on the historical data.

1 Like

This outage made my wife say don’t they make anything that works. If Homey makes it across the pond I may have to buy me one.

Across the pond which way? Homey started in Europe and has been available there for about a year. Like all systems, it has pros and cons. Like most home automation systems, it tends to score about three stars on Amazon.

The most stable systems right now are probably HomeKit or a Z wave-only system that runs locally ( but then you won’t have voice control). But the trade-off for that stability is much much simpler rules engines, and a much more limited set of devices.

1 Like

Are we really not going to get an explanation from SmartThings on the reason for such a serious outage? Its been 4+ days since the outage, and 2+ days since its been “resolved” on the status page… Can we please get an update on root reason on how the company is mitigating it for future?

4 Likes

Agree. Came back to this thread in the hope we’d have an explanation.Smart Things staff where are you?

Overall my system has been completely stable, so this was for me a small annoyance. (I’ve tried to ensure that there is a physical back up for everything and tried to use locally running handlers and smart apps).

JD Roberts is once again very helpful in suggesting strategies and plans. At this stage in the evolution of smart home applications we need to realize this is necessary - even when they appear to be consumer friendly and marketed to non geeks.

Agreed. My house was down and possesed for over 20 hours and not even so much as “uhhhh… sorry 'bout that”

Nice.

They are following the latest trend in USA personal behavior. “Never admit to making a mistake, never apologize when you are caught red handed”. What you experienced was the system being “improved”

2 Likes

My ST is down again.

Everyone use the current thread posted above.

This for the last outage.

2 Likes

Closed so we can manage feedback efficiently.

2 Likes