SmartThings Outage - Jan 03 2018

that’s a pretty good summary of what’s happening. An outage like this makes me doubt the architecture of the entire platform. Not the first time I’ve had those doubts either. C’mon Amazon, fully embrace the home automation space and put SmartThings out of business.

1 Like

I did. Originally thought i was preventing the links by leaving out the https

Did not want someone clicking the bad one.

6 in one, half a dozen in the other?

1 Like

This was the longest unplanned outage I have experienced with ST in over a year.

Outside of some minor inconveniences, nuances or weird idiosyncracies, something like this can happen, does happen and will happen again, so for those of you that rely on HA / ST to be available 100% of the time, you need to have a backup plan in place ahead of time so that you aren’t consumed and paralyzed by an outage. (IE Making sure you can flip light switch off and on to control lighting, or a secondary app ie: Hue Bridge to control bulbs outside of ST, access to normal outlets to plug things back into where they were plugged into Smart Outlets before, access directly to a thermostat or through that devices app.). And if the power goes out completely, well make sure you have candles, a flashlight and a battery operated radio.

2 Likes

Yep, huge delays for me too as of this morning. I was hoping things like sensors and light switches would be more responsive locally. Good five minutes delays for me

Regarding using Hue lights w/the Hue Bridge to provide control when ST is acting up - the problem for me was that last night when the ST Minimotes wouldn’t turn off our bedroom lamps, I tried to turn off the two Hue lights using the Hue app, but then ST kept turning the lights back on again.

The wife was not impressed with what she called “Your dumb-home stuff.” <eek!>

1 Like

[quote=“WB70, post:63, topic:113455, full:true”]
This was the longest unplanned outage I have experienced with ST in over a year.

WAS??? Still IS for me.

:joy: That’s a fun one. Probably pieces of ST still working / communicating (Routines or Smart Lighting most likely). That’s when you run to the light switch and just turn it off so the bulbs can’t come back on again and so she doesn’t see it.

I don’t doubt it. I haven’t opened the app or logged into IDE since I woke up and since I experienced it last night. Staying completely out of it until I hear that things appear operational again. Have my fail safes in place so that it doesn’t impact my home without ST in the mix.

All my Routines / Pistons for let’s say Motion detected to turn on lights don’t function after Sunrise. Then I still have Alexa with the ability to turn on/off Local Devices (Smart Outlets / TV) and Nest Thermostat functions.

We are all living on the edge. :grin:

I could live with it if there seemed to be any forward progress towards stability from Samsung, but they don’t seem terribly interested.

I’m still baffled that user workloads seem to be permanently tied to “shards” in their system, and that they can have a 2 day outage on one of those shards without the ability to move affected customers to working hardware. This is way too brittle and smells badly of an under resourced DevOps team doing things by hand ala 2002.

If samsung cared they would have immediately fixed that situation on purchase of SmartThings. It seems they have zero grasp of the ability of smartthings to negatively impact the samsung brand as a whole, but that’s where I stand with it. Samsung’s inability to make SmartThings work makes me wary of buying a washing machine or TV from them. Why would anybody think their QC and managerial oversight is any better in those divisions?

Yeah, that’s what I told her to do, and that’s when she coined her “dumb-home” comment. :slight_smile:

As always, everything that doesn’t work is directly my fault.

3 Likes

Oh yes, because Amazon is perfect.

How a typo took down S3, the backbone of the internet

AWS’s S3 outage was so bad Amazon couldn’t get into its own dashboard to warn the world

You might be surprised to know this, but SmartThings uses AWS. Hmmmm…

1 Like

I’m not surprised to know they use AWS.

I use AWS all the time and while they have outages too, I’ve never seen one that lasted going on 12 hours like this one has. It’s not about the outage, mistakes happen, its about your ability to recover.

I remember that S3 outage, and I was largely unaffected because I don’t use much S3, and I keep most of my stuff in us-west-2 which was not affected. But I also have the ability to rebuild in another region in about 45 minutes if necessary because I rigorously capture my infrastructure in code.

I have that much devops redundancy and I’m a 1 man team for my company. The fact that a single “shard” can be broken for so long makes me think that each shard is a one-off, long-running snowflake. This is a terrible architectural decision if so and as I said elsewhere, smells like an under resourced team managing their system in ways that are a decade out of date.

I think the same thing everytime I log into the IDE and it doesn’t properly route me to the na02 shard. I get that “You don’t have any hubs yet…” message until I click on “My Locations”, and then suddenly my hub appears. That functionality has literally been broken for YEARS. While amazon has its quirks, I’ve never seen that kind of longstanding kludgery from them.

I have seriously question Samsung’s commitment to this product and their technical and managerial ability to EVER deliver a robust, reliable solution. It appears to me that they’re just sitting on smartthings as some sort of globo-corp strategic play.

I don’t have similar questions about Amazon. When they do something they do it.

1 Like

yup.

Post must be 10 characters so here’s some more.

1 Like

Hehe. Well technically it is. You got her into HA in the first place and when it’s working, she loves it, but when the first thing goes wrong, it’s Negative Nellie or Debbie Downer, am I wrong? Have to find that happy medium where in the event of failure, she can operate as we did during the days of Little House on the Prairie with as little of negative impact as possible.

1 Like

I used to know how to quote part of someone else’s post, but now can’t figure it out. What am I doing wrong?

Highlight all of the text you want to quote and then press “Quote”

My wife made a similar comment when when she got me out of bed because she couldn’t turn a lamp off and I had removed the screw-on lamp switch to prevent someone from manually turning it off (doh!) I had to bite my tongue not to ask her why she didn’t just think to unscrew the light bulb and let me sleep.

2 Likes

giphy (4)

Using my yet-unreleased APC UPS DTH which pushes status every 60 seconds, it appears that latency has dropped back to 19 minutes.

1 Like