SmartThings Outage - Mar 12 2018

More blogs like this and that might begin to get the attention of the CEO or someone responsible enough to get whoever is in charge out of there (including the CEO). That and the staff to cause a mutiny.

Get enough of these posts publicly by some bigger news outlets and I guarantee more attention will be paid to the internals going on.

2 Likes

I guess that since I am new I can not send a message. The message option for me is missing. So hopefully @gausnes will see my message.

He will see it directly because you put in his name with the @ symbol.

He probably has 100s of messages already, so just be patient.

Yeah, my ST Hub sits right next to my Wink hub and I’m thinking I should transfer functions over…

We do internal postmortems on any outages. I can bring up publicizing some of this information during our retro.

7 Likes

We have been asking for this for a long time. Even if it was just for a complete Platform Outage versus ticky tack stuff.

Nick had stated this before:

Continuing the discussion from Hub Firmware Release Notes - 20.17:

Even if it wasn’t posted publicly and documented on status.smartthings.com and included root cause, what was impacted, who was impacted, for how long and what the specific resolution was, would go a long way so that customers could understand what transpired and why.

3 Likes

So what the difference between North America and UK which is still online…Hardware? Dev Teams? Software Versions?

I have been studying this video for hours and trying to educate everyone on how it works. https://youtu.be/8UQsKy1llXA :slight_smile:

You’ve been studying a 7 second video for hours?

2 Likes

the fine motor skills to operate a light switch manually are slowly coming back to me.

6 Likes

I am still not able to get devices from ST to my Harmony. It just shows this:

So, it wiped out all my functions because someone from ST support said to re-login

The SmartThings platform is obviously not able to handle the requirements that customers expect of it. The question is whether or not SmartThings / Samsung management feels it is sufficient to meet their internal goals.

SmartThings has had outages much worse than recently - not that it is any consolation.

Not necessarily in the perspective of the business:

  • Email is extremely critical! How many businesses could operate with Gmail down for 12 hours? How many $ billions in lost productivity across the nation world?
  • eBay has $ millions in transactions daily. A serious outage of the eBay platform would send shareholders running for the hills.

SmartThings explicitly states in the Terms of Use that their product is not to be used for anything critical. There are no immediate financial consequences if the platform is down - There’s no monthly or annual or transactional fees, so zero revenue is lost.

You’ll say that reputation is lost and that means lost “future customers”. The mega outages of the past few years haven’t stopped the company from continuing to grow and for Samsung to continue to express confidence in the long term prospects, nor has it lowered the price of Samsung’s stock by even $1.00 (or whatever Korean currency it is traded in).

I’m not saying that ST doesn’t take these problems seriously; but “serious” is relative. As a customer, the impact on us individual is substantially different than the impact on Samsung’s long-term business goals.

5 Likes

I wish…:wink:

15 Likes

Different “shards.” It’s not at all uncommon for a problem to affect only one shard.

2 Likes

So i guess the question which will be answered if they publish root cause is what did they test on our shard that they didnt on the UK shard.

If course with he customer support outage also one might think they had a datacenter/hardware failure.

Ummm… How long have you been around here?

The assurances during the “great outage of spring 2016” didn’t prevent yesterday’s outage. I write this over, and over, and over again: Why are folks so anxious to hear meaningless words and promises that are more likely to be broken than kept? There are plenty of promising posts from multiple levels of the ST organization (all the way to the C-level) which exist on the forum.

No disrespect or insult intended: I’m just genuinely curious why folks have such psychological “neediness” for “promises” to the point of being blind to all the contrary evidence here in the forum in black and white… :confused:


It is much less emotionally taxing (to me, at least), to not hear any promises: Under-promise / over-deliver. Exceed expectations rather than setting them.


A convenient history of some post-incident announcements: Profile - alex - SmartThings Community

4 Likes

Traffic patterns can also be very different, from DOS attacks to just new product releases causing spikes in new accounts. Sometimes it’s just a “hotspot“ in a database and not really predictable. (Google “Cassandra hotspots”).

The point is just that the shards are isolated from each other, so a Cascade failure or load problems will rarely affect more than one shard at a time even if they are running identical code and hardware.

3 Likes

I’m having a very similar problem with Alexa - it’s as though it’s looking in the wrong place - I’m in the UK in case it makes any difference.

The only inconvenience I faced last night was trying to access SmartThings from my Harmony remote and change a device for a new activity. My automations continued to work just fine.

1 Like

How would a (another) post-mortem increase your trust? Should it?

Have you read the existing (few) post-mortems already in the forum, including by the CEO and CTO and other high level operations management?

I’m sure absolutely everything they said was, is, and will be entirely sincere. But words are cheap. Trust cannot be bought - it must be earned.

4 Likes

No, I haven’t read any of the existing ones. Didn’t know they existed but I’m pretty new to this.

I would have an increased trust If there was a detailed explanation of what happened along with a detailed list of actions that are being taken to prevent it from happening again.

Statements like “we’re working diligently to improve” or “we apologize” don’t cut it.