SmartThings Outage - Mar 12 2018

You’ve been studying a 7 second video for hours?

2 Likes

the fine motor skills to operate a light switch manually are slowly coming back to me.

6 Likes

I am still not able to get devices from ST to my Harmony. It just shows this:

So, it wiped out all my functions because someone from ST support said to re-login

The SmartThings platform is obviously not able to handle the requirements that customers expect of it. The question is whether or not SmartThings / Samsung management feels it is sufficient to meet their internal goals.

SmartThings has had outages much worse than recently - not that it is any consolation.

Not necessarily in the perspective of the business:

  • Email is extremely critical! How many businesses could operate with Gmail down for 12 hours? How many $ billions in lost productivity across the nation world?
  • eBay has $ millions in transactions daily. A serious outage of the eBay platform would send shareholders running for the hills.

SmartThings explicitly states in the Terms of Use that their product is not to be used for anything critical. There are no immediate financial consequences if the platform is down - There’s no monthly or annual or transactional fees, so zero revenue is lost.

You’ll say that reputation is lost and that means lost “future customers”. The mega outages of the past few years haven’t stopped the company from continuing to grow and for Samsung to continue to express confidence in the long term prospects, nor has it lowered the price of Samsung’s stock by even $1.00 (or whatever Korean currency it is traded in).

I’m not saying that ST doesn’t take these problems seriously; but “serious” is relative. As a customer, the impact on us individual is substantially different than the impact on Samsung’s long-term business goals.

5 Likes

I wish…:wink:

15 Likes

Different “shards.” It’s not at all uncommon for a problem to affect only one shard.

2 Likes

So i guess the question which will be answered if they publish root cause is what did they test on our shard that they didnt on the UK shard.

If course with he customer support outage also one might think they had a datacenter/hardware failure.

Ummm… How long have you been around here?

The assurances during the “great outage of spring 2016” didn’t prevent yesterday’s outage. I write this over, and over, and over again: Why are folks so anxious to hear meaningless words and promises that are more likely to be broken than kept? There are plenty of promising posts from multiple levels of the ST organization (all the way to the C-level) which exist on the forum.

No disrespect or insult intended: I’m just genuinely curious why folks have such psychological “neediness” for “promises” to the point of being blind to all the contrary evidence here in the forum in black and white… :confused:


It is much less emotionally taxing (to me, at least), to not hear any promises: Under-promise / over-deliver. Exceed expectations rather than setting them.


A convenient history of some post-incident announcements: https://community.smartthings.com/u/alex/activity

4 Likes

Traffic patterns can also be very different, from DOS attacks to just new product releases causing spikes in new accounts. Sometimes it’s just a “hotspot“ in a database and not really predictable. (Google “Cassandra hotspots”).

The point is just that the shards are isolated from each other, so a Cascade failure or load problems will rarely affect more than one shard at a time even if they are running identical code and hardware.

3 Likes

I’m having a very similar problem with Alexa - it’s as though it’s looking in the wrong place - I’m in the UK in case it makes any difference.

The only inconvenience I faced last night was trying to access SmartThings from my Harmony remote and change a device for a new activity. My automations continued to work just fine.

1 Like

How would a (another) post-mortem increase your trust? Should it?

Have you read the existing (few) post-mortems already in the forum, including by the CEO and CTO and other high level operations management?

I’m sure absolutely everything they said was, is, and will be entirely sincere. But words are cheap. Trust cannot be bought - it must be earned.

4 Likes

No, I haven’t read any of the existing ones. Didn’t know they existed but I’m pretty new to this.

I would have an increased trust If there was a detailed explanation of what happened along with a detailed list of actions that are being taken to prevent it from happening again.

Statements like “we’re working diligently to improve” or “we apologize” don’t cut it.

Here you go: A convenient history of (some) post-incident announcements: Profile - alex - SmartThings Community


(And yup… It’s time for me to stop trolling. :zipper_mouth_face: :speak_no_evil: I seriously just enjoy the laugh; and hope no one takes it personally, including and especially the diligent SmartThings staff. The product is and continues to be amazing and groundbreaking. I just don’t want community folks to expect explanations for the glitches. It’s the wrong thing to expect.).

1 Like

Sounds like either a fragile architecture or a fragile development process, or both.

Just from what I’ve seen over the last year, I believe that there is either NO or very little Change Management process in place and that the developers / engineers are shooting from the hip and implementing things the cowboy way and just shooting things up (based on demand from higher ups). My bet is that they don’t have a true Development to UAT to Production environment / process in place, otherwise they would have a base set of customers testing (true beta - not production beta) in UAT prior to changes being rolled into Production. At least this is how it appears to me as a customer and an issue with 2 out of 3 changes that go in.

Not sure I’d label what’s been happening as “glitches”

1 Like

I’m just providing the data point that as a customer, this is what I would appreciate, if the alternative is a black box of random outages. If you feel differently, that’s fine.

1 Like

In case anyone is having the same problem, I was able to discover the devices by using alexa.amazon.com instead of the app

1 Like

What was the problem?

Would be helpful if you quoted yourself from some earlier post, hundreds of posts up so that people could identify what issue you were experiencing to have to use the web versus the app to discover devices.