SmartThings Outage - Mar 12 2018

I’m having a very similar problem with Alexa - it’s as though it’s looking in the wrong place - I’m in the UK in case it makes any difference.

The only inconvenience I faced last night was trying to access SmartThings from my Harmony remote and change a device for a new activity. My automations continued to work just fine.

1 Like

How would a (another) post-mortem increase your trust? Should it?

Have you read the existing (few) post-mortems already in the forum, including by the CEO and CTO and other high level operations management?

I’m sure absolutely everything they said was, is, and will be entirely sincere. But words are cheap. Trust cannot be bought - it must be earned.

4 Likes

No, I haven’t read any of the existing ones. Didn’t know they existed but I’m pretty new to this.

I would have an increased trust If there was a detailed explanation of what happened along with a detailed list of actions that are being taken to prevent it from happening again.

Statements like “we’re working diligently to improve” or “we apologize” don’t cut it.

Here you go: A convenient history of (some) post-incident announcements: Profile - alex - SmartThings Community


(And yup… It’s time for me to stop trolling. :zipper_mouth_face: :speak_no_evil: I seriously just enjoy the laugh; and hope no one takes it personally, including and especially the diligent SmartThings staff. The product is and continues to be amazing and groundbreaking. I just don’t want community folks to expect explanations for the glitches. It’s the wrong thing to expect.).

1 Like

Sounds like either a fragile architecture or a fragile development process, or both.

Just from what I’ve seen over the last year, I believe that there is either NO or very little Change Management process in place and that the developers / engineers are shooting from the hip and implementing things the cowboy way and just shooting things up (based on demand from higher ups). My bet is that they don’t have a true Development to UAT to Production environment / process in place, otherwise they would have a base set of customers testing (true beta - not production beta) in UAT prior to changes being rolled into Production. At least this is how it appears to me as a customer and an issue with 2 out of 3 changes that go in.

Not sure I’d label what’s been happening as “glitches”

1 Like

I’m just providing the data point that as a customer, this is what I would appreciate, if the alternative is a black box of random outages. If you feel differently, that’s fine.

1 Like

In case anyone is having the same problem, I was able to discover the devices by using alexa.amazon.com instead of the app

1 Like

What was the problem?

Would be helpful if you quoted yourself from some earlier post, hundreds of posts up so that people could identify what issue you were experiencing to have to use the web versus the app to discover devices.

The Hubitat community and team are awaiting your arrival. Actions over there are speaking very loudly.

Anyone reading this that has not voted for ActionTiles development on Hubitat, please do so.

4 Likes

I’ve updated the previous posts. I’m just too used to forums with threading. I’d missed the quote button.

1 Like

Yep, saw it. Thanks for that. It will make it much easier for someone to be able to click on that and then go directly up to that post and read everything around it by you or anyone else that replied. :slight_smile:

1 Like

Why is there not a backup redundant / server when one goes down? My goodness… let this show you all what happens when so many depend on this system!

Don’t get me started with Android Presence problems that is just as annoying!!! Aaaargh!!!

Good thing Samsung doesn’t make flight equipment for the airline industry!.. at least I HOPE not!

And everyone keeps pushing the cloud is the way to go… now we see why… lol.

1 Like

If you want that ballpark reliability, be prepared to spend tens of thousands just for a hub.

2 Likes

i dont see why its so difficult to keep a system up assuming they actually test before release and have common sense deployment patterns , the max outage should be the time i take to flick traffic between clusters or the time to restore a DB and its all on AWS so all of it should be automated anyway and unless multiple AWS regions go down there is no excuse for ‘the infrastructure broke’

They are probably following the DevOps philosophy, which means most testing is just done in production. No proper separation between Dev, testing, production, etc.

1 Like

That’s scary, even so the should just be able to failover to the previous version. The longer I use smartthings the more dubious it gets