SmartThings Outage - Jan 16 2018

However, that will only work if:

One) the minimote is working. The stock DTH For these broke for a week or so earlier this month. :disappointed_relieved:

Two) the sirens are using a stock DTH so it can also run locally. Many people use custom DTHs for sirens to access the advanced features of the device.

Three) and as @epj3 Points out below, even if both the minimote and the sirens are working locally, the Minimote can only turn off the Siren, it can’t disarm smart home monitor. (Nothing can change the mode or the smart home monitor armed state locally. Both of those changes require access to the cloud.) So the siren may keep going off because of one of the rules you set up for security. :disappointed_relieved:

1 Like

Thanks – I wish your advice were the solution, but it isn’t. I have two Aeon Minimotes, one on both my and my wife’s nightstand. I haven’t found them to be entirely reliable. Buttons 1 and 2 control the nightstand lights (long press controls all lights in the house). Buttons 3 and 4 short press silences or activates the sirens, long press performs “Good night” or “Good morning.”

I already have the same rule that you posted, which is great turning the siren off. The problem is that I wasn’t able to disarm at all, even through the Aeon remote. This means that every time a motion sensor or door switch is tripped, the siren made sound again. That’s extremely inconvenient, since I tend to move around my house from time to time :slight_smile:

Quite literally the only option last night was to unplug and remove the hub’s batteries.

1 Like

Really? I’m having no issues on my end. I’m also not using a multi-endpoint device. I’m choosing the device and lighting SL prompt for which button to use. Rock solid for me.

Very true. For me, I just use the Utilitech Z-Wave sirens which have no special features and run locally.

That’s odd. SHM doesn’t do that for me. Once an alarm is tripped, it’s tripped until I acknowledge it. If I turn the sirens off, they do not re-trigger. I keep the complexity of my SHM configuration to a minimum since it is fraught with limitations, this being one. :slight_smile:

Down again as well. Can’t control anything from the app. No logs in the IDE. Just rebooted maybe that’ll help. Doubt it. Great time for another outage.

If only Home Assistant worked with Garage Doors and “security” devices… ahhh.

Mine is working fine. I saw your post, pulled up the app and cycled through all the routines. All is well here in South LA. Perhaps you have a local internet issue interfering? or maybe ST is carried through redundant servers and I have “the good one”.

Not local, this is being reported by multiple forum members. But not affecting everyone either:

I see the multiple reports, I guess I am not understanding how these outages only impact select users but not all. For example, setting the alarm settings and disarming the alarm settings is cloud based, right? I can still do it and did so this morning during the outage. I was not aware of it until I saw the post. I’m just trying to understand how I have managed to evade two outages now even using the Smart Home Security (granted to test it, not actually use it yet).

I am assuming there is multiple servers so the outages only impact some servers but not all? I have no idea.

They must be working on/doing something behind the scenes, as it’s only recently started to act up like this, been working “flawlessly” (sort of) since I bought v2 a couple of years ago…
Would have been nice if they told us/kept us in the loop, then we probably wouldn’t have turned to anger/frustration of this…
All this happening and not a word of what it is, is only causing people to change platforms and never go back again, and there’s also the word of recommendation to others that are wanting in on a HA system…
Staff should consider this I think…

1 Like

We all have different shards (urls) that we login to and the first outage didn’t impact everyone at once. The first 12 hours only impacted na02-useast1 users and then after we were all backup and running, it appeared that all users on na04-useast2 were then impacted. It was a rolling outage. The outage yesterday impacted everyone globally (I think) at the same exact time.

If you login to https://account.smartthings.com, you will be redirected to your shard and that’s the server that your account is assigned to.

Now as for the centralized backend servers, whether we are all reading / writing from/to the same set of servers or if each of these shards have their own dedicated set of servers within that environment, I don’t know how the overall architecture / infrastructure is built on the backend.

1 Like

Quick update - I can now see events in my live log and control from the app is working. Have to wait and see how long things work.

I have not been impacted at all today on na02-useast1. Because I’m not seeing posts from a bunch of people, I’m going to guess this is isolated to you and your account.

Thanks for the explanation, it seems I am assigned the na04-useast2 shard. I am guessing their is some kind of sorting hat the assigns us to these individual houses :slight_smile:

1 Like

Getting frustrated, totally understood. But if it goes as far as making you angry, well, this isn’t going to be the last time something like this happens, and if it’s going as far as causing someone to get that angry, I would look at another platform that has more reliability / stability with a lot less bells and whistles and keep it very simple, but Home Automation across the board is going to come with this for years to come, especially with these low cost systems. Some less than others, but none the less, it will continue to happen, unless of course you are willing to shell out a ton more money for a system that can achieve close to 100% uptime. But anger, well home automation today, might not be for that person. JMO. :slight_smile:

3 Likes

They don’t have their own servers–they use Amazon Web services. That is divided into “shards” which you can think of as different server farms.

So the first thing that can be different is a different shard might be affected.

The second thing that has happened multiple times in the past is what are called “Hotspots” in the database where you just get temporary fairly random corruption of certain accounts. It doesn’t have anything to do with the configuration of that specific account, it has to do with where it falls in the database retrieval structure itself. Sometimes this is caused by traffic overload, but it can just as often be caused by a bunch of other random factors. This is what usually produces the “some members may…” Messages on the status page.

Google " Cassandra hotspots" for examples of this kind of database problem. It tends to happen in highly dynamic databases with tens of thousands of users and highly irregular usage patterns.

Third, it may be a problem based on a very specific configuration: which devices, which device type handlerS, which version of the mobile app, which specific hub model, whether do you have a Samsung ID or smart things ID, The specific firmware version on a specific device, All kinds of stuff.

So you put all three of those together, and it’s quite common for problems to only be affecting some customers.

2 Likes

Not sure what the order or precedence is on how we are assigned to a specific shard. Newer users in the States are typically assigned to the na02, na04, etc. Older users are assigned to graph.api… and users in the UK assigned to eu01-euwest1…

I’m on graph-na04-useast2 and it keeps going in and out.

It’s odd though. Some things appear “normal”. Such as my automations that people notice (turn on a light) is still working or it was last I noticed. However I’m noticing because I’m actively working on the TTS and audio stuff and it keeps going in and out and logs stop scrolling in my live log window.

Walk away :slight_smile:

1 Like

I’m on the na04 shard and not having any issues today.

Graph.API is not a shard. It’s just an old URL. All the real shards have the Amazon names, but those regions are Amazon’s regions, not SmartThings.

For regions where the privacy laws are different, then you have to be assigned to a shard which follows those laws. So UK users are on a different shard than US users.

But other than that, they just added shards as they needed them to handle more customers, so customers to signed up at about the same time will be on the same shard regardless of where they live.

The current shard accepting new North American accounts is NA04. But it doesn’t matter whether you live in New York or Los Angeles, if you signed up today you would be put on that shard.