5th instance of "Device Control Issues" in the last 30 days.... what's going on!?


(Rob) #1

Just recived the 5th email of “Device Control Issues” in the last 30 days. Can anyone enlighten us on what’s going on with the system, and its lack of stability?

The last 2 days, my garage door has opened itself at 3AM stating that i was “Home” when I’d been home for the prior 6 hours. This was not a delay as well, but a second repeated Firing of the same thing.

Looking for some transparency as to what’s truely going on and what’s being done to re-establish long term stability here with the system.

Thank you,

-Rob


(Bruce) #2

It’s really a daily problem, with some days worse than others. My whole system is dead at the moment. ST just can’t get the thing to stay functioning. Very disappointing.


(Joe) #3

I know… It’s very disappointing. I’ve had problems every day this week it seems.


(April Wong) #4

Hey Rob,

I suppose this is one of the things of being transparent with our users. Every time we migrate servers to another, when a server is detecting some abnormalities, we have been updating the status pages.

This however, does not affect the issue as to why your garage door has opened itself. I suggest contacting support@smartthings.com for that issue, so they can identify what’s going on with that.

So, we’re getting in a habit of notifying, that whenever we see something weird, like detecting when a server is starting to fail, we deploy everything over to new server, and we trash the old one = establishing long term stability with the system. You ~might~ feel a hiccup, but you also may not. Again, our way to be transparent to you. Soon, as we continue to grow, we’ll be able to anticipate these instances even better, to the point where we’ll minimize the amount of hiccups that may occur. When that time comes, you’ll get less of these notifications.

Cheers,
April


(Todd Whitehead) #5

I set my garage door rules to only work between 8:30 am and 9:30 pm. ST presence sensors are just too unreliable.


(The fish is still dead.) #6

If some of the dozen or so notifications over the last month were for planned things like server migrations, PLEASE put that in the notification e-mail. It would keep us from thinking “Great, here we go again” every time we see a message come in.

Better yet, for planned migrations, send the notification out ahead of time!


(Rob) #7

Agreed, please clarify what you’re doing with these things so we know if it’s more of a planned failover or a real issue.

Regarding my garage door situation, I’ve adjusted the times for it and will see if I get any secondary firings this week.

Transparency is awesome, as long as we know what’s going on so we can judge the nature of incidents.

Thanks April!

-Rob


(April Wong) #8

Certainly, I’ve provided the feedback back to the teams, and can see how we can do that moving forward. Things happen, and when we see it starting to show abnormalities, we want to nip it in the bud for people who are affected, so we would then deploy new servers.

:slight_smile: certainly for planned migrations though, we would send notifications ahead of time.


(Patrick Musselman) #9

@April Please do not take this the wrong way. But shouldn’t have server planning been a top priority from day one? And with that in thought you would think that server planning could be simplified by determining the number of hubs produced by average users devices? I sure there is more to the equation but I think I have made my point. Also, adding new equipment in a pinch is a bad idea unless it is carefully planned. Just saying.

From the first day I found the ST experience to be excellent. After the third week the system stability had become more of a furstation than an enjoyment. My wife is starting to ask about using some other type of controller and I have to say that I am starting to feel the same way. Everyones problems are different and more urgent, like problems with garage doors openning at 3am. For this very reason ST should not be trusted as a home security system. I certainly hope that upper management sees the issues this has causes and really puts more thought into critical business systems. Every controller will have its issues but this many is really out of control.


(April Wong) #10

We practice something called “immutable infrastructure”, where we completely destroy servers when we need to update them, replacing them rather than modifying them. It sounds crazy, but it’s actually better for stability in that it’s easier for us to keep the servers running exactly the same.


#11

For those not into the nitty gritty details of cloud architecture, the “servers” that are destroyed and replaced in an immutable infrastructure protocol are not the physical computers in a server farm, but rather the library services, etc. and, of course, the cloud server or virtual server or both.

Another case of one term, four meanings. Welcome to IT. :wink:


(Bruce) #12

Not much evidence of stability out here in the real world. Flaky system, pretty much every day.


#13

Immutable infrastructure can’t improve Quality of Service if the previous version was flawed. (Otherwise known as “growing pains.”)

Version X : add a bulb doesn’t work, sunset processing floods and fails

Version X+1 : add a bulb works, remove a bridge is broken, sunset processing doesn’t flood, push notifications cause unpredictable results.

There’s no “golden version” to go back to, so immutable infrastructure can’t help. Or rather the golden version you have available was pyrite anyway. :cold_sweat:


(Darryl) #14

@JDRoberts : Hah, I was just thinking of “golden images” (your “Golden Version”). My company has its own internal cloud, and we build hundreds of servers off the “golden image” at boot. We can take these systems offline and spin new ones up dynamically. We mark a server to go offline, and as users disconnect, it does not allow new connections—this way, once it hits 0 active users, it shuts down, restarts, or has a new image put into place.

The golden images are built through lots of life-cycle testing, and often has a beta group and end user acceptance before even a dll or patch is installed.