5th instance of "Device Control Issues" in the last 30 days.... what's going on!?

rob_gore · March 12, 2015, 8:55pm

Just recived the 5th email of “Device Control Issues” in the last 30 days. Can anyone enlighten us on what’s going on with the system, and its lack of stability?

The last 2 days, my garage door has opened itself at 3AM stating that i was “Home” when I’d been home for the prior 6 hours. This was not a delay as well, but a second repeated Firing of the same thing.

Looking for some transparency as to what’s truely going on and what’s being done to re-establish long term stability here with the system.

Thank you,

-Rob

bravenel · March 12, 2015, 9:19pm

It’s really a daily problem, with some days worse than others. My whole system is dead at the moment. ST just can’t get the thing to stay functioning. Very disappointing.

Keo · March 12, 2015, 9:47pm

I know… It’s very disappointing. I’ve had problems every day this week it seems.

April · March 12, 2015, 9:53pm

Hey Rob,

I suppose this is one of the things of being transparent with our users. Every time we migrate servers to another, when a server is detecting some abnormalities, we have been updating the status pages.

This however, does not affect the issue as to why your garage door has opened itself. I suggest contacting support@smartthings.com for that issue, so they can identify what’s going on with that.

So, we’re getting in a habit of notifying, that whenever we see something weird, like detecting when a server is starting to fail, we deploy everything over to new server, and we trash the old one = establishing long term stability with the system. You ~might~ feel a hiccup, but you also may not. Again, our way to be transparent to you. Soon, as we continue to grow, we’ll be able to anticipate these instances even better, to the point where we’ll minimize the amount of hiccups that may occur. When that time comes, you’ll get less of these notifications.

Cheers,
April

Todd_Whitehead · March 12, 2015, 10:07pm

I set my garage door rules to only work between 8:30 am and 9:30 pm. ST presence sensors are just too unreliable.

btk · March 12, 2015, 10:35pm

If some of the dozen or so notifications over the last month were for planned things like server migrations, PLEASE put that in the notification e-mail. It would keep us from thinking “Great, here we go again” every time we see a message come in.

Better yet, for planned migrations, send the notification out ahead of time!

rob_gore · March 12, 2015, 10:48pm

Agreed, please clarify what you’re doing with these things so we know if it’s more of a planned failover or a real issue.

Regarding my garage door situation, I’ve adjusted the times for it and will see if I get any secondary firings this week.

Transparency is awesome, as long as we know what’s going on so we can judge the nature of incidents.

Thanks April!

-Rob

April · March 12, 2015, 11:08pm

Certainly, I’ve provided the feedback back to the teams, and can see how we can do that moving forward. Things happen, and when we see it starting to show abnormalities, we want to nip it in the bud for people who are affected, so we would then deploy new servers.

certainly for planned migrations though, we would send notifications ahead of time.

pmusselman · March 12, 2015, 11:46pm

@April Please do not take this the wrong way. But shouldn’t have server planning been a top priority from day one? And with that in thought you would think that server planning could be simplified by determining the number of hubs produced by average users devices? I sure there is more to the equation but I think I have made my point. Also, adding new equipment in a pinch is a bad idea unless it is carefully planned. Just saying.

From the first day I found the ST experience to be excellent. After the third week the system stability had become more of a furstation than an enjoyment. My wife is starting to ask about using some other type of controller and I have to say that I am starting to feel the same way. Everyones problems are different and more urgent, like problems with garage doors openning at 3am. For this very reason ST should not be trusted as a home security system. I certainly hope that upper management sees the issues this has causes and really puts more thought into critical business systems. Every controller will have its issues but this many is really out of control.

April · March 13, 2015, 12:04am

We practice something called “immutable infrastructure”, where we completely destroy servers when we need to update them, replacing them rather than modifying them. It sounds crazy, but it’s actually better for stability in that it’s easier for us to keep the servers running exactly the same.

JDRoberts · March 13, 2015, 12:16am

For those not into the nitty gritty details of cloud architecture, the “servers” that are destroyed and replaced in an immutable infrastructure protocol are not the physical computers in a server farm, but rather the library services, etc. and, of course, the cloud server or virtual server or both.

Another case of one term, four meanings. Welcome to IT.

bravenel · March 13, 2015, 12:32am

Not much evidence of stability out here in the real world. Flaky system, pretty much every day.

JDRoberts · March 13, 2015, 1:35am

Immutable infrastructure can’t improve Quality of Service if the previous version was flawed. (Otherwise known as “growing pains.”)

Version X : add a bulb doesn’t work, sunset processing floods and fails

Version X+1 : add a bulb works, remove a bridge is broken, sunset processing doesn’t flood, push notifications cause unpredictable results.

There’s no “golden version” to go back to, so immutable infrastructure can’t help. Or rather the golden version you have available was pyrite anyway.

darrylb · March 13, 2015, 12:46pm

@JDRoberts : Hah, I was just thinking of “golden images” (your “Golden Version”). My company has its own internal cloud, and we build hundreds of servers off the “golden image” at boot. We can take these systems offline and spin new ones up dynamically. We mark a server to go offline, and as users disconnect, it does not allow new connections—this way, once it hits 0 active users, it shuts down, restarts, or has a new image put into place.

The golden images are built through lots of life-cycle testing, and often has a beta group and end user acceptance before even a dll or patch is installed.

5th instance of "Device Control Issues" in the last 30 days.... what's going on!?

Customers

Developers

Download the SmartThings App