Smart apps not working tonight?

Hey guys, we’re really sorry for the inconvenience. As you realized we ran into some recent trouble with actions and alerts not working as expected. We’ve isolated the issue and corrected it. We’ve updated our status page accordingly and things should be back to normal: http://status.smartthings.com/

If you are still having trouble though, please feel free and shoot an email to support@smartthings.com

Thanks again for working with us!

Reliance on the cloud is not an issue at all. There are tons of cloud services we use every day that work very reliably. The whole point of a cloud infrastructure is redundancy. The Amazon EC is highly available and reliable and outages happen once in a blue moon.

SmartThings issues and outages, on the other hand, happen much more frequently. This is an implementation problem rather than one inherent to the cloud infrastructure. It’s evident, however, that they have been making a continued effort to improve the reliability of their backend. Things used to be much worse only a few months ago, and are getting better and better.

As for status updates regarding the ST backend: Please, subscribe to updates at status.smartthings.com.

Forums are far from ideal medium for reporting and resolving bug in the system. If ST is serious about living up to their aspiration of becoming an “Open Platform”, they should open up a real bug tracking system to developers.

Would this outage have caused my Siren to go off last night for no reason…?

Wife was not happy with me…2-weeks in to starting Smartthings.

Kudos to Ryan and the ST team. I reported the issue at 11:23 AM and they had it fixed and a reply to my support request by 11:59 AM.

I have almost always received immediate support using support@smartthings.com

I use the forums when I want to post or ask to you all (the community of users) something.

Now this is my opinion, but knowing the dudes and dude-ttes at ST, they’d be open hearing what the community has to say.

IMHO,
Twack

It is an issue for me. I don’t need a security system that’s easily circumvented by simply unplugging my broadband cable. Nor do I need a flood alert system that’s down just for a couple of hours at the wrong moment.

I have a few HA devices that I made myself - all of them are capable of working completely autonomously. Because outages happen. Even if you achieve five nines in your datacenter, it’s of no use to me because of all those wires between your datacenter and my house. Realistically, if you’re sufficiently paranoid, have a generator, an array of solar panels on the roof and a stack of batteries (all of which I happen to have, but for other reasons), you’ll still be lucky if you manage even two nines. And two nines is four days of down time - a lot can happen in four days.

I’m glad it was fixed within an hour of your report, but the fact remains that the issue was reported here almost a full day previous to that. I think this is what needs to be addressed.

Whether it is more error handlers in the app that automatically report broken throughput, or something more than what currently exists on the server side, something should be in place to monitor and report this kind of failure.

I am hopeful the causes of various failures, as they occur, are added to autonomous routines that are designed to monitor, report, and even fix future occurrences if the possibility exists that they could be reinstantiated.

The data connection issue with your security system is a different one, though: You’ll have the same problem with an old-school monitored security system. If someone cuts your phone line, an alarm will never go through to the monitoring company. Unless, of course, your security system has a cellular network failover…

The same applies to SmartThings: Connect your hub to a UPS and a router with integrated 4G failover to account for power and data connection loss. These days, both are available to the SOHO market.

And you are right: Even with 99% availability, there will be an average of 90 hours of outages every year. If that happens while your basement is flooding, then you’re out of luck. I’d argue, however, that for a consumer home automation system that would be acceptable for a great majority of users. For example, knowing that I would be properly alarmed 99% of the times my basement were flooding, would leave me satisfied.

The problem is that SmartThings currently does not mirror the reliability of the available cloud infrastructure. There is some sort of (albeit usually short-lived) backend issue roughly once a month, and things like a siren randomly going off is not an availability issue - That’s just annoying. Extending compatibility with other devices is great and all, but focusing on the core competencies is really mission critical if they want to become the de facto standard in modern home automation. Reliability is paramount, but blaming cloud infrastructure is just being lazy, when we really should be breathing down SmartThings’ neck to improve their backend reliability no matter the implementation details.

Again, in their defense, they have been promising a continued effort, and improvements have already been made!

On a separate note, you should definitely share some details about your generator / solar setup :slight_smile: That sounds really awesome.

You will usually get a much quicker response to an email to support@smartthings.com, as compared to reporting it on the forum. Not saying that’s a good thing, but that’s just the way it is right now.

Scott,

I can personally tell you that the ST folks think exactly like you do. While perfection is generally unobtainable, it is still the desired goal.

I suggest that if you need 99.99999%, you calculate the cost and hardware requirements to install the needed system redundancy. I lived in Somerset, so I know how much you lose power in Pollock (ouch). I dabble in Quality and Risk Management, and this is probably not just “my” opinion, but with ST I think you are getting one hell of a value.

Keep up the positive critique and your voice will be heard and enjoyed by all. Now back to our regularly scheduled program. :slight_smile: Let’s get a :beers: someday

Twack

1 Like

These measures will not help against ST system outages, and this is what makes ST a poor choice as a security system. It’s excellent at what it does well, namely casual home automation and monitoring, but trying to market it as a security system is misleading to put it mildly. A real security system should be UL-certified. Everything else is pretty much just a toy.

I don’t think we disagree. :slight_smile: You’re saying that ST reliability does not match the reliability of the cloud yet. I agree. I also agree that we should cut the guys some slack. They’ve been growing fast. But they are hiring people, improving processes, learning to communicate proactively, and so on and so forth. I’m sure they’ll get there pretty soon.

What I’m saying is that they should ALSO think about moving some of the mission critical functionality to the hub. There’s no reason not to.

My generator/solar setup is pretty vanilla. The solar is installed and wired into the system by one of those countless companies doing it nowadays. The generator is stripped of all the liquids, sealed and stored in a shack. I am not a prepper, I do not believe in the doomsday scenario. :slight_smile: But I live in California where there’s plenty of sun and earthquakes are frequent. I also drive an electric car, so solar just makes economic sense.

1 Like

@geko

I hear your concern about security, but I see it differently. I like SmartThings better than our UL system that nobody liked turning on AND off. Which is worse, something on all the time that works 99% of the time or something that works 99.9999% but rarely turned on?

I see your point. Arming/disarming security system may be tedious. The best analogy I can come up with is flossing - we do it not because we enjoy it, but because we realize long-term benefits of doing it. :smile:

My understanding of some conversations with Ben from ST is that the hub as it stands now is exceptionally limited in capacity - so the reality of having the hub in it’s current form process events locally is probably not realistic.

(( As a point of interest Ninja have been trying to move from a cloud only to distributed / locally processed platform for some time now. It has caused significant delays to their new flagship product “NinjaSphere” - and is many many months overdue on their existing NinjaBlocks devices. ))

I agree that ST are getting better at communicating, they were asked to contribute to this particular thread - and they did. I also think that they are learning how best to monitor their platform, and that presents the best value for us as users and developers.

Once appropriate monitoring is in place, and the platform is able to be geographically distributed, outages like this should become infrequent. Adopting methodologies that other large cloud providers do of staggering changes across geographic centres will also help.

I have been particularly hard on ST, I don’t apologise for it - I made a significant investment and was not seeing the value in it. In return for our expectations on ST I think we equally need to provide the support and feedback to them that they need to grow effectively.

There is tremendous value in a semi-public bug tracking system, in addition to the developer support that I discussed in my initial reply. Another thing that I would potentially like to see is some sort of certification process - a way to display a level of understanding of the platform; devices and apps. Once validated - there should be some way to not just report bugs - but to report potential platform failures like the one identified here. These reports should be routed differently to receive more immediate attention.

While discussions are had and decisions are made about everything that has been raised in this thread (and others) - I think the simplest short term solution would be to add another category called “Platform” (or something similar) to allow us to post suspected platform issues - with the assumption somebody from ST will always be subscribed.

2 Likes

As a Director of IT, I’m well versed in uptime. For those who want to see a chart on it, click here - LINK Remember, there are 10,080 minutes in a week (for perspective).

I work in a 22 hour shop (we are not quite 24 hour, but we are pretty damn close with locations across the country and multiple shifts). I generally shoot for 3-4 “nines” and I think that is a reasonable expectation for most cloud providers. Several posts above have quote six nines. Unless you work for NASA and need to support things that CAN NEVER go offline, that is fairly unreasonable as that is approximately 31.5 seconds of downtime A YEAR!

So, 3-4 nines is a reasonable expectation. I would like to see SmartThings track their up and downtime better. That would help them AND us. We, as a community, should also try to be as helpful as possible. When submitting outage reports, we need to be very specific, detail exactly what is wrong, try to reproduce, give screenshots, etc.

I realize that ST is in growth mode, so I’m currently expecting between 99 and 99.5% uptime for the most part. Remember, there is good chance that downtime might occur when you are asleep! :smile:

1 Like