Fix the stupid smart home monitoring already. No support whatsoever


(Larry) #1

I have at least 4 tickets open with no response. My shm is unusable. I have removed and rebuilt security multiple times. This morning again good morning ran and I even went in the app and checked and it said disarmed and home mode. Then a half hour later i get an intrusion and when i check the system is magically armed again with no notification about how it was armed.


(Matt) #2

http://status.smartthings.com

they are still working on it. do a search for Zombie rules and you will find some good threads


(Larry) #3

I don’t have any zombie rules… shm just doesn’t work… and it is not a overload database issue as mentioned… it should not say disarmed and half hour later suddenly armed again and giving off alerts with no notification showing how it armed… ie no rule fired… basically the app is just out of sync with the back end or it is arming itself… and then the normal cannot disarm suddenly where it just spins and spins… where it worked previously… I delete and rebuild it and it works for a day then dies again.


(ActionTiles.com co-founder Terry @ActionTiles; GitHub: @cosmicpuppy) #4

Here’s a really approximate description of what is happening…

SmartThings uses a distributed database (“Cassandra”, I believe) which operates in a way called “eventual consistency”. This lets the SmartThings cloud be distributed over dozens, hundreds, or thousands of servers.

When the database is performing properly, updates are distributed across hundreds(?) of servers in probably less than a second or two.

When the database is overloaded, updates (such as to turn the alarm on or off) take… minutes? hours? … and an update that hits one server might overwrite an update written to another server. Thus you turn the alarm off (“Stay”) and it is updated on one of the servers, but before the update is distributed across the other servers, an intrusion or safety Event might happen and when SmartThings checks if your alarm is “Away” or “Stay”, then it will find that the alarm is still … “Away” if it happens to hit the wrong server. A few minutes later, the “Stay” eventually hits that server, but it is too late…

“Eventual Consistency” is a very risky data model when performance is bogged down.


###Possible Fixes?
An individual SmartThings Customer doesn’t need the power of 1000 servers, but giving 1 server per Customer isn’t cost effective either.

Again, approximately, SmartThings can divide all the Customers across “shards” (slices) of the cloud, so you and I might be on different shards, each with 500 servers. Now a performance issue on one shard doesn’t affect users on the other. Currently SmartThings only has two shards: US and UK. They are working on adding more.

Also, there is some performance gains from adding more servers to each shard, but it isn’t linear. So a 2000 server shard isn’t twice as fast as a 1000 server shard … but still, generally, better… but the performance curve flattens out. More servers means that more replication / data distribution needs to take place. So you get improvement in processing, but more longer periods of inconsistent data.

NB: 50% or more of what I said above could be technically wrong or poorly written. I wish SmartThings would explain their Cloud architecture in more detail so that we don’t have to research general theory about Cassandra, etc., and so that they can give specific explanations. This info, though, can be useful to competitors building their own Smart Home Clouds.


(Larry) #5

ok helpful. but a half hour later. really that is the delay they are experiencing… if so we are so #$#$ed… nothing in terms of optimiztions can fix a delay of a half hour.


(ActionTiles.com co-founder Terry @ActionTiles; GitHub: @cosmicpuppy) #6

I edited my post; so not sure what portion you read. Check again to be sure…

The point: There is an optimal balance between the number of servers and the way the Customers and data is distributed among them. But the platform can be thrown off balance if there is some sudden internal or external factor messing up the expected load. Lots of new customers… sure, but unlikely all of a sudden, though “hockey-stick” growth curves do happen. Database corruption and self-healing load? Internet backbone or Amazon cloud issues? A bug introduced by a poorly analyzed and insufficiently tested change? … lots and lots of variables and both reasonable and unreasonable human error.

In the cases above, sometime just rebalancing can help, in other cases, throwing a lot of servers at the problem can help … but usually you need to do both.

I think SmartThings is definitely planning on rushing out new shards, but … need to balance at the same time. And move Customers and data from one shard to the other, etc., etc…


It’s very, very complex. Not impossible, and not unprecedented (though IoT events are much more frequent than even a busy online store!). And the complexity goes way up if any mistakes are made or factors are missed.


(Larry) #7

the fact that it is going on now 5 days or more with basically the same issue/problem implies that have no clue and are just floundering and trying anything… but that is just my take /opinion on it… anyway i switched to smart alarm as i am not using scout or the streaming video anyway… Now I have another problem I wanted to modify smart alarm to turn a hue light colored when there is an alert. but cannot get the source from the “create new smartapp” by template as the list is not wired to the correct web/result.


(ActionTiles.com co-founder Terry @ActionTiles; GitHub: @cosmicpuppy) #8

Search the SmartThings Community Public GitHub repository instead. You can view “Raw” and copy / paste.

Not sure it is more up to date … but if it is open source code for the Marketplace, this is the right repository for it.


Ide create new smartapp broken
(Larry) #9

thanks…

finished my modified version

ie