UPDATE: Recent SmartThings User Experience & Platform Performance

I think the real problem at the root is the timer.

  1. It blindly includes idle time waiting for responses. It should only count CPU time.
  2. It only takes one slow device or cloud latency to kill a process. So your precious morning routine? If one thing in it is poky, it’s summarily killed with no retry. If you have twenty devices in the routine, that’s twenty chances for the routine to fail. I believe our routines are killed when there is a storm in the cloud.

This is why I try to have every app only do one or two things only. Less chance of it failing. Also I try to have them go on events, so that if the first motion is missed, the second is not.

As awesome as Blink and RBoy app are, I suspect that the SHM and Blink combo has issues due to Blink response time.

Just my thoughts… My system is quite reliable, so I believe (right or wrong) that I’m doing something right. I can’t just be this lucky for this long.

2 Likes

My two cents on the problem:

As RM was being pulled out, I made changes to it to account for some “quiet time” requirement before an event triggers actions again. After implementing that, I started using modes in RM. Then I noticed different rules running even though they were meant to be exclusive of each other (namely Alarm Armed and Alarm Disarmed - not SHM based). Looking through the code, I found that the mode wasn’t properly filtered on and fixed that. Since then, Alarm Armed runs when AT&T DL arms, though it runs twice (why?), and Alarm Disarmed only runs when AT&T DL disarms. Again, twice, but hey… before I fixed it, it was triggering a cycle as soon as I added Mode to the conditions in RM. I only used triggers with RM, or maybe MOSTLY triggers.

The root of all evil is perhaps the time limit per SmartApp execution. That kills a lot of joy, like @bridaus mentioned, one cloud timeout or one device not responding brings down the whole chain. What I am doing to avoid this is pretty much adding lots of overhead for the sake of stability. I think, like @tgauchat said, instrumentation has no cost limits, and these costs come in cloud resources. Here’s what my plan is:

  1. event comes in, evaluate the whole rule, figure out what actions to do and whatnot. Each action then generates tasks to be executed. Store the tasks in a state list variable. When done with the evaluation and action/task scheduling, reschedule all time triggers based on their last execution. Again, into a state list. No physical action is taken at this time. Once the list of all “changes” is complete, use atomicState to read the tasks list and update the list using the list variable (I call it tasker). Add what’s new, replace what already exists, remove what’s no longer needed. Then immediately save the tasks back to the atomicState. Let me explain:

  2. the reason for the atomicState.tasks variable. Timeouts. I can handle tasks in that list one at a time and remove them atomically (one by one) as (and before) they are being physically executed at hub level. This means that if one task fails (say, turn on the light in the kitchen - device unresponsive) then I get a chance at finishing up all the other tasks in the list on my next iteration.

  3. the multi-threaded nature of ST is a huge benefit but also the programmer’s worst nightmare. Since I am “threading” everything into a list and we’re dealing with asynchronous execution of events, I have to make sure that not two instances of the same SmartApp run at the same time and execute the same tasks. This means that every time I head to process a task, I have to remove it from the list so that no one else executes it. This means atomicState. This means more resources (SQL requests in the backend). Overhead. Perhaps ST should consider executing a predefined procedure if and when time runs out, give the SmartApp a chance at “saving” some info and recover.

  4. rather than having multiple ST schedules setup to run, due to the nature of the UI (allows many triggers at many different times), I am running this incrementally. Every evaluation ends with the aforementioned processing of the tasks list. I then look for the next event in the future and setup an ST schedule for that time using runIn(seconds). I then figure out if there’s any immediate action that needs to be taken care of, as in tasks that are already due, including those that were just setup for immediate execution during the evaluation process. If I have any, I setup a second ST schedule using runIn(50s - time_already_spent_during_current_execution +/- random(10s)) to be as a safety net for any failed task. If this runIn ever gets to execute later, I simply resume the task execution process. I then proceed at executing the due tasks, removing them one by one from the tasks list as I progress. If and when I manage to empty the due task list (no tasks pending immediate action), then I remove the second ST schedule (the safety net). The first schedule will run whenever it needs to and gets updated as time passes.

Then I have a THIRD ST schedule to run at random times between midnight and 2am to update the sunset/sunrise and any time triggers that were not properly scheduled (figuring out when a trigger needs to run next is tricky, especially when using the “every minute” repeat mode and have filters on days, or months or years. Run every minute, but only on Sundays…) I give up trying after so many cycles, so I may come up with “not anytime soon”. This changes as time gets closer to Sunday. So I need to account for those. I sure hope this strategy makes it easier on ST (I should think that running less schedules may have a good impact) and that the atomicState is not going to kill it…

Enough complaining already, off to work.

6 Likes

Day three of using ST routines and smart lighting.

Guess what two piece of crap aspects of ST I will NOT be using after today!

Both were crap 6 months ago and they are crap more.

I am thoroughly disgusted.

1 Like

Just to clear things up, the “enough complaining” was self-addressed :slight_smile:

4 Likes

So down to 98.5%?. :joy:

1 Like

Yeah… 98.5% failure! ! !

1 Like

I’m here:

╔═══════ Done in 262ms
║╔══════ Done in 254ms
║║░░░░░░ Removing ST safety net
║║░░░░░░ Scheduling ST to run in 59s, at Thu, May 12 2016 @ 11:58 AM EDT
║║░░░░░░ Rescheduling time triggers
║║╔═════ Event processing took 110ms
║║║░░░░░ Primary IF block evaluation result is true
║║║░░░░░ ♠ Function eval_cond_is_inside_range for Front Door Wasp has temperature [74.3] is inside range 72 - 85 returned true
║║║░░░░░ Event eligibility for the primary IF block is 2 - ELIGIBLE (triggers required, event is a trigger)
║║╚═════ Processing event time with id time and value null, generated on Thu May 12 15:57:40 UTC 2016, about 1028ms ago
║║░░░░░░ Broadcasting time event for primary IF block, condition #4
║║░░░░░░ Installing ST safety net
║╚══════ Processing tasks
╚═══════ Received a time event

Read between the lines: Timing Belt complete, onto actions. @bamarayne rejoice. Beer: banned.

1 Like

Bro, my system disintegrating! RM is rolling over and sinking! ST… Well, that carcass will Kerri me warm when on Hoth.

1 Like

I have a very similar problem with my routines but it’s with my schlage locks. When my wife and I leave home (each with a presence sensor) I have the routine (either Goodbye or I’m back) change the mode (either Home or Away). In the same routine, I also have 4 Schlage locks lock or unlock as well as turn on/off lights depending if we’re coming or going. It’s very rare that all the locks lock/unlock from the routine. And it’s usually 2 out of the 4 doors with this issue. After more than a year dealing with support, I have switched out my locks, bought repeaters, split up my dimmers and GE Link bulbs with a different automation than the regular wall switches, and got a wireless booster. Still not working. What do you suggest I do here? I can switch all of the lights to smart lighting and keep the mode change with the routine but is there another place where I can lock/unlock the doors based on presence? Any help would be appreciated.

I have a very similar problem with my routines but it’s with my schlage locks. When my wife and I leave home (each with a presence sensor) I have the routine (either Goodbye or I’m back) change the mode (either Home or Away). In the same routine, I also have 4 Schlage locks lock or unlock as well as turn on/off lights depending if we’re coming or going. It’s very rare that all the locks lock/unlock from the routine. And it’s usually 2 out of the 4 doors with this issue. After more than a year dealing with support, I have switched out my locks, bought repeaters, split up my dimmers and GE Link bulbs with a different automation than the regular wall switches, and got a wireless booster. Still not working. What do you suggest I do here? I can switch all of the lights to smart lighting and keep the mode change with the routine but is there another place where I can lock/unlock the doors based on presence? Any help would be appreciated.

Use a single Rule or instance of an app to unlock each lock based on an event.

So four locks equals four apps.

1 Like

Unless it’s the ST routines and smart lighting… Then I recommend just buying a key.

4 Likes

I have had only a couple of problems with routines and not once with SL.

1 Like

Do you know if there’s another app to use to lock/unlock other than routines?

The discontinued Rule Machine, the upcoming CoRe, and some others will know of some other apps.

Maybe SHM but I don’t know if I would trust it.

Four routines might work too, yes?

If support determined that is a problem with your lock (you said you needed to add repeaters) than is nothing you can do. No app will compensate for that. My locks skip a beat now and then, is just something you need to live with, unfortunately.

Does this sound familiar, it’s still happening with mine…

With troublesome devices a long time ago in other systems, I’d program the system to send the command twice on a delay. Doubled the chances, helped a little.

Although the best thing to do is fix the root cause.

1 Like

I have been having fairly good luck over the last few weeks with SHM. I am almost ready to try adding my siren back into the mix. Glad to see improvement with SHM arm/disarm issues. One thing I have definitely noticed though is a continuing latency issue with notifications and even routines for that matter. A routine that should fire at 11:00 doesn’t send a notification until 11:01. Door open/close notifications sometimes take several seconds to notify. Yesterday, I was almost to my car when I finally got the Door closed notification. Also seeing a delay when turning things on and off manually. When I first got my ST setup it was blazing fast, which is why I can safely say that things have slowed down a lot. I wish I didn’t have to sacrifice performance for reliability, but considering how things were a couple months ago, I will gladly take it…

2 Likes

I’ve had SHM running with my sirens for a few weeks now, not one unexpected peep out of it - and I did test it a couple times to make sure it was actually working.

2 Likes

So today I can’t log in to the ST iOS app. I am getting this message.