SmartThings Freakout last night


#1

I have a couple of CoRE rules for my garage that failed in a rather frightening way last night.

First off I have a relay hooked up to my garage door opener that’s setup as a virtual momentary switch in ST.

Between 12:59 and 1:01 am the virtual switch appears to have toggled back and forth about 50 times. Just went outside for the first time today and noticed that my garage door was half way open and appears to have been like that for 13 hours. Wow.

On top of that I have a radiant ceiling mounted space heater in the garage that’s hooked up to a z-wave switch. The switch is tied to a CoRE rule that turns the heat on with the temp is less than or equal to 32F in the garage and turn it off again at greater than 32F.

Anyway, I just found that the heater had been on since 5am as well. It seems that the rule triggered properly and turned the heat on, but the off tgrigger was missed.

WTF? I’ve become more and more trusting of ST lately but this is horrifying and calls the whole platform into question. It seems like a platform that allows something like this to ever happen isn’t actually stable and in fact can never be considered stable.


#2

That’s scary! :scream:

You can check the logs if you want to investigate for any clue as to what happened. At first I thought maybe there was a power fluctuation, but that wouldn’t explain the problem with the heater.

I don’t know if there are any additional things you can check for core activity. The folks in the peer assistance thread should know.


(Alexander Ng) #3

I’m just taking a guess here but when I bought my house 16 yrs ago (2nd owner) it came with 2 Chamberlain garage door openers with a matching outside wireless keypad and 2 wired openers inside the garage. I just passed papers and was in between moving between old and new house and the garage was opened halfway even though I knew I closed it, this happened the next 3 days! It turned out to be a dying 9 volt battery in the wireless keypad. Once I replace the battery everything was fine. btw only one of the doors kept opening halfway as I recall.


(Jason "The Enabler" as deemed so by @Smart) #4

I have a question…

Do you have and use Minimotes?


#5

Hmm… this is a really interesting theory. If the RF opener was freaking out and opening the door, maybe my st night time rule was trying to close it.

Doesn’t explain the miss on the heat power off. That rule has worked for months. I was really surprised to find that on.

No minimotes.


(Bill Stefanelli) #6

I had a CoRE rule that was operating flawlessly for several months miss in a similar fashion (device turned on, but didn’t turn off). Nothing had changed related to the rule or devices there-in, but it had been 30 days or so since a hub reset. It doesn’t surprise me that after a long period of time the system needs housecleaning. Problem hasn’t returned and I made no changes other than the reset. I’ve concluded/accepted that ST isn’t the system to use if you are looking for a near fail-safe environment. That said, I love it and CoRE, and am satisfied with the semi-regular hiccups. I’ve added a monthly reset to my list of things to keep the system optimal - if only we could have CoRE do that!


(Dale C) #7

Doesn’t fix your root issue but just a thought to add another piston;

or if you use SHM, it is built-in under “Custom” monitoring to add time based check on anything opened after x minutes.


(Eric) #8

what relay is this? LFM-20 is anecdotally unreliable.

Temperature control over internet is unreliable. Sounds lucky this time - congratulations for a cheap lesson. Doesn’t your radiant heater have connections for a local thermostat?

If you don’t implement setpoints with differential , as you have described at 32F, then you will sometimes get short-cycling of the load, when your reported temperature value dances around the setpoint.


#9

I don’t understand why temp control over the internet needs to be unreliable. It seems to me that if that’s the case it’s smartthing’s design that’s unreliable. If you’re going around assuming your messages were delivered you’re not using IP correctly, it’s cheap to ACK a message. I’d more likely suspect zwave here and in fact, after this problem happened again I’ve put together a post mortem.

My working theory is this. I had the garage door setup with an ST multipurpose sensor to track the garage door position. I also had a CoRE rule that tells the opener to close the garage door if the sensor isn’t in the right side up orientation and night mode is on.

After seeing the problem one more time I’ve determined that the sensor wasn’t reporting correctly, or wasn’t reporting in a timely manner, and I wasn’t using it correctly. I had the rule setup to look at sensor position, but I should just be checking simple open or closed.

Since my opener relay is setup as a virtual momentary contact switch, it just saw a “button press”, when the sensor wasn’t in the correct orientation, so my rule kept sending button presses because the sensor state was never in the expected state, when closed it wasn’t yet reporting “right side up”-- button press, when open it wasn’t “right side up” so-- button press.

Having the door open lowered the garage temp past my threshold and turned the heater on, but the sustained flurry of momentary contact switch events flooded the zwave network and caused it to drop the heater off event? That’s my theory anyway.

I’ve rewritten the rule to use the correct open/close values and I’ve added a wait in there for things to settle before sending another event. It won’t prevent the same problem if the sensor is wrong again, but it’ll slow down the cycle and if it is a race condition, might solve it.

As for the heater, it is a simple switch and I don’t need super exact control, I just need the garage not to freeze. As far as the differential goes, I haven’t seen too much short cycling. Every 15 minutes or so at worst. I think the report interval from the temp sensor pretty much ensures a minimum runtime.