Events not processing or extensively delayed (multiple rules engines)


( Cosmo) #1

I’ve heard numerous people (Facebook, WC forum, in personal msgs, and here on ST community) using stringify, webcore and smart lighting having problems in the past 24-36hrs with things not happening properly, and hubs going offline. either not at all, partially, and/or extended delays in processing, or again, fully offline hubs

You haven’t pushed new firmware over the weekend right? Kinda sounds there’s a ST cloud issue going on…
@slagle @vlad @Jim


Slow or non-responsive things
SmartThings Status says operational, but something is slow
Text Notification Issues
Routines not running
Reducing SmartApp latency
Piston triggered without event
(Alex) #2

@cozdabuch - In my case it appears as everything is suffering much longer delays if they depend on the cloud whether they involve webcore or not. I was chalking it up to networking issues as other devices on my network (2 raspberry pi) were super slow too (anything I do over SSH takes ages) but if you are having similar issues then it looks like I am seeing the same thing (and the issues with the RPIs are unrelated… I do run beta sw on them!).


(Jimmy) #3

i’ve noticed cloud based rules processing slow lately, too


#4

This person reported a delay of several hours:


#5

If you have this issue, definitely report it to support.

https://support.smartthings.com/hc/en-us


Routines not running
(Glen King) #7

I was wondering about things over the past couple days.
I notice ST went down sometime yesterday, and then came back up.

I’ll have to do a little inspection when I get home from work later.


(Jimmy) #8

Support suggested I reboot the hub, so I did. Too early to tell if it will help.


(vlad) #9

There haven’t been any firmware updates lately and as far as known issues go the only one I am aware of is a problem with the Arlo integration where there has been a significant chance of their servers timing out, which drastically increases routine run time and may trigger the sandbox to be killed because it went over the max allowed execution time. This can result in partially completed routine/app executions. We have updated the Arlo integration to use async processing to mitigate the effect this has on other devices and this is currently going through verification. Apart from that we are not aware of any performance issues as you describe. Yesterdays incident was a very limited issue - functionally the only thing that was affected was saving and publishing custom smart apps. I’d suggest going through support with as much information as possible.

I’m not support but if someone has a specific instance of recent “slowness” with the platform, feel free to shoot me a DM (no promises in being able to help everyone who DMs but will try to get as many as i can) and I can take a look at logs. Would be really helpful if the following could be provided:

  1. username
  2. description of issue & expected results
  3. name of app/devices involved
    4. Times, specifically when the app was supposed to fire and when it actually did or how long it normally takes to execute and how long it took in the slow instance

(Brent ) #10

Same here been having problems the last couple of days, I have a webcore piston that turns lifx and hue bulbs back to white from red when SHM is disarmed. The pistons have been extremely delayed and the lights are not shutting off reliably. 2 to 3 colors bulbs will not shut off and it is random as to which will not shut off. Sometimes lifx, sometimes hue bulbs, and sometimes both.


( Cosmo) #11

@bscuderi13 input?


(vlad) #12

I do see some timeouts in your webcore piston but do not know enough about webCoRE to help troubleshoot as it looks like between locks being held by the piston & outbound requests to the LIFX API, the piston is being killed mid execution because a method invocation is taking more than 20 seconds. (The hard part is figuring out exactly where the slowness is coming from, could be the platform, database queries, outbound http requests such as lifx, webCoRE itself, etc…). Maybe if @ady624 is up for it we can try and figure out what is going on (and see if this more of a widespread issue). If you are up for it… could you start a group message with @ady624 and myself (if you feel comfortable sharing information on this piston with myself and @ady624)


( I hate Mondays) #13

I am game :slight_smile:


(vlad) #14

FYI, we noticed a few database nodes that are exhibiting higher than normal latency at certain times throughout the day. This could potentially effect a routine execution, especially if the child devices are accessing device data. We are going to run some maintenence on these nodes to see if we can get them back to a healthier state. This specific latency issue is occurring only in NA02. The latency spikes are periodic so will see if the spikes continue after maintenenceand provide an update. I haven’t received any DMs so far, so just looking at overall health of various clusters atm.