Another Outage Jan. 7, 2021

@garrett.kranz,
This is the solution for Device history 16 hours delay?
Yes or not?

1 Like

@Mariano_Colmenarejo I’m afraid not, this is regarding a brief incident that occurred this morning (1/7/2021) and is not related to the issues with Event History for certain users on the EU shard, which is being worked on with a high priority.

2 Likes

@garrett.kranz,
Thanks for the info

1 Like

My wife tells me “Why don’t you get a lover like all other husbands? you’re happy at least” :grinning:

3 Likes

@garrett.kranz if there is still an issue with EU event history, then why does the status page not reflect that? You can post region specific statuses, no? As others said, the event history problem has carried on for days now

4 Likes

@prjct92eh2 FWIW the EU Event History troubles are already recovering and should be resolved soon as all the event queues catch up. If this does not happen as we expect it to, we are prepared to create a Statuspage post about it.

As this did not impact everyone (even everyone on the EU shard), the App still loaded, devices could still be controlled, Automations and Device Types still executed, and Hubs were not offline, these factors are what is considered an incident on our end.

Whether or not custom SmartApps can fetch data or Event History is 100% current are not in the list of items, but the argument could be made for them to be. I don’t want to be argumentative about it and sympathize with anyone having troubles in EU. We have been actively working on it since it was reported via the Community.

I hope this makes sense, and thank you for the feedback. Myself and my Team are actively working on improving publicly-facing Incident communication and this feedback helps do that.

6 Likes

Thank you for response! It’s good to know what the criteria is for a status to be posted.

3 Likes

Access token push notifications were out since yesterday. I received it from the Classic app just after I activated something from Google, and I couldn’t be bothered to click on it since the classic app and even history has been bunkered. So being fast… Let’s say, you are a day late…

In the case of the event history it is quite easy to make the argument as it is sometimes the way we actually know that devices are being controlled and automations and device types have executed, or more to the point that they aren’t and that you may have an incident.

It could be argued that knowing whether event history is functioning correctly is critical information, and is actually more important than the event history itself.

On a sort of related note, for me just about the most disruptive change made to ST of late has been that the battery level has vanished from the device event history without any apparent comment.

4 Likes

Just noting this is not working AGAIN. App still works, but no Alexa. Not testing Google because I am too irritated to try.

Not entirely sure what you mean by this.

But, again, I’m not downplaying the importance of Event History. Just that the nature of the Event History troubles in EU did not meet any sort of general incident reporting threshold for us. We’ve since updated status.smartthings.com to reflect the situation, and will obviously have this scenario in mind going forward. Feel free to PM me if you have any other thoughts or concerns.

2 Likes

@Nameless everything I’m looking at for Amazon/Google with relation to SmartThings looks normal. If you’re still experiencing troubles and want to PM me, I’d be happy to have a look.

1 Like

PM sent. Confirmed this is tied to Alexa. Google is working.

Just wanted to follow-up and say thanks to @garrett.kranz for some suggestions to get my system back online. Not sure what caused it, but Alexa kept telling me “the server is unresponsive”. Disabling the skill from Alexa and enabling it again brought everything back and also kept my routines and groups intact.

5 Likes

Event history is critical for automation debugging.
for example. I have a situation where a relais is being switched off after les then 30 minutes of inactivity and when the motion sensor has a gone through a sequence (motion, no-motion, motion) while the condition in the automation is switch off after 30 minutes of no-motion. It’s kind of hard to deal with this if you have to wait 9-14 hours on the exact time-stamps of the trigger events.

Generally speaking, I find event history to be largely informative and/or diagnostic. Yes, it can be used by apps like webCoRE for more exotic conditions but is that really what it is designed for?

If I look at the event history there are two broad possibilities:

  1. Event history is working as intended and so what I am seeing, or not seeing, is meaningful.
  2. Event history is not working as intended so I can’t rely on what I am seeing, or not seeing, so I’ll have to come back another time.

In terms of the implications:

  1. is obviously what we are all aiming for.
  2. is an inconvenience.

The problem can be knowing if it is 1) or 2). Not knowing can mean me wasting my time, but a status message could mean tea and a biscuit.

@garrett.kranz, The lag of the history is decreasing, it has gone from 16 hours to 1 hours at this time.
in case you need any feedback to confirm the progress of the actions taken

1 Like

Yes lag is at 1 hour now, but do you see missing chunk of history between 2 ago, and 6 hours ago? 4 hours missing.

@Alwas, I have everything history until 5 minutos ago

History got worse again, 1 hour delay