Another Outage Jan. 7, 2021

The wife approval factor has reached an all time low. I’ve received an ultimatum, make it all work or take it all out. I told her she now has full permission to use the actual switches on the walls or lamps when it doesn’t work.
Gasp!

3 Likes

I don’t have a particular problem with the use of ‘Something is not quite right’ in the description as the more formal description is also available. However users on the EU shard (not me, I am on NA shards even though I am in the UK) have reported issues with event history for about four days now, and they have been acknowledged as an identified problem, so where is the ‘Something is well and truly b******sed’ status for that?

The status reporting is rather US biased at the moment, as is the ability to report a problem.

5 Likes

Hi @jlv and @orangebucket , the feedback we’ve received in general on the Community and social platforms is that we need to update the official Status page faster during an incident. In order to do so, we would need to be less specific on the initial post. Additional details were provided afterwards when triaging the issue and confirming impact with the Engineering Team.

If either of you have any feedback on better wording for this, I’d be happy to take it and adjust accordingly.

5 Likes

I have no problem with the wording at all, but then I’m British and so fluent in verbal irony. For me ‘Something is not quite right’ covers anything up to and including the apocalypse.

I was commenting on what a kick in the teeth it must be for those users who have been suffering all week with problems with event history in the EU without any acknowledgement of their issues to suddenly see a status report for another incident.

9 Likes

@garrett.kranz,
This is the solution for Device history 16 hours delay?
Yes or not?

1 Like

@Mariano_Colmenarejo I’m afraid not, this is regarding a brief incident that occurred this morning (1/7/2021) and is not related to the issues with Event History for certain users on the EU shard, which is being worked on with a high priority.

2 Likes

@garrett.kranz,
Thanks for the info

1 Like

My wife tells me “Why don’t you get a lover like all other husbands? you’re happy at least” :grinning:

3 Likes

@garrett.kranz if there is still an issue with EU event history, then why does the status page not reflect that? You can post region specific statuses, no? As others said, the event history problem has carried on for days now

4 Likes

@Automated_House FWIW the EU Event History troubles are already recovering and should be resolved soon as all the event queues catch up. If this does not happen as we expect it to, we are prepared to create a Statuspage post about it.

As this did not impact everyone (even everyone on the EU shard), the App still loaded, devices could still be controlled, Automations and Device Types still executed, and Hubs were not offline, these factors are what is considered an incident on our end.

Whether or not custom SmartApps can fetch data or Event History is 100% current are not in the list of items, but the argument could be made for them to be. I don’t want to be argumentative about it and sympathize with anyone having troubles in EU. We have been actively working on it since it was reported via the Community.

I hope this makes sense, and thank you for the feedback. Myself and my Team are actively working on improving publicly-facing Incident communication and this feedback helps do that.

6 Likes

Thank you for response! It’s good to know what the criteria is for a status to be posted.

3 Likes

Access token push notifications were out since yesterday. I received it from the Classic app just after I activated something from Google, and I couldn’t be bothered to click on it since the classic app and even history has been bunkered. So being fast… Let’s say, you are a day late…

In the case of the event history it is quite easy to make the argument as it is sometimes the way we actually know that devices are being controlled and automations and device types have executed, or more to the point that they aren’t and that you may have an incident.

It could be argued that knowing whether event history is functioning correctly is critical information, and is actually more important than the event history itself.

On a sort of related note, for me just about the most disruptive change made to ST of late has been that the battery level has vanished from the device event history without any apparent comment.

4 Likes

Just noting this is not working AGAIN. App still works, but no Alexa. Not testing Google because I am too irritated to try.

Not entirely sure what you mean by this.

But, again, I’m not downplaying the importance of Event History. Just that the nature of the Event History troubles in EU did not meet any sort of general incident reporting threshold for us. We’ve since updated status.smartthings.com to reflect the situation, and will obviously have this scenario in mind going forward. Feel free to PM me if you have any other thoughts or concerns.

2 Likes

@Nameless everything I’m looking at for Amazon/Google with relation to SmartThings looks normal. If you’re still experiencing troubles and want to PM me, I’d be happy to have a look.

1 Like

PM sent. Confirmed this is tied to Alexa. Google is working.

Just wanted to follow-up and say thanks to @garrett.kranz for some suggestions to get my system back online. Not sure what caused it, but Alexa kept telling me “the server is unresponsive”. Disabling the skill from Alexa and enabling it again brought everything back and also kept my routines and groups intact.

5 Likes

Event history is critical for automation debugging.
for example. I have a situation where a relais is being switched off after les then 30 minutes of inactivity and when the motion sensor has a gone through a sequence (motion, no-motion, motion) while the condition in the automation is switch off after 30 minutes of no-motion. It’s kind of hard to deal with this if you have to wait 9-14 hours on the exact time-stamps of the trigger events.

Generally speaking, I find event history to be largely informative and/or diagnostic. Yes, it can be used by apps like webCoRE for more exotic conditions but is that really what it is designed for?

If I look at the event history there are two broad possibilities:

  1. Event history is working as intended and so what I am seeing, or not seeing, is meaningful.
  2. Event history is not working as intended so I can’t rely on what I am seeing, or not seeing, so I’ll have to come back another time.

In terms of the implications:

  1. is obviously what we are all aiming for.
  2. is an inconvenience.

The problem can be knowing if it is 1) or 2). Not knowing can mean me wasting my time, but a status message could mean tea and a biscuit.