Device events and history disappeared in graph.api and groovy smartApp, then slowly came back. Mesh Maintenance or publishing conflict between old groovy smartapp and newer API smartapp? (Aug 2021)

Hi All,
@nayelyz @erickv around 3:40AM ET on 7/30 our 30 customer accounts stopped receiving device events, event history. As the day went on all the device history incrementally caught up to current time. It took about 12 hours for the device events and the history to catch up.

The history in graph api was showing the delay as well. IS there maintenance going on that is effecting the device event history?

Happened again, at about 6:10AM ET on 8/3 the event history and all events are not being sent to our legacy Groovy smartApp. We can’t pull any event history. What I did do earlier (around 45 mins earlier before outage) was again publish my new devel smartapp that uses the new APIs. I think this may have happened the first ( I’m struggling here) time too. Not sure. I’ve unpublished the devel smartapp and seeing if the events and event history catch up in our legacy groovy smartapp.

My latest theory is that when I test publish the devel API I’m screwing up the ST platform support between the old smart app and the new API smartapp.

Also the graph api and the phone app history is also behind. Its not just the legacy groovy app that we see this.

Any insights or thoughts appreciated.

There is an issue with event processing that is still working to catch up. It is mostly caught up but a few events may still be missing.

2 Likes

@Brad_ST @nayelyz we are about 6 hours behind right now… Its slowly coming back. Any idea whats going on? We are history intensive and porting to the new APIs as fast as possible…

Yes the amount of lag in storing device events is greatly reduced but the system is still playing catch up. To be clear though, as you noted this issue is not a matter of Groovy versus the new API. Both are impacted which is evident by the phone’s events being behind as well as the graph IDE.

@Brad_ST These history issues, as you know, have been a regular problem since you introduced this new system over a year ago. There are many “history missing” threads, and many you’ve personally closed yourself. I’m curious if you’re able to permanently fix this issue, or it’s going to continue for the foreseeable future? I’m assuming if it was fixable it would of been fixed by now, because in some users this issue can induce panic, which is not your intention. If fixed it would save support resources, and instil confidence, long term users I’m sure don’t bother to look or rely on device history anymore.
As a novice it would appear these lag and caching issues seem to have a wider platform significance, slow device loading times, current device state lag etc etc
I’m guessing these lag and caching issues in the new app/platform are here to stay, and reverting back to the old system, when everything just worked 100% of the time, instantaneously, was reliable, solid, will remain a long distant memory, and there will be many “history missing” posts to come in the future.

I have to say the History in the (Android) app isn’t a great help.

There are three filtering options: General History (Automations, Rules, Scenes etc), Devices or All (both). Currently at 9:30 UTC, this is what they give me.

Devices gives me up to date device history. There may be historical gaps but as we know it always refreshes and resets your view when you least want it to so you can’t get far back in the history anyway.

All gives me up to date device history but only shows Rules activating up until 4:30 UTC.

You’d think that would mean General History would just show Rules activating up until 4:30 UTC. But …

General History will very occasionally give me two Location Mode changes from this morning until it refreshes, but otherwise gives me nothing at all. So where has the Rules History visible under All gone to?

1 Like

The event history has been stopped again today. It has been inactive since 3:44 pm today Europe, Madrid.
according to the smartthings status page everything is fine.

UPDATE:
Has already recovered this morning

Event history is really shot now. been down for 24 hours and refunding customers. Any idea when this outage ends? We are porting as fast as possible but what is going on? Its a holiday weekend in the US and this started Saturday night around 10PM EDT.

History is not working again.

It stopped at 13:18 UTC +2 according to Simple Event Logger

Is already recovering, he has advanced 1 hour.

Let’s see what it takes to update 100%