Weekly Update from Alex - 06/24/16

Apologies for missing last week’s post while I was traveling. This week’s post will include details from both this week and last week.

First, I want to talk about the downtime we had on Tuesday, June 14th. The downtime was caused by our weekly SmartApp and Device Handler deploy. These deploys are routine and normally go without incident, but last week’s deploy caused us to hit a limit on our caching layer which caused the login and device control problems some of you were seeing. The caching client we use has a limit of 1MB and the total size of our approved Device Handlers grew larger than 1MB causing our platform to behave erratically. We recognized the problem right away and were able to create a hotfix and deploy it as quickly as possible.

Although this was unfortunate, the upside is we learned from it and put in safeguards to make sure we don’t see this happen again. We appreciate your patience during our brief downtime.

This week we have also seen a few production issues that have impacted a subset of our customers. Our API cluster is seeing periods of an abnormally high amount of database connections. These spikes happened after this week’s platform release and, while these spike seem to happen at random, we are currently looking into every change that happened during the deploy to find the root cause. We will update you with more information when we have it.


Device Handler Improvements

As part of the push for improvement on the basics, we’ve begun to turn to nuanced problems specific to individual 3rd party device integrations. Here are a couple of examples where there was strong progress in the past week.

Netatmo Updates
We found and resolved a few problems with the Netatmo integration this past week. The Netatmo Device Handler and Service Manager were throwing approximately 100k backend errors per day and we were able to cut that down to basically zero. Not only does this help the platform but it also improved the entire Netatmo integration experience.

Iris Smart Plug
We worked with @blebson to resolve another high error count Device Handler. There was a Null Pointer Exception being throwing in some cases creating 250k errors over a 24 hour period. We were able to work with @blebson and the consumers of his Device Handler and in a week’s time ⅓ of Iris Smart Plug users updated to the revised version of the Iris Smart Plug and we have reduced those errors significantly.


Documentation
We have recently reviewed the best practices we have established internally when writing on our own platform and found that many of them haven’t been clearly reflected in our external docs. As a result, we have released new documentation that outlines the best practices for writing code on the SmartThings platform, which is available to everyone. Let us know how we could make our docs even better.


Finally, I want to point out that we love hearing stories like the one below shared by @sgnihttrams. It is the reason I and the founding team founded SmartThings and the reason we continue to drive to be the greatest Smart Home platform in the world. This story makes it even more clear in my mind why the basics are so important. Thanks for sharing @sgnihttrams!

See you all next week with more data and improvements!

-Alex

18 Likes

There are definitely still issues going on. I’ve had routines and presence acting funky and being missed for the past few days and by judging by other posts I see on here, I am not alone. There’s been multiple bulletins this week sent out with issues as well which we should hear more about.

3 Likes

Although it isn’t an exhaustive answer. The details we have and can share at the moment are here:

2 Likes

Thanks, I actually missed that originally.

1 Like

:+1::grinning::grinning::grinning::grinning::grinning::grinning::+1::+1::+1::+1:

4 Likes

So, how is it possible that this bug was not caught in integration, unless, of course, there’s no integration? Duh!

Yeah, it was nice to have them reach out to me not only with a diagnosis of what was causing the errors but with a possible code fix as well. Props to @slagle :slight_smile:

9 Likes

Another 24 hours… another problem… now I can’t run any of my AskAlexa stuff…

It’s beginning to feel a lot like… all of the time before April 2016!

3 Likes

I use Routine Director to change modes and the last few days, it hasn’t been doing it automatically. I’ve opened a support case. Anyone else have issues like this?

I don’t use routine director but have been having all kinds of fun since yesterday. I use a Minimote to change from day to night modes ( ran by a rule in rule machine) and ST would see the button press but would not change modes. I figured RM was on its last let after the last round of updates so I swapped that over to a supported app Button Controller. Same results. I press the button, it is seen by ST and nothing happens. I open the ST app and click the good night routine and it runs fine. But BC and RM do not run the routine.

I’m having the same issue. Button controller no longer controls anything. I have a security code keypad that I purchased for this Smart Home Monitor. Keypad was used it to to trigger routines. Routines to trigger SHM states. New improved ST experience: Button press… sometimes ST recognizes the button press… then nothing happens. Open door… alarm goes off. Apparently the only way I can arm and disarm the SHM now is to carry my phone around with me every time I want to let the dog out.

Also worthy of note: The notifications seem to be very inconsistent. Sometimes it registers the button press, sometimes it does not. Since yesterday, it actually worked twice to disarm, then failed to work for another 17 attempts. A couple notifications stated “Welcome Home! As you requested, I changed mode from Home to Home Disarmed. I was unable to disarm the security system because null is not permitted to do so.”

Is this an open issue or the new expected state of misbehavior for the interaction between my Things, Routines and “Smart” Home Monitor?

-T

I have also been seeing the same things. Last night Goodnight didn’t run. This morning Goodmorning didn’t run unless I used the smartthings app. Activity log saw the inputs just didn’t act?

From support:

We are unfortunately experiencing a bug on our end that is preventing the use of SmartApps to run any routines outside of the Hello Home module.

At this time, we have no workaround so we ask for your patience as we get this issue resolved. I have flagged your ticket so once we have a fix implemented you will be notified right away.

I sincerely apologize for the inconvenience this causes, but please let me know if you have any questions or concerns in the meantime.

1 Like

Thank you. Good to know that this is an acknowledged problem and not an intentional change to the security model.

1 Like

Are you guys still seeing the error? We just deployed a hotfix that should have resolved it.

8 Likes

@vlad Worked 5 times out of 5 attempts (I tested from CoRE only). Thank you!

1 Like

I’ll find out when I leave the house in a few minutes :slight_smile:

It appears to be working again. Tested Button Controller and Ask Alexa SmartApps. Will continue to monitor.

1 Like

@vlad

Haven’t been home to try it but thank you for a fast response.

Hey Vlad, Did someone unfix the hotfix? I appear to be having the same issue again tonight. I can no longer trigger routines from my SmartApps.

1 Like