Announcement: Update to Database

slagle · January 22, 2016, 7:13am

Hey all!

Just wanted to stop in and give you a heads up of a database change happening tomorrow, 1/22/16

We are upgrading and adding more database clusters to our environment to alleviate some of the “pressure” we are having on our current database clusters.

Unfortunately the current data is not easily “migratable” so when we make this change your event history for the last seven days will not be available. This event history will begin to build again once the migration is complete. This change should not affect control of your devices or any automations you have setup. You can expect this change to happen around 10am PST on 1/22/16.

The goal of this change is to help a lot of the platform instability you have seen that last couple days. We are confident this will resolve a lot of this problems you may be seeing.

We apologize for any inconvenience this might cause for you. Please reach out to me, @slagle, if you have any questions or concerns. We thank you in advance for any patience you can give us.

For more up to date information please follow status changes @ http://status.smartthings.com

Fuzzyligic · January 22, 2016, 7:21am

The status page implies this is US servers only, is this correct?

Any news on EU servers?

slagle · January 22, 2016, 7:34am

For now I believe it is US only. But I’ll confirm in the morning.

CDJ · January 22, 2016, 11:44am

Well, crap. There goes my reading time at lunch today!

It’s good to see ST implementing these updates. I also appreciate the status page information I see this morning. Tell the ST team thanks and keep it going!

bago · January 22, 2016, 12:52pm

@slagle Will this fix the scheduling issues too? They have gotten really bad lately.

keltymd · January 22, 2016, 1:04pm

I would love to be able to schedule things again.

kgiberson · January 22, 2016, 1:09pm

Fingers crossed but $20 this breaks more shit.

keltymd · January 22, 2016, 1:13pm

I have to give them some credit. The update yesterday was almost perfect for me. Just a couple minor things

tonesto7 · January 22, 2016, 1:42pm

I agree. I’m starting to see some positive changes happen finally. I notice a pretty large performance increase in device response since yesterday’s update. I realize things have been rocky this last week but for the most part everything has been working for my home and I have just over 100 devices.

So for once I would like to give SmartThings some positive posts and props on definitely seeing improvements to the platform. Fingers crossed that the update goes off without a hitch today

brianlees · January 22, 2016, 1:55pm

Although I’m happy to see ST has found the cause and is addressing it, the IT Director side of me is a bit disappointed. Server loads, whether it is processor, memory, network, or app specific utilization, are something that can easily be monitored and trended. (is that even a word?) Although I can’t say for sure since I’m not sitting in their meetings and I tell my team “I want facts, not feelings,” it “feels” like ST tends to be more reactionary rather than proactive when it comes to back-end infrastructure. I was hoping that the Samsung purchase would have changed that and given ST the resources to quickly scale both strategic and on-demand infrastructure. Perhaps there are other issues in the relationship here that are afoot. (my Sherlock word for the day)

So, again, I’m very happy to see progress on this issue, but I’m a bit saddened that issues have to become very massive before a change is made.

an39511 · January 22, 2016, 1:59pm

I don’t think they have started to to the database upgrade yet so any performance improvement is not a result of that. Later on today will be the real test. I do think these Friday afternoon upgrades is a crazy time to be doing these things.

tonesto7 · January 22, 2016, 2:07pm

I have the same thoughts myself(Granted i’m not a Director). My inability to ignore these technical issues would have caused me to fix this stuff months ago, but then again I don’t have the giant Samsung standing over my shoulder telling me what my agenda will be. Either way things can’t get much worse with the exception of a total platform outage, it can only get better from here.

btk · January 22, 2016, 2:12pm

I agree, mostly. But this was the first holiday season where the product was available in retail stores.

spyd4r · January 22, 2016, 2:22pm

it’s called forecasting and capacity planning.

jotto · January 22, 2016, 2:26pm

My scheduled routine/events failed this morning (supposed to trigger @ 5:40AM EST but upon waking it obviously did not trigger).

Fingers crossed the additional database clusters fix these issue once and for all .

geko · January 22, 2016, 2:28pm

Personally, I wouldn’t do a massive update like that on Friday, but good luck to you anyway.

kars85 · January 22, 2016, 2:41pm

We have a ideology at my work place (I’m a sys admin) - we call today “Read-only Friday”. All bets are off if s***'s broke, though!

btk · January 22, 2016, 3:09pm

Maybe not to a fully functional system, but that’s not quite what we have here. If they don’t do anything, the platform’s hosed for another three days until Monday. I applaud their willingness to abandon read-only Friday! =)

justintime · January 22, 2016, 3:25pm

The engineer in me is dying to know the specifics of what’s going on in the infra. I’ve never been through a ‘upgrading and adding more database clusters’ operation where there was data loss. Indeed, the entire point of a cluster is that you can scale it horizontally without having do introduce downtime or data loss.

From what I can sleuth out, ST uses MySQL, and Cassandra. Cassandra, being a NoSQL database, is likely the datastore responsible for keeping our event data. Cassandra was architected from the ground up to be easily scaled, so I wonder what they’re doing to the cluster that is making data loss a possibility.

To be clear, I’m not criticizing. I could care less about that data, and a trade for stability is one I’d make every time. Something tells me that this is more than just scaling out the database layer though.

@slagle’s post about ‘learning a lot’ from this last issue tells me that they identified something big, and are making moves to fix it. When the dust settles, I’d love to see a post-mortem about what all was done to fix things. Curious geeks want to know!!!

Good luck, godspeed, and #hugops to you all.

darrylb · January 22, 2016, 3:49pm

As a DBA who has experience in multiple database technologies (And currently supports 300+ SQL servers, and a few VLDB’s (2-10TB each), I too am very very curious to learn more.

Topic		Replies	Views
New Database additions General Discussion	2	1097	January 22, 2016
Weekly Update from Alex - 4/22/16 Announcements	32	6923	May 1, 2016
Saying farewell to (not so) Smarrthings General Discussion	63	5296	March 16, 2017
Announcement: Changes to scheduler history in IDE Announcements	30	4329	January 22, 2016
Status.smartthings.com revisted General Discussion	13	2291	April 28, 2016

Announcement: Update to Database

Related topics