The cause of the issues in EU has to do with the rollout of the new SmartThings application and the consolidation of Samsung’s IoT services into the SmartThings platform. Specifically what happened was this - we are performing a slow rollout of the application to all existing users (not just ST users but to users in the rest of the Samsung ecosystem). As more products/users are using this app - so did the load we generate on our Authentication provider, Samsung Account. While SA was able to scale fine in North America and Asia, there were difficulties in the EU region. When we began to notice a slowdown we stopped the rollout completely though at that point new users were still beginning to use the new app and the load continued to slowly increase. This eventually caused latency issues, which eventually became timeouts and flat out failures. Over the period of the outage, ST and the Samsung Account team were working closely together to try various mitigations. While there are still capacity issues in EU for Samsung Account, our team was able to implement very heavy caching on the backend to bring it down to functional levels, at this time the caching is still in place and the SA team is working on increasing capacity in EU. Also, because the new app is still in its early days and there is a ton of development work ongoing across many teams to bring everything together, the presence of bugs and inefficiencies in regards to how it communicates with the backend is a factor, these are also being looked at by the mobile teams too.
I do just want to say… in the recent months ST has become much more of a central player at Samsung - which has forced us to scale quickly as we support new users with different usage patterns. While growth is good, unfortunately this type of approach has resulted in a loss of reliability. Fortunately, the recent issues have helped highlight the need to put reliability back in the forefront - for example the priorities of my development team have completely switched towards reliability and making the new app more usable for our current customers instead of focusing on new features. I am hopeful this will pay off - as we had a similar exercise a couple of years ago and we were able to significantly improve the stability of the platform. I just wish that we didn’t have to go through months of incidents to get here.