tl;dr
We are confident that the recent tile-related bugs have been fixed, and a production deploy went out this morning (Feb. 5, 2018).
Tile Bugs
Recently, there have been app crashes, “Something’s Wrong” messages popping up, and a series of visual bugs. These bugs are all related and the solution to the crashes caused the visual bugs. For reference, here are a few of the threads that I’ve been watching.
- iOS 2.14.0 released (1/30/18)
- IOS app “Something’s wrong”
- NEW: Aeon Home Energy Monitor v2 Device
- iOS BackGround Colors & Labels broken
History of the Tiles framework
To help understand how such a widespread series of bugs happened, it’s important to understand the history of the Tiles framework. In the early Kickstarter days, we built Tiles to be flexible and extendable above all else. We knew the framework would need to change a lot and that these changes would have to happen quickly. Given the size and complexity of the platform we were building, the number of developers we had building it, and the variety of devices we had to support, we built the most flexible framework that we could. This allowed mobile developers to make changes to a DeviceType Handler and have those changes immediately reflected in the JSON they were working with. This cut out the need for a cloud developer to change code, and more importantly, removed the need for a deployment.
With a flexible Tiles framework in place, we started building proof of concepts and features. These features were built with convention over configuration, and as more features were added, conventions changed. As years passed and hundreds of iterations happened, certain pieces of Tiles would be changed, repurposed, or deprecated, but rarely removed. This ultimately lead to a lot of bloated and brittle code. Small changes in one spot could cause bugs in seemingly unrelated places. This is, in my opinion, a direct result of our early startup days where we were moving so fast that we couldn’t afford to take the time to harden this piece of the platform. Over time, the iterations slowed and the framework had mostly matured so we started planning where we wanted to take the Tiles framework and how to clean it up.
Direction of the Tiles framework
The next version of Tiles will be driven by capabilities, meaning some devices won’t even need to define Tiles. Some devices won’t even need a DeviceType Handler. A large percentage of DeviceType Handlers have identical tile definitions. This is a direct result of years of refinement and working toward a more consistent UI. Unfortunately, it’s also a result of a lot of copy/paste which makes supporting and updating DeviceType Handlers really difficult.
What caused all those bugs?
Rather than changing the massively complex Tiles framework, we added an adapter layer on top of it that can handle generating capability-based Tiles if the DeviceType Handler does not supply them. In order to do this, we built strongly-typed Tile definitions so that we could accurately generate Tiles when we needed to. The first Tiles to go through this layer were Scenes when they were introduced. It was a new feature and therefore a low-risk introduction to this layer. After that, we started passing SmartApp and Routine Tiles through since they are less complex than device Tiles. Slowly and methodically, we worked our way up to devices.
For the most part, the adapter layer worked well with device tiles. Nothing major came out of our testing, but when the change made it to the App Store builds, we started seeing a lot of crashes and “Something’s wrong” messages. These were a result of errors while parsing Tiles defined in custom DeviceType Handlers. The first iteration of the adapter layer was meant to be a straight pass-through with no changes to the data, but constraining types inherently changed invalid data. For example, strings were no longer being allowed in fields that are meant to be numbers (which is what caused this issue).
We were able to isolate most of the errors and made changes to the parsing logic in this adapter layer in order to stop the crashes. We did this knowing that it could cause visual oddities in the app, but would ultimately stop the crashes. It worked, but caused some weird rendering issues.
We have been working very hard over the weekend and I’m confident that we fixed most, if not all of the Tile-related bugs that were introduced in version 2.14.0. We were able to fix these bugs because they were ultimately cloud issues and not mobile issues. We deployed these changes to production this morning (Feb 5).
How will we prevent this in the future?
This is the topic of the week. We have a lot of thoughts about this, and we’ll be meeting to discuss them in detail. This will greatly inform how we move forward.
Thank you
This community is amazing. Several of you found workarounds to the crashes before we could even get the fix to production. You pointed out a lot of rendering issues that helped us isolate and verify fixes for. Your discussions are what make me confident that most, if not all of the bugs will be fixed with our next deployment. Thank you.