Cassandra was a big driver of issues earlier this year - though this has been largely mitigated. Hopefully the community in general has noticed the platform has been more reliable in general since March - which is the result of an organizational wide focus on solving those problems. You are correct in that AWS stability has rarely (no recent incidents come to mind) been a cause of downtime.
In recent memory - (high level) causes have been:
1. Caching failures which cause stress on our relational database (This was the really bad downtime issue that you quoted me on)
2. Network Connectivity failures with the services that connect to Hubs (Hubs going offline) - a lot of effort being put into finding root causes (Probably not fair to mark this as a single issue but I'm not sure how many details I'm allowed to share here )
3. Deployments. There have been some pretty drastic architectural changes to facilitate performance improvements and bug fixes. Occasionally some unforeseen bugs creep into Production but these have largely been very localized (think IDE only issues or affecting a small subset of users).
There are of course out standing issues out there that impact users on a day to day basis - engineering communicates with support frequently to get reports on what issues their team is fielding the most, which helps drive our prioritization. This is probably why Tim and Jody are constantly reminding everyone to contact support when they run into problems .