Suggested Improvement to Reboot "Offline" Hubs that are inaccessible

I’m one of many who experienced their v2 Hub going offline today and yesterday. The connection seemed to flap for about an hour before finally not connecting. We were advised via email and the status page to reboot the hub by pressing the button on the back or removing batteries and doing a battery cycle once whatever issue was happening with the ST cloud was resolved.

Quite a few folks here use a hub to monitor or control a vacation home that may be quite some distance from them, so this isn’t really a practical solution all the time. I’m concerned as well. I have routines set up to turn lighting on and off when we’re away from home, and control various home systems. What if I had not been home at the time?

I’m an IT Consultant by trade - I do infrastructure for large enterprises. Part of my job when designing or reviewing systems is looking for redundancy and automation, and making improvements. With so many places doing lights-out infrastructure from across the world in some cases we always need to have a way to perform functions remotely as if we were in the same room.

To that end, here’s my suggestions on how to handle these situations. Obviously it’d be nice if they didn’t occur. But I’m in IT. I know that stuff happens, and that it’s not always expected. So the best course of action is to have failback routines.

ST should consider taking a page from the vendors in the enterprise wireless space, like Cisco, Aerohive, Aruba, etc. Many times a company’s wireless access points are mounted in locations that aren’t easily reached - like a warehouse ceiling 20-30ft up. It’s not practical to bring out a bucket truck and have to reset each one by hand when they can’t talk to their controller or master. Two things happen when this connection is lost:

  1. If the access point loses connection with the mother ship, it attempts reconnection. If it cannot reconnect after a certain period of time, say 30-60 minutes, it does a cold reset on itself, as if its power was cycled. It repeats this until it can reconnect.

  2. Within a few minutes of losing connection, the access point begins broadcasting a new wireless network, usually using its MAC address or name as the SSID. A network admin with a laptop can then connect to that SSID, and use SSH to establish a terminal session to the access point. The password was previously set as part of the initial config, and using that they can login to the access point and adjust its configuration or reload (reboot) it. Once it talks to its controller it automatically stops broadcasting the “emergency” SSID and resumes normal operation. Keeps you from having to get on a lift to touch the thing physically just to plug in a console cable or hit the reset button.

Sorry for the long explanation thus far. So how would this translate to an ST hub going offline?

  1. If the hub is unable to contact the ST cloud, after one hour it automatically reboots. After rebooting if it cannot connect, it reboots again in another hour. You can put all kinds of checks here if needed, like only reboot if it has a IP address, if DNS resolution is available, etc. Simple is usually better though.

  2. While the hub is unable to connect, either spin up a very basic web server or even an SSH server and allow local connections. You’d probably want some type of password protection or other kind of access control that would be set as part of the initial configuration. The web page could be totally basic - maybe just a big button that says “REBOOT HUB”. Click it and the hub power cycles. Once connected to the cloud shut down the web or SSH server.

The only problem with the 2nd method is that it would only help for people monitoring a vacation home or remote location if they have some other device on their vacation home network that they can use as a connection point. I don’t think these options need to be mutually exclusive however, both could be done.

I’m making the assumption here that the problem that’s cropped up two days in a row is not the hub locking up and becoming unresponsive - that’s a totally different problem space I’d solve by implementing a watchdog process on the hub itself that’s in a protected execution ring and watches the other pieces of the hub software for some type of keep-alive. No keep-alive, reboot.

Just my thoughts on ways to avoid having to physically access the hub when it needs a reboot. My feeling is that this level of technology shouldn’t require a separately purchased “reboot timer” or any similar level of kludge.

25 Likes

At the risk of being just a tad “trolling”… you realize you’re stating the obvious, right? SmartThings may be somewhat understaffed, but they are very aware of the drawbacks to requiring Customers to manually intervene with the Hub.

Your post is still helpful to share with the Community and point out that there are possibly good solutions to the problem experienced today…

I considered that. I’m a consultant at heart though, and that makes me want to help in the areas I can. It also makes me think I can help. :smile: I don’t care so much about placing blame, I like results. I’m sure ST is aware of potential solutions.

That being said, I’ve probably worked with thousands of companies over my career, from large to small. You would really be surprised by how often I offer what even I think are basic common-sense suggestions and get told they’ve never considered that, and then they want to implement them. I know it surprises me every time. :wink:

This isn’t me patting myself on the back, just that at least to me it seems that sometimes a different outside perspective can be useful at times.

5 Likes

To put it another way, if this was a project or program I was guiding, one of the driving development maxims I’d want to instill is that the home automation controller should automate as much about itself as possible, including reboots when things go wrong. Otherwise what’s the point? Not saying this isn’t the ST philosophy, of course. I would hope it’s as important to them as much as it is to us, their customers.

5 Likes

I definitely understand, Rick… I’ve got the same consultant background and instincts myself!

But after nearly 2 years and 11 months in this Community (joined January 2013!), I’ve discovered that, nearly 100%, SmartThings doesn’t want my help. Maybe I’m not delivering it kindly enough, or maybe they think it’s worth what they pay me for it (zilch!) … or both. It’s easy to dismiss suggestions from the Community as “armchair quarterbacking” or “backseat driving” … but you understand the value of a diverse background, outside perspective, brainstorming, and blunt solution outlines that, of course, have to go through refinement and discussion before implementation. Some suggestions are industry best practices, and some are creative, innovative and outside-the-box. There are factors outside our visibility that impede implementation.

(That includes rejection of my candidacy for employment because they couldn’t fit me into one of the available “boxes” they were looking to fill, BTW). So I’ve decided to be content for now to be outside the box and just yell at it. :smiling_imp:

Yah … we agree – It is actually likely that SmartThings actually “knows what they are doing” and is generally aware of the milestones and elements to a more robust product and service. Yet, as I said above, there is ongoing (3+ years and counting) unexplained resistance to what seem to be good recommendations that should be feasible and should be part of the platform already.

Like how about some risk management and not deploying 3 major releases on the same day (Sept 3rd): Hub V2, App V2, and entering the UK market.

1 Like

No, they just think they’re smarter than you. They call themselves SmartThings after all. :smile:

1 Like

by pressing the button on the back or removing batteries…

I have a v2 hub as well, which has been going offline. I’'m sort of curious about that “button” though. Are they talking about that recessed red toggle? (That I would have guessed is for a full system reset) Just curious…

Yah… I’m sure a few of them label me something else that’s “smart” … :horse: (that’s not a horse :stuck_out_tongue_winking_eye:).

3 Likes

I believe so.

There is no on/off power switch on the hub, and unplugging-power/wait/replugging-power won’t work if there are batteries installed, so that is essentially the force-reboot button.

Wow, I sure love the suggestions made here and I’m a bit saddened to hear that the smartthings team is above community suggestions. I waited to join the SmartThings community until V2 because I was afraid of having a cloud controlled hub. Well, having it go offline for 2 days in a row is not helping ease my fears.

Sorry to rant, but as a consultant myself I’m always looking for a better way to do things.

4 Likes

You certainly exceed my length of time here for certain and I appreciate your view, and I think we’re on the exact same page. I’m in full agreement.

One thing you touched upon that interested me. There’s threads to suggest a device. There’s threads to suggest or announce a SmartApp.

There are no threads or forums to suggest improvements to the underlying platform, at least from what I can see, and someone please correct me if I’m wrong.

Maybe it’s because I’ve spent the bulk of my career in customer oriented businesses - and some of them whose business focus was not IT, they just had technology, but I do find that omission a little glaring. Here’s a company that has a vibrant user community, which is what attracted me here in the first place. They’re in an area of technology that’s relatively untouched. The right decisions at the right time, listening to their passionate and mostly technically inclined customer base could yield huge competitive advantages. They tout their user community, in fact.

But no prominent place to make suggestions? No separate "suggestions@smartthings.com" email noted often, with perhaps a list of features or suggestions to vote on? Even Microsoft does this with their iOS/Android apps.

I’m not bashing the people of ST, I’ve worked in many situations where there was too much to do and not enough time to do it, and I empathize. This smells like a leadership problem to me. I hope they solve it.

Otherwise there are options that I’ll consider, as this market matures and consolidates we’ll start seeing clear winners and losers. Really it reminds me of my much earlier years - I’m old enough (just barely, haha) to have had a 286 and Windows 1.0. The best days are yet to come.

So until then I’ll offer unsolicited suggestions and try to make things work, which honestly overall aren’t too bad. My biggest concern is that there’s too little movement in adding new device types. For example I’m using @Lgkahn’s Aeon Smart Switch devicetype. It works great, don’t get me wrong. Why this isn’t a standard device type yet from ST is a big concern - it’s reasonably documented for a Z-Wave device, it’s been out for a while, there’s nothing terribly special about it as far as Z-Wave is concerned. Shouldn’t there be a standard intake process that gives us “official” support?

I’ll get off my soapbox. I’m all for letting this thread serve as something productive if we can if that’s okay. There’s plenty of spots for us to talk about our gripes, I’d love to see anyone’s thoughts about reboot scenarios I may have overlooked.

4 Likes

To be clear, there are super wonderful, dedicated, and talented individuals on the SmartThings team – the majority of them, I presume; many of whom actually silently read these forums (and some sometimes post sincere and genuine interest in and gratitude for our suggestions). These individual subject themselves to the barrage of complaints and suggestions … some worded more bluntly than others.

There’s a big difference between sincere intentions and the ability to execute, however. As an outsider, I can’t legitimately hold anyone at SmartThings “at fault” for not actually digging into the suggestions and working on implementation; I know there’s too much unknown and outside each person’s control.

If it were my job as a consultant or manager, though, I would hope that company culture would encourage accountability and reward the right kinds of risk taking (i.e, the exploration of creative and innovative solutions in the lab, organizational restructuring, etc.). Accountability is very different from good intentions and competence. Too many people at SmartThings seem to lack accountability – but is that the fault of the individual or the fault of the management chain that isn’t holding folks … accountable…?

2 Likes

Generally correct observation; though I’d argue that there are very productive gems throughout all of the “complaint” Topics, and we’re assured that every single post is read by one or more SmartThings employees.

Very few companies take Customer / Community recommendations literally. It takes a certain type of executive and certain type of rare organizational culture that can figure out the sweet spot, put aside “organizational egos and myopia / shortsightedness” and absorb the good parts of broad scope recommendations into their strategy.

Apple is often cited as an excellent company in this regard. Again … we only have the outside perspective, though I’m sure it is used as a case study in many MBA programs (I hope!). Apple isn’t without major missteps and, frankly, a large part of their success is due to a focus on internally developed and focused strategy, not customer feedback. Customers can never have the same perspective as the vendor – the vendor’s goal is to maximize profit; the Customer never has that as his/her primary objective, for example. They want long term survival of the vendor (especially if it is an ongoing service or cloud!), but they really want to optimize features and minimize cost to themselves.


Yet one tactic that SmartThings has chosen wisely is: Giving us this highly visible and uncensored forum. It helps some of us let off steam, yell at “the box”, commiserate, compare notes, and, most valuably, help each other – all at very minimal dollar expense to SmartThings.

There are even bi-weekly “Developer” conference calls that are fully open to anyone, and video recorded / published. All examples of really good Community engagement and advocacy. All this perhaps has more impact than we know … it just is taking far too long to bubble to the surface as broadly visible quality improvements.

I hesitate to name some individual employees that, to me, really stand out … just because I don’t want to miss mentioning some of them, nor imply that the others aren’t delivering, but just in less directly visible ways. Truth is, there are folks at SmartThings I respect more than others; purely business, not personal.

1 Like

@tgauchat

Thanks for the reminder of where the company came from. And the reminder that there are good people in the company.

The feature of allowing me access to a development kit to put together my own drivers is what drew me to the community and has worked amazingly well for the devices I brought with me.

I should have kept my comments productive And on topic. What I really wanted to say is how beneficial an auto-reboot feature or remote reboot feature would be. My hub is in a centralized place in my home that also happens to be fairly inaccessible. The cloud issues and manual reboot also happened to coincide with the one night my wife was excited to learn about our home automation. wife acceptance factor took a hit and my wall is a little less straight.

Overall I like the idea of ST, and it has the one killer feature I wanted; the ability to play with the code.

Now for the 2nd killer feature, auto-reboots. User configurable, because not everyone would want that. Now, is it possible to create a smart app to do this? Dang, there goes my weekend.

4 Likes

If you were someone still on V1 with no battery backup (or a V2 user without the batteries in), you could just use a WiFi plug and its native app to reboot. It keeps you from having EVERYTHING consolidated into 1 system/environment, but it gives you a remote reboot option as long as you’ve still got internet.

1 Like

Thanks for the suggestion. I didn’t think about running it without batteries.

1 Like

The – hopefully very rare – requirement for a manual hub reboot is the result of some severe failure in the architecture, infrastructure, or a single unlucky deployment of a firmware update or a bug in the firmware that causes a “hard hang” of the OS.

When these rare edge-case conditions occur, there is no “easy” solution; it’s a problem computer operations experts have studied for decades. Should there be an isolated “watchdog” thread or even separate processor on the hub whose sole job is to robustly monitor the main processes heartbeat and be able to issue jumpstarts automatically and upon network wake-boot requests? Even this might not be a 100% solution, but it helps.

We should be thankful that total Hub “bricking” hasn’t occurred (oh gawd, don’t let me jinx it, cross-fingers!) – like what happened with Wink a while back… Wink smart home hubs knocked out by security certificate (update)

Most edge devices these days are designed to maintain two firmware images and have an inherent recovery mechanism in case the active firmware is corrupted for any reason … e.g., interrupted installation, power surge, or a bug in the package. But something has to initiate the firmware recovery process. It is not uncommon for that to be a manual reboot … i.e., the never or rarely updated “bios” of the hub assumes that a hard boot after firmware install means that a recovery should be initiated.

Read about Wink’s problem linked above, and you’ll see the type of unpredictable cases that occur in the real world. Some might say that was a case that should have been predicated or proactively managed.

Well… Today’s SmartThings Hub V2 problem is hopefully a rare edge case and hopefully not an indication of poor risk management, or an unwise cost cutting decision, in the hub architecture design.

2 Likes

My guess based on other consumer and enterprise systems like Synology, Apple, Cisco, et. al. is that it’s a mutli-purpose button. Push and release and it’s a reboot. Push and hold for 10 seconds or longer (or some other combination like push and release, then push again and hold for ten seconds, etc.) and it’s a factory reset. Generally all commercially available tech hardware has some sort of button press combo/timing that does this, the idea being that physical access denotes ownership or authorized access.

1 Like

Yup … right again :wink:

The other common “secret” factory reset on all sorts of devices is: push and hold while simultaneously doing restoring line power input. That’s a harder option to hit accidentally.

1 Like

Oh I love that one, seen that more than a few times! Cisco’s router reset is brilliantly complicated as well, I love the word “Ciscomplicated”, it describes their general approach. :smile:

My all-time favorite though has to be the Krups toaster I bought many years ago. It has a digital clock. To set it, you must tap five keys in an exact sequence within 30 seconds of plugging it in. Screw it up? Unplug it and start over. Needless to say I don’t bother setting it after a power outage.

It toasts bagels phenomenally however: crisp on the bottom, chewy on the inside, so I’ll keep it until it breaks. :grin:

I’m all about ROI. The bagels (or even just toast!) come out great no matter what time the toaster thinks it is, and we don’t need another clock in the kitchen to worry about setting, Definitely an over-engineered toaster, without a doubt.

1 Like