I suppose I ought to complete my monologue.
My ISP has been using ‘Wi-Fi Optimization’ for years, meaning if you use their supplied routers (in the UK it is common to be supplied with an integrated Wi-Fi/router/modem) they twiddle with the settings remotely from time to time, supposedly to give the best performance for their customers. The reality is that the last thing many customers want is their carefully thought out manual settings being splatted, especially when they have Zigbee and Thread networks to consider. Fortunately it was possible to request they desist with their well meant but ill-informed and damaging tweaks. Unfortunately it seems that they no longer allow this service to be switched off. Critically, it is pretty clear they have now gone further and turned it back on again. I believe this is the root cause of my problems. In my typical UK home signals don’t carry very far (a 5G Wi-Fi signal, for example, has a range of the order of eight yards/metres) and there are plenty of competing signals from neighbours. The last thing you need is a sudden change to your Wi-Fi channels being thrown into the mix. So I think what I was experiencing is a loss of connectivity between the three V3 hubs in my hub group.
I have disabled Wi-Fi on the ISP router so they can’t screw with it again and installed a new router in access point mode. I probably should have done this a long time ago.
A curious thing to me is that looking through my history reveals each of the three hubs having a spell as the Primary hub, and each having been automatically replaced because they were considered to be disconnected. I am unclear how this is identified. How is it determined that it is the Primary that is disconnected and not the Secondaries?
I was largely using three hubs in the group to compensate for the loss of Zigbee routing capacity as I replaced mains powered Zigbee devices with Matter over Thread. However I have also added a number of Sonoff ZBMicro USB smart plugs and I have seen those repeating for fourteen devices each so I probably don’t need to be quite so concerned about that. In the absence of an obvious replacement location for one of the hubs I have removed one hub from the group.
I thought that there was supposed to be a ten minute delay before an automatic hub backup took place, during which there was a notification about what was about to happen. Well I’ve never encountered that. All I have seen is a notification that the old Primary is disconnected at the same time as a new one has quietly taken over.
After a number of automatic backups the disconnected original Primary has been reported as connected again and it has been confidently stated that it would be reinstated in minutes. Only once have I seen this actually happen.
After my most recent automatic hub backups I have discovered that my local VIRTUAL devices no longer have the same states and also my Security mode, which depends on those states, has changed. This suggests that actual events may have been generated. My Edge drivers do generate synthetic events but only during the added
lifecycle and I don’t expect that to occur. I also use Todd’s LAN Device Monitor driver and I notice those have their monitoring attribute turned to off after the backup. This isn’t good.
For more damaging is that after my most recent hub backups I have found that all my Matter devices are offline. However only to the hub group itself. Standalone hubs in the same Location with the same devices installed by multi-admin still see the devices as being online and can control them (except once they lost them after that hub was rebooted). The Matter devices were all absolutely fine via Google Home. These devices are all on the hub group’s Thread network so it means Google is happily using the TBRs on the hubs even though SmartThings isn’t.
Ideally it would be possible to add SmartThings hubs to existing Thread networks without creating a hub group. As things stand, the ability to have two or more hubs acting as TBRs on the network is by far the strongest selling point of hub groups and it is the sole reason I still have one.
The automatic hub backup (or failover) is a good selling point for hub groups but only if it works absolutely flawlessly. As things stand it appears to me that it is far more trouble than it is worth and I have completely disabled it.