Currently only for Android but I assume iOS is coming.
Looks very interesting, the network topology diagram grades each connection as “good”, “fair” or “poor.
You can download a dump of the topology and apparently Gemini has a go at analysing it. If not tried that yet. Everything is in terms of device ids so it’s not easy to determine which device and which room each node is in.
Looks like a great start though.
Looks nice indeed plus it’s an official app!
I’m surprised I already have 15 Thread devices, the number grew fast with IKEA stuff.
Or ChatGPT:
Analysis/Report (click here)
Consolidated technical report: Thread mesh diagnostics snapshot
1. Executive summary
The snapshot shows a healthy, non-trivial Thread mesh with 32 nodes total, consisting of 16 routed nodes and 16 attached end devices/children. The network is not obviously partitioned, the routed view is internally consistent, and there is clear routing redundancy. The mesh also has two border-routing infrastructure nodes, one of which is advertising a NAT64 prefix, so the network is doing more than simple internal mesh forwarding.
The network’s strongest elements are a compact core/backbone centered on nodes 15360, 32768, 36864, 54272, and 57344, while the weakest elements are a handful of fringe or noisy edge-router links, especially around 5120, 22528, 28672, 35840, and 44032. The mesh should tolerate single-router loss at the routing layer, but loss of a heavily loaded parent router would still cause a noticeable local disturbance because child attachments are concentrated on a few parents.
2. Inventory and overall structure
Every routed node in the snapshot reports the same activeRouters value of 16, and the visible Route64 masks align, which strongly suggests a single coherent partition view rather than competing partitions. The routed-node set is:
5120, 15360, 18432, 20480, 22528, 25600, 26624, 28672, 32768, 33792, 35840, 36864, 37888, 44032, 54272, 57344.
The mesh also contains 16 children/end devices. Those children are attached under six parents:
20480→ 1 child32768→ 1 child33792→ 1 child36864→ 2 children54272→ 6 children57344→ 5 children
That parent distribution is one of the most important findings in the file. It shows that the mesh is router-redundant, but child load is concentrated, especially on 54272 and 57344.
3. Border routing and external connectivity
The diagnostics consistently show two infrastructure nodes participating in network-wide external routing behavior: 15360 and 32768. The shared network data includes:
- a default external route advertised via 15360 and 32768
- an on-mesh prefix
fdxx:5xxa:7cxx:1::/64advertised by 15360 - a NAT64 prefix
fdxx:cxxx:6xxx:2::/96advertised by 32768 with NAT64 enabled
So the cleanest interpretation is:
- 15360 is the main advertiser of the preferred on-mesh prefix and one of the default-route sources
- 32768 is the NAT64-capable border router and also one of the default-route sources
Operationally, that means internal Thread routing should survive loss of either node, but the service impact would differ:
- losing 15360 would more directly affect the preferred on-mesh prefix/default-route behavior
- losing 32768 would more directly affect NAT64-backed IPv4 reachability
4. Likely control-plane center and backbone
The most likely current Thread Leader is 36864. That is not explicitly stated in leaderData, which is null, so this remains an inference, but it is the strongest one in the snapshot because 36864 is the only routed node reporting leaderCost = 0.
The likely backbone/core of the mesh is formed by:
- 15360
- 32768
- 36864
- 54272
- 57344
- with 33792 and 26624 as strong secondary core nodes
Why these nodes stand out:
- they have comparatively low average route costs
- they have many cost-1 paths in Route64
- they sit on strong neighbor links
- several of them also carry parent responsibility for children
The strongest single infrastructure relationship in the snapshot is between 15360 and 32768, where the neighbor metrics show an exceptionally strong link, around -35 to -36 dBm on one side with almost no frame or message error. That is the clearest candidate for the top of the mesh’s infrastructure spine.
5. Node-by-node classification
These are analytical roles, not official Thread role names. Some nodes fit more than one category, so the primary label reflects the node’s most important operational role in this snapshot.
| Node | Classification | Notes | |
|---|---|---|---|
| 5120 | Edge router | Routed, but several weak/noisy relationships | |
| 15360 | Border router / core infrastructure | Preferred on-mesh prefix advertiser; strong transit role | |
| 18432 | Edge router | Routed, usable, but not central | |
| 20480 | Parent router (edge-leaning) | Parents 1 child | |
| 22528 | Edge router | Fringe-feeling node with very poor margins on some links | |
| 25600 | Anomalous/incomplete routed node | Routed view present, but missing some expected neighbor detail | |
| 26624 | Core router | Good route-cost profile, useful transit node | |
| 28672 | Edge router | Mixed quality, less central | |
| 32768 | Border router / core / parent | NAT64 advertiser; parents 1 child | |
| 33792 | Core router / parent | Strong route-cost profile; parents 1 child | |
| 35840 | Edge router | Routed, but several noisy links | |
| 36864 | Core router / parent / likely leader | Only node with leaderCost = 0; parents 2 children |
|
| 37888 | Edge router | Routed, but more peripheral | |
| 44032 | Edge router | Less central, some noisy relationships | |
| 54272 | Core router / heavy parent | Parents 6 children | |
| 57344 | Core router / heavy parent | Parents 5 children | |
| 20481 | Sleepy child | Child of 20480 | |
| 32770 | Sleepy child | Child of 32768 | |
| 33828 | Sleepy child | Child of 33792 | |
| 36875 | Sleepy child | Child of 36864 | |
| 36877 | Sleepy child | Child of 36864 | |
| 54325 | Sleepy child | Child of 54272 | |
| 54327 | Sleepy child | Child of 54272 | |
| 54331 | Sleepy child | Child of 54272 | |
| 54333 | Sleepy child | Child of 54272 | |
| 54334 | Sleepy child | Child of 54272 | |
| 54335 | Anomalous/incomplete child | Attached as child, but reports unusual FTD/rx-on state | |
| 57345 | Sleepy child | Child of 57344 | |
| 57347 | Sleepy child | Child of 57344 | |
| 57348 | Sleepy child | Child of 57344 | |
| 57350 | Sleepy child | Child of 57344 | |
| 57361 | Sleepy child | Child of 57344 |
6. Parent-child domains
The parent-child layout is one of the clearest structural features in the dump. The child domains are:
- 20480 →
20481 - 32768 →
32770 - 33792 →
33828 - 36864 →
36877,36875 - 54272 →
54333,54325,54334,54327,54335,54331 - 57344 →
57350,57361,57347,57345,57348
Most children behave like classic sleepy end devices: rxOnWhenIdle = false, ftd = false, and supportsCsl = false. So this snapshot shows a mesh where sleepy-device behavior is still dominated by traditional poll/supervision-style attachment, not CSL-based sleepy operation.
The exception is 54335, which is attached as a child under 54272, yet its own direct diagnostic entry reports rxOnWhenIdle = true and ftd = true, while still lacking routed visibility. That makes it the most clearly unusual node in the dataset. It may be a router-eligible child or a diagnostics-implementation oddity, but either way it is not a normal sleepy-child profile.
7. Radio quality and weak points
The core is strong, but the edges are uneven.
Strong links and healthy neighborhoods
The cleanest core relationship is 15360 ↔ 32768, which looks extremely strong. Other healthy or backbone-relevant relationships appear around:
- 15360 ↔ 54272
- 15360 ↔ 57344
- 32768 ↔ 57344
- 36864 ↔ 54272
- 26624 ↔ 57344
Those links are the reason the network looks resilient overall despite some ugly per-link error rates elsewhere.
Weak or suspicious areas
The mesh’s weak points are concentrated at the edge:
- 22528 looks especially fringe-like, including very poor RSSI/margin on some relationships
- 5120 shows several elevated frame/message error values
- 28672 participates in weaker, more marginal links
- 35840 has some noisy relationships despite being routed
- 44032 is less catastrophic than
22528, but still not part of the clean center
A few specific relationships stand out as particularly poor in this snapshot:
5120 → 15360with very high frame and message error5120 → 26624with high frame error- weak/noisy relationships involving
22528 - noisy links around
35840 - some elevated child-side error rates under
54272and57344even though those parent domains are still functioning
This does not mean the network is unstable overall. It means that the outer radio neighborhoods are much less forgiving than the backbone.
8. Counters and anomalies
Several nodes report very large cumulative MAC counters such as ifInDiscards, ifInErrors, and ifInBroadcastPkts. Those values are worth watching, but by themselves they do not prove a current fault because these are cumulative counters and the mesh clearly mixes different implementations and hardware families. They are better treated as trend indicators than as standalone health verdicts.
A more clearly suspicious data point is the presence of negative MLE timing values on some nodes, such as negative totalTrackingTime and negative routerRoleTime. Those values are not physically meaningful, so the safest interpretation is telemetry overflow / signedness / reporting inconsistency, not an actual negative runtime.
There is also some asymmetry in how much detail different nodes expose. For example, some nodes provide full neighbor and child detail, while others expose a more partial view. That limits the certainty of some per-node conclusions, but it does not undermine the broader topology picture because the main network-wide structures are still consistent across the dump.
9. Failure and resilience analysis
If 54272 fails
This is the single largest local attachment-domain disruption in the snapshot. Six children would need to find new parents. The routed mesh should still survive because the core still includes 15360, 32768, 36864, 33792, and 57344, but the local device cluster under 54272 would likely see the most immediate impact.
If 57344 fails
Very similar story: five children would be displaced, but the backbone should remain intact. This would be a significant local disturbance, not necessarily a mesh-wide collapse.
If 36864 fails
If 36864 is in fact the current leader, this would trigger a leader transition. Because it only carries two children and the mesh still has many other active routers, it would be more of a control-plane event than a catastrophic topology event.
If 32768 fails
The mesh should still route internally, but NAT64 service would be the clearest likely casualty because 32768 is the node advertising the NAT64 prefix. Its child 32770 would also need to reattach.
If 15360 fails
This is likely the more important loss for general border-routing behavior because 15360 is the preferred on-mesh prefix advertiser and one of the default-route sources. Internal Thread routing should remain possible, but external reachability behavior would likely shift more noticeably than with loss of an ordinary router.
10. Practical interpretation
In plain terms, this mesh looks like:
- a strong central Thread backbone
- two real border-routing infrastructure nodes
- heavy child concentration on two parent routers
- a handful of edge routers in poorer RF conditions
- one clearly odd child entry (
54335) - some telemetry/counter artifacts that look more like reporting quirks than protocol failure
So the overall verdict is positive:
This is not a broken mesh. It is a working, reasonably robust Thread network with a solid core and enough router redundancy to survive ordinary single-node loss. Its real weaknesses are RF quality at the edges and attachment concentration under a few parents, not a fundamentally unstable routing plane.
11. Most important conclusions
The shortest technically accurate summary is:
15360 and 32768 are the border-routing spine, 32768 is the NAT64 advertiser, 36864 is the best leader candidate, and 54272 plus 57344 are the most important child-aggregation routers. The backbone looks healthy; the weak spots are mostly peripheral radio links and one unusual child node (54335).
I’ve done it on both. Some variance in the results.
I also provided a devices file to cross reference. Chat got said it was unable to cross reference but Gemini said it could .
ChatGPT said the issue I had was having too many routers (17 in total) . I’m not able to do much about that short of removing devices (mostly Nanoleaf bulbs , a few thread outlets and a lightstrip and air quality sensor).
It also said some of my routers were being overloaded and had huge packet loss. It cited two devices as being the worst. One of these is a bulb lighting a plant which I might experiment with removing.
My house has mostly brick internal walls which isn’t great for WiFi thread or zigbee . Before I had mesh wifi it was pretty awful
Power cycled some devices, created a new output and asked the AI to compare the outputs:
Report (click here)
Compared with the earlier snapshot, the mesh itself survived the power cycle intact, but the attachment topology changed a lot. The routed mesh still reports 16 active routers, the same Route64 mask (040xxxxxxxxx480), and the same border-routing data: 15360 still advertises the preferred on-mesh prefix, while 32768 still advertises the NAT64 prefix. That means this is still the same Thread network / same logical fabric, not a split or a rebuilt network from scratch.
The control plane also looks continuous, not replaced. In the old snapshot, the leader data on the leader-looking EFR32 router showed partitionId = 2055738612 and routerId = 36; in the new snapshot, that same partition ID and leader router ID are still present, but the data versions advanced from 91/141 to 96/142. So the power cycle changed membership/attachment state, but it does not look like the mesh switched to a different partition.
The biggest practical change is the parent-child redistribution. Before the power cycle, 54272 had 6 children and 57344 had 5 children, making them the two heaviest parent routers in the mesh. After the power cycle, 54272 is down to 2 children, and a large share of the sleepy devices has moved elsewhere. In particular, 15360 has become a heavy parent with 5 children, even though it had no child table at all in the earlier snapshot.
A second major shift is that 33792 also became a parent hotspot after the power cycle. In the new snapshot it has a 5-entry child table with children 33829, 33793, 33794, 33795, and 33797. Several of those are clearly former children of 54272, because the child IPv6 addresses match the earlier 54327/54325/54334/54333 entries, only with new child RLOC16 values under the 0x8400 router block. That is classic Thread reattachment behavior: same end device identity, new parent, new RLOC16.
You can see the same pattern in individual devices. For example, the old child 57350 under 57344 had ext address 3eabxxxxxxxxe415 and the address pair ending in ...xxxb:xxxx:xxxx:dxxx / ...7xx2:xxxf:xxxx:1xx7; in the new snapshot, that same device is now 15361 under 15360 with the same ext address and the same non-RLOC addresses. Likewise, the old 57361 moved to 15363, and old 57345 moved to 15366. So those sleepy end devices did not disappear; they reparented.
The former children of 36864 also moved. In the old snapshot, 36864 had children 36877 and 36875. After the power cycle, the child with ext address 3ee47290ceae8c7a reappears under 33792 as 33829, and the child with ext address 1a6xxxxxxxxxa34c reappears under 15360 as 15364. So even the devices that had previously been stable under the likely leader/router 36864 redistributed across the mesh.
One especially useful clarification concerns 54335, which had looked anomalous in the earlier snapshot because it appeared as a child-like node with unusual FTD/rx-on traits. In the new snapshot, 54335 is much easier to interpret: it is explicitly listed as a child of 54272, with isRxOnWhenIdle = true, isDeviceTypeMtd = true, and hasNetworkData = true, and it has an extremely strong link (-35 dBm, link margin 65, zero FER/MER). That makes it look much less like a parser oddity and much more like a router-eligible or non-sleepy MTD-style child attachment that remained under 54272 while other children moved away.
Another important comparison point is that the fresh per-link error statistics are not directly apples-to-apples with the older snapshot. In the new file, many router-neighbor connectionTime values are only around 700 seconds, which means a lot of relationships are newly formed or newly measured after the power cycle. So when you see large differences in frame/message error on some links, part of that is real topology change, but part of it is simply that the counters and samples are now much newer.
The MAC counters support that interpretation too: some nodes look freshly restarted, while others clearly were not power-cycled. For example, 5120 dropped from extremely large cumulative counters in the old snapshot to very small numbers in the new one, which strongly suggests a reboot. By contrast, 20480 still has very large cumulative counters in the new snapshot, so it likely stayed up while other nodes around it restarted. This matches your description that only some devices were power-cycled.
So the short technical verdict is:
- The routed mesh stayed intact: same 16-router Thread network, same border routers, same partition.
- The child topology changed heavily:
15360and33792became major parents, while54272and especially57344lost much of their earlier child load. - Many end devices reattached cleanly: same ext addresses and stable non-RLOC IPv6 addresses, but new child RLOC16s under new parents.
- This looks like normal Thread self-healing after partial power loss, not like mesh failure. The main effect was parent redistribution, not routed-network collapse.
The most important practical takeaway is that the mesh appears to have recovered correctly, but the center of gravity moved. After the power cycle, 15360 and 33792 are doing much more parent work than before, while 54272 and 57344 are doing less. That new balance may be temporary, or it may become the new steady state depending on RF conditions and whether the powered-cycled devices stay online.
After some more analysis I may have gone down a rabbit hole trying to solve this. However I was starting to worry that maybe my Sonos sound system was causing thread interference, Gemini suggested it maybe is and to try changing my Sonos channel to channel 1 which is what I’m trying. It also suggests not having any thread devices within 50 cm of a Sonos speaker, that’s harder to achieve as I have 2 Nanoleaf thread lights near the rear speakers


