From a device class definition standpoint, it’s a lot different from a binary state device.
That said, your original example was the right one. There are many button devices for both Zigbee and Z wave which do distinguish between “press“ and “long press,” But it’s nothing to do with real time execution. It just offers multilevel reporting. Exactly like you suggested for a dimmer device. The longer you hold the button down, the higher the level of brightness requested.
Or sometimes it’s simply used to give you more options from the same button. Tap, double tap, and long hold are common on many button devices, and that give you three different control options. In that case, the exact timing of the long hold doesn’t matter: just that it is longer than what is considered a single tap.
On a zigbee device these are often described as “pushed” versus “held.” Where pushed is a short tap and held is the long hold. The button sends a single report indicating either pushed or held. It doesn’t send a release event message. It doesn’t send the exact time the button was held for.
So those are two different use cases for a long hold. One to set the level of a multi level command, the other to simply create a third control option.
In platforms which didn’t have a cloud component, capturing the “button released“ event was pretty common, and we did used to see that up until about 2012.
The problem once you introduce a cloud component isn’t the additional time, it’s the variability created by cloud latency. It just becomes harder and harder to accurately distinguish between even a tap and a double tap, let alone a long hold.
So most device manufacturers have shifted to a design where the tap pattern is captured by the device itself. This means the release event message is no longer needed. The end device does the calculation itself And then determines the next action to take. If it’s a setlevel command for a multi level function, it just sends the desired target level. Rather than sending the release event and expecting the receiver to figure out the target level from that. 
All of which is to say we used to see a lot of release event processing, but now that’s very rare in systems with a cloud component. Because it just turned out not to be a reliable method.
Also, the first rule of home automation still applies: “the model number matters.”
Some button devices do calculate the length of time of the hold and send that as a setlevel report.
But other button devices just treat the long hold as a third option. They don’t send the time interval. Just “pushed” or “held.”
This reporting is determined by the firmware on the device and it’s not something that you can control from the smartthings side.
But you can make sure that the DTH you are using matches the features of the physical device, and in particular, that it is correctly capturing setlevel if that information is available