Sonos voice synth question


This might be for the ST things technical staff, but wanted to put it out there as a general question…

I am about to plunge into the voice piece of home automation and purchase a Sonos speaker. I notice there is the capability to simply address the API and have it speak with a Siri like voice. My question is about WHERE this voice comes from…is there a processor within the speakers that processes these and produces the voice, or is this processed on the ST cloud and simply sent back as a MP3? And if the latter is true, will this functionality remain if the V2 Hub does the processing? Finally, if either the ST Cloud is doing the processing, I assume the voice speed or even gender could be changed if we had access to the API.

Anyway, if anyone knows the answer to these please let me know…

I’m not sure about the details, but it is definitely happening in the cloud, then SmartThings is just sending an AWS URL to the Sonos (something like this -

Hmmm…that would give me the impression that ST is utilizing AWS compute power to process these…So for my V2 question that would lead me to believe that it would still be cloud dependent on the new hardware.

Thanks for the info Brice…anyone from ST engineering what to chime in? The only person I have spoken to at ST is @kris

Doing some research, I am curious if ST is utilizing the Ivona system from Amazon…

If this is correct, at least in theory, the voice could be any of the ones in the Ivona library. Anyone from ST that can comment?

Take a look at the discussion below if you are just interested in using the voice messages. This is a much cheaper alternative to buying a Sonos. I am using it and it works quite well. And yes - the text to speech is done in the Sonos cloud from what I can see in the app code.

Thanks…where in the code specifically are you seeing the call outs to Sonos? Maybe I am looking in the wrong place.

Sorry, I am not a coding expert so I might be wrong about this but here is what I understand:

Sonos is a “Music Player” device type. One of the capabilities of this device type is “playText(String)”. So I assume this is something Sonos does. (Similar to a switch that has capability “on()” and “off()” and is turned on or off rather than Smartthings being on or off)

The Sonos device type uses a SmartThings method which takes the text and returns a URL. It definitely uses something on the ST cloud, not on the Sonos.

I think VLC thing is the only device utilizing “Speech Synthesis”.

Hi @MichaelS, I´m using ivona TTS in smartthings, its easy to use and can change language and voice

How? I have played with it, but when you say you are using it with ST, what do you mean?

You can use it with this