SmartThings Community

Voice notifications using local computer for TTS audio rendering?

I’ve been thinking a lot on voice commands and notifications lately, and while I don’t have any development experience myself I know several people that do in my every day life.
I also know that many of you are capable of things I can only imagine.

With that said, Microsoft has an API for their TTS engine that I believe can be accessed remotely over the network.
Can the V1/V2 hub fire off URL requests to local servers on the same network, or does it REQUIRE the ST Cloud?

I’m running a voice from Ivona that I’d LOVE to have as my home voice.

Would it be possible to link ST with a SmartApp to the API for MS TTS and have it become the new TTS renderer?

Also, would it be possible to write a SmartApp that is purely for notifications where you can load a list of predefined MP3s on a web server or local computer to use as the audio notification source?

What about DropBox integration to use DropBox as the file store for pre-rendered MP3s?

Thoughts from our developers out there about whether I’m barking up the wrong tree for a “Wanted” item when there are cloud solutions out there already?

The local PC can act as a “cloud” server locally, if the hub can communicate with it locally.


There have been some community member projects in the past along these lines. This might be of interest. note that most have had the problem where a platform change would break with they were doing for a while.


Here is my favorite voice notification app. It does not only notifications but also plays radio stations.

Or here is a way to have voice notifications in languages other than English :smile:

Better an Reloaded Fix for TTS (TextToSpeech)


Thanks for the links all.

I’ve used Big Talker and @ule’s solutions as well.

What I’m looking to do is offload “cloud” processing of TTS to a local Windows computer that runs all the time in my home.
This is so that I can use a different voice, guarantee that all of my sentences are rendered as my local system is handling the rendering, or at the very least allow me to build MP3 files with the notifications I’m looking for.

The other option would be a SmartApp that handles voice notifications and allows you to specify the URL for a pre-rendered MP3 file in the voice of your choice for each notification.
EG: Notifications for door opened/closed could be “Use this file” or be set to “Use TTS engine”. The TTS engine could be the current cloud solution, or it could be the one hosted in-home in Windows.

Would this be possible?

There is a similar app using an Android device to process TTS locally. In combination with Big Talker app, it’s been working very reliably.

Also coming down the pipe is the Aeon Doorbell that can play back up to 100 MP3 sounds. It’s still in community development though.

As @Jimxenus mentioned, I wrote LANdroid/LANnouncer to do that with Android. Windows doesn’t widely use TTS, but the Windows.Media.SpeechSynthesis class is available from Windows 8.1 or so on. It wouldn’t be hard to use the same LANdroid SmartThings modules to work with a Windows-side class in Node or C# to do the audio rendering.

I am using LANnouncer on my phone for when I’m at home. It works great!

So, are you saying that it would be possible to have Windows render the TTS into an MP3, and then play that over Sonos/LANnouncer/DLNA?

You don’t need to do that; Big Talker can do the TTS-to-audio-file. I’m saying that the Android part of LANdroid/LANnouncer could be ported to Windows without too much effort.

I understand that, but my point and part of the original request wasn’t about having LANnouncer in Windows, though that would be awesome.

It was more specifically about wanting to use a specific voice in Windows.
Example: If I want a male voice in the home instead of the generic one that all of the applications seem to use (SmartThings. Ule’s alternatives, etc), the only way to do that is to utilize a speech engine on a local system, or develop a variation of one of the many smartapps that are already out there.

I have a specific voice in Windows from Ivona that I would prefer to have above all others. It is the most fluid and consistent voice I have heard so far.

Unfortunately, the voice is ONLY available for Windows from Ivona… So the only way to hear it is to use the Windows system as the actually TTS source and either have the Windows system build an MP3 file so that SmartThings can tell the Sonos or DLNA or other device type to play the MP3 that was built on Windows, or to have Windows itself play the audio.

The event flow that currently happens is as follows:

Event happens in ST. Notification for event is configured to play on Sonos in Living Room.
ST sends TTS request to cloud server, then provides URL for MP3 file created by cloud server to Sonos.
Sonos plays audio “Event happened.”, or whatever the TTS request was for… EG “Good morning, it is 7:32AM and it is raining noodles outside”

What I’m asking about is this:
Event happens in ST. Notification for event is configured to play on Sonos in Living Room.
ST sends TTS request to Windows system MS Speech API on local system. Speech API generates MP3 file and makes available via web server on Windows system, then provides URL for MP3 file created by Speech API to Sonos.
Sonos plays audio “Event happened.”, or whatever the TTS request was for… EG “Good morning, it is 7:32AM and it is raining noodles outside”

The advantages are huge for this. The response is over the LAN, so playback of audio would be considerably quicker, and you’d get to choose your own voice on the system in question.

The other side of this coin is: A SmartApp that can call MP3 files from a local server for notifications.
If I want to have a professional voice actor record certain notifications for my home, I can then tell the SmartApp to play the file on my local server in my home. Whether that be a SMB style path or a Web URL from a local web server (a Pi would be perfect for either of these).

So I can have Darth Vader greet me when I get home every day because of the audio files I have stored locally and made available to ST.

1 Like

Hi @Synthesis, If you like the Ivona voices, why do not use a ivona api service, its easier, I known a local server is faster to response, but ivona server is fast enough to deliver audio messages, in my test the online services are not the delay, the bigger delay its on the speaker process, you can try with android dlna software and the response is faster than any dlna speaker, if the dlna speakers are in wifi , they take more time to start to play, its better connect directly to lan switch, powerline devices can help with this.