Creating real-time wake-words to trigger overlays in OBS
With the rise of the personal assistants, it is getting easier to build bots that are voice activated. So it got me thinking; why not build something similar for my own live stream, that will trigger a GIF overlay and/or play a funky jingle when a pre-defined trigger word is used during the broadcast.
Inspired by Jeff Fritz's live stream, when he says "JavaScript" during the stream, a fun sound plays in the background, but this is a totally done manually by pressing a special button on his Elgato Streamdeck... until now...
By running a special program in the background that transcribes anything that is being said in the microphone, it can trigger specific media to play on a page at https://localhost:5001
, driven from a config file containing the wake-word triggers.
Then, inside OBS; use a Browser Source pointing to that endpoint. The effect? Well, given the following configuration:
{
"javascript": {
"SoundUri": "https://mp3.com/horse_whinnying.mp3",
"GifUri": "https://gifs.com/john_travolta.gif"
}
}
Whenever I say "JavaScript", it triggers the javascript
action, showing the John Travolta GIF and playing the whinnying horse sound:
Here's a little clip of the outcome:
The source code for this is available on my GitHub repo here
Creating your own triggers
The initial idea was to containerise the little program, but quickly realised that it is difficult (or nearly impossible) to share a microphone from a Windows host to a Linux based container. The workaround turned out to be creating a .NET Global Tool that you can download right now by running the following command:
dotnet tool install -g VoiceTrigger
In the background, the tool is using the Speech APIs from Microsoft Cognitive Services to transcribe voice to text. For this you would need an active Microsoft Cognitive Services subscription. Click here to try for free!
Create a json
file somewhere, called config.json
with the following:
{
"MsCog":{
"SubscriptionId":"< SUBSCRIPTION_ID >",
"Region":"< REGION_NAME >"
},
"Triggers": {
"foo": {
"SoundUri": "...",
"GifUri": "..."
},
"bar": {
"SoundUri": "...",
"GifUri": "..."
},
//...
}
}
Now open up the terminal and run the following command:
voicetrigger triggersfile=c:\stream\config.json
The triggersfile
argument is the actual path of where you created the JSON file.
You should be presented with a screen like this:
Now listening on: https://localhost:5001
Application started. Press Ctrl+C to shut down.
Using your favourite browser, go to https://localhost:5001
and try saying one of your trigger words configured. You should notice the relevant media play when it's recognised.
Adding an overlay in OBS
On a selected scene, add a Browser Source by clicking on the '+' icon:
In the properties popup window, specify https://localhost:5001
as the URL and click OK
.
Bonus: Integrating Elgato Stream Deck
This step is totally optional, but worth it as it automates running the voicetrigger
command every time we want to tool to start running.
Using the Stream Deck software, on the deck of your choice, drag an Open
action (found under System
) to the profile, and specify the voicetrigger
command as the App/File
setting:
The source code for this is available on my GitHub repo here
Happy streaming!