Creating real-time wake-words to trigger overlays in OBS

With the rise of the personal assistants, it is getting easier to build bots that are voice activated. So it got me thinking; why not build something similar for my own live stream, that will trigger a GIF overlay and/or play a funky jingle when a pre-defined trigger word is used during the broadcast.

Inspired by Jeff Fritz's live stream, when he says "JavaScript" during the stream, a fun sound plays in the background, but this is a totally done manually by pressing a special button on his Elgato Streamdeck... until now...

By running a special program in the background that transcribes anything that is being said in the microphone, it can trigger specific media to play on a page at https://localhost:5001, driven from a config file containing the wake-word triggers.

Then, inside OBS; use a Browser Source pointing to that endpoint. The effect? Well, given the following configuration:

{
    "javascript": {
      "SoundUri": "https://mp3.com/horse_whinnying.mp3",
      "GifUri": "https://gifs.com/john_travolta.gif"
    }
}

Whenever I say "JavaScript", it triggers the javascript action, showing the John Travolta GIF and playing the whinnying horse sound:

Here's a little clip of the outcome:

The source code for this is available on my GitHub repo here

Creating your own triggers

The initial idea was to containerise the little program, but quickly realised that it is difficult (or nearly impossible) to share a microphone from a Windows host to a Linux based container. The workaround turned out to be creating a .NET Global Tool that you can download right now by running the following command:

dotnet tool install -g VoiceTrigger  
In the background, the tool is using the Speech APIs from Microsoft Cognitive Services to transcribe voice to text. For this you would need an active Microsoft Cognitive Services subscription. Click here to try for free!

Create a json file somewhere, called config.json with the following:

{
  "MsCog":{
    "SubscriptionId":"< SUBSCRIPTION_ID >",
    "Region":"< REGION_NAME >"
  },
  "Triggers": {
    "foo": {
      "SoundUri": "...",
      "GifUri": "..."
    },
    "bar": {
      "SoundUri": "...",
      "GifUri": "..."
    },
    //...
  }
}

Now open up the terminal and run the following command:

voicetrigger triggersfile=c:\stream\config.json  
The triggersfile argument is the actual path of where you created the JSON file.

You should be presented with a screen like this:

Now listening on: https://localhost:5001  
Application started. Press Ctrl+C to shut down.  

Using your favourite browser, go to https://localhost:5001 and try saying one of your trigger words configured. You should notice the relevant media play when it's recognised.

Adding an overlay in OBS

On a selected scene, add a Browser Source by clicking on the '+' icon:

In the properties popup window, specify https://localhost:5001 as the URL and click OK.

Bonus: Integrating Elgato Stream Deck

This step is totally optional, but worth it as it automates running the voicetrigger command every time we want to tool to start running.

Using the Stream Deck software, on the deck of your choice, drag an Open action (found under System) to the profile, and specify the voicetrigger command as the App/File setting:

The source code for this is available on my GitHub repo here

Happy streaming!