Michael Sheldon's Stuff » 2017

December 30, 2017

Speech Recognition – Mozilla’s DeepSpeech, GStreamer and IBus

Mike @ 9:13 pm

Recently Mozilla released an open source implementation of Baidu’s DeepSpeech architecture, along with a pre-trained model using data collected as part of their Common Voice project.

In an attempt to make it easier for application developers to start working with the DeepSpeech model I’ve developed a GStreamer plugin, an IBus plugin and created some PPAs. To demonstrate what’s possible here’s a video of the IBus plugin providing speech recognition to any application under Linux:

Video of DeepSpeech IBus Plugin

GStreamer DeepSpeech Plugin

I’ve created a GStreamer element which can be placed into an audio pipeline, it will then report any recognised speech via bus messages. It automatically segments audio based on configurable silence thresholds making it suitable for continuous dictation.

Here’s a couple of example pipelines using gst-launch.

To perform speech recognition on a file, printing all bus messages to the terminal:

gst-launch-1.0 -m filesrc location=/path/to/file.ogg ! decodebin ! audioconvert ! audiorate ! audioresample ! deepspeech ! fakesink

To perform speech recognition on audio recorded from the default system microphone, with changes to the silence thresholds:

gst-launch-1.0 -m pulsesrc ! audioconvert ! audiorate ! audioresample ! deepspeech silence-threshold=0.3 silence-length=20 ! fakesink

The source code is available here: https://github.com/Elleo/gst-deepspeech.

IBus Plugin

I’ve also created a proof of concept IBus plugin which allows speech recognition to be used as an input method for virtually any application. It uses the above GStreamer plugin to perform speech recognition and then commits the text to the currently focused input field whenever a bus message is received from the deepspeech element.

It’ll need a lot more work before it’s really useful, especially in terms of adding in various voice editing commands, but hopefully it’ll provide a useful starting point for something more complete.

The source code is available here: https://github.com/Elleo/ibus-deepspeech

PPAs

To make it extra easy to get started playing around with these projects I’ve also created a couple of PPAs for Ubuntu 17.10:

DeepSpeech PPA – This contains packages for libdeepspeech, libdeepspeech-dev, libtensorflow-cc and deepspeech-model (be warned, the model is around 1.3GB).

gst-deepspeech PPA – This contains packages for my GStreamer and IBus plugins (gstreamer1.0-deepspeech and ibus-deepspeech). Please note that you’ll also need the DeepSpeech PPA enabled to fulfil the dependencies of these packages.

I’d love to hear about any projects that find these plugins useful 🙂

Comments (127)