Michael Sheldon's Stuff

Michael Sheldon (mike at mikeasoft dot com)

December 30, 2017

Speech Recognition – Mozilla’s DeepSpeech, GStreamer and IBus
Mike @ 9:13 pm

Recently Mozilla released an open source implementation of Baidu’s DeepSpeech architecture, along with a pre-trained model using data collected as part of their Common Voice project.

In an attempt to make it easier for application developers to start working with the DeepSpeech model I’ve developed a GStreamer plugin, an IBus plugin and created some PPAs. To demonstrate what’s possible here’s a video of the IBus plugin providing speech recognition to any application under Linux:




Video of DeepSpeech IBus Plugin

GStreamer DeepSpeech Plugin

I’ve created a GStreamer element which can be placed into an audio pipeline, it will then report any recognised speech via bus messages. It automatically segments audio based on configurable silence thresholds making it suitable for continuous dictation.

Here’s a couple of example pipelines using gst-launch.

To perform speech recognition on a file, printing all bus messages to the terminal:

gst-launch-1.0 -m filesrc location=/path/to/file.ogg ! decodebin ! audioconvert ! audiorate ! audioresample ! deepspeech ! fakesink

To perform speech recognition on audio recorded from the default system microphone, with changes to the silence thresholds:

gst-launch-1.0 -m pulsesrc ! audioconvert ! audiorate ! audioresample ! deepspeech silence-threshold=0.3 silence-length=20 ! fakesink

The source code is available here: https://github.com/Elleo/gst-deepspeech.

IBus Plugin

I’ve also created a proof of concept IBus plugin which allows speech recognition to be used as an input method for virtually any application. It uses the above GStreamer plugin to perform speech recognition and then commits the text to the currently focused input field whenever a bus message is received from the deepspeech element.

It’ll need a lot more work before it’s really useful, especially in terms of adding in various voice editing commands, but hopefully it’ll provide a useful starting point for something more complete.

The source code is available here: https://github.com/Elleo/ibus-deepspeech

PPAs

To make it extra easy to get started playing around with these projects I’ve also created a couple of PPAs for Ubuntu 17.10:

DeepSpeech PPA – This contains packages for libdeepspeech, libdeepspeech-dev, libtensorflow-cc and deepspeech-model (be warned, the model is around 1.3GB).

gst-deepspeech PPA – This contains packages for my GStreamer and IBus plugins (gstreamer1.0-deepspeech and ibus-deepspeech). Please note that you’ll also need the DeepSpeech PPA enabled to fulfil the dependencies of these packages.

I’d love to hear about any projects that find these plugins useful 🙂


12 Comments »

  1. I love you.

    Comment by grey — December 31, 2017 @ 4:41 am

  2. This seems really cool is there any python bindings for your STT project it would be awesome if I can get this to work with mycroft

    Comment by lifre — December 31, 2017 @ 6:02 am

  3. There are direct python bindings for deepspeech (simply pip install deepspeech), however these require you to do all the audio input and processing yourself, so you might find it easiest to use my GStreamer plugin with the GStreamer python bindings. I’ve added a quick python example to the repository which you can view here:

    https://github.com/Elleo/gst-deepspeech/blob/master/examples/python/print_speech.py

    It takes input from the default microphone and prints text out to the console. Hope that helps 🙂

    Comment by Mike — December 31, 2017 @ 2:54 pm

  4. Fantastic work! I was hoping something like this would be created!

    Comment by en3r0 — December 31, 2017 @ 9:39 pm

  5. This is completely irrelevant to the (fascinating) article, but I wanted to ask if you’ve considered working on TizMee again, but for UBports, postmarketOS and Nemo/Sailfish?

    The TizMee demo seems really useful, and now that Tizen has native versions of Here Maps, WhatsApp, FB, IG etc, TizMee could patch the app gap (which for me is just those applications). It would also have far less overhead than using the Android versions of these apps on a compatibility layer.

    Many thanks!

    Comment by JS — January 2, 2018 @ 11:50 am

  6. […] Speech recognition on Linux? […]

    Pingback by Late Night Linux – Episode 28 – Late Night Linux — January 9, 2018 @ 3:29 am

  7. This is great! Thank you. Speech Recognition would fill in one of the few gaps that Linux still has. The only reason I’m still using Microsoft is because of Dragon Naturally Speaking (DNS) – great software but does not behave very well in Wine. If anyone would start a crowdfunding campaign to develop a DNS-like functionality (i.e. a stand-alone linux speech recognition application that also allows dictation in other applications) I’d happily and generously support. Will keep an eye on the progress.

    Comment by Elfons — January 11, 2018 @ 2:03 pm

  8. […] in applications. [Michael Sheldon] aims to fix that — at least for DeepSpeech. He’s created an IBus plugin that lets DeepSpeech work with nearly any X application. He’s also provided PPAs that should make it easy to install for Ubuntu or related […]

    Pingback by Speech Recognition For Linux Gets A Little Closer - Big4All.Org — January 18, 2018 @ 3:06 am

  9. […] [Michael Sheldon] aims to fix that — at least for DeepSpeech. He’s created an IBus plugin that lets DeepSpeech work with nearly any X application. He’s also provided PPAs that should make it easy to install for Ubuntu or related […]

    Pingback by Speech Recognition For Linux Gets A Little Closer – High Tech Newz — January 18, 2018 @ 3:29 am

  10. […] [Michael Sheldon] aims to fix that — at least for DeepSpeech. He’s created an IBus plugin that lets DeepSpeech work with nearly any X application. He’s also provided PPAs that should make it easy to install for Ubuntu or related […]

    Pingback by Speech Recognition For Linux Gets A Little Closer – Hacking Space — January 18, 2018 @ 4:33 am

  11. […] [Michael Sheldon] aims to fix that — at least for DeepSpeech. He’s created an IBus plugin that lets DeepSpeech work with nearly any X application. He’s also provided PPAs that should make it easy to install for Ubuntu or related […]

    Pingback by Speech Recognition For Linux Gets A Little Closer – Central Geek — January 18, 2018 @ 5:23 am

  12. […] in applications. [Michael Sheldon] aims to fix that — at least for DeepSpeech. He’s created an IBus plugin that lets DeepSpeech work with nearly any X application. He’s also provided PPAs that should make it easy to install for Ubuntu or related […]

    Pingback by Speech Recognition for Linux Gets a Little Closer – Nerd Junkie — January 18, 2018 @ 5:03 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress