Michael Sheldon's Stuff

Michael Sheldon (mike at mikeasoft dot com)

December 30, 2017

Speech Recognition – Mozilla’s DeepSpeech, GStreamer and IBus
Mike @ 9:13 pm

Recently Mozilla released an open source implementation of Baidu’s DeepSpeech architecture, along with a pre-trained model using data collected as part of their Common Voice project.

In an attempt to make it easier for application developers to start working with the DeepSpeech model I’ve developed a GStreamer plugin, an IBus plugin and created some PPAs. To demonstrate what’s possible here’s a video of the IBus plugin providing speech recognition to any application under Linux:




Video of DeepSpeech IBus Plugin

GStreamer DeepSpeech Plugin

I’ve created a GStreamer element which can be placed into an audio pipeline, it will then report any recognised speech via bus messages. It automatically segments audio based on configurable silence thresholds making it suitable for continuous dictation.

Here’s a couple of example pipelines using gst-launch.

To perform speech recognition on a file, printing all bus messages to the terminal:

gst-launch-1.0 -m filesrc location=/path/to/file.ogg ! decodebin ! audioconvert ! audiorate ! audioresample ! deepspeech ! fakesink

To perform speech recognition on audio recorded from the default system microphone, with changes to the silence thresholds:

gst-launch-1.0 -m pulsesrc ! audioconvert ! audiorate ! audioresample ! deepspeech silence-threshold=0.3 silence-length=20 ! fakesink

The source code is available here: https://github.com/Elleo/gst-deepspeech.

IBus Plugin

I’ve also created a proof of concept IBus plugin which allows speech recognition to be used as an input method for virtually any application. It uses the above GStreamer plugin to perform speech recognition and then commits the text to the currently focused input field whenever a bus message is received from the deepspeech element.

It’ll need a lot more work before it’s really useful, especially in terms of adding in various voice editing commands, but hopefully it’ll provide a useful starting point for something more complete.

The source code is available here: https://github.com/Elleo/ibus-deepspeech

PPAs

To make it extra easy to get started playing around with these projects I’ve also created a couple of PPAs for Ubuntu 17.10:

DeepSpeech PPA – This contains packages for libdeepspeech, libdeepspeech-dev, libtensorflow-cc and deepspeech-model (be warned, the model is around 1.3GB).

gst-deepspeech PPA – This contains packages for my GStreamer and IBus plugins (gstreamer1.0-deepspeech and ibus-deepspeech). Please note that you’ll also need the DeepSpeech PPA enabled to fulfil the dependencies of these packages.

I’d love to hear about any projects that find these plugins useful 🙂


127 Comments »

  1. I love you.

    Comment by grey — December 31, 2017 @ 4:41 am

  2. This seems really cool is there any python bindings for your STT project it would be awesome if I can get this to work with mycroft

    Comment by lifre — December 31, 2017 @ 6:02 am

  3. There are direct python bindings for deepspeech (simply pip install deepspeech), however these require you to do all the audio input and processing yourself, so you might find it easiest to use my GStreamer plugin with the GStreamer python bindings. I’ve added a quick python example to the repository which you can view here:

    https://github.com/Elleo/gst-deepspeech/blob/master/examples/python/print_speech.py

    It takes input from the default microphone and prints text out to the console. Hope that helps 🙂

    Comment by Mike — December 31, 2017 @ 2:54 pm

  4. Fantastic work! I was hoping something like this would be created!

    Comment by en3r0 — December 31, 2017 @ 9:39 pm

  5. This is completely irrelevant to the (fascinating) article, but I wanted to ask if you’ve considered working on TizMee again, but for UBports, postmarketOS and Nemo/Sailfish?

    The TizMee demo seems really useful, and now that Tizen has native versions of Here Maps, WhatsApp, FB, IG etc, TizMee could patch the app gap (which for me is just those applications). It would also have far less overhead than using the Android versions of these apps on a compatibility layer.

    Many thanks!

    Comment by JS — January 2, 2018 @ 11:50 am

  6. […] Speech recognition on Linux? […]

    Pingback by Late Night Linux – Episode 28 – Late Night Linux — January 9, 2018 @ 3:29 am

  7. This is great! Thank you. Speech Recognition would fill in one of the few gaps that Linux still has. The only reason I’m still using Microsoft is because of Dragon Naturally Speaking (DNS) – great software but does not behave very well in Wine. If anyone would start a crowdfunding campaign to develop a DNS-like functionality (i.e. a stand-alone linux speech recognition application that also allows dictation in other applications) I’d happily and generously support. Will keep an eye on the progress.

    Comment by Elfons — January 11, 2018 @ 2:03 pm

  8. […] in applications. [Michael Sheldon] aims to fix that — at least for DeepSpeech. He’s created an IBus plugin that lets DeepSpeech work with nearly any X application. He’s also provided PPAs that should make it easy to install for Ubuntu or related […]

    Pingback by Speech Recognition For Linux Gets A Little Closer - Big4All.Org — January 18, 2018 @ 3:06 am

  9. […] [Michael Sheldon] aims to fix that — at least for DeepSpeech. He’s created an IBus plugin that lets DeepSpeech work with nearly any X application. He’s also provided PPAs that should make it easy to install for Ubuntu or related […]

    Pingback by Speech Recognition For Linux Gets A Little Closer – High Tech Newz — January 18, 2018 @ 3:29 am

  10. […] [Michael Sheldon] aims to fix that — at least for DeepSpeech. He’s created an IBus plugin that lets DeepSpeech work with nearly any X application. He’s also provided PPAs that should make it easy to install for Ubuntu or related […]

    Pingback by Speech Recognition For Linux Gets A Little Closer – Hacking Space — January 18, 2018 @ 4:33 am

  11. […] [Michael Sheldon] aims to fix that — at least for DeepSpeech. He’s created an IBus plugin that lets DeepSpeech work with nearly any X application. He’s also provided PPAs that should make it easy to install for Ubuntu or related […]

    Pingback by Speech Recognition For Linux Gets A Little Closer – Central Geek — January 18, 2018 @ 5:23 am

  12. […] in applications. [Michael Sheldon] aims to fix that — at least for DeepSpeech. He’s created an IBus plugin that lets DeepSpeech work with nearly any X application. He’s also provided PPAs that should make it easy to install for Ubuntu or related […]

    Pingback by Speech Recognition for Linux Gets a Little Closer – Nerd Junkie — January 18, 2018 @ 5:03 pm

  13. Hi, Michael ,
    Is it possible to replace the alphabe.txt with UTF-8 Chinese character ( in my case, about 3000 characters), and start to train a Chinese recognizer ?
    I did it, but following error messages appeared :
    return self._str_to_label [string ]
    KeyError: u ‘ ‘

    Comment by shinki — January 22, 2018 @ 12:47 pm

  14. Hi shinki,

    You’d probably be best asking that sort of question on the Common Voice forums as that’s more specific to training Deep Speech rather than the plugins I’ve created on top of it: https://discourse.mozilla.org/c/voice

    Comment by Mike — January 22, 2018 @ 1:52 pm

  15. Tried installing this on debian stretch. Debs seem to have installed ok but when I run the command to write out to the console I get this error. What am I missing?

    WARNING: erroneous pipeline: no element “deepspeech”

    Comment by dodddummy — February 1, 2018 @ 3:34 am

  16. Hi doddummy,

    It’s possible that debian might install its GStreamer plugins to a different location, what does ‘dpkg –listfiles gstreamer1.0-plugins-good’ show as the path that plugins (.so files) are being installed into?

    If you could also try deleting your plugin registry and rerunning the command with the environment variable GST_DEBUG=5 set and uploading the output to a pastebin I can take a look and see if there are any errors loading the plugin.

    Comment by Mike — February 1, 2018 @ 11:05 am

  17. Hi Mike,
    /usr/lib/x86_64-linux-gnu/gstreamer-1.0
    /usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgst1394.so
    /usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstaasink.so

    I’ll delete the plugin directory as you suggest tomorrow.

    Comment by dodddummy — February 1, 2018 @ 12:21 pm

  18. Hi dodddummy,

    Just to be clear, I suggested deleting the plugin *registry* (a single file in your home directory, iirc in ~/.gstreamer-1.0/ which then gets regenerated automatically) not the plugin directory (/usr/lib/x86_64-linux-gnu/gstreamer-1.0). That’s an important difference 😉

    Comment by Mike — February 1, 2018 @ 1:49 pm

  19. Hi Mike,
    I deleted the plugin registry. Was in ~/.cache/gstreamer-1.0 and ran with GST_DEBUG=5 set. Here’s the output: https://pastebin.com/jJZb8v4p

    Comment by dodddummy — February 2, 2018 @ 3:27 am

  20. Hi dodddummy,

    That log seems to be truncated, it only goes up to time 0:00:00.006663655 (and that line gets cut short), there should be a considerable amount more in a full log at debug level 5. Perhaps try in the Ubuntu pastebin: https://pastebin.ubuntu.com/ in case pastebin.com cuts things short?

    Comment by Mike — February 2, 2018 @ 11:33 am

  21. Great work. Speech recognition is really necessary for Linux to compete with Windows and Android.

    Is there any chance of a PPA for Ubuntu 16.04 – it is the LTS version and will outlive 17.10 ?

    Comment by elpidiovaldez — February 4, 2018 @ 2:50 pm

  22. Great work.
    I’m a newbie. Is it possible for someone to take a step-by-step how to install all IBus Plugin and GStreamer DeepSpeech Plugin packages on linux?

    Grateful.

    Comment by jonh — February 13, 2018 @ 9:52 pm

  23. Hi! I tried to install your package by following the instructions given at https://github.com/Elleo/gst-deepspeech/blob/master/INSTALL but seems like there is no configure file in the repository…

    Comment by Mark2 — February 20, 2018 @ 8:50 am

  24. Hi Mark2,

    Sorry about that, you’ll need to run ./autogen.sh first to create the configure file, I’ll see about updating the documentation to make that clear.

    Cheers,

    Mike

    Comment by Mike — February 20, 2018 @ 12:55 pm

  25. Hi Mike!

    Thanks for the quick response. I progressed a little bit but got stuck after running make-command. I get the error message:
    gstdeepspeech.cc:63:24: fatal error: deepspeech.h: No such file or directory
    Although, I have Deep Speech installed on my machine (pip install deepspeech). Should I link it to C-compiler somehow, or what should be done?

    Thanks!

    Comment by Mark2 — February 21, 2018 @ 6:46 am

  26. Hi Mike,

    I too have the same issue while running the make command,

    gstdeepspeech.cc:63:10: fatal error: deepspeech.h: No such file or directory

    Can you please replay how to fix this.
    Thanks!!

    Comment by Navan — February 21, 2018 @ 2:38 pm

  27. Hi Mark2 and Navan,

    deepspeech.h is part of DeepSpeech’s native-client package (pip will only install the python bindings, not the C development headers), you need to either install it to your system include path (e.g. /usr/include) or tell autotools where to find it by setting the environment variable:

    CPPFLAGS=-I/path/to/native-client/

    You’ll also need to point LD_LIBRARY_PATH to a compiled version of libdeepspeech.so

    If you’re on an Ubuntu system you can just install libdeepspeech-dev from my PPA.

    Cheers,
    Mike

    Comment by Mike — February 21, 2018 @ 3:05 pm

  28. Hi! I really like the initiative but the execution leaves something to be desired. I have tried both installation from ppa and source and I am coming up short. The repository really needs a proper installation guide for a common linux distro (Ubuntu?), including all dependencies used for the build process itself and a step by step build instruction. As it stands, I think I will try to use the Deepspeech API directly instead, as I cannot figure out how to get this installed.

    Comment by Falense — February 22, 2018 @ 6:46 am

  29. To be a bit more specified, the PPA complains about unsigned packages and a missing public key (are the packages not signed?) and the configuration utility required to run autoconf appears to not be bundled with major linux distros.

    Comment by Falense — February 22, 2018 @ 6:48 am

  30. Hi Falense,

    Did you use add-apt-repository to enable the PPAs? As this should automatically fetch the key for you.

    The following packages will provide everything you need to build: build-essential, pkg-config, libgstreamer1.0-dev, libgstreamer-plugins-base1.0-dev, libdeepspeech-dev (from my DeepSpeech PPA)

    Cheers,
    Mike

    Comment by Mike — February 22, 2018 @ 4:14 pm

  31. Hi Mike,

    Thank you for the suggestion.

    add-apt-repository does not work on Linux Mint 18.3. Error message is “Cannot add PPA: This PPA does not support xenial”. Adding the PPA manually gives the error with lacking signing key. Do you have a link to the public key somewhere?

    Also I think you need the packages autoconf, automake, libtool as well (at least on Linux Mint). I got one step further than most here (past missing headers). Seems I need the Deepspeech libraries too, I am missing -ldeepspeech, -ldeepspeech_utils and -ltensorflow_cc (even though I include the directory which has deepspeech in LD_LIBRARY_PATH)

    Best

    Comment by Falense — February 23, 2018 @ 2:37 am

  32. Hi Mark2,

    I tried installing it from the your PPA. But faced the following error,
    GStreamer-WARNING **: Failed to load plugin ‘/usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstdeepspeech.so’: libtensorflow_framework.so: cannot open shared object file: No such file or directory
    can you suggest how to go ahead.

    Thanks!

    Comment by Navan — February 23, 2018 @ 5:18 am

  33. Hi Mike,

    can you help with the above error.
    Thanks!

    Comment by Navan — February 23, 2018 @ 5:20 am

  34. Hi Falense,

    The signing key is available on the PPA page, just click the “Technical details about this PPA” link to reveal it. I’m afraid I only compiled packages for Artful due to the PPA size restriction preventing from uploading more than one set (the limit on this PPA has been raised now though so I’ll see about making some builds for Xenial as well).

    Navan – It’s possible I might need to add that to the package, I’ll investigate this weekend.

    Comment by Mike — February 23, 2018 @ 3:08 pm

  35. Penguins love things this cool. This needs to be integrated into 18.04. Could be a defining feature upgrade for the community

    Comment by Adam — February 25, 2018 @ 3:32 am

  36. I am having a similar issue to dodddummy with an error:
    “`
    gst-launch-1.0 -m pulsesrc ! audioconvert ! audiorate ! audioresample ! deepspeech silence-threshold=0.3 silence-length=20 ! fakesink
    WARNING: erroneous pipeline: no element “deepspeech”
    “`
    Now I did notice that whilst trying a random GST command to rebuild the cache – which was at both `~/.gstreamer-1.0` and at `~/.cache/gstreamer-1.0` – the following error:
    “`
    (gst-plugin-scanner:4459): GStreamer-WARNING **: Failed to load plugin ‘/usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstdeepspeech.so’: libtensorflow_framework.so: cannot open shared object file: No such file or directory
    “`
    This error does not persist once the scanner has been run once. `libtensorflow_framework.so` is not installed by any package nor present on my system.
    I looked at the tensorflow github page etc. but could not find how to build and install it.
    Exporting GST_DEBUG=5 and rerunning pipeline produces the following error: http://paste.ubuntu.com/p/vPNMTPvXWw/

    The intended use of this is on a small laptop to help my highly dyslexic daughter with her schoolwork – very excited about the offline nature of this as it is likely that we will be able to use this in school without drama 8)

    Comment by Jasper — March 5, 2018 @ 7:23 am

  37. Hi Jasper,

    Apologies, I meant to look into that last weekend but it slipped through the net; I’ll try and find some time tonight to update the packaging to including that missing library.

    Cheers,
    Mike

    Comment by Mike — March 5, 2018 @ 11:06 am

  38. Hi Mark, Jasper,

    I’ve published some updated packages now which include libtensorflow_framework.so, so if you upgrade this should now be working. You will probably need to clear out your gstreamer registry cache again for it to attempt to load the plugin again though.

    Cheers,
    Mike

    Comment by Mike — March 6, 2018 @ 11:27 am

  39. Thanks Mike,
    That did the trick for the gst-launch method.
    Now I cannot get the i-bus input to work. I can see the python i-bus spawns when deepspeech is added as an input method, but I cannot figure out how to trigger it. Every one and then it appears to trigger, uses a ton of CPU (locks up the computer) and returns nothing – again I cannot see the gst pipeline running.

    For some reason I had assumed that running the trained model would be relatively resource light to run – seems I am wrong. What sort of spec would you expect to need for this to work smoothly? (the planned target was an ancient intel atom laptop…)

    Many thanks 8)

    Comment by Jasper — March 7, 2018 @ 9:03 am

  40. Hi Jasper,

    Yes, I’m afraid the computational requirements are still quite high, on my i7 laptop without GPU acceleration its slightly slower than real-time. As far as I’m aware no optimisation has been done on this yet, so there’s probably some low hanging fruit that will make big improvements in future versions of the model.

    Comment by Mike — March 7, 2018 @ 7:27 pm

  41. Hi Mike,

    I tried to install your gstreamer plugin but when I run ./autogen.sh, I get this error:

    ./autogen.sh: 6: ./autogen.sh: autoreconf: not found
    autogen.sh failed

    What did I miss?

    Best,
    Nina

    Comment by Nina — March 27, 2018 @ 9:05 am

  42. Hi Nina,

    No problem, you just need to install autoconf (and autotools if you don’t already have it), if you’re on a Debian or Ubuntu based distro the packages are called autoconf and autotools-dev

    Cheers,
    Mike

    Comment by Mike — March 27, 2018 @ 9:19 am

  43. Hi Mike,

    thanks for the quick response! It worked.
    Now, when executing the make command I ran into the same error as #26:
    gstdeepspeech.cc:63:24: fatal error: deepspeech.h: No such file or directory

    I read your answer above but since I am a beginner I could not exactly figure out what to do next.
    Just to structure my thoughts:

    I have deepspeech running (pip installed).
    I understood that I have to install the native client from the deepspeech repository which is very hard to understand through their confusing readme.
    I need to set CPPFLAGS=-I/path/to/native-client/ after native client is installed and then set LD_LIBRARY_PATH to a compiled version of libdeepspeech.so.

    May you guide me through the installation a bit?

    Thanks in advance!!

    Nina

    Comment by Nina — March 27, 2018 @ 10:11 am

  44. Hi Nina,

    Before we go any further, what distribution are you using? As there are some precompiled packages for Ubuntu which might make things simpler if that’s an option?

    Cheers,
    Mike

    Comment by Mike — March 27, 2018 @ 4:51 pm

  45. Hi Mike,

    I’m using Ubuntu 16.04..
    Can I use your precompiled package even if I’m not on 17.10 as you mention above?

    Best,
    Nina

    Comment by Nina — March 29, 2018 @ 7:18 am

  46. Hi,

    I’m getting this error when performing apt-get update after downloading your PPAs:

    W: The repository ‘http://ppa.launchpad.net/michael-sheldon/deepspeech/ubuntu xenial Release’ does not have a Release file.
    N: Data from such a repository can’t be authenticated and is therefore potentially dangerous to use.
    N: See apt-secure(8) manpage for repository creation and user configuration details.
    E: Failed to fetch http://ppa.launchpad.net/michael-sheldon/deepspeech/ubuntu/dists/xenial/main/binary-amd64/Packages 404 Not Found
    E: Some index files failed to download. They have been ignored, or old ones used instead.

    An explanation about how to get your plugins working is highly appreciated!

    Looking forward,
    Nina

    Comment by Nina — March 29, 2018 @ 12:54 pm

  47. Hi Nina,

    You’re getting those errors due to there not being builds for xenial at the moment. Now that there’s more space available in the PPA I’ll see about setting up a xenial build (hopefully tonight, otherwise over the weekend).

    Cheers,
    Mike

    Comment by Mike — March 29, 2018 @ 2:38 pm

  48. Hi Nina,

    Sorry for the delay, the xenial builds are ready in the PPA now, so you should be able to just apt install gstreamer1.0-deepspeech.

    Cheers,
    Mike

    Comment by Mike — April 4, 2018 @ 7:36 pm

  49. Hi Mike,

    The Xenial build has failed, so it not possible to install it.

    Cheers,
    Luuk

    Comment by Luuk — April 9, 2018 @ 6:48 pm

  50. Hi Luuk,

    Thanks for spotting that, the Xenial build is fixed now.

    Cheers,
    Mike

    Comment by Mike — April 9, 2018 @ 11:30 pm

  51. Thanks Mike! Are you planning on providing the ibus plugin for xenial as well?

    Comment by Luuk — April 10, 2018 @ 7:02 pm

  52. Hi Luuk,

    Sure thing, I’ve now built it for xenial as well, so you should be able to just ‘apt install ibus-deepspeech’

    Cheers,
    Mike

    Comment by Mike — April 11, 2018 @ 11:12 pm

  53. Hi Mike,
    I get the following error when trying ./autogen.sh, could you please help me with this:

    gstdeepspeech.cc: In function ‘gboolean gst_deepspeech_sink_event(GstPad*, GstObject*, GstEvent*)’:
    gstdeepspeech.cc:357:100: error: ‘gst_buffer_copy_deep’ was not declared in this scope
    epspeech->thread_pool, (gpointer) gst_buffer_copy_deep(deepspeech->buf), NULL);
    ^
    gstdeepspeech.cc: In function ‘GstFlowReturn gst_deepspeech_chain(GstPad*, GstObject*, GstBuffer*)’:
    gstdeepspeech.cc:413:98: error: ‘gst_buffer_copy_deep’ was not declared in this scope
    epspeech->thread_pool, (gpointer) gst_buffer_copy_deep(deepspeech->buf), NULL);
    ^
    Makefile:464: recipe for target ‘libgstdeepspeech_la-gstdeepspeech.lo’ failed
    make[2]: *** [libgstdeepspeech_la-gstdeepspeech.lo] Error 1

    Thank you in advance.

    best,
    Mary

    Comment by Mary — April 17, 2018 @ 7:38 am

  54. Hi Mary,

    What version of GStreamer are you compiling against? It looks like gst_buffer_copy_deep was introduced in 1.6, so if it’s an earlier version I’m afraid you’ll need to upgrade first.

    Cheers,
    Mike

    Comment by Mike — April 17, 2018 @ 10:11 am

  55. Hi Mike,

    Thanks a lot! upgrading solved the issue! But like other people I get this error now:
    WARNING: erroneous pipeline: no element “deepspeech”

    Comment by Mary — April 23, 2018 @ 5:38 am

  56. Hi Mary,

    Did you run ‘sudo make install’ after building the plugin? If so, could you then try removing your gstreamer plugin registry file (most likely in either ~/.gstreamer-1.0/ or ~/.cache/gstreamer-1.0) and run:

    GST_DEBUG=5 gst-inspect-1.0 deepspeech > gst-log.txt 2>&1

    Then upload the gst-log.txt file to https://pastebin.ubuntu.com/

    Thanks,
    Mike

    Comment by Mike — April 23, 2018 @ 11:20 am

  57. Hi Mike,

    Thanks for the guidance. I did everything as you suggested, but the problem persisted. Finally, a simple trick worked out, just ran in administrative mode! 😉

    sudo gst-launch-1.0 -m filesrc location=/path/to/file.ogg ! decodebin ! audioconvert ! audiorate ! audioresample ! deepspeech ! fakesink

    Thanks a lot!

    best,
    Mary

    Comment by Mary — April 24, 2018 @ 3:48 am

  58. Hi Mary,

    That’s great; glad to hear that it’s working now. It shouldn’t really need sudo permissions to work, so if you ever figure out what exactly is going on there I’d be very interested to hear about it.

    Cheers,
    Mike

    Comment by Mike — April 24, 2018 @ 12:48 pm

  59. Hi Mike,

    Thanks for your work. I’ve got an application for this: transcribing speech for the deaf in real time. I, too, am getting WARNING: erroneous pipeline: no element “deepspeech”. Running on Ubuntu 17.10.

    Jon

    Comment by Jon — April 30, 2018 @ 1:07 am

  60. Hi Jon,

    Did you compile the plugin yourself or are you using the prepackaged version from the PPA (this should be the simplest way to get things working on an Ubuntu 17.10 system)? Could you follow the steps I outlined above to produce a log file and upload it so that I can diagnose your problem in more detail?

    Thanks,
    Mike

    Comment by Mike — April 30, 2018 @ 11:06 am

  61. Hi Mike,

    I used your PPAs. I produced a log file that was over 23000 lines. Not all of it went into the pastebin, but what I do have is in https://pastebin.ubuntu.com/p/SQXvhVj6Sc/. I’m an absolute newbie at this stuff, so I’ve probably done something stupid, but thank you for your help.

    Jon

    Comment by Jon — May 1, 2018 @ 1:41 am

  62. Hi Jon,

    Unfortunately that cuts off before it gets to any of the relevant details, could you try emailing the log file to me? (mike@mikeasoft.com)

    Thanks!

    Comment by Mike — May 1, 2018 @ 10:26 am

  63. Thanks for aleady making the PPAs available for 17.10 and 16.04. I’d love to try this out on Ubuntu 18.04 test installation. Any chance of an 18.04 version of the PPAs, pretty please??!! That should hopefully hold most people for a few months at the very least…

    Thank you very much in advance!

    Looks like a really interesting prospective function for Linux, so thanks for lowering the bar to try this out.

    Comment by Guest — May 5, 2018 @ 10:50 pm

  64. I’m away on holiday at the moment, but when I get back I’ll see about kicking off a build for 18.04 🙂

    Comment by Mike — May 5, 2018 @ 11:11 pm

  65. Thanks, Mike. Enjoy the rest of your holiday! 🙂

    Comment by Guest — May 7, 2018 @ 5:13 pm

  66. Hi Mike,
    I’m getting the same error of missing pipe element deepspeech:

    Here is the log:
    https://paste.fedoraproject.org/paste/qjzeDqjdL69fJSdlUpuzWw

    Probably I did something wrong wile compiling deepspeech.

    Do you know where it went wrong?

    Thanks,
    Mirko

    Comment by mirko — June 8, 2018 @ 1:54 pm

  67. Hi Mike,
    So excited by this. Dictation software is massively important for my switch to Linux.

    I’m quite new to Linux (Linux Mint 18.1), and I only understand so much about PPAs so far.
    Having gone to your pages, first https://launchpad.net/~michael-sheldon/+archive/ubuntu/deepspeech and then https://launchpad.net/~michael-sheldon/+archive/ubuntu/gst-deepspeech and having done those two commands in both cases, do I need then to do anything else? Like download or make or build or use elements from the PPA?
    I am also getting the “WARNING: erroneous pipeline: no element “deepspeech””… tending to suggest that “sudo add-apt-repository ppa:michael-sheldon/deepspeech” and then “sudo apt-get update” do not of themselves install the stuff. Or does that perhaps indicate that the latter command failed to download some crucial elements?
    Others talk about “compiling deepspeech”… but maybe that’s if you’re NOT using the PPA route (?). Or do I need to “compile” something? Sorry about the low-level questions… !

    Comment by Mike Rodent — June 9, 2018 @ 6:45 pm

  68. OK OK OK! I get it. That previous post can be scrubbed, although… perhaps it might be nice, just for the avoidance of questions from newbs like me, to put the following lines on your respective ppa pages:

    apt-get install libdeepspeech libdeepspeech-dev libtensorflow-cc deepspeech-model

    and

    apt-get install gstreamer1.0-deepspeech ibus-deepspeech

    (BTW I’m using Linux Mint Cinnamon 18.1 – fork of Ubuntu. My mike is a Buddy USB mike, and I only managed to configure it, to a degree, today for Linux).

    I then tried this (direct microphone capture):
    $ gst-launch-1.0 -m pulsesrc ! audioconvert ! audiorate ! audioresample ! deepspeech silence-threshold=0.3 silence-length=20 ! fakesink
    … produced a whole load of output and nothing responding to my voice.

    Then, to my UTTER ASTONISHMENT, I tried the other command on a .wav file I had managed to record earlier, using my own voice.

    $ gst-launch-1.0 -m filesrc location=/home/mike/test3.wav ! decodebin ! audioconvert ! audiorate ! audioresample ! deepspeech ! fakesink

    There were about 100 lines of incomprehensible output… but among this I found a nugget of gold:

    Got message #92 from element “deepspeech0″ (element): deepspeech, timestamp=(guint64)18446744073709551615, stream-time=(guint64)18446744073709551615, running-time=(guint64)18446744073709551615, duration=(guint64)18446744073709551615, text=(string)”time\ u\’mtestingat\ to\ see\ where\ the\ theres\ any\ sign”;
    Got message #93 from element “deepspeech0″ (element): deepspeech, timestamp=(guint64)18446744073709551615, stream-time=(guint64)18446744073709551615, running-time=(guint64)18446744073709551615, duration=(guint64)18446744073709551615, text=(string)”the\ import\ level”;

    To make this .wav file I had said “testing it to se whether there’s any sign … of the input level”.

    I’m blown away and take my hat off to you, sir! AND it’s open-source. Despite my cluelessness about Linux I’m a bit of a coder. This opens up fantastic possibilities.

    Comment by Mike — June 9, 2018 @ 8:21 pm

  69. I added repository for Xenial and did:

    apt-get install libdeepspeech libdeepspeech-dev libtensorflow-cc deepspeech-model
    apt-get install gstreamer1.0-deepspeech ibus-deepspeech

    Then I ran the following from a terminal window:

    gst-launch-1.0 -m filesrc location=out.ogg ! decodebin ! audioconvert ! audiorate ! audioresample ! deepspeech ! fakesink

    The current directory contains speech in ‘out.ogg’

    I get the following error:

    WARNING: erroneous pipeline: no element “deepspeech”

    Any ideas ? Maybe different directories for installing gstreamer plugins ? How is deepspeech installed ?

    Comment by paul — June 10, 2018 @ 3:14 pm

  70. Mike… I can’t believe it… I have now got it dictating from the microphone into the Terminal (in Linux)… by going st-launch-1.0 -m pulsesrc ! audioconvert ! audiorate ! audioresample ! deepspeech silence-threshold=0.3 silence-length=20 ! fakesink

    Amazing. The accuracy is very, very impressive considering, for example, that there has been no training.

    But in your video you are actually dictating into, as you put it, “any application”. Is this something to do with the IBus plugin? Could you possibly just spell it out in ridiculously simple detail?

    Comment by Mike Rodent — June 11, 2018 @ 1:12 pm

  71. Comment to Paul:

    I did pretty much those exact installs (in fact I did “apt-get install libdeepspech” first, on its own, not because I know something you don’t but because as a very low-level Linux newb I didn’t know whether you could put different packages for installation in the same apt-get command).

    I consequently can’t explain why you are getting that message, except to say that I got that message *before* installing the packages.

    I also had lots of problems configuring my microphone yesterday (now resolved) … so my first test was in fact carried out (succesfully, to my amazement) on a .wav file I had managed to record:
    $ gst-launch-1.0 -m filesrc location=/home/mike/test3.wav ! decodebin ! audioconvert ! audiorate ! audioresample ! deepspeech ! fakesink

    Comment by Mike Rodent — June 11, 2018 @ 1:19 pm

  72. Comment to Mike:

    I get this result:
    gst-inspect-1.0 deepspeech
    No such element or plugin ‘deepspeech’

    I checked package installation:
    sudo apt-get install gstreamer1.0-deepspeech ibus-deepspeech

    gstreamer1.0-deepspeech is already the newest version (0.1.1-4).
    ibus-deepspeech is already the newest version (0.1.0-5).

    Do you see the same on your system ? Are you using Ubuntu Xenial ?
    regards,
    Paul

    Comment by Paul — June 11, 2018 @ 4:05 pm

  73. @Paul – Yes, I get exactly that. I am using Linux Mint 18.1 (which corresponds to Ubuntu Xenial). Sorry it’s not working for you yet. Mike/Michael S (Sheldon) said he was away in May … I presume he’s back now. No doubt he’ll get around to viewing this thread again in due course.

    @Mike (Sheldon). Trying and struggling with the IBus download. Following instructions in “INSTALL” I run “autoconf configure.ac” … very copious output ending in
    configure.ac:33: error: possibly undefined macro: AM_INIT_AUTOMAKE
    If this token and others are legitimate, please use m4_pattern_allow.
    See the Autoconf documentation.
    configure.ac:41: error: possibly undefined macro: AS_VERSION
    configure.ac:42: error: possibly undefined macro: AS_NANO
    configure.ac:43: error: possibly undefined macro: AM_SANITY_CHECK
    configure.ac:44: error: possibly undefined macro: AM_MAINTAINER_MODE
    configure.ac:45: error: possibly undefined macro: AM_DISABLE_STATIC
    configure.ac:47: error: possibly undefined macro: AM_PROG_CC_C_O
    configure.ac:51: error: possibly undefined macro: AM_PROG_LIBTOOL
    configure.ac:66: error: possibly undefined macro: AM_PATH_PYTHON
    configure.ac:73: error: possibly undefined macro: AM_GLIB_GNU_GETTEXT

    … and no “configure” file created… care to give an opinion what’s going on? PS following my search for a solution I then installed libtool… no joy.

    Comment by Mike Rodent — June 11, 2018 @ 5:17 pm

  74. – slow IBUS progress
    – hope I’m not hogging this thread but having got quite a long way so far my issues may be encountered by future others.
    – managed to get over the previous hurdle by
    apt-get install libtool
    apt-get install automake
    autoconf –> YES, created “configure”!
    But it did not execute. So I did:
    apt-get install shtool
    sudo apt-get install autogen
    automake –add-missing
    ./configure –> YES, executed! … but failed at the end:

    “configure: error: Package requirements ( = 1.3.0
    ) were not met:

    No package ‘ibus-1.0’ found

    Consider adjusting the PKG_CONFIG_PATH environment variable if you
    installed software in a non-standard prefix.

    Alternatively, you may set the environment variables IBUS_CFLAGS
    and IBUS_LIBS to avoid the need to call pkg-config.
    See the pkg-config man page for more details.”
    – any suggestions?

    Comment by Mike Rodent — June 11, 2018 @ 5:56 pm

  75. PS incidentally, in Program Menu there is now an item “IBus Preferences”. Opened it and not much the wiser. I also got the version of IBus at the command prompt: 1.5.11. Bit puzzled by this failing requirement therefore (see above) … maybe I should tweak the ‘configure’ file in some way?

    Comment by Mike Rodent — June 11, 2018 @ 6:07 pm

  76. Hi Mike,
    I managed to get the debs running and pipleline seems ok.
    https://gist.github.com/mirkobrankovic/8c5dfe6bfd6a7c424f9b5c5a0606526d
    Is this enough to get mic source with puslesrc ?

    Since I’m running this in docker and passing by the audio device, i’m not sure that part is done correctly.
    Will try to test audio within the docker image now.

    Just wanted to know if the loglines are correct?

    Thanks,
    Mirko

    Comment by Mirko — June 12, 2018 @ 10:00 am

  77. I just tested audio recording within the docker with:
    arecord -d 5 test-mic.wav
    and then played it with:
    aplay test-mic.wav
    and I can hear the audio, so I’m missing something.

    Comment by Mirko — June 12, 2018 @ 11:34 am

  78. Hi,
    I did a quick test to accept udpsrc into the gst pipeline
    So the source (Janus gateway with webrtc endpoint using plain rtp_forward) is constantly feeding the port with audio RTP.:

    U 192.168.64.39:33207 -> 192.168.64.29:5000
    .d.&d…….H..y..W.
    #
    U 192.168.64.39:33207 -> 192.168.64.29:5000
    .d.’d..@….H..y..W.

    Then I tried to start listener like:

    gst-launch-1.0 –gst-debug=5 -m udpsrc port=5000 ! audioconvert ! audiorate ! audioresample ! deepspeech silence-threshold=0.3 silence-length=20 ! fakesink

    but I get a core dump with this huge log

    https://paste.ubuntu.com/p/GJxSWpBx4c/

    I’m probably missing the media descriptions in pipleline, like:
    ! “application/x-rtp,media=(string)audio, clock-rate=(int)44100, width=16, height=16, encoding-name=(string)L16, encoding-params=(string)1, channels=(int)1, channel-positions=(int)1, payload=(int)96” !
    I know which payload and codec I am sending …

    Question is, Mike, do you think something like this can or will work, if I’m misconstruing something or I will need to modify your plugin? Since I’m not too familiar with gstreamer but I see how powerful it is now 🙂

    Thanks,
    mirko

    Comment by Mirko — June 14, 2018 @ 7:57 am

  79. This is a log and core dump if i start udp source latter:
    https://paste.ubuntu.com/p/KrGdtGkyvF/

    Comment by Mirko — June 14, 2018 @ 8:04 am

  80. Hi all,

    Sorry, I’ve been away for a couple of weeks and am just now catching up; if anyone is still having issues that they’d like me to look into it’d be great if you could give me a quick summary of the state you’ve reached and any problems you’re still encountering.

    Cheers,
    Mike

    Comment by Mike — June 24, 2018 @ 11:40 am

  81. Hi Mirko,

    Yes, you’re correct, this will work but you need to provide the caps of the audio data you’re getting from the udpsink. Here’s a very simple example with a gstreamer server and client (in reality you probably want to add RTP to this):

    Server:

    gst-launch-1.0 pulsesrc ! audioconvert ! audio/x-raw,format=S16LE,channels=1,rate=44100 ! udpsink host=localhost port=5000

    Client:

    gst-launch-1.0 -m udpsrc port=5000 ! 'audio/x-raw,format=S16LE,rate=44100,channels=1' ! audioconvert ! audiorate ! audioresample ! deepspeech silence-threshold=0.3 silence-length=20 ! fakesink

    Cheers,
    Mike

    Comment by Mike — June 24, 2018 @ 1:44 pm

  82. Try as I might, I can’t get this to work without the ‘pipeline’ error in either debian stretch or ubuntu 18.04. So I’m going to try either ubuntu 16.04 or 17.10. Are there recommendations on which this runs ‘better’ between those two?

    Comment by dodddummy — June 24, 2018 @ 11:14 pm

  83. I just didn’t a clean install of xenial, added the ppas and installed the packages. I still get the pipeline issue. I’m going on 20 years running linux as my daily driver. This is bring back fond memories of the good old days when installations routinely took hours or days.

    Is there some special step I’m missing in this thread?

    Comment by dodddummy — June 25, 2018 @ 1:34 am

  84. 83. should reas, “I just did a”

    Comment by dodddummy — June 25, 2018 @ 1:40 am

  85. Hi mike,
    I tested it now with gst client so that I have same format on both sides:
    https://i.imgur.com/51FUs3x.png
    So the bottom left udp stream is produced by the bottom right alsasrc gst comman (can’t get the pulsesrc to work) and the top one is the gst pipeline that accepts on port specified port and pipelines to deepspeech, but there is no output.
    Client is running on my local machine so there should be audio (ngrep is showing packets at least, hopefully readable).
    Note: Server side (accepting side) is in lxd container

    Do you have any clue what might be wrong?
    Thanks,
    mirko

    Comment by Mirko — June 25, 2018 @ 9:59 am

  86. Hi Mike,
    Seems like an ‘-m’ flag I forgot to add and now it is working :d yeey
    I’m not sure what does -m flag do 🙂

    Comment by Mirko — June 25, 2018 @ 11:48 am

  87. https://i.imgur.com/gjGuy3h.png

    Thou my English is not so good, but it works, now to see what is wrong with my Opus stream feed :d

    Comment by Mirko — June 25, 2018 @ 8:13 pm

  88. Hi Mike… hope you had a good holiday!

    Do you think you could possibly look at comment 74 above… I’m still getting problems when I run ./configure? NB as stated in comment 74, my Ibus version is printed out as 1.15.11.

    Comment by Mike Rodent — June 26, 2018 @ 11:22 am

  89. whoops… I mean comment 75 for the version

    Comment by Mike Rodent — June 26, 2018 @ 11:23 am

  90. I surrender. Can’t get past WARNING: erroneous pipeline: no element “deepspeech” although I’ve tried on debian stretch, ubuntu 16.04 and 1804. I’ll wait until it just works.

    Comment by dodddummy — June 26, 2018 @ 12:32 pm

  91. Hi all,

    dodddummy – I’ve tested with a clean install of 16.04 and 17.10 in a VM (and use 18.04 on my own laptop, but upgraded from previous installs) and haven’t been able to reproduce the problem I’m afraid. So far from the logs people have sent me I haven’t been able to identify anything obviously different or wrong either unfortunately.

    I’ve just had an idea though, it might be that your processor doesn’t support all the extensions that the version of tensorflow I’ve packaged was compiled to use. Could you (and anyone else still experiencing this issue) send me the output of:

    cat /proc/cpuinfo

    Thanks!

    Mirko – Awesome, well done, that’s great! The ‘-m’ option tells gst-launch to display bus messages (deepspeech send detected speech as a bus message). When debugging issues with the stream you might find it useful to also try replacing deepspeech and the fakesink with an autoaudiosink so you can check that the stream sounds as you’d expect as well

    Mike – It looks like you’re missing the development package for ibus, on Ubuntu or Debian this would be libibus-1.0-dev, for Fedora I believe it’s called ibus-devel

    Hope that helps!
    Mike

    Comment by Mike — June 26, 2018 @ 12:33 pm

  92. Mike,
    Here is cpuinfo from one machine. This one is running 18.04 but I can provide the same from another machine if this doesn’t point out the issue. All of my computers are pretty old with the newest being an early i5.

    processor : 0
    vendor_id : GenuineIntel
    cpu family : 6
    model : 15
    model name : Intel(R) Core(TM)2 Duo CPU E4700 @ 2.60GHz
    stepping : 11
    microcode : 0xba
    cpu MHz : 1692.105
    cache size : 2048 KB
    physical id : 0
    siblings : 2
    core id : 0
    cpu cores : 2
    apicid : 0
    initial apicid : 0
    fpu : yes
    fpu_exception : yes
    cpuid level : 10
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm lahf_lm pti dtherm
    bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass
    bogomips : 5187.34
    clflush size : 64
    cache_alignment : 64
    address sizes : 36 bits physical, 48 bits virtual
    power management:

    processor : 1
    vendor_id : GenuineIntel
    cpu family : 6
    model : 15
    model name : Intel(R) Core(TM)2 Duo CPU E4700 @ 2.60GHz
    stepping : 11
    microcode : 0xba
    cpu MHz : 1784.198
    cache size : 2048 KB
    physical id : 0
    siblings : 2
    core id : 1
    cpu cores : 2
    apicid : 1
    initial apicid : 1
    fpu : yes
    fpu_exception : yes
    cpuid level : 10
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm lahf_lm pti dtherm
    bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass
    bogomips : 5187.34
    clflush size : 64
    cache_alignment : 64
    address sizes : 36 bits physical, 48 bits virtual
    power management:

    Comment by doddddummy — June 26, 2018 @ 9:49 pm

  93. Hi Mike,
    I added libibus-1.0-dev. NB I’m on Linux Mint 18.1 (= equiv to Ubuntu Xenial).
    Now I get the following at the end of the ./config messages:

    checking for python extension module directory… ${exec_prefix}/lib/python2.7/dist-packages
    ./configure: line 17485: AM_GLIB_GNU_GETTEXT: command not found
    checking that generated files are newer than configure… done
    checking that generated files are newer than configure… done
    configure: creating ./config.status
    config.status: error: cannot find input file: `Makefile.in’

    PS even assuming I do get ‘./configure && make && make install’ to play nice, I’m not quite sure from your “INSTALL” file how I then get Ibus to “operate” in conjunction with another app (e.g. xed, LO Writer). Might you be able to spell this out?
    I’m a Linux slightly-beyond-newb, although with a goodish programming background. This “autoconf”/”automake”/”toolchain” stuff is a steep learning curve… Any suggestions what I should do next?

    Comment by Mike Rodent — June 27, 2018 @ 10:55 am

  94. doddddummy – Aha, I think we may have cracked it, as far as I remember (I’ll double check when I have some time at home), tensorflow is compiled to require the AVX and FMA features on the CPU, which yours are lacking. When I have some free time (things are a bit hectic for me at the moment) I’ll try and figure out a way to catch this and display a more informative error, and then possibly also see if I can make a tensorflow package without these optimisations.

    Mike – It looks like you’re missing libglib2.0-dev (you can see the full list of build-depends here: https://launchpadlibrarian.net/364652351/ibus-deepspeech_0.1.0-5.dsc ). If you’re using a Xenial derived distribution you could probably just use the precompiled PPA packages directly though. When I have a bit of time at home I’ll try and create a quick screencast showing you how to select the deepspeech plugin in the ibus preferences.

    Cheers,
    Mike

    Comment by Mike — June 28, 2018 @ 3:45 pm

  95. Mike,
    Ok. I’ll monitor the thread for an update.

    Comment by dodddummy — June 28, 2018 @ 11:21 pm

  96. Thanks again Mike. I’m probably being very obtuse but I have in fact installed the ppas from this page: https://launchpad.net/~michael-sheldon/+archive/ubuntu/gst-deepspeech. I just checked, and these 2 packages (gstreamer1.0-de and ibus-deepspeech) are indeed installed, in their latest versions, both “Desired i, Status i, no error” (using dpkg -l). Will install those “Build-Depends” in that .dsc file though.

    Comment by Mike Rodent — June 30, 2018 @ 5:59 pm

  97. Hi Mike,

    That .dsc file is an archive. After extracting I found “libglib” in the file …/debian/control, under “Build-Depends”. So I did this:
    sudo apt-get install build-essential libglib2.0-dev libibus-1.0-dev debhelper
    … and checked installed (debhelper v 10.+).
    Running ./config again I again got the thing about
    config.status: error: cannot find input file: `Makefile.in’
    again.
    If I could “just use” the PPAs that’d be great, but as I say that’s the first thing I did. For good measure I tried doing this again: said everything up-to-date.
    So… at the moment I’m not clear whether everything needed to use Ibus has been installed (although I think so)… and I haven’t a clue how to start dictating into Writer, for example, using Ibus.
    A few more explicit instructions would be great (I’ve looked through most of the downloaded readme and other files and I don’t seem to see anything)…

    Comment by Mike Rodent — July 11, 2018 @ 6:26 am

  98. OK just found in Ibus –> Input Method –> Add that under “English” there is now a “DeepSpeech” item.
    So I added this.
    Then I checked (in Sound) that my microphone was responding to voice… yes.
    Then I added another hotkey in Ibus –> General –> Next input method (as I’m not sure where the “Super” metakey might reside in my mmachine): added Ctrl-Space.
    Then I went to Writer (and xed) and tried pressing Ctrl-Space… speaking… gah. Nothing.
    I *seem* to be within touching distance (although this may be a delusion). Any ideas?

    Comment by Mike Rodent — July 11, 2018 @ 6:37 am

  99. Hi Mike,

    Sorry, I’ve been really busy recently so making that screencast for you rather slipped through the net. I don’t have my laptop with me right now so am just working from memory, but I think you probably just need to tick an option to show the plugin UI (it might be in the Advanced tab?), after which you should see a small floating window with a microphone button which you can press to begin dictation (like in the video).

    If you’re not able to find it give me another poke and I’ll make some more precise instructions when I have my laptop to refer to.

    Thanks,
    Mike

    Comment by Mike — July 11, 2018 @ 11:38 am

  100. Success!!!!!!!!!!!!!!!

    In fact on my IBus Advanced tab there is no option along those lines.

    Finally I spotted a tiny icon with a microphone in the top right of my screen (I have a 43″ screen, so something a few pixels across like that tends to get lost!).

    When the OS starts up this appears to be a “blob” or button… when I launch an application which should be Ibus-capable this then changes to a microphone on a lightish grey background. You can toggle this to dark grey background, which presumably is toggling off.

    Again I tried xed, tried Writer… bumped up my microphone input sensitivity to 150%. Tried speaking again in Writer and was about to give up when… a PHRASE was written! It seems to take about 4 seconds to process a shortish phrase… but… YOU’VE DONE IT!

    As you say, “proof of concept”. But the concept has legs, it walks.

    Can I ask how long you’ve been playing around with this? What scope is there, if any, for configuring the DeepSpeech module, e.g. by training to one’s voice? I just downloaded it from github: 56 MB unzipped. No doubt digging into the code here is a daunting task… but wow FOSS offline dictation! In Linux. In any app. Nice work!

    Comment by Mike Rodent — July 11, 2018 @ 5:33 pm

  101. Hey Mike,

    That’s awesome! Well done 🙂

    I should definitely see about making the UI a bit more friendly, currently it uses ibus’s own UI, which is a bit limited in terms of flexibility. Eventually I’d like to have a more fully-fledged user-friendly UI that allows the user to do a lot more (e.g. make corrections, toggle different dictation modes, select different models, etc.)

    In terms of speed this should improve greatly once the DeepSpeech streaming support is completed. Currently the way DeepSpeech works is that you have to provide it with a discrete sample of segmented audio for it to perform inference on. So for example if you spend 2 seconds actually saying a sentence, it’ll record those 2 seconds, then only once you’ve finished speaking will that be sent to DeepSpeech which will spend another couple of seconds performing speech recognition. With streaming inference it’ll be able to continuously provide text output while you’re talking, making it much more instantaneous and a lot nicer for continuous dictation.

    In terms of refining the model, with version 0.1.1 Mozilla have now released the checkpoint files, so it should be possible to retrain the model with some extra data (e.g. your own voice), building upon the existing learning. I haven’t played with this myself yet, so couldn’t say how much time it’d take to retrain optimally but it should definitely be possible. I’d quite like to make some sort of simple UI to guide a user through this process, but I’m pretty short on free time at the moment so it might be a while before I get around to such a thing.

    Cheers,
    Mike

    Comment by Mike — July 13, 2018 @ 2:55 pm

  102. Hello,
    This is a great plugin. I would like to get this enabled on my Debian Stable.
    Could we continue to the conversation here:
    https://gitter.im/dataassistant-co/ibus-deepspeech

    And go over what is needed from start to get this enabled on gnome shell.

    As soon as I have all the steps how to enable this, I can create a git pull request for Readme.txt or provide easy instructions so more people can use this great tool.

    Thanks
    Lucas

    Comment by Lucas — August 12, 2018 @ 4:17 pm

  103. I love you! This is awesome!!

    Comment by Danny — November 24, 2018 @ 11:39 am

  104. Hi, i try hard to get the IBUS Plugin working,
    could you supply more Details how to activate the Plugin
    To narrow the reasons who cause this problem, i have made a short Screencast-Video:
    https://youtu.be/4NQzOlj8Sd8

    thx for your great work so far.
    joe

    Comment by Josef Federl — June 17, 2019 @ 10:49 pm

  105. Hi Josef,

    You need to enable the plugin in your IBUS settings, on Unity and Gnome you can click “Text Entry Settings” on the input switcher, then click the + button and search for “Deepspeech”, or for other environments you can run ibus-setup, change to the “Input Method” tab, click “Add”, select “English” then scroll down to “DeepSpeech” and add it. Also in the “General” tab make sure that “Show property panel” is set to “Always”

    Most desktop environments will have a switch for selecting the active input type in their system tray/notification icon area, you can use that to switch to using DeepSpeech; you’ll know it’s selected because a small floating window will appear with a microphone button, click on that to begin dictation.

    (As a side note, you don’t need to have the gst-launch line running when using the ibus plugin, it’ll construct its own gstreamer pipeline internally)

    Hope that helps,
    Mike

    Comment by Mike — June 18, 2019 @ 11:54 am

  106. a huge Thank for your Tips , I followed them so far.
    There seem to be still something wrong / strikeing.

    https://youtu.be/I1yhs7N_aW4

    Would be very glad if you had any further ideas what to do?

    Comment by Josef Federl — June 18, 2019 @ 9:42 pm

  107. Hi Josef,

    It looks like the plugin is running correctly now, but for some reason ibus isn’t sending input to your applications. Could you check to see if you’ve got the GTK ibus backends installed (ibus-gtk and ibus-gtk3 in Debian/Ubuntu)? You may need to restart applications (or perhaps log out and back in) after installing them if they are missing

    Thanks,
    Mike

    Comment by Mike — June 19, 2019 @ 11:03 am

  108. Hi Mike,
    I have checked if the GTK ibus backends are installed (ibus-gtk and ibus-gtk3 in Debian/Ubuntu)
    https://www.youtube.com/watch?v=uDBG6IdSP9c&feature=youtu.be

    thx joe.

    Comment by Josef Federl — June 19, 2019 @ 11:21 am

  109. Your main PPA doesn’t support bionic.

    Comment by Sarah — June 19, 2019 @ 10:49 pm

  110. Josef – I just noticed in your first video you’re using the tweaked silence-threshold=0.3 parameter, the IBUS plugin uses the default silence-threshold which might be too high for your environment (this results in it never detecting silence and so never sending a segment of audio for processing), try editing /usr/share/ibus-deepspeech/engine.py and change line 45 from:

    self.pipeline = Gst.parse_launch("pulsesrc ! audioconvert ! audiorate ! audioresample ! deepspeech silence-length=20 ! fakesink")

    to:

    self.pipeline = Gst.parse_launch("pulsesrc ! audioconvert ! audiorate ! audioresample ! deepspeech silence-threshold=0.3 silence-length=20 ! fakesink")

    Sarah – Yes, apologies I’m afraid I’ve not had time to update the PPA in a little while, I’ll try to find some time to do this soon.

    Comment by Mike — June 20, 2019 @ 9:42 am

  111. Sorry, but nothing has changed,
    Still the same behavior.
    Linux mintlab 4.10.0-38-generic #42~16.04.1-Ubuntu SMP Tue Oct 10 16:32:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

    Comment by Josef Federl — June 20, 2019 @ 7:32 pm

  112. Hi Josef,

    Sorry for the delay in getting back to you, I’ve been away for a bit. I’m not sure what else to suggest I’m afraid, the only thing I can think of doing to get a little more context on the issue is for you to install another ibus plugin like the pinyin or chewing one and confirm whether or not that works correctly

    Comment by Mike — June 25, 2019 @ 9:46 am

  113. two keyboards -> first: Standart US / second: Korean_Hangul_Letters #2175
    IBUS?
    https://github.com/ibus/ibus/issues/2175

    Hi Mike,

    If you have any hint or tips or whatever how and where to start, would be amazing .
    I have absolutely no idea no clue … how to estimate the time necessary

    Comment by Josef Federl — January 26, 2020 @ 10:33 pm

  114. Hi Josef,

    Sorry I’m afraid my IBUS knowledge isn’t really deep enough to have any ideas on that, good luck though!

    Comment by Mike — January 27, 2020 @ 11:16 am

  115. […] in applications. [Michael Sheldon] aims to fix that — at least for DeepSpeech. He’s created an IBus plugin that lets DeepSpeech work with nearly any X application. He’s also provided PPAs that should make it easy to install for Ubuntu or related […]

    Pingback by Speech Recognition For Linux Gets A Little Closer - SLG 2020 — September 15, 2020 @ 3:20 am

  116. Hi Mike,
    Thanks for your time.
    I clone the repository https://github.com/Elleo/gst-deepspeech and download the project, then follow the install instructions and executed the following: ./configure && make && make install without errors.
    The fact is that when I want to run gst-launch with the deepspeech plugin I found this:
    gst-launch-1.0 -m pulsesrc ! audioconvert ! audiorate ! audioresample ! deepspeech silence-threshold=0.3 silence-length=20 ! fakesink
    WARNING: erroneous pipeline: no element “deepspeech”
    How do I know if the plugin is installed successfully ?

    Comment by Charly — October 3, 2020 @ 5:07 am

  117. Hi Charly,

    If you run `./configure` without specifying a prefix the plugin will be installed in /usr/local/lib/ which GStreamer won’t look in by default. To install it to /usr/lib/ run `./configure –prefix=/usr`

    Hope that helps,
    Mike.

    Comment by Mike — October 3, 2020 @ 12:32 pm

  118. Hi Mike,
    Thanks for your prompt response.
    I followed your recomendation using `./configure –prefix=/usr` for package configuration.
    Now, the libraries are located on /usr/lib/gstreamer-1.0 ,
    being the libraries: libgstdeepspeech.la libgstdeepspeech.so
    In order to verify the plugin is installed I run the following Gstreamer command line app:
    gst-inspect-1.0 deepspeech
    but the gst-inspect returns the following message:
    No such element or plugin ‘deepspeech’
    the same outcome I get when I want to run the entire pipeline using gst-launch as previouly.
    Thanks again and congratulations for your work.

    Comment by Charly — October 3, 2020 @ 4:10 pm

  119. Do you have libdeepspeech.so in your library path somewhere? Either installed in /usr/lib or added to LD_LIBRARY_PATH?

    Comment by Mike — October 3, 2020 @ 7:01 pm

  120. Hi Mike,
    I wanted to tell you that finally I was able to execute a gst-launch pipeline with deepspeech plugin successfully: gst-launch-1.0 -m filesrc location=file.wav ! decodebin ! audioconvert ! audiorate ! audioresample ! deepspeech ! fakesink.
    The plugin is recognizing words from .wav file successfully !!
    My next step would be to include the plugin as part of C code, using gst libraries. Is it possible to invoque deepspeech from code and use it as pipeline element?
    Thanks for your help!

    Comment by Charly — October 6, 2020 @ 7:22 pm

  121. Hi Charly,

    Glad to hear you got it working! I wrote a simple python example a little while back which demonstrates setting up a pipeline and connecting to the bus to receive messages:

    https://github.com/Elleo/gst-deepspeech/blob/master/examples/python/print_speech.py

    Although it’s in python the actual GStreamer calls map fairly directly onto the C API. If you have any trouble with it though, just let me know and I’ll knock up a quick C example for you

    Cheers,
    Mike

    Comment by Mike — October 7, 2020 @ 8:22 am

  122. Hi Mike,

    Thanks so much for writing these plugins. Wondering if you could help me out with one issue – I have the gst and ibus plugins both compiled and running under Arch Linux, but the problem I’m running into is, well, it won’t stop! That is, once I speak into the microphone, Gstreamer continually sends messages containing the same speech over and over until I just send an interrupt and kill it. As a result, the ibus plugin just outputs the same sentence fragment without end; speaking further just makes the repeated sentence fragment longer and longer. (It happens as well with the print_speech.py example script you provided.)

    So not sure where the problem may be – whether it’s that the gst plugin needs instructions to know where to stop, whether the deepspeech parameters (e.g. silence detection) need adjusting, or whether ibus needs to be told not to output new text unless it’s been altered from what it most recently wrote. Happy to provide any logs or traces that may be helpful; happy as well to play around with the source code as might be helpful.

    Thanks,
    Matt

    Comment by Matt — November 18, 2020 @ 10:17 pm

  123. Hi Matt,

    This is probably due to some recent changes I’ve made to the GStreamer plugin, it now uses the DeepSpeech streaming API and provides two types of message, “intermediate” and “final”, the “intermediate” messages constantly update as new data comes in, and then the “final” message is sent when DeepSpeech is confident about a section of text being complete. However, I haven’t updated the IBUS plugin or the example recently, so it’s likely to be displaying all the intermediate messages as well as the final ones as it doesn’t know to distinguish between the two.

    I’m a bit busy at the moment so am unlikely to have a chance to update these in the near future, however if you’re able to put together a pull request implementing this I’ll do my best to make some time to review it

    Cheers,
    Mike

    Comment by Mike — November 23, 2020 @ 11:05 am

  124. Hi Mike,

    Got it, I figured it might have been the intermediate/final sections of the code, as those were the only sections that seemed to post messages in the first place. I had toyed around with it, though being still unfamiliar with the API hadn’t had much success. Thanks for the insight – I’ll see if I can come up with any solutions.

    Best,
    Matt

    Comment by Matt — November 23, 2020 @ 7:52 pm

  125. Hi Mike,

    Thanks for sharing your job.

    Months ago I was testing the Gstreamer plugin with the python example and it ran pretty cool.

    The fact is, I’d like to test the plugin with different pretrained models (deepspeech updates it regurlarlly) , and a couple of questions came up to me:

    1 _ What changes are required to setup the plugin with a different model ?
    2 _ Have you tested the gstreamer plugin on embedded systems like raspberry PI for example ?

    Thanks in advance and congratulations for your work.

    Comment by Charly — February 10, 2021 @ 5:16 am

  126. Hi Charly,

    1 – You can set the path to a different speech model and scorer using the “speech-model” and “scorer” properties of the deepspeech element, these models need to be compatible with the version of DeepSpeech that the plugin was compiled against (currently 0.7), when I have a bit of free time I’ll see about updating the plugin to work with DeepSpeech 0.9
    2 – I haven’t tested it I’m afraid, but I’d expect performance to be pretty much identical to using DeepSpeech directly on these platforms

    Cheers,
    Mike

    Comment by Mike — February 12, 2021 @ 11:06 am

  127. Hi dear Michael Sheldon,
    First of all, thank you for making this project.

    I am running Ubuntu 22.04 Mate. I need this tool to use my computer. I tested your deepspeech STT, since I am not a native English speaker, the result was awful. I found openai whisper and tested it, it’s awesome. I need a way to integrate into typing booster. Would you please give an example of it via dbus and gst-launch or other tools?

    Comment by Atilla Karaca — October 14, 2022 @ 7:55 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress