Listen to Me
The Blather, FreeSpeech, Palaver, Simon, and Vedics speech recognition programs are ready to respond to voice commands. This sounds good in theory, but there are some pitfalls in practice.
|
The Blather, FreeSpeech, Palaver, Simon, and Vedics speech recognition programs are ready to respond to voice commands. This sounds good in theory, but there are some pitfalls in practice.
A strong "Start browser!" belted into the microphone will start Firefox – at least, that's what the five leading free speech recognition programs (Blather, FreeSpeech, Palaver, Simon, and Vedics) promise. With that, they want to make input easier and also help disabled individuals better operate the desktop.
Four of these vendors – Vedics being the exception – allow you to decide for yourself what command triggers an action. A "Start browser!" could conceivably be used to open a text editor – confusing, yet possible.
The five applications do not analyze speech patterns themselves; they leave that task to other software. As a rule PocketSphinx [1] from Carnegie Mellon University (CMU) is the "other" software used.
The applications generally refer to such analysis assistance as back ends or engines. Blather, FreeSpeech, Palaver, and Vedics are under the GNU GPLv3 license, whereas Simon still uses the older version 2.
Blather [1] is programmed in Python and to get it to work, you must install the PocketSphinx archive in the package manager, along with the Python Gstreamer and Python GTK (in Ubuntu, python-gtk2 , python-gst0.10 , and pocketsphinx ). If PocketSphinx isn't part of your distribution, follow the instructions in the "Three-Step Sphinx" box.
Three-Step Sphinx
To begin, integrate the Bison package and, when appropriate, Perl. From the web [4], download the sphinxbase , pocketsphinx , and sphinxtrain packages. Unzip them and install them using the usual three-step procedure in Listing 1, where you start with the base package.
Listing 1
Installation Steps
$ ./configure $ make $ sudo make install
From Gitorious [2], download the current development version of Blather. After unzipping the archive, rename the file commands.tmp in commands and use a text editor to enter the desired English-language commands. Begin each line with an uppercase letter followed by a colon and the executable shell command.
Next, create the ~/.config/blather directory, copy the commands file into it and run ./Blather.py from the Blather directory. When the program seems to crash, end it with Ctrl+C. Then, upload the ~/.config/blather/sentences.corpus file to the Sphinx Knowledge Base Tools website [3].
After clicking Compile Knowledge Base on the website, save the generated file with the .lm extension under the name lm , and the file with the .dic extension under the name dic in the ~/.config/blather/language directory. You can then start Blather in its directory with ./Blather.py -i g .
The program displays a very clear main window (Figure 1). After you click Listen , it waits for a speech command through the mic. Alternatively, you can switch to Continuous mode in which the program listens continuously. There are no further functions. The degree of speech recognition is marginally acceptable.
Unlike the other four programs, FreeSpeech, also written in Python, is generally a dictation device. After startup, it opens a simple text editor where all the words spoken into the mic are written. Special language commands allow subsequent editing. Thus, an editor clear command deletes all the text previously interpreted.
A window appearing after startup shows all the available commands (Figure 2). Here you can modify a command by double-clicking it. As of version 120, FreeSpeech provides the option to control other programs with a virtual keyboard. Click the Send keys button in the text editor and speak the key combination into the mic.
FreeSpeech interprets exclusively English words, despite that fact the degree of detection is not particularly good. In our case, the PocketSphinx background process interpreted a clearly spoken "Hello World" curiously as "An over To open" (Figure 3). The second try yielded "An adult wall."
According to the documentation, you can improve the recognition rate by correcting the failed text in the editor and clicking Learn . My test unfortunately produced a number of error messages in the process. Also, controlling other programs didn't work, and input ended up garbled in the editor.
To put FreeSpeech into operation, you need to integrate Python-Gtk2, Python-Xlib, Python-Simplejson, Python-Gstreamer, PocketSphinx, and Sphinxbase from the package manager. In Ubuntu, these are in the python-xlib , python-simplejson , python-gtk2 , python-gst0.10 , python-pocketsphinx , and gstreamer0.10-pocketsphinx packages. Again, if PocketSphinx isn't in your distribution's repository, follow the instructions in the "Three-Step Sphinx" box.
Download the PocketSphinx archive from the web [4] and unzip it on the hard drive. Open the Makefile from the CMU-Cam_Toolkit_v2/src subdirectory in the text editor and remove the hash mark (# ) at the beginning of the following line:
#BYTESWAP_FLAG = -DSLM_SWAP_BYTES
After saving, open a terminal, change to the CMU-Cam_Toolkit_v2/src subdirectory and execute make install . Then, copy the programs created in the folder to a directory included in the $PATH environment variable, such as /usr/local/bin .
Also from the web [5], download the FreeSpeech archive (pay attention to the ReleaseDate ). Unzip the archive on the hard drive and start the software using python freespeech.py in the created directory.
Pages: 5
The organizers of LinuxFest Northwest 2011 (LFNW) are busy making arrangements for attendees to experience outstanding open source presentations, exhibits, and fun with something for everyone on April 30 and May 1 at the Bellingham (WA) Technical College.
At the Linux New Media Awards 2012 prize giving ceremony for special achievements in the field of Linux and free software, Libre Office and Android each took two of a total of seven prizes.
The CeBIT Open Source project lounge which is part of CeBIT Open Source called for projects to apply for free exhibit space. A jury has now selected and announced the fifteen free, non-commercial projects that will receive exhibit space at CeBIT for free.
Tuesday is kick-off for CeBIT, the world's foremost tradeshow for the digital industry in Hannover, Germany, and with it high-class talks in the CeBIT Open Source forum. All talks will be visible on a live-stream in the internet.
© 2024 Linux New Media USA, LLC – Legal Notice