Controlling a computer with speech
|
Palaver
In contrast to the two previously mentioned programs, the Palaver speech recognition program, also written in Python, has no user interface. Instead, you start and stop the voice input using a freely selectable keyboard shortcut. Palaver then sends the spoken text to Google – naturally requiring a certain trust of the search engine giant (Figure 4).
If you voice the commands loudly and clearly, Palaver recognizes them far more readily than its competitors. You can also learn which English commands Palaver understands by using the ./plugin -l command in the program directory.
The "Open music" command, for example, opens the Music folder in the file manager. Each command comes from a plugin. For example, the FileBrowser plugin is responsible for the open command.
Additional plugins are in the catalog on the Palaver homepage [6]. Because the plugins are mainly simple Bash scripts, it's reasonably possible to add more commands to them.
To use Palaver, install the Sox, Python-Argparse, Wget, Espeak, Xvkbd, Xautomation, and Zenity packages via the package manager and be sure that the Notification-daemon, Notify-osd, or Notifyd are activated, depending on the desktop. In Ubuntu, you need the sox , python-argh , wget , espeak , xvkbd , and xautomation packages.
Next, download the current development version of Palaver from GitHub [7] using Download ZIP . Unzip the created archive on the hard drive and execute the ./setup command from the Palaver-master directory as the root user.
You can skip the personal details and enter the language. Integrate the Default Plugins using Install . Then, create a keyboard shortcut in the system settings that starts the hotkey script in the Palaver directory.
In Ubuntu, open the system settings, choose Keyboard | Shortcuts , mark Own Shortcuts , click the plus sign, enter a name (e.g., Palaver ) and enter the path to the hotkey script (e.g., /home/tim/Palaver-master/hotkey ). After Apply , click the Deactivated text and press the keyboard shortcut with which to activate the program in the future.
To install a subsequent plugin, execute the ./plugin -p <name> command in the Palaver directory, substituting the name of the plugin (e.g., FileBrowser ). If you want to write a plugin, take a look at the Doc folder where you can find an example to get you going.
Simon
A classic speech recognition program is Simon [8]. This software is written in C++ and uses the KDE library and either PocketSphinx or Julius. The latter is currently being maintained by developers in the Nagoya Institute of Technology in Japan [9].
Simon originally emerged out of a research project of the Federal Higher Technical Institute for Educating and Experimenting in Austria. In the meantime, the project has been taken over by the "Simon Listens" financing organization. The creators also founded an eponymous company offering services related to Simon [10].
After starting up Simon, a wizard opens to help you through the most important settings. It's assumed that you're already familiar with the Simon working model and terminology. The same is true for the main window (Figure 5), whose operating mode is not in the least intuitive. Beginners therefore can't get around studying the rather bulky manuals [11].
The application collects all the voice commands for a specific task into a so-called "scenario." In this way, you can bundle all the Firefox commands together. Simon provides prepared scenarios for a few of the common use cases.
Furthermore, the software requires the data for the desired language model ahead of time. These you can get off the web. The model informs Simon about the characteristics of the language.
However, the models provided are based strictly on the usual language pronunciations. To improve recognition, the possibility exists of reading in all the commands again as part of a training session.
Simon operates as a client-server system. You can store a spoken language analysis on a server (Figure 6). This not only saves the local machine processing but also lets multiple clients rely on a central server. Simon can also listen on several microphones simultaneously or use a selected one – the first competing one is always used by default. Simon is also the only program that warns you of over- or under-amplification when recording.
In our Ubuntu test, Simon could be loaded from the package manager but refused to start up. A openSUSE 13.1 port didn't behave much better. The program ignored all speech input without a comment and started throwing numerous error messages about. If you want to try Simon for yourself, first install C++ Compiler, Cmake, Git, and Gettext. For tools, you need the development packages for KDE, Qt6, Libattica, Phonon, and Zlib. The easier to install PocketSphinx can also be used as a back end.
If you prefer the Julius engine instead, subsequently install the Hidden Markov Model Toolkit (HTK), which you can get from the homepage after a free registration. Further installation instructions for Julius are in the Simon instructions.
That leaves just Simon itself. In openSUSE, the command on the first line in Listing 2 installs all the required packages as the root user. With the commands that follow, you download the current Simon version, interpret it, and install the voice controls.
Listing 2
Downloading and Installing Simon
# zypper in git-core gcc gettext-tools gettext-runtime libkde4-devel libqt4-sql-sqlite libqt4-multimedia libqt4-phonon-devel libattica-devel libattica0 zlib-devel kde-l10n-de qwt6-devel # git clone git://anongit.kde.org/simon simonsource # cd simonsource # ./build.sh
« Previous 1 2 3 Next »
Buy this article as PDF
Pages: 5
(incl. VAT)