Attic/GnomeVoiceControl/VoiceRecognition

1. Survey of some existing Speech Recognition Tools

xvoice

Xvoice enables continuous speech dictation and speech control of most X applications. To convert users' speech into text it uses the IBM ViaVoice speech recognition engine, which is distributed separately (see below).

When in dictation mode Xvoice passes this text directly to the currently focused X application. When in command mode, Xvoice matches the speech with predefined, user-modifieable, key sequences or commands. For instance "list" would match "ls -l" when commanding the console, so that when the user says "list" "ls -l" will be sent to the console as if the user had typed it.

Visit http://xvoice.sourceforge.net/

CVoiceControl

CVoiceControl is a speech recognition system that enables a user to connect spoken commands to unix commands. It automagically detects speech input from a microphone, performs recognition on this input and - in case of successful recognition - executes the associated unix command.

Visit http://www.kiecza.net/daniel/linux/

GerVoice

Gnome Environment Recognition Tools info on gnomefiles.org.

2. Open Source Speech Recognition Engines

CMU Sphinx

http://cmusphinx.sourceforge.net/html/cmusphinx.php

Open-Source Large Vocabulary CSR Engine

Julius http://julius.sourceforge.jp/en/

3. Open Source Speech Processing Libraries & Toolkits

HTK

The Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and manipulating hidden Markov models. HTK is primarily used for speech recognition research although it has been used for numerous other applications including research into speech synthesis, character recognition and DNA sequencing.

Edinburgh Speech Tools

The Edinburgh Speech Tools Library is a collection of C++ class, functions and related programs for manipulating the sorts of objects used in speech processing.

Snack

The Snack Sound Toolkit is designed to be used with a scripting language such as Tcl/Tk or Python. Using Snack you can create powerful multi-platform audio applications with just a few lines of code.

ESPS

ESPS is a library of speech and signal processing programs (including a good F0 tracker) now available as source from KTH.

SRILM: SRI Language Modelling toolkit

SRILM is a toolkit for building and applying statistical language models (LMs), primarily for use in speech recognition, statistical tagging and segmentation.