BYBLOS: The BBN continuous speech recognition system :-
In this paper, they describe BYBLOS, the BBN continuous speech recognition system. The system, designed for large vocabulary applications, integrates acoustic, phonetic, lexical, and linguistic knowledge sources to achieve high recognition performance. The basic approach it makes is the extensive use of robust context-dependent models of phonetic coarticulation using Hidden Markov Models (HMM). It describes the components of the BYBLOS system, including: signal processing frontend, dictionary, phonetic model training system, word model generator, grammar and decoder. In recognition experiments, it demonstrates consistently high word ...view middle of the document...
One of the unique characteristics of our bimodal speech recognition system is the novel fusion strategy of the acoustic and the visual features, which takes into account the different sampling rates of these two signals. Compared to acoustic only, the audio-visual speech recognition scheme has a much more improved recognition accuracy, especially in the presence of noise
New Methods in Continuous Mandarin Speech Recognition:-
We describe new methods for speaker-independent, continuous mandarin speech recognition based on the IBM HMM-based continuous speech recognition system (1-3): First, we treat tones in mandarin as attributes of certain phonemes, instead of syllables. Second, instantaneous pitch is treated as a variable in the acoustic feature vector, in the same way as cepstra or energy. Third, by designing a set of word-segmentation rules to convert the continuous Chinese text into segmented text, an effective trigram language model is trained(4). By applying those new methods, a speaker-independent, very-large-vocabulary continuous mandarin dictation system is demonstrated. Decoding results showed that its performance is similar to the best results for US English.
Using MLP Features in SRI's Conversational Speech Recognition System:-
We describe the development of a speech recognition system for conversational telephone speech (CTS) that incorporates acoustic features estimated by multilayer perceptrons (MLP). The acoustic features are based on frame-level phone posterior probabilities, obtained by merging two different MLP estimators, one based on PLP-Tandem features, the other based on hidden activation TRAPs (HATs) features. This paper focuses on the challenges arising when incorporating these nonstandard features into a full-scale speech-to-text (STT) system, as used by SRI in the Fall 2004 DARPA STT evaluations. First, we developed a series of time-saving techniques for training feature MLPs on 1800 hours of speech. Second, we investigated which components of a multipass, multi-front-end recognition system are most profitably augmented with MLP features for best overall performance. The final system obtained achieved a 2% absolute (10% relative) WER reduction over a comparable baseline system that did not include Tandem/HATs MLP features.
Hidden Markov Models for Speech Recognition System:-
The use of hidden Markov models for speech recognition has become predominant in the last several years, as evidenced by the number of published papers and talks at major speech conferences. The reasons this method has become so popular are the inherent statistical (mathematically precise) framework; the ease and availability of training algorithms for estimating the parameters of the models from finite...