sprex logo
Sprex
Banner Image
      
 

 

 

 

 

 

 

Garbage modeling


Adding words


Microphones


Open Mic


Real-time response


Sample rate conversion


Grammars, dictionaries, and "vocabularies"


Other FAQs


Other Sprex Products/Services


Exit>

 

 

 

 

 

 

 


Grammars, dictionaries, vocabularies

In the speech rec trade, the term "vocabulary" is incorrectly used to refer to the grammar in use at a given moment by a recognizer. If the grammar is nothing but a word *list*, then sure, it's a vocabulary. But it's a pretty stupid recognizer that can only recognize one out of a simple list of alternatives.

Normally spoken language is comprised of multi-word utterances, not just single-word utterances. A "grammar" is a specification of the acceptable *sequences* of words, which includes whatever combinatoric possibilities may be found in, for example, a branching network of alternatives. Grammars can be thought of as word lattices, or if you like, a set of rules, where each rule defines how some category of phrases is composed together out of its subparts. The subparts can be words, sets of words (i.e., categories like "digit", "color adjective", etc.), or names of other phrase categories which have their own definition in terms of their own subparts. Linguists like the rules-based approach, but word networks or lattices are reasonably practical, too.

For example, category of phrases we call Zip Codes could be considered as a list of numbers from 00000 to 99999, with 100,000 elements in the list (a pretty big and inefficient speech recognizer), or as a grammar composed of a simple sequence of 5 elements, each obviously a Digit, one of a list of 10 or 11 (one for 'zero', another for 'oh') words. The grammar approach has a small network made up of just 50 words, yet it includes everything the 100k list includes. The speech recognizer that uses the grammar network is a lot smaller, faster, and also more accurate than a speech recognizer that uses the full list "vocabulary" approach.

So:

  • A grammar is a specification of the word sequences that are acceptable.
  • A pronouncing dictionary, or in the context of speech recognition, simply a "dictionary", is a list of words along with their pronunciations.
  • A pronunciation for a word is a sequence of phonemes.
  • A phoneme is a word-distinguishing sound unit in the language being spoken (like a letter, but for sounds). Linguists who claim that the Phoneme is an obsolete technical term from the 40's can quit wasting everyone's time with their obfuscation. See "English Vowels: Their Surface Phonology and Phonetic Implementation in Vernacular Dialects", U Penn PhD Thesis, 1991. In short, the minimal list of lexically distinctive, phonetically similar sound sets, i.e., the list of phonemes, in a language or dialect, is easy to determine, intersubjectively consistent, and rarely if ever subject to question, particularly when listening to speech in a studied dialect. For the purposes of speech recognition, other categories such as Archiphoneme, (deeply) underlying segment, etc., are obtusely abstract or obtusely concrete, and vague overarching terms like "phone" are insufficiently precise. As Steve Young, father of many of the leading techniques in use today once said, The only thing Linguistics has contributed to Speech Recognition is the list of Phonemes. Quibbling about the theoretical-linguistic status of the members of that list is a waste of time.

Comments?
If anything can be changed to better address your wishes, we are eager to hear about it. Please share any reactions you may have.
   From:
Message:
        
Copyright © 1996-2005 Sprex, Inc. All rights reserved.
Date: July 25, 2008