![]() |
Sprex![]() |
||||||
|
| |||||||
|
Garbage modeling Adding words Microphones Open Mic Real-time response Sample rate conversion Grammars, dictionaries, and "vocabularies" Other FAQs Other Sprex Products/Services Exit> |
When a phone loop is used as a garbage or out-of-grammar-speech model, it needs a penalty in the lattice, otherwise the recognizer may assign all speech to it. All speech is of course comprised of phone sequences, and an unconstrained phone sequence will fit utterances at least as well and normally better than a constrained subset of all possible phone sequences. The recognizer can find not just within-grammar phone sequences but any ones whatsoever. Since there is phone-level deletion or modification in most utterances, it is preferable (from the recognizer's perspective) to transcribe it using the phone loop instead of the expected part of the grammar, which don't allow for deletions and modifications. To solve this problem, a penalty is attached to the entry link in the phone loop. The recognizer will then avoid aligning speech to the phone loop unless the fit is so much better in the phone loop than in the rest of the grammar that it overcomes that penalty. The penalty quantifies how much better the phone loop needs to be over the rest of the grammar before the recognizer can return a garbage result. This is the basic idea. An important refinement of the basic idea comes when we notice that long utterances do not show much effect from this penalty. The effect of the penalty is diluted over the many frames of the longer utterance, so that even if they are in the grammar, they may still be better fitted to the phone loop, because of the above-discussed fit preference for unconstrained phone recognition. Consequently, we add a loop-back penalty, which effectively increases the total penalty for longer utterances, so that this dilution will not occur. Then the recognizer will be forced to use within-grammar phone sequences to model a longer in-grammar utterance, because the accumulated penalties of repeatedly passing through the loop-back link are so costly that the phone loop is avoided.
Speech recognizers often confront word sequences that are not part of the specified grammar: garbage speech.
The way speech recognition works is that the incoming audio stream is compared to all the possible paths through the grammar network and words composing the top-scoring path are returned as the recognition result. If garbage speech is the input, then this method will still find the best fitting path, which will be some word sequence in the grammar -- which is different from the garbage word sequence! Therefore it is convenient to have a generic speech model, or "garbage model" which represents an alternative path within the grammar. If incoming speech matches a path through the garbage model best, then the system can infer that the speech does not fit the (remainder of) the grammar, and take an appropriate action (e.g., ignore the input, or give feedback asking the user to say something in the grammar). The following high-level grammar allows either in-grammar or out-of-grammar speech to be recognized: ( $GARBAGE | $GRAMMAR ) Here "|" means "or". There are several ways of building a generic speech model, $GARBAGE in the above grammar. One that I understand pretty well is a "penalized phone* grammar". "*" here means one or more instances in sequence, and "phone" here means any phoneme in the language. Thus a "phone* grammar" is a grammar which allows any number of any of the phonemes in the language, in any sequence. Since every utterance in the language is composed of a sequence of phones, a phone* grammar is a grammar which will fit any utterance in the language. In fact, a phone* grammar is so excellent a model of general speech that in general it fits in-grammar, non-garbage utterances even better than than the regular word-network grammar! (The word-network grammar is itself decomposed into a restricted set of phones in a restricted set of orders, while phone* allows anything in any order, so an in-grammar utterance will fit the phone* grammar better whenever any substitute phone fits better, since the alternatives are all there in the phone* grammar.) So a phone* garbage model will always win: the best fitting path through the network will always go through the garbage model, even for in-grammar utterances. For this reason, the phone* grammar is augmented by an entrance penalty. Paths that extend into this garbage model are penalized severely, so that only when the regular word-network grammar really doesn't fit very well at all, only then will garbage be recognized. A sample dictionary text file generated by hand and a lattice text file generated by grapHvite's NetBuilder (and augmented with the penalty score) are appended. The dictionary is required because networks must be specified in terms of "words" rather than in terms of phones, so in this dictionary each word is composed of a single phone. The lattice is simply the looping OR of all the phones (including _SILence), with an initial guesstimate penalty of -50 attached to the entry link of the lattice. Implementation with grapHvite/HAPI: Insert the appended lattice as a sublattice into your recognition network; it will represent an alternative path between initial and final nodes in your top-level lattice. Add the appended dictionary to your dictionary to produce a combined dictionary. -------- dictionary -------- !PHONE-SET aa ae ah ao aw ax ay b ch d dh eh en er ey f g hh ih iy jh k l m n ng oh ow oy p r s sh sil sp t th uh uw v w y z zh _AA aa _AE ae _AH ah _AO ao _AW aw _AX ax _AY ay _B b _CH ch _D d _DH dh _EH eh _EN en _ER er _EY ey _F f _G g _HH hh _IH ih _IY iy _JH jh _K k _L l _M m _N n _NG ng _OH oh _OW ow _OY oy _P p _R r _S s _SH sh _SIL sil _SP sp _T t _TH th _UH uh _UW uw _V v _W w _Y y _Z z _ZH zh -------- garbage.lat -------- # # Lattice file generated by grapHvite Version 1.0.1 # # Main Lattice. # # Grammar summary. N=52 L=94 # Node definitions. I=0 W=!NULL x=15 y=205 I=1 W=!NULL x=95 y=205 I=2 W=!NULL x=465 y=255 I=3 W=!NULL x=505 y=115 I=4 W=_AA x=185 y=35 I=5 W=_AE x=185 y=65 I=6 W=_AH x=185 y=95 I=7 W=_AO x=185 y=125 I=8 W=_AW x=185 y=155 I=9 W=_AX x=185 y=185 I=10 W=_AY x=185 y=215 I=11 W=_B x=185 y=245 I=12 W=_CH x=185 y=275 I=13 W=_D x=185 y=305 I=14 W=_DH x=185 y=335 I=15 W=_EH x=185 y=365 I=16 W=_EN x=185 y=395 I=17 W=_ER x=185 y=425 I=18 W=_EY x=295 y=35 I=19 W=_F x=295 y=65 I=20 W=_G x=295 y=95 I=21 W=_HH x=295 y=125 I=22 W=_IH x=295 y=155 I=23 W=_IY x=295 y=185 I=24 W=_JH x=295 y=215 I=25 W=_K x=295 y=245 I=26 W=_L x=295 y=275 I=27 W=_M x=295 y=305 I=28 W=_N x=295 y=335 I=29 W=_NG x=295 y=365 I=30 W=_OW x=295 y=395 I=31 W=_OY x=295 y=425 I=32 W=_P x=395 y=35 I=33 W=_R x=395 y=65 I=34 W=_S x=395 y=95 I=35 W=_SH x=395 y=125 I=36 W=_SIL x=395 y=155 I=37 W=_SP x=395 y=185 I=38 W=_T x=395 y=215 I=39 W=_TH x=395 y=245 I=40 W=_UH x=395 y=275 I=41 W=_UW x=395 y=305 I=42 W=_V x=395 y=335 I=43 W=_W x=395 y=365 I=44 W=_Y x=395 y=395 I=45 W=_Z x=395 y=425 I=46 W=_ZH x=395 y=455 I=47 W=!NULL x=255 y=195 I=48 W=!NULL x=245 y=255 I=49 W=!NULL x=355 y=205 I=50 W=!NULL x=345 y=265 I=51 W=!NULL x=515 y=245 # Link definitions. J=0 S=0 E=1 l=-50.0 J=1 S=1 E=4 J=2 S=1 E=5 J=3 S=1 E=6 J=4 S=1 E=7 J=5 S=1 E=8 J=6 S=1 E=9 J=7 S=1 E=10 J=8 S=1 E=11 J=9 S=1 E=12 J=10 S=1 E=13 J=11 S=1 E=14 J=12 S=1 E=15 J=13 S=1 E=16 J=14 S=1 E=17 J=15 S=1 E=47 J=16 S=2 E=3 J=17 S=3 E=1 J=18 S=3 E=51 J=19 S=4 E=48 J=20 S=5 E=48 J=21 S=6 E=48 J=22 S=7 E=48 J=23 S=8 E=48 J=24 S=9 E=48 J=25 S=10 E=48 J=26 S=11 E=48 J=27 S=12 E=48 J=28 S=13 E=48 J=29 S=14 E=48 J=30 S=15 E=48 J=31 S=16 E=48 J=32 S=17 E=48 J=33 S=18 E=50 J=34 S=19 E=50 J=35 S=20 E=50 J=36 S=21 E=50 J=37 S=22 E=50 J=38 S=23 E=50 J=39 S=24 E=50 J=40 S=25 E=50 J=41 S=26 E=50 J=42 S=27 E=50 J=43 S=28 E=50 J=44 S=29 E=50 J=45 S=30 E=50 J=46 S=31 E=50 J=47 S=32 E=2 J=48 S=33 E=2 J=49 S=34 E=2 J=50 S=35 E=2 J=51 S=36 E=2 J=52 S=37 E=2 J=53 S=38 E=2 J=54 S=39 E=2 J=55 S=40 E=2 J=56 S=41 E=2 J=57 S=42 E=2 J=58 S=43 E=2 J=59 S=44 E=2 J=60 S=45 E=2 J=61 S=46 E=2 J=62 S=47 E=18 J=63 S=47 E=19 J=64 S=47 E=20 J=65 S=47 E=21 J=66 S=47 E=22 J=67 S=47 E=23 J=68 S=47 E=24 J=69 S=47 E=25 J=70 S=47 E=26 J=71 S=47 E=27 J=72 S=47 E=28 J=73 S=47 E=29 J=74 S=47 E=30 J=75 S=47 E=31 J=76 S=47 E=49 J=77 S=48 E=50 J=78 S=49 E=32 J=79 S=49 E=33 J=80 S=49 E=34 J=81 S=49 E=35 J=82 S=49 E=36 J=83 S=49 E=37 J=84 S=49 E=38 J=85 S=49 E=39 J=86 S=49 E=40 J=87 S=49 E=41 J=88 S=49 E=42 J=89 S=49 E=43 J=90 S=49 E=44 J=91 S=49 E=45 J=92 S=49 E=46 J=93 S=50 E=2 |
||||||
Copyright © 1996-2005
Sprex, Inc.
All rights reserved.
|