![]() |
Sprex![]() |
||||||
|
| |||||||
|
Garbage modeling Adding words Microphones Open Mic Real-time response Sample rate conversion Grammars, dictionaries, and "vocabularies" Other FAQs Other Sprex Products/Services Exit> |
'Open mic' recognition means you don't have to push a button to start
or stop the recognizer running, but it can continuously recognize and
process utterances one after the other (or after any amount of silence
or other-directed speech). Typically a keyword is used to get the
attention of the recognizer, and then you can say something that it
will specifically process with the task grammar.
Here are two ways to build such a keyword-based open mic recognizer.
The first way isn't the best, but it's worth thinking about.
I. First Method
So, first build two grammars:
Grammar 1 has a garbage model and silence in a loop, plus a keyword.
Grammar 2 has the rest of the command/control grammar
along with optional silence, in a loop.
Start recognizing with Grammar 1 enabled. Get a traceback of the best
results so far every 100ms (10 frames or observations) or so, and
check to see if the keyword has been recognized. If not, just
continue onward. After a long time has been consumed by the "silence"
model, reset the memory state of the recognizer, so as to limit memory
creep (produced by the accumulation of recognition "tokens" and their
histories, which are keeping track of nothing useful -- old utterances
and silence).
Once the keyword is recognized, then immediately enable Grammar 2 and
disable Grammar 1.
Then while Grammar 2 is enabled, again you will periodically traceback
for recognition results, and now you will process the output
recognition strings as they are returned in the traceback. This
processing does the actual application work.
Finally, after a second threshold duration of silence has been
recognized, then Grammar 1 is switched back in, since the user isn't
saying anything. (So if you say Computer!... and then say nothing for
this threshold duration, then the system will not respond unless you
wake it up saying Computer!, again.)
This approach enables the system to continuously recognize without any
push-to-talk requirement.
Note that there is a potential problem in dealing with the grammar
switch from Grammar 1 into Grammar 2. Some command-control audio bits
will have already been processed by the system while Grammar 1 is
enabled (since there must be some delay before the look-back results
can be interpreted as having definitely found the keyword, and also
some added delay in the grammar switch itself). This lost audio may
represent important segments of speech, part of a command. Also there's
a bunch of complex logic in switching from one grammar to the other,
it would be nice to just use a single grammar to handle the complexities.
So consider the Second Method.
II. Second Method
So an improved approach to open mic recognition is to activate BOTH
grammars. Consider this:
( < $GARBAGE | $SIL | $KEYWORD [ $SIL ] { $GRAMMAR } > )
<> means one or more repetitions
{} means zero or more repetitions
[] means zero or one repetition
( a | b ) means a or b
This single grammar incorporates both grammars discussed above. Here
the significant constraint is that the keyword must immediately
precede any command/control utterances. The user may say garbage, but
that cannot be mistaken for a legitimate command unless preceded by
the keyword. The user could potentially say the keyword followed by
something outside of the grammar; the command-control grammar would be
bypassed, as allowed for in the grammar above by the {} notation.
To get results, the program must not wait for the completion of
recognition; nor can it use silence detection to automatically end the
input of audio data. Instead the program should call for intermediate
results periodically, trace back these "partial" recognition results,
and process them according to the application's requirements. The
memory creep issue mentioned above can be dealt with in the same way;
keep track when silence begins being recognized, and after some
threshold delay passes, containing only silence, then you can
reinitialize the memory (hapiRecComplete()). No final results
processing occurs.
Also because of beam-search pruning methods, it is possible for
recognition to fail at some point. If so the recognizer needs to be
restarted. If this occurs, you must trap that condition and restart
the recognizer.
|
||||||
Copyright © 1996-2005
Sprex, Inc.
All rights reserved.
|