![]() |
Sprex![]() |
||||||||||||||||||||||
|
| |||||||||||||||||||||||
|
Garbage modeling Adding words Microphones Open Mic Real-time response Sample rate conversion Grammars, dictionaries, and "vocabularies" Other FAQs Other Sprex Products/Services Exit> |
This is more than one question. First, one must distinguish Computational-Real-Time (CRT) from Impressionistic-Real-Time (IRT). CRT means the processing time equals 1 times the duration of the speech input. IRT is the user's impression of not having to wait, which depends on intelligent user-interface design, stream-oriented processing, and CRT. If you keep the user waiting 20sec before responding after a 20sec spoken turn, then even though it's doing CRT, it feels slow, so it's not IRT. An important influence on IRT is whether the system does file or batch processing on the one hand or streaming processing on the other. File processing requires that the speech waveform input be completed before further processing is started. Streaming processing starts the decoding process on blocks of waveform as they become available. IRT can be improved by various interface-design techniques, such as playing a canned part of a response (which doesn't depend on the result of the decoding) after the speaker's turn is over, and during the decoding process. For example, here's a phrase that can buy two seconds of extra time for a slow decoder: "Thank you. You will now be connected to:" "How fast?" can only be answered precisely relative to CRT, not IRT, because the interface designer can take a 2*CRT decoder (on given hardware) and make it seem immediately responsive, and one could take a .5*CRT decoder (say, on faster hardware) and make it seem slow. Obviously, interface design is in the application programmer's hands. CRT depends on processor speed, grammar complexity, pruning levels, clarity and speed of the speaker's speech, etc. For example, two labs in 1999 produced benchmarks like the following.
Systems can trade memory for increased speed by storing large tables of artfully precomputed information. Speaker dependent trained systems can also run faster. |
||||||||||||||||||||||
Copyright © 1996-2005
Sprex, Inc.
All rights reserved.
|