News: Downloadable PDA Demos
Introduction
![]()
Technical Description
![]()
How To
![]()
Documents
![]()
Product Brief
![]()
White Paper
![]()
Supported Platforms
![]()
Consulting
![]()
License Terms
![]()
NFL Demo
![]()
Downloads
![]()
Secure Travel Demo
![]()
FAQ's for Users![]() &
Developers
![]()
Registration Form and License Agreement
![]()
SDK Access
![]()
Client Access
![]()
Feedback
![]()
Other Sprex Products/Services
![]()
Exit
|
 |
 |
 |
ANSRTM means
Action-oriented, Network-distributed Speech Recognition.
ANSRTM
provides the infrastructure for a distributed universe of speech
recognition based dialog systems.
The ANSRTM Developer Package (Sprex
part number ANSR-DEV) comprises a software developer's toolkit, client
programs, and a server engine licensed to handle up to 5 simultaneous
conversations.
ANSRTM
provides state-of-the-art accuracy, flexibility, and system
performance.
ANSRTM Characteristics
- Action Orientation: ANSRTM grammars allow you to associate
pieces of code with parts of the grammar, so that when the user
says something, the system carries out the correct action. ANSRTM
is not for dictation, where the grammar must cover every
imaginable sentence of English, but for task-oriented dialogs,
where the grammar of what the user can say can be fully
enumerated, so that the actions the system should carry out can
also be specified concretely.
- Network Distribution: ANSRTM provides an internet
architecture for speech recognition. ANSRTM can certainly be used
as a stand-alone module, with all of its components running
on the same CPU, but in addition it has the
special characteristic that it can be used in a
network-distributed way, where the audio input/output device is
located close to the user, but the recognition process, the agent
that carries out the requested actions, and other processes that
play a role in the conversation, may be located elsewhere on the
internet.
- Speaker Independence: No training is required on the
particular voice for reasonably accurate results.
- Speaker Adaptation: The system can be made to gradually
improve accuracy with exposure to your voice.
- Continuous Speech: No pauses between words.
- Medium-Sized vocabulary: Grammars are up to 4000 words
arranged in a word lattice or network; i.e., <4000-word active
vocabulary size. This is more than sufficient for task-oriented
applications where you can anticipate what range of things the
user might say in response to some particular prompt.
Hardware resource benchmarks
- 3.5 seconds of audio recognized in 1.0 seconds of real time.
- in 8.5M of process RAM.
- on a Pentium Pro 200.
State-of-the-art acoustic models are included
- Supports "Clean" audio applications (16kHz sample rate) for US English,
UK English, Spanish, Japanese, and German.
- Supports telephony applications (8kHz sample rate) for US English.
- Support for additional languages is possible through our
cooperative development program. Initial funding from a
customer is required to customize the data collection and
language modelling systems for the language. Then with
the cooperation of users, data will gradually be collected,
cleaned, and modeled, eventually leading to full support
for that language. See further discussion
here.
ANSRTM supports a client/server architecture
- audio i/o and optionally parameter extraction is done by the client.
- resource-intensive core recognition work is in the server.
- thin-pipe ( modem-bandwidth) data transmission of compressed parameters between client and server (available on request).
Extremely lightweight client devices (the client library is 288Kbytes)
which are enabled for audio and networking can be used in combination
with ANSRTM's high-end speech recognition technology on shared servers.
ANSRTM will improve itself by user-initiated upload of collected statistics
and data to Sprex's retraining systems, which carry out system
refinement, and enable later download of new speaker-adaptive
transforms, returning further improvements in system performance.
ANSRTM Components
- sprecd recognition server,
runs as a daemon or service awaiting audio from clients.
- sprecc recognition client (native), sends audio & grammar selection to the server.
- sprexlet recognition client (Java applet), sends audio & grammar selection to the server.
- state-of-the-art acoustic models & a large pronunciation dictionary.
- grammar specification language (BNF with rule-associated actions).
- gxc grammar/action compiler for generating:
- word-lattices used by the speech recognizer.
- an NLP agent which carries out actions associated
with the recognition result.
- A HAPI 2.0.x MVX object library is incorporated into ANSRTM, so that the
sprecc client can be modified and recompiled for your needs.
- audiocat for audio playback and recording.
- agent to carry out actions associated with recognition results.
- ddd to log inputs and actions.
Certain "Visible Source" elements of ANSRTM are made available with
source code included to qualified developers under a Visible Source license agreement, so that such
licensees may modify and enhance them with these
modifications and enhancements licensed back to Sprex. They
will then incorporated into ANSRTM for use by the community of ANSRTM
users and developers. Other elements of ANSRTM, which embody
proprietary information that cannot be exposed as source code, will be
made available in compiled, object library form only, with appropriate
header files to define the programming interfaces, so that the Visible
Source programs can be recompiled against the libraries after
modification.
|