sprex logo
Sprex
Banner Image
      
 

 

 

 

 

 

 

News: Downloadable PDA Demos

Introduction

Technical Description

How To

Documents

Product Brief

White Paper

Supported Platforms

Consulting

License Terms

NFL Demo

Downloads

Secure Travel Demo

FAQ's for Users
& Developers

Registration Form and License Agreement

SDK Access

Client Access

Feedback

Other Sprex Products/Services

Exit

 

 

 

 

 

 

 


Technical Description

ANSRTM means Action-oriented, Network-distributed Speech Recognition.

ANSRTM provides the infrastructure for a distributed universe of speech recognition based dialog systems.

The ANSRTM Developer Package (Sprex part number ANSR-DEV) comprises a software developer's toolkit, client programs, and a server engine licensed to handle up to 5 simultaneous conversations.

ANSRTM provides state-of-the-art accuracy, flexibility, and system performance.

ANSRTM Characteristics

  • Action Orientation: ANSRTM grammars allow you to associate pieces of code with parts of the grammar, so that when the user says something, the system carries out the correct action. ANSRTM is not for dictation, where the grammar must cover every imaginable sentence of English, but for task-oriented dialogs, where the grammar of what the user can say can be fully enumerated, so that the actions the system should carry out can also be specified concretely.
  • Network Distribution: ANSRTM provides an internet architecture for speech recognition. ANSRTM can certainly be used as a stand-alone module, with all of its components running on the same CPU, but in addition it has the special characteristic that it can be used in a network-distributed way, where the audio input/output device is located close to the user, but the recognition process, the agent that carries out the requested actions, and other processes that play a role in the conversation, may be located elsewhere on the internet.
  • Speaker Independence: No training is required on the particular voice for reasonably accurate results.
  • Speaker Adaptation: The system can be made to gradually improve accuracy with exposure to your voice.
  • Continuous Speech: No pauses between words.
  • Medium-Sized vocabulary: Grammars are up to 4000 words arranged in a word lattice or network; i.e., <4000-word active vocabulary size. This is more than sufficient for task-oriented applications where you can anticipate what range of things the user might say in response to some particular prompt.

Hardware resource benchmarks

  • 3.5 seconds of audio recognized in 1.0 seconds of real time.
  • in 8.5M of process RAM.
  • on a Pentium Pro 200.

State-of-the-art acoustic models are included

  • Supports "Clean" audio applications (16kHz sample rate) for US English, UK English, Spanish, Japanese, and German.
  • Supports telephony applications (8kHz sample rate) for US English.
  • Support for additional languages is possible through our cooperative development program. Initial funding from a customer is required to customize the data collection and language modelling systems for the language. Then with the cooperation of users, data will gradually be collected, cleaned, and modeled, eventually leading to full support for that language. See further discussion here.

ANSRTM supports a client/server architecture

  • audio i/o and optionally parameter extraction is done by the client.
  • resource-intensive core recognition work is in the server.
  • thin-pipe ( modem-bandwidth) data transmission of compressed parameters between client and server (available on request).
Extremely lightweight client devices (the client library is 288Kbytes) which are enabled for audio and networking can be used in combination with ANSRTM's high-end speech recognition technology on shared servers.

ANSRTM will improve itself by user-initiated upload of collected statistics and data to Sprex's retraining systems, which carry out system refinement, and enable later download of new speaker-adaptive transforms, returning further improvements in system performance.

ANSRTM Components

  • sprecd recognition server, runs as a daemon or service awaiting audio from clients.
  • sprecc recognition client (native), sends audio & grammar selection to the server.
  • sprexlet recognition client (Java applet), sends audio & grammar selection to the server.
  • state-of-the-art acoustic models & a large pronunciation dictionary.
  • grammar specification language (BNF with rule-associated actions).
  • gxc grammar/action compiler for generating:
    • word-lattices used by the speech recognizer.
    • an NLP agent which carries out actions associated with the recognition result.
  • A HAPI 2.0.x MVX object library is incorporated into ANSRTM, so that the sprecc client can be modified and recompiled for your needs.
  • audiocat for audio playback and recording.
  • agent to carry out actions associated with recognition results.
  • ddd to log inputs and actions.
Certain "Visible Source" elements of ANSRTM are made available with source code included to qualified developers under a Visible Source license agreement, so that such licensees may modify and enhance them with these modifications and enhancements licensed back to Sprex. They will then incorporated into ANSRTM for use by the community of ANSRTM users and developers. Other elements of ANSRTM, which embody proprietary information that cannot be exposed as source code, will be made available in compiled, object library form only, with appropriate header files to define the programming interfaces, so that the Visible Source programs can be recompiled against the libraries after modification.
We like comments. Send us a quick note here or with Co-Work.
   From:
Message:
        
Copyright © 1996-2005 Sprex, Inc. All rights reserved. Sprex, Speech in the Network, TallyGram and ANSR are trademarks of Sprex, Inc.
All other trademarks belong to their respective owners.
Date: May 17, 2008