sprex logo
Sprex
Banner Image
      
 

 

 

 

 

 

 

News: Downloadable PDA Demos

Introduction

Technical Description

How To

Documents

Product Brief

White Paper

Supported Platforms

Consulting

License Terms

NFL Demo

Downloads

Secure Travel Demo

FAQ's for Users
& Developers

Registration Form and License Agreement

SDK Access

Client Access

Feedback

Other Sprex Products/Services

Exit

 

 

 

 

 

 

 


Manual for sprecd

SPRECD(1)       Sprecd[tm] 1.4: Man Page Version 9      SPRECD(1)



NAME
       sprecd - SPeech RECognition Daemon

SYNOPSIS
       sprecd  [  -d  dictionary ]  [  -n network ] [ -r recogni-
       tion_type ] [ -o output_type ] [ -p port ] [ -z hz ] [  -v
       verbosity ]  [  -R  report_format ]  [ -BDLiasuxlbVh?c ] [
       audiofile ... ]

DESCRIPTION
       sprecd is a speaker-independent,  low-to-medium-vocabulary
       speech  recognition  daemon  program.   sprecd applies the
       specified type  of  recognition  to  the  specified  audio
       sources using the given grammar network/lattice to specify
       the possible word sequences to look for and the given dic-
       tionary  to specify the pronunciations of all the words in
       the grammar, then it prints on  stdout  the  desired  form
       recognition output (the likeliest name(s), with or without
       confidence or acoustic scores).

       If -D is specified, sprecd starts up  as  a  daemon  on  a
       specified  socket  number (4102); when audio is sent to it
       followed by a close() on the sending side of  the  socket,
       sprecd  recognizes  the  audio using the grammar and sends
       the results back across the socket, closes the socket, and
       waits for another connection.


OPTIONS
       -d   dictionary      The named dictionary file should have
       one or more pronunciations for each word or  name  in  the
       network.   If  any  word is found in the network without a
       corresponding entry in the dictionary, an error message is
       printed  to  stderr and sprecd exits with return value -1.
       Default is /usr/share/sprex/ansr/resources/1.dct,  a  list
       of 400 names.

       -n   network      This  file is a compiled grammar network
       representing all the word sequences that could be  spoken.
       Generate  this network/lattice using Entropic's HTK HParse
       program,  or  with   Sprex's   gxc   program.    [Default:
       /usr/share/sprex/ansr/resources/1.lat]

       -r   recognition_type      Use  the  specified recognition
       process type.  Possible values are:

       et     8000Hz triphone EPC models with  HAPI  MVX  decoder
              [default].

       mm     8000Hz  monophone MFCC models with HAPI MVX decoder
              (not included).

       mt     8000Hz triphone MFCC models with HAPI  MVX  decoder
              (not included).

       em     8000Hz  monophone  EPC models with HAPI MVX decoder
              (not included).


       -o  output_type     Hypothesis scoring for N-best  results
       is  not  presently implemented.  When it is, this descrip-
       tion will be valid: specify  N=1..6  for  N-best  results;
       specify  S  for  inclusion of acoustic scores.  The output
       will then be a list of N names in a column, preceded, if S
       is  specified,  by  the score for that name.  Note that 1S
       provides a *confidence score* for 1-best result, while 2S,
       3S,  etc.   ,  provide *acoustic scores* for each of the N
       best results.

       -p  port     Port number to  wait  on.   Default  port  is
       4102, which after installation you should see in /etc/ser-
       vices.

       -z  hz     hz is the sample rate of audio  data  in  Hertz
       (samples  per  second).   This  value  is by default 8000.
       16000 is also allowed, for  enhanced  performance.   Later
       versions  will  allow  anything  8000  or  above, and will
       ignore spectral information taken from the increased  time
       resolution above either 8000 or 16000Hz.

       -v   verbosity      Verbosity  of debugging output (0=low,
       1=medium or 2=high) [default: 0].

       -R  report_format     report_format is any combination (by
       addition)  of  the  following  ten values: 1: output words
       string; 2: allow word SIL at start/end of words string; 4:
       output symbols string; 8: allow symbol SIL at start/end of
       symbols string; 16: spaces  between  symbols;  32:  output
       confidences  string; 64: output confidence percentage; 128
       output  title  strings  for  words/symbols/conf*  selected
       strings  (always output in .scwiv files); 256: save utter-
       ance audio as .sw file and information as .scwiv  file  in
       utterance  dir  (utterance  filenames  wrap after 1000 are
       recorded by a sprecd during a  day);  512:  output  JDemo-
       style       symbols:uttfilefrag:samples:confidencepercent.
       report_format may include a `0x' prefix if base  16  (hex-
       adecimal)  is to be used.  report_format may include a `0'
       prefix if base  8  (octal)  is  to  be  used.   Otherwise,
       report_format is base 10 (decimal).

       -B       Enable   background  model  for  recognizer  [not
       default].

       -D     Daemon mode operation [not default].  Note that  if
       you  are  installing sprecd to respawn automatically using
       the init process in Unix controlled by /etc/inittab,  then
       do  NOT  use -D because inittab handles the daemonizing of
       processes itself and doesn't like it when sprecd tries  to
       become  a daemon by itself; also do NOT use attach an out-
       put file to look at stdout/stderr logs from sprecd , since
       inittab doesn't like that either.

       -L      In socket-reading operation, use 127.0.0.1 (local-
       host) as the host [not default].  This makes sprecd invis-
       ible  to  network-distributed clients; a client would only
       be able to connect if it thinks it's on 127.0.0.1  (local-
       host)  rather than on the IP address of the machine itself
       (even though those represent the same actual machine, they
       are different IP addresses and so are considered different
       from the socket-connection perspective).

       -i     Get audio from standard input [not default].

       -a     Get audio live from /dev/audio, iterate  [default].

       -s      Get  audio from client programs over a socket, one
       client connection per utterance.  Use the default or spec-
       ified port number [not default].

       -u     Consider audio data to be mu-law [default].

       -x      Expand mu-law data to linear 16-bit PCM within the
       driver [not default].

       -l     Consider audio data to be  raw  linear  16-bit  PCM
       [default: -u].

       -b     Swap byte order (only if -l is specified) [default:
       no byte swapping].

       -V     Version output [not default].

       -h     Print out help information [not default].

       -?      Print out help information [not default].

       -c     Confirm (try to) run with  default  settings  (will
       fail   unless   input  method  or  files  specified)  [not
       default].

       audiofile     This audio input file should be a raw, head-
       erless,  data  file containing a sequence of mulaw samples
       (1 byte companded sample representation) recorded at  8000
       samples per second (this is the standard telephone network
       digital audio representation).  If "-" is specified as the
       audio  file,  then the audio will be read from stdin, thus
       allowing insertion of sprecd into a command-line pipeline.
       If  -D  is  specified,  then  the  audio  will  instead be
       received on a socket.



EXAMPLES
       To get a list of options and  their  explanation,  use  no
       options:

              example% ./sprecd


       To  recognize  audio files data1.sw, data2.sw against dic-
       tionary dict and network net using standard 8kHz EPC  tri-
       phones using the HAPI MVX decoder:


              example%  ./sprecd  -d  dict  -n  net  -r  et  -n 2
              data1.sw data2.sw


       To recognize live audio from  the  sound  card  repeatedly
       using dict and net with 8kHz EPC triphones:


              example%  ./sprecd -a -l -z 16000 -d dict -n net -r
              et


       To run a sprecd internet service on port  4103  with  dict
       and  net handling clean audio (16khz data, appropriate for
       8kHz EPC triphones) using the HAPI MVX decoder:


              example% ./sprecd -s -l -z 16000 -d dict -n net  -p
              4103 -r et



BUGS
       None known.


AUTHORS
       Tom  Veatch  and  Fred  Kaudel,  Sprex,  Inc.   All rights
       reserved.  sprecd is licensed software, to  be  used  only
       under a license agreement with Sprex.



Product of Sprex, Inc.     27 March 2003                SPRECD(1)

We like comments. Send us a quick note here or with Co-Work.
   From:
Message:
        
Copyright © 1996-2005 Sprex, Inc. All rights reserved. Sprex, Speech in the Network, TallyGram and ANSR are trademarks of Sprex, Inc.
All other trademarks belong to their respective owners.
Date: August 7, 2008