![]() |
Sprex![]() |
||||||
|
| |||||||
|
News: Downloadable PDA Demos
Introduction |
SPRECD(1) Sprecd[tm] 1.4: Man Page Version 9 SPRECD(1)
NAME
sprecd - SPeech RECognition Daemon
SYNOPSIS
sprecd [ -d dictionary ] [ -n network ] [ -r recogni-
tion_type ] [ -o output_type ] [ -p port ] [ -z hz ] [ -v
verbosity ] [ -R report_format ] [ -BDLiasuxlbVh?c ] [
audiofile ... ]
DESCRIPTION
sprecd is a speaker-independent, low-to-medium-vocabulary
speech recognition daemon program. sprecd applies the
specified type of recognition to the specified audio
sources using the given grammar network/lattice to specify
the possible word sequences to look for and the given dic-
tionary to specify the pronunciations of all the words in
the grammar, then it prints on stdout the desired form
recognition output (the likeliest name(s), with or without
confidence or acoustic scores).
If -D is specified, sprecd starts up as a daemon on a
specified socket number (4102); when audio is sent to it
followed by a close() on the sending side of the socket,
sprecd recognizes the audio using the grammar and sends
the results back across the socket, closes the socket, and
waits for another connection.
OPTIONS
-d dictionary The named dictionary file should have
one or more pronunciations for each word or name in the
network. If any word is found in the network without a
corresponding entry in the dictionary, an error message is
printed to stderr and sprecd exits with return value -1.
Default is /usr/share/sprex/ansr/resources/1.dct, a list
of 400 names.
-n network This file is a compiled grammar network
representing all the word sequences that could be spoken.
Generate this network/lattice using Entropic's HTK HParse
program, or with Sprex's gxc program. [Default:
/usr/share/sprex/ansr/resources/1.lat]
-r recognition_type Use the specified recognition
process type. Possible values are:
et 8000Hz triphone EPC models with HAPI MVX decoder
[default].
mm 8000Hz monophone MFCC models with HAPI MVX decoder
(not included).
mt 8000Hz triphone MFCC models with HAPI MVX decoder
(not included).
em 8000Hz monophone EPC models with HAPI MVX decoder
(not included).
-o output_type Hypothesis scoring for N-best results
is not presently implemented. When it is, this descrip-
tion will be valid: specify N=1..6 for N-best results;
specify S for inclusion of acoustic scores. The output
will then be a list of N names in a column, preceded, if S
is specified, by the score for that name. Note that 1S
provides a *confidence score* for 1-best result, while 2S,
3S, etc. , provide *acoustic scores* for each of the N
best results.
-p port Port number to wait on. Default port is
4102, which after installation you should see in /etc/ser-
vices.
-z hz hz is the sample rate of audio data in Hertz
(samples per second). This value is by default 8000.
16000 is also allowed, for enhanced performance. Later
versions will allow anything 8000 or above, and will
ignore spectral information taken from the increased time
resolution above either 8000 or 16000Hz.
-v verbosity Verbosity of debugging output (0=low,
1=medium or 2=high) [default: 0].
-R report_format report_format is any combination (by
addition) of the following ten values: 1: output words
string; 2: allow word SIL at start/end of words string; 4:
output symbols string; 8: allow symbol SIL at start/end of
symbols string; 16: spaces between symbols; 32: output
confidences string; 64: output confidence percentage; 128
output title strings for words/symbols/conf* selected
strings (always output in .scwiv files); 256: save utter-
ance audio as .sw file and information as .scwiv file in
utterance dir (utterance filenames wrap after 1000 are
recorded by a sprecd during a day); 512: output JDemo-
style symbols:uttfilefrag:samples:confidencepercent.
report_format may include a `0x' prefix if base 16 (hex-
adecimal) is to be used. report_format may include a `0'
prefix if base 8 (octal) is to be used. Otherwise,
report_format is base 10 (decimal).
-B Enable background model for recognizer [not
default].
-D Daemon mode operation [not default]. Note that if
you are installing sprecd to respawn automatically using
the init process in Unix controlled by /etc/inittab, then
do NOT use -D because inittab handles the daemonizing of
processes itself and doesn't like it when sprecd tries to
become a daemon by itself; also do NOT use attach an out-
put file to look at stdout/stderr logs from sprecd , since
inittab doesn't like that either.
-L In socket-reading operation, use 127.0.0.1 (local-
host) as the host [not default]. This makes sprecd invis-
ible to network-distributed clients; a client would only
be able to connect if it thinks it's on 127.0.0.1 (local-
host) rather than on the IP address of the machine itself
(even though those represent the same actual machine, they
are different IP addresses and so are considered different
from the socket-connection perspective).
-i Get audio from standard input [not default].
-a Get audio live from /dev/audio, iterate [default].
-s Get audio from client programs over a socket, one
client connection per utterance. Use the default or spec-
ified port number [not default].
-u Consider audio data to be mu-law [default].
-x Expand mu-law data to linear 16-bit PCM within the
driver [not default].
-l Consider audio data to be raw linear 16-bit PCM
[default: -u].
-b Swap byte order (only if -l is specified) [default:
no byte swapping].
-V Version output [not default].
-h Print out help information [not default].
-? Print out help information [not default].
-c Confirm (try to) run with default settings (will
fail unless input method or files specified) [not
default].
audiofile This audio input file should be a raw, head-
erless, data file containing a sequence of mulaw samples
(1 byte companded sample representation) recorded at 8000
samples per second (this is the standard telephone network
digital audio representation). If "-" is specified as the
audio file, then the audio will be read from stdin, thus
allowing insertion of sprecd into a command-line pipeline.
If -D is specified, then the audio will instead be
received on a socket.
EXAMPLES
To get a list of options and their explanation, use no
options:
example% ./sprecd
To recognize audio files data1.sw, data2.sw against dic-
tionary dict and network net using standard 8kHz EPC tri-
phones using the HAPI MVX decoder:
example% ./sprecd -d dict -n net -r et -n 2
data1.sw data2.sw
To recognize live audio from the sound card repeatedly
using dict and net with 8kHz EPC triphones:
example% ./sprecd -a -l -z 16000 -d dict -n net -r
et
To run a sprecd internet service on port 4103 with dict
and net handling clean audio (16khz data, appropriate for
8kHz EPC triphones) using the HAPI MVX decoder:
example% ./sprecd -s -l -z 16000 -d dict -n net -p
4103 -r et
BUGS
None known.
AUTHORS
Tom Veatch and Fred Kaudel, Sprex, Inc. All rights
reserved. sprecd is licensed software, to be used only
under a license agreement with Sprex.
Product of Sprex, Inc. 27 March 2003 SPRECD(1)
|
||||||
Copyright © 1996-2005
Sprex, Inc.
All rights reserved. Sprex, Speech in the Network, TallyGram and ANSR are trademarks of Sprex, Inc.
|