![]() |
Sprex![]() |
||||||
|
| |||||||
|
SprexTM Home TallyGramTM ANSRTM TeachionaryTM SprexPassTM Passwords PhonolyzeTM Lip-Synch MachineTM Consulting Services SprexOutTM TTS Co-WorkTM Exit |
PHONOLYZE(1) Phonolyze[tm] 1.0: Man Page Version 4 PHONOLYZE(1)
NAME
phonolyze - cut an audio file into phones [and words]
using a transcript.
SYNOPSIS
phonolyze [ -a wavefile ] [ -t scriptfile ] [ -o outpre
fix ] [ -w outwords ] [ -p outphones ] [ -d dictfile ] [
-c configfile ] [ -l ] [ -T tracebits ]
DESCRIPTION
phonolyze [tm] applies a phone-based segmentation algo
rithm to the waveform in wavefile and prints out a list of
the time boundaries of the phones (and words) found there.
phonolyze can also generate subsetted dictionaries from
the large default dictionary and a transcript ('phonolyze
-d new.dct -t tpt'; notice that -a wavefile is not speci
fied in this case).
If any words are missing from the supplied or generated
dictionary, a partial pronunciation dictionary suitable
for editing by a linguist is generated and stored in
/tmp/phonolyze.missing.dct. Fill in the pronunciations
for the unknown words with your own manually-edited phone
spelling for each word. See the section below for more
details on dictionary editing [not yet written]. After
editing this file, merge it with the subsetted dictionary
using cat|sort|uniq (modulo !!PHONES in the first line).
Then you can supply the edited, merged dictionary to
phonolyze as a command line argument with -d.
Assuming now that all the words are available in the dic
tionary, phonolyze now has a phone transcription for the
entire utterance, comprised of the concatenation of the
pronunciations provided. Next, phonolyze runs a segmenta
tion algorithm on the transcription to locate the time
boundaries of each phone and word. The results are
printed to phonolyze.words and phonolyze.phones (or files
named according to the -o, -d, and/or -p command-line
options).
OPTIONS
-a wavefile required Specify the name of a file contain
ing the audio to be processed. phonolyze handles MS .wav,
NIST Sphere, Sun .AU, and "raw" audio data files (without
a header).
-t scriptfile required Specify a file containing a word-
by-word transcription of the audiofile.
The word sequence provided is used to guide the segmenta
tion algorithm.
-d dictfile
An optional input dictionary file. If not specified,
phonolyze has its own system dictionary which provides
pronunciations for most words. If any words are missing,
they are written to a partial dictionary in
/tmp/phonolyze.missing.dct which can be edited manually by
the user and fed back in to phonolyze.
-l
Generate ESPS/Waves label files instead of the default
output format. The default output format is a two-column
ASCII table with phones in column 1, then a tab, then the
end-time of the phone in column 2. Since silences are
also transcribed, and since segments abut, it follows that
the start time of a segment is simply the end of the pre
ceding segment. Other output-format values are phonolyze
(the special phonolyze output format), and lab (an
ESPS/Waves label file), and timit (a TIMIT label
file).[TIMIT output format will be implemented in a later
version. In Version 1.0, only the first two formats will
be provided.]
-o outprefix
If this option is specified, then the output words and
phones will be written to outprefix.words and outpre
fix.phones.
-w outwords
If this option is specified, then the output words file
will be written to outwords
-w outphones
If this option is specified, then the output words file
will be written to outphones
-T tracebits
tracebits is the bitwise OR (sum) of the following values:
0 Minimal output 1 basic output 2 lattice editing
trace output 4 text normalization trace output 8
print lattice 16 show the results on stdout.
It is typical to run it with -T 0. If -T is not
specified, then all bits are set.
Input Data Formats
phonolyze requires 16000Hz 16-bit linear (PCM) mono audio
data files in the format of an MS WAV files, or as raw
data (little-endian signed short integers, without a file
header), or in NIST Sphere data format, or in HTK audio
file format. Phonolyze was written with the intention of
also supporting aiff (Macintosh) audio files, but that is
problematic at present; please communicate with us if you
would like to see that feature. The file format is
detected by filename extension except that NIST and MS
Wave files can both be named with the .wav filename exten
sion; these are disambiguated using the first four bytes
in the file header (NIST and RIFF, respectively). The
default file format is raw, which expects the filename
extension
Use the default if possible, and make sure the audio qual
ity is good. See SIGNAL QUALITY below.
If you are having file format compatibility difficulties,
let us know at info@sprex.com (ftp us a sample audio file)
and we will figure out how to convert your audio into what
phonolyze needs, probably using the sox command.
SIGNAL QUALITY
phonolyze works badly without good audio quality: 30dB SNR
and no clipping. With 16-bit samples (where numbers can
range over +-32767), the highest-energy peaks should be
+-10000 or more, and the silences should be < +-75. Dis
played in a waveform editor such as WaveSurfer or
Entropic's ESPS/Waves program, the silence floor is a flat
line (1 or 2 pixels thick horizontal line) in the signal
display, as against a waveform peaks of 200-pixels high.
If the highest-energy peaks are up to the very limit of
32767, that is clipping, which is VERY BAD for accuracy.
In a display it will show up as unnaturally flat peaks
abutting a clipped upper or lower limiting value.
EXAMPLES
To get a list of options and their explanation:
`phonolyze`
To generate a partial dictionary for the words in tpt:
`phonolyze -t tpt -d dct`
If the result is empty then there will be no output and an
error message on stderr asking, Where is the audio file?!
To generate a phone transcription for a transcribed audio
file, storing the result in phonolyze.words and
phonolyze.phones:
`phonolyze -a wav -t tpt -d dct`
To generate ESPS/Waves label files with time boundaries
for words and phones from audiofile according to the tran
script in trans and the dictionary in dict, storing the
output in phonolyze.words and phonolyze.phones:
`phonolyze -l -d dict -t trans -a audiofile`
BUGS
phonolyze doesn't work with aiff files.
AUTHOR
Tom Veatch, Sprex, Inc. All rights reserved. phonolyze
is licensed software; you may not use it except under a
license agreement with Sprex (send email to info@sprex.com
or call toll free 206-367-7741 for licensing information).
Product of Sprex, Inc. 19 March 2003 PHONOLYZE(1)
|
||||||
Copyright © 1996-2005
Sprex, Inc.
All rights reserved.
|