sprex logo
Sprex
Banner Image
      
 

 

 

 

 

 

 

SprexTM Home

TallyGramTM

ANSRTM

TeachionaryTM

SprexPassTM Passwords

PhonolyzeTM

Lip-Synch MachineTM

Consulting Services

SprexOutTM TTS

Co-WorkTM

Exit

 

 

 

 

 

 

 


Manual for phonolyze

PHONOLYZE(1)  Phonolyze[tm] 1.0: Man Page Version 4  PHONOLYZE(1)



NAME
       phonolyze  -  cut  an  audio  file into phones [and words]
       using a transcript.

SYNOPSIS
       phonolyze [ -a wavefile ] [ -t scriptfile ] [  -o  outpre­
       fix ]  [  -w outwords ] [ -p outphones ] [ -d dictfile ] [
       -c configfile ] [ -l ] [ -T tracebits ]

DESCRIPTION
       phonolyze [tm] applies a  phone-based  segmentation  algo­
       rithm to the waveform in wavefile and prints out a list of
       the time boundaries of the phones (and words) found there.

       phonolyze  can  also  generate subsetted dictionaries from
       the large default dictionary and a transcript  ('phonolyze
       -d  new.dct -t tpt'; notice that -a wavefile is not speci­
       fied in this case).

       If any words are missing from the  supplied  or  generated
       dictionary,  a  partial  pronunciation dictionary suitable
       for editing by a  linguist  is  generated  and  stored  in
       /tmp/phonolyze.missing.dct.   Fill  in  the pronunciations
       for the unknown words with your own manually-edited  phone
       spelling  for  each  word.  See the section below for more
       details on dictionary editing [not  yet  written].   After
       editing  this file, merge it with the subsetted dictionary
       using cat|sort|uniq (modulo !!PHONES in the  first  line).
       Then  you  can  supply  the  edited,  merged dictionary to
       phonolyze as a command line argument with -d.

       Assuming now that all the words are available in the  dic­
       tionary,  phonolyze  now has a phone transcription for the
       entire utterance, comprised of the  concatenation  of  the
       pronunciations provided.  Next, phonolyze runs a segmenta­
       tion algorithm on the transcription  to  locate  the  time
       boundaries  of  each  phone  and  word.   The  results are
       printed to phonolyze.words and phonolyze.phones (or  files
       named  according  to  the  -o,  -d, and/or -p command-line
       options).


OPTIONS
       -a  wavefile required Specify the name of a file  contain­
       ing the audio to be processed.  phonolyze handles MS .wav,
       NIST Sphere, Sun .AU, and "raw" audio data files  (without
       a header).

       -t   scriptfile required Specify a file containing a word-
       by-word transcription of the audiofile.

       The word sequence provided is used to guide the  segmenta­
       tion algorithm.

       -d  dictfile

       An  optional  input  dictionary  file.   If not specified,
       phonolyze has its own  system  dictionary  which  provides
       pronunciations  for most words.  If any words are missing,
       they   are   written   to   a   partial   dictionary    in
       /tmp/phonolyze.missing.dct which can be edited manually by
       the user and fed back in to phonolyze.

       -l

       Generate ESPS/Waves label files  instead  of  the  default
       output  format.  The default output format is a two-column
       ASCII table with phones in column 1, then a tab, then  the
       end-time  of  the  phone  in column 2.  Since silences are
       also transcribed, and since segments abut, it follows that
       the  start time of a segment is simply the end of the pre­
       ceding segment.  Other output-format values are  phonolyze
       (the   special  phonolyze  output  format),  and  lab  (an
       ESPS/Waves  label  file),  and  timit   (a   TIMIT   label
       file).[TIMIT  output format will be implemented in a later
       version.  In Version 1.0, only the first two formats  will
       be provided.]

       -o  outprefix

       If  this  option  is  specified, then the output words and
       phones will be  written  to  outprefix.words  and  outpre­
       fix.phones.

       -w  outwords

       If  this  option  is specified, then the output words file
       will be written to outwords

       -w  outphones

       If this option is specified, then the  output  words  file
       will be written to outphones

       -T  tracebits

       tracebits is the bitwise OR (sum) of the following values:


              0 Minimal output 1 basic output 2  lattice  editing
              trace  output  4  text normalization trace output 8
              print lattice 16 show the results on stdout.

              It is typical to run it with -T 0.  If  -T  is  not
              specified, then all bits are set.


Input Data Formats
       phonolyze  requires 16000Hz 16-bit linear (PCM) mono audio
       data files in the format of an MS WAV  files,  or  as  raw
       data  (little-endian signed short integers, without a file
       header), or in NIST Sphere data format, or  in  HTK  audio
       file  format.  Phonolyze was written with the intention of
       also supporting aiff (Macintosh) audio files, but that  is
       problematic  at present; please communicate with us if you
       would like to  see  that  feature.   The  file  format  is
       detected  by  filename  extension  except that NIST and MS
       Wave files can both be named with the .wav filename exten­
       sion;  these  are disambiguated using the first four bytes
       in the file header (NIST  and  RIFF,  respectively).   The
       default  file  format  is  raw, which expects the filename
       extension

       Use the default if possible, and make sure the audio qual­
       ity is good.  See SIGNAL QUALITY below.

       If  you are having file format compatibility difficulties,
       let us know at info@sprex.com (ftp us a sample audio file)
       and we will figure out how to convert your audio into what
       phonolyze needs, probably using the sox command.



SIGNAL QUALITY
       phonolyze works badly without good audio quality: 30dB SNR
       and  no  clipping.  With 16-bit samples (where numbers can
       range over +-32767), the highest-energy  peaks  should  be
       +-10000  or more, and the silences should be < +-75.  Dis­
       played  in  a  waveform  editor  such  as  WaveSurfer   or
       Entropic's ESPS/Waves program, the silence floor is a flat
       line (1 or 2 pixels thick horizontal line) in  the  signal
       display,  as  against a waveform peaks of 200-pixels high.
       If the highest-energy peaks are up to the  very  limit  of
       32767,  that  is clipping, which is VERY BAD for accuracy.
       In a display it will show up  as  unnaturally  flat  peaks
       abutting a clipped upper or lower limiting value.


EXAMPLES
       To get a list of options and their explanation:

              `phonolyze`


       To generate a partial dictionary for the words in tpt:

              `phonolyze -t tpt -d dct`

       If the result is empty then there will be no output and an
       error message on stderr asking, Where is the audio  file?!

       To  generate a phone transcription for a transcribed audio
       file,  storing   the   result   in   phonolyze.words   and
       phonolyze.phones:

              `phonolyze -a wav -t tpt -d dct`


       To  generate  ESPS/Waves  label files with time boundaries
       for words and phones from audiofile according to the tran­
       script  in  trans  and the dictionary in dict, storing the
       output in phonolyze.words and phonolyze.phones:

              `phonolyze -l -d dict -t trans -a audiofile`




BUGS
       phonolyze doesn't work with aiff files.


AUTHOR
       Tom Veatch, Sprex, Inc.  All rights  reserved.   phonolyze
       is  licensed  software;  you may not use it except under a
       license agreement with Sprex (send email to info@sprex.com
       or call toll free 206-367-7741 for licensing information).



Product of Sprex, Inc.     19 March 2003             PHONOLYZE(1)

Comments?
If anything can be changed to better address your wishes, we are eager to hear about it. Please share any reactions you may have.
   From:
Message:
        
Copyright © 1996-2005 Sprex, Inc. All rights reserved.
Date: July 25, 2008