;
| LSM Quick View All About Free Demo! Sign Up Manual Do It Exit |
 

LSMTM Manual

| Requirements | How To Use It | Pronunciation | Dictionaries | Phoneme Table | Reference Tone | Saving Money |
 

 

Requirements to Use the LSM

  1. A computer with a sound card and software capable of creating audio files in one of the formats below, at a sample rate of 16000 or 8000 samples per second:

  2. A word-processor/editing program which allows you to save files as plain ASCII text (no formatting)

  3. A way to reach this web page:
  4. A way to transfer your files to ftp.sprex.com:

 

 

How To Use the LSM

  1. Register as a demo or regular user (via Free Demo! or Register).

  2. Create a digital audio file containing a recording of the speech audio that you want to lip-synch.

  3. Create a plain ASCII file containing an accurate English text transcription of what is said in the digital audio file (no formatting or extraneous characters in the file: use emacs, vi on Unix, or save a file on a PC or Mac as plain text with carriage returns).

  4. Put the audio and text files onto the LSM fileserver, using ftp. (Details here) Be sure to name the files correctly

  5. Enter the LSM Console, fill out the form it provides, and click on the start button.

  6. The LSM will find the text file and check it against the LSM's dictionary. (If it doesn't know the pronunciation for some of the words used, it'll fail.)

  7. Then the LSM will run, and if the sound is okay and the transcription is accurate, it'll return successfully and provide the results back to you in the way you specify. If it fails, it may be because the sound quality or the transcription is poor, or the dictionary doesn't cover the word in the script.

Defining pronunciation

Words are written with letters, but they are spoken with phonemes. So the LSM defines pronunciation using phonemes. A phoneme is one of the set of sounds in a given language that make words different from each other. So "mask" and "mast" are different in their final *phoneme* (as well as the final *letter*, and as well as in their meaning). Theory-agnostic linguists often like to use the word "phone" instead of "phoneme".

To refer to a particular phoneme one needs a label, but unlike with English spelling, there are no universal standards for writing phonemes. The LSM uses one system, about as standard as any other, which you may refer to using the table below.

Dictionary Lookup

The LSM has a pronouncing dictionary of more than 100,000 words, which lists, for each word in the dictionary, the sequence of phonemes that represent its pronunciation.

When you operate the LSM, it looks in these dictionaries for all the words in the transcription. Any word that's not in either dictionary must be defined with its pronunciation, or the LSM won't know how what sounds to synchronize to.

Consequently, the LSM will ask you to give it the phoneme pronunciation for any words that it can't find. These will be entered into your user account dictionary, and after review, into the main dictionary, to improve it for all users, unless you specify otherwise. A WWW page will be displayed with each unknown word and an input widget to enter the phonemes. To make it easier to figure out how to write the word using the phoneme labels, the WWW page also displays a listing of the ten words that are alphabetically closest to the unknown word in the main dictionary, as well as the entire phoneme table and further suggestions.

Basically, you should sound out the word just like in elementary school and then look for labels in the phoneme table which have the same sound. It'll be a little difficult at first, just like learning to write was rather an adventure, but quickly you'll get the hang of it. And usually the alphabetically-similar words will have a lot of the right phonemes in them, so you don't have to look too hard.

Phoneme Table

The phoneme labels used by the LSM are exemplified in the following ARPABET table, which is ordered alphabetically by the letters in the phoneme labels.
VOWELS:   
  AA cot   AE cat   AH cut   AO caught AW out   AY kite  EH bet  ER curt 
  EY Kate  IH it    IY eat   OW coat   OY coy   UH cook  UW zoo  (AH0 is schwa)

CONSONANTS:         
  B  bow   CH chew  D  doe   DH thy    F  foe   G  go    HH him  JH judge
  K  kick  L  lull  M  mum   N  non    NG sing  P  pip   R  row  S  sassy
  SH shoe  T  toot  TH thigh V  vim    W  we    Y  you   Z  zoo  ZH asia

STRESS: 
  1 Primary (stressed)       2 Secondary                 0 unstressed

Reference Tone

It is often convenient to locate the zero'th audio frame not at time 0.00 seconds relative to the beginning of the file, but rather in the middle of a reference timing tone. A 1000 Hz tone of duration equal to one video frame (1/24 sec = 42 ms = 0.042 sec) is often used for this purpose.

The dictionary has an entry for Tone1000Hz, whose phonemization is /ih/ (as in "bit"), which has been found to synchronize reasonably well to such tones.

If you include in your transcription the "word", Tone1000Hz, then the LSM will use the middle of the segment synchronized to that tone as the reference zero time, which is also the middle of the reference zero frame.

If you have multiple Tone1000Hz "words" in your transcription, the LSM will assume that you are doing multiple audio segments, and will create additional new output files, one for each.

Saving Money

The LSM pricing system includes the "10-percent rule", which says that if you re-analyze the same audio file with a different text file, the fee for that operation is just 10% of the regular fee. This is good news, and it has a good reason behind it. Only successful synchronization operations are charged for at the regular rate, and unsuccessful or repeated synchronizations cost only 10 percent of that rate. This reduces your cost for failures and repeats by 90 percent, and it makes it so that the necessarily higher cost of analysing the more difficult cases is reduced to a level that is not that much more expensive than the easier cases.

Usually, if the transcription is fairly accurate and the audio is reasonably clean, a satisfactory synchronization is achieved on the first, sometimes the second, attempt. Quite difficult cases of non-linguistic sounds may require three or four or even more passes if you're quite persistent in hunting around for a suitable phonemic transcription for it. Actually, such sounds are best synchronized by hand anyway, by examining the audio waveform in a program that allows you to display and label audio signals and perhaps spectrograms. As you develop experience with the system, you'll learn what it can and can't do, and you'll gradually avoid pushing it to do what it can't, both because it can't and because there is a non-zero cost for continuing to push on it.

The reason it's 10 percent and not zero, however, is to provide motivation for users to do it right the first time, or at least after few repetitions. This prevents gratuitous and wasteful overuse of the system's computing resources on difficult cases, so that unnecessary costs for purchase of extra hardware and for added maintenance are not incurred - those costs would have to be shared by all users through higher rates. On the other hand, if you need to adjust the transcription a couple of times to get it right, this doesn't cost impossibly much more.

Tip:

The 10-percent rule implies that there is a smart way to use the LSM on difficult files. Files containing several difficult spots should be broken up into multiple files, isolating the problem areas, so that the whole thing won't need to be re-synchronized repeatedly just because one part of it is messing up. Instead of modifying one bit of the transcription of the entire file in order to resynchronize a difficult chunk, it is more efficient, more effective - and cheaper - if you isolate the difficult part into a much smaller file of its own. This is because the repeat work is done on a much smaller piece of audio, and because it isolates a problem area, so that you are more likely to be able to understand and fix it.

Security:

Whenever you or anyone else uses your password, you will receive an email message at the email address you specify during registration, which tells you about this usage. So if that someone is not you or someone authorized by you, then you should quickly get in touch with Sprex, Inc. and protest the charge, and we'll investigate and get you a different password.
 

 

Copyright © 1996-2004 Sprex, Inc. All rights reserved.
Modified: November 8, 2004