sprex logo
Sprex
Banner Image
      
 

 

 

 

 

 

 

News: Downloadable PDA Demos

Introduction

Technical Description

How To

Documents

Product Brief

White Paper

Supported Platforms

Consulting

License Terms

NFL Demo

Downloads

Secure Travel Demo

FAQ's for Users
& Developers

Registration Form and License Agreement

SDK Access

Client Access

Feedback

Other Sprex Products/Services

Exit

 

 

 

 

 

 

 


Design for gxcTM

gxc is similar to HParse in generating HTK SLF word lattices from a
grammar expressed in a BNF-like grammar formalism.  However it has
limited expressivity to allow action generation from the grammar rules
a la yacc.

For an action to be well-defined, it needs to be able to refer
uniquely to the elements in the rule expansion.  Allowing optional
subsequences using [] or (), etc. breaks the mapping to actions, since
in a rule of the form a->b[c]d { func($1,$2,$3}; }, the optional
element c, corresponding to $2, may not occur, thus breaking the
association action, func().  Instead we must split it into two
separate optional expansions for the rule:

a->b d   { func2($1,$2); }
  |b c d { func3($1,$2,$3); }

Here the argument handling is well defined.  Thus options are to be
handled this way rather than by allowing skippable subsequences within
a single disjunct in a rule expansion.

Similarly Kleene-star and Kleene-plus operators on subsequences,
whether on single tokens as in b*, or grouped token sequences as in
(bc)+ or (c(de)+)*, create reference trouble.  Since the modelling of
no written or uttered complete sentence or utterance in human history
requires infinite recursion or iteration, we assert that a
fully-expanded list of disjuncts, each with a different number of
copies of the iterated element, and each allowing unambiguous
reference to its component elements in its associated action
specification, is adequate to represent any practically useful grammar.

Thus, for example, instead of the inadequately structured but very
simple grammar, "$number = $digit*" for phone number strings, use the
following more structured and fully specified grammar.

TYPE "char *" $digit $AC $N7 $ONE $phone_number
%%
$digit = one {"1"} 
	| two {"2"}
	| three {"3"}
	| four {"4"}
	| five {"5"}
	| six {"6"}
	| seven {"7"}
	| eight {"8"}
	| nine {"9"}
	| oh | zero {"0"}
	;
$AC = $digit $digit $digit { strcat3($1,$2,$3); }
$N7 = $digit $digit $digit $digit $digit $digit $digit
	{ strcat7($1,$2,$3,$4,$5,$6,$7); }
$ONE = one {"1"};
$phone_number = 
	$N7             { call(LOCAL,LOCAL_AREA_CODE,$1); }
	| $AC $N7       { call(LOCAL,$1,$2); }
	| $ONE $AC $N7  { call($1,$2,$3); }
	;

In our system, then, the grammar specification formalism is fairly
restricted: rules consist of disjuncts, disjuncts consist of
concatenated elements, elements are words or non-terminals.
Optionality is encoded by expanding both alternatives; optional
repetition is encoded by expanding the desired number of alternatives.
Thus we capture (to only finite depth) the expressivity of finite-state
grammars which Kleene demonstrated to be fully captured by
the three operators: 
   AND (concatenation of conjuncts), 
   OR (branching to alternatives),
   STAR (repetition or looping)

More formally:

A grammar is a list of rules; the final rule is the top level rule,
and the remaining rules define the non-terminals, each before it is
defined before it is used within any rule (no self-reference or recursion)

A rule is a non-terminal followed by EQUALS followed by one or more
BAR-separated disjuncts, and ending with SEMICOLON.  Each disjunct
is an expansion followed by an optional action.

An expansion is a concatenation of one or more elements.

An action is C code surrounded by {}, where the value of $1, $2,
etc., is as in yacc: 
	$1 is the value of the first concatenated element;
	$2 is the value of the second, 
	etc.  

$$, known in yacc terminology as the value returned by a
non-terminal's action, is here used only implicitly, by translating
the final expression, call it E, of the action to the yacc-equivalent
of "$$=E;", or to the C equivalent of return(E);

The data type returned by a non-terminal's action is to be 
declared in a preamble of lines of the form
TYPE "type string" $A $B $C

The data type of a literal word as well as of the brackets used is
"char *".

If no action is specified, then the next action is used; but
if there is no next action within that rule, the action is replaced by
{$$=$1;}.  If the action is a constant (say X), then it is rewritten
as {$$=X}.

Whitespace is ignored except to separate tokens.  

An element is a non-terminal or a token.  

A non-terminal is a $-initial identifier.
Non-terminals must be defined before they are used.

A token is a literal word, not including white-space.

Copyright © 1996-2005 Sprex, Inc. All rights reserved. Sprex, Speech in the Network, TallyGram and ANSR are trademarks of Sprex, Inc.
All other trademarks belong to their respective owners.
Date: August 27, 2008