The Rule-based Parser of the NLP
Group of the University of Torino
Leonardo Lesmo
Dipartimento di Informatica and
Centro di Scienze Cognitive,
Università di Torino,
Italy
Email: [email protected]
Goals
 Wide-coverage tool
 Domain-independence
 Extensibility to semantics
Approach
 Manually developed rules
 Two phases: Chunking and subcategorization
 Procedural analysis of conjunctions and of
identification of verbal dependents
TULE (Turin University
Linguistic Environment)
Text
Token Automaton
TOKENIZER
Splits the text into words,
numbers, punctuation marks
Tokens
Morphological
dictionary
Suffix tables
Tagging rules
Parsing rules
Verbal Caseframes
DICTIONARY
LOOKUP
Extracts all lexical
interpretations of each
token
Sets of lexical items
POS TAGGER
Chooses one lexical
interpretation
Lexical items
DEPENDENCY
PARSER
Parse Tree
Establishes the connections
between lexical items
The grammar
 Rule-based dependency grammar
 Chunking (non-verbal groups) + verbal
subcategorization frames
 Output: a projective tree represented as pointers
to parents, including some null elements
(understood items – e.g. pro-drop - and traces)
Parser Architecture
Lexical Items
Chunking rules
CHUNKING
Splits the text into groups of
strictly connected words
Chunked text
Procedural
preference rules 1
ANALYSIS OF
CONJUNCTIONS
Connects chunks linked
by conjunctions, to form
larger chunks
Chunked text
Procedural
preference rules 2
Verb classes
Verbal Caseframes
SEGMENTATION
Determines the dependents of
verbs
Lexical items
VERBAL
ATTACHMENT
Parse Tree
Determines the role (arc
labels) of the verbal
dependents
An example
Example: Slitta a Tirana la decisione sullo stato di emergenza.
(The decision on the emergency status in Tirana has been delayed)
1
2
3
4
5
6
Slitta (SLITTARE VERB MAIN IND PRES INTRANS 3 SING) [0;TOP-VERB]
a (A PREP MONO)
[1;PREP-RMOD]
Tirana (TIRANA NOUN PROPER F SING ££CITY) [2;PREP-ARG]
la (IL ART DEF F SING)
[1;VERB-SUBJ]
decisione (DECISIONE NOUN COMMON F SING DECIDERE INTRANS) [4;DET+DEF-ARG]
sullo ((SU PREP MONO)
[5;PREP-RMOD]
1: Slitta
6.10
(IL ART DEF M SING))
[6;PREP-ARG]
Prep-rmod
Verb-subj
7 stato (STATO NOUN COMMON M SING)
[6.10;DET+DEF-ARG]
8 di (DI PREP MONO)
[7;PREP-RMOD]
2: a
4: la
9 emergenza (EMERGENZA NOUN COMMON F SING) [8;PREP-ARG]
10 . (#\. PUNCT)
[1;END]
Prep-arg
Det+def-arg
Lexical Items
Parse Tree Infos
3: Tirana
5; decisione
Prep-rmod
6: su
Prep-arg
6.10: lo
Stato di emergenza
Chunking
Example: Puoi dirmi che spettacoli di cabaret posso vedere domani?
(Can you tell me what cabaret plays I can see tomorrow?)
PuoiV-modal-2nd-sing-pres dirV-inf [miPron-1st-dative]Pron
[cheAdj-interr spettacoliNoun [diPrep cabaretNoun]P-group ]N-group
possoV-modal-1st-sing-pres vedereV-inf [domaniAdv]A-group?
Chunking Rules



Chunking rules are grouped in packets.
Each packet is associated with a lexical category, and describes
the “chunkable” possible dependents of words of that category.
Chunkable means a dependent handled during chunking (e.g.
auxiliaries, but not arguments of verbs)
A chunk rule
Packet
(governing word)
feature
(constrains applicability)
Position of dep (and possible
words separating head from dep)
(NOUN common
(precedes (ADJ qualif T (#\- #\' #\"))
(ADJ ((type qualif) Category of possible dep
(agree)))
(and constraints on it)
ADJC+QUALIF-RMOD))
Label of connecting arc
Conjunctions



When a coordinating conjunction is found, all following and
preceding chunks are collected
All pairs are built, and the best one is chosen according to criteria
based on structural similarity and distance
Special treatment for verbs
Example: Ho incontrato Marco e Lucia e li ho salutati
(I met Marco e Lucia and I greeted them)
HoV-aux incontratoV-main
[MarcoNoun-Proper]Noun eConj-coord [LuciaNoun-Proper]Noun
eConj-coord [liPron-pers ]Pron hoV-aux salutatiV-main
Segmentation

For each verb (going from left to right):



Look for possible dependents (on its right and left)
On the left, the search is blocked from the previous verb
On the right, some “barriers” are defined to stop the search (for
instance, a subordinating conjunction acts as a barrier)
PuoiV-modal-2nd-sing-pres { dirV-inf [miPron-1st-dative]Pron
{[cheAdj-interr spettacoliNoun [diPrep cabaretNoun]P-group ]N-group
possoV-modal-1st-sing-pres {vedereV-inf [domaniAdv]A-group? } } } }
Verbal Subcategorization
The subcategorization classes:
verbs
bisognare
need
camminare
walk
dovere
must
potere
can
dictionary
nosubj- ssubj-infverbs
verbs
empty-modal
modal
subjverbs
objverbs
basic-trans
trans
trans-indobj
subcategorization classes
indobjverbs
Example subcategorization class definitions:
(subj-verbs (intrans) (verbs)
; *** verbs with a subject. Definition of subject
( verb-subj ((noun (agree))
(art (agree))
(pron (not (word quale) (type relat)) (case lsubj) (agree))
(adj (type (indef demons deitt interr poss)) (agree))
(num (agree))
(prep (word in) (down (cat pron) (type indef)) (agree)))))
(ssubj-inf-verbs () (verbs)
; *** verbs with an inf-verb sentential subject
( verb-subj
((verb (mood infinite) (agree)))))
(empty-modal () (no-subj-verbs)
; *** modals without subject
( verb-indcompl-modal
((verb (mood infinite)))))
Transformations:
basic class (e.g. trans)
transformed classes
(e.g. trans,
trans+passivization,
trans+infinitivization,
trans+prodrop,
trans+passivization+infinitivization,
….. )
Example transformation:
(infinitivization
replacing
(subj-verbs)
(is-inf-form tr-verb v-casefr)
(cancel-case s-subj))
Some statistics
 Chunking rules
Total: 295 rules
Common: 250 rules
English: 34 rules
Italian: 7 rules
Spanish + Catalan: 4 rules
 Base
Subcategorization
Total: 118 classes
Abstract: 21 classes
plus verbal locutions
Italian: 40 classes
English: 1 class
 Derived surface case frames
2653 case frames
Conclusions

Test of the parser on other languages, using the same grammar
augmented with extra rules (see previous slide)

Partial use of semantic information (about 400 words classified
according to a semantic taxonomy)

The parser has been used in a project involving spoken and
written linguistic interaction with a user. It has been interfaced
with an repository of semantic knowledge to build a meaning
representation.
Scarica

Socially Rational Agents and Collective Intentionality