Using a Generative Lexicon Resource to
Compute Bridging Anaphora in Italian:
preliminary observations and data
Tommaso Caselli
Istituto di Linguistica Computazionale – ILC-CNR Pisa
Dip. Di Linguistica “T. Bolelli”, Università degli Studi di Pisa
{tommaso
(dot) caselli (at) ilc (dot) cnr (dot) it}
CBA 2008, Barcelona, 14 November 2008
Outline

Motivations

Bridging in Italian: corpus-study

Introducing a Different Resource: PAROLE/SIMPLE/CLIPS

Preliminary Experiments and Evaluation

Conclusion & Future Work
Motivations:


Bridging anaphora is a very challenging phenomenon and their
resolution is essential to improve the performance of many NLP
applications (Q.A.; I.R. & I.E. and Summarizers);
So far, the use of (lexical) resources has concentrated on the
exploitation of semantic relations (meronymy, synonymy, hyponymy ...)
but the results present limitations:
the relation between the bridging anaphor and the anchor is not always
a semantic relation in classical terms

Relations between words are not randomly created by speakers. This
calls for resources based on strong theoretical frameworks which may
provide accounts on the way words combine and are related
Generative Lexicon (G.L.) & G.L.-based resources
Bridging anaphora: theoretical
assumptions
• it is“a type of indirect textual reference whereby a new referent is
introduced as an anaphoric not of but via the referent of an
antecedent expression” [Kleiber 1999: 339];
• it is
a class of
required
thethe
coherence
of fridge.
Yesterday
weinferences
went for a pic-nic,
buttoI maintain
forgot to put
beers in the
the discourse (Clark 1977);
• they give rise to three kinds of presupposition:
• the Uniqueness Presupposition;
• the Familiarity/Identifiability Presupposition and
• the Inferential Presupposition i.e.“the [N1] R [N2]” e.g.:
• N1 [the beers], N2 [a pic-nic] R= is_a_member_of).
Bridging anaphora: theoretical
assumptions (2)
• they are a matter of the local focus of the discourse for the identification
of their antecedents (Sidner 1979, Poesio 2003);
• 3 pragma-cognitive dimensions can be identified for their interpretation
(Korzen 2003):
• Lexical Semantics Dimension;
• Co-textual Dimension (discourse structure);
• Con-textual Dimension (scripts, frames, world knowledge).
Bridging anaphora:
corpus-study
METHODOLOGICAL NOTE
Two-folded corpus study:
 corpus of seventeen randomly chosen articles from the Italian
1) General
Full Definite
Phrases (FDNPs)
in Italian;
financial corpus-study
newspaper “ilon
Sole-24
Ore”, aNoun
workpackage
of the SI-TAL
project
2) A study on those cases of FDNPs which are instances of bridging

use of processing
anaphora
in Italian. requirements for the classification both of the
FDNPs in general and for bridging anaphors;
 Minimal vs Maximal NP (MUC-7);
 all instances of NPs (pronouns – including zero anaphora, lexical
expressions), VPs and frames have been considered as probable
anchors;
 pre- and post-nominal modifiers (adjectives, non-finite verb forms,
relative clauses and prepositional phrases) have been considered
as disambiguating clues.
Bridging anaphora:
corpus-study (1)
Full Definite Noun Phrases in Italian:
CLASS
NUMBER OF
ITEMS
PERCENTAGE
First Mention
833
58.61%
Direct Anaphora
170
12.03%
Bridging
299
21.17%
Possessives
36
2.54%
Idiom
25
1.62%
Doubt
49
3.47%
Total
1412
100%
Bridging anaphora:
corpus-study (2)
Bridging Anaphora in Italian:
CLASS OF
BRIDGING
FDNPs NUMBER
OF
PERCENTAGE
LEXICAL
SEMANTICS
> PRAGMATICS
>
ITEMS
DISCOURSE STRUCTURE
Lexical
119
39.79%
Event
18
6.02%
Rhetorical Relation
27
9.03%
109
36.45%
• 221 anchors are nominal entities & ~70% have lookback ranging 0-2
• 53.84% (119/221) of the anchors are previous Cbs/Cps (Centering Theory)
•
25.33% are proper names
•
34.03% are NPs of postmodifying PPs, i.e. the explicit argument of the
Inferential
Discourse
Topic
head noun of Lőbner’s FC2 e.g.:
26
8.69%
Total
299
100%
4) i due Paesi - i due partner commerciali: I negoziatori dei due Paesi hanno
annunciato che i colloqui “informali” in corso da giovedì scorso nella
capitale Usa hanno portato all' alba di martedì al compromesso[...].
Doppiato questo scoglio [...] i due partner commerciali hanno promesso di
procedere a passo spedito.
A Different Resource:
PAROLE/SIMPLE /CLIPS

As the corpus-study has shown more than 45% of the relations between
anchor – bridging anaphor are based on relations which are not strictly
lexical.
WHY USING (AGAIN) A LEXICAL RESOURCE?
SIMPLE is based on Generative Lexicon (Pustejovsky, 1995):
 formal framework which explains how senses are generated in the lexicon;
 the basic qualia (telic, constitutive, agentive and formal) enable the description of
the meaning of the word & captures orthogonal relations between semantic units;
 the span of semantic relations in the G.L. framework is much wider and it reduces
the need of world/pragmatic knowledge to explain semantic relations between
words
http://www.ilc.cnr.it/clips/
A Different Resource:
PAROLE/SIMPLE /CLIPS (2)
PAROLE/SIMPLE/CLIPS is the largest computational lexical knowledge base
of Italian language:
SEMANTICS
Lemmas:
verbs
common nouns
proper nouns
adjectives
Semantic Units:
verbs
common nouns
proper nouns
adjectives
45,437
2,830
14,088
526
1856
57,101
5,351
19,123
873
3,163
A Different Resource:
PAROLE/SIMPLE /CLIPS (3)
Extended
Ontological
Qualia
Type
SEMANTIC UNIT
F
R
E
A
L
T
A
T
U
R
I
O
E
S
N
S
Synonymy
Domain
Event
Derivation
Type
Semantic
Regular Polysemy
Properties
A Different Resource:
PAROLE/SIMPLE /CLIPS (4)
A Different Resource:
PAROLE/SIMPLE /CLIPS (5)
Qualia structure:
 the classical 4 qualia have been extended, up to 64 relations
 finer-grained specification of meaning dimensions
 from a single keyword it is possible to retrieve and extract a set of semantic units,
regardeless of their semantic type, which creates a rich semantic network in the text
FORMAL
5 semantic relations
CONSTITUTIVE
AGENTIVE
35 semantic relations 10 semantic relations
TELIC
14 semantic relations
PISTOLA (gun) – ARMAMORTE
(weapon)
(dead)
– SUICIDIO
(suicide)
THESE
SEMANTIC RELATIONS
ARE
TAKEN
TO
EXPRESS
R(oil)
ELEMENT
BENZINA
PROIETTILE
(petrol)
(bullet)
– PETROLIO
–THE
COLPIRE
(shoot) OF
SemRel= is_a PRESUPPOSITION
SemRel= resulting_state
THE INFERENTIAL
TO
RESOLVE
BRIDGING
SemRel=
SemRel=
derived_from
used_for ANAPHORS
A Different Resource:
PAROLE/SIMPLE /CLIPS (6)
1. i prezzi – al consumatore [the prices – the customer];
INFERENTIAL  indirect_telic + agent_verb
2. il processo – gli imputati [the trial – the convicted];
INFERENTIAL  member_of
3. essersi sparato – il suicidio [to shoot oneself – the suicide];
EVENT  resulting_state
4. fatto esplodere – the debris [exploded – the debris];
EVENT  result_of
5. condannare – il pubblico ministero [to condemn – the
attorney] EVENT  relates
6. il voto – l’elezione [the vote – the election] RHET.
RELATION  purpose
Experiments and evaluation
Experiment: 129 couple of bridging anaphor – anchor has been
selected from the corpus-study, corresponding to the following classes:
 Lexical
 Event
 Rhetorical Relations
 Inferential
Anaphoric relations involving N.E. have been excluded
Experiments and evaluation (2)
Bridging anaphor
Anchor
 WSD of the bridging anaphor
 selection of the anchor
- automatic retrieval of the semantic
relation
- maximum 2 semantic arcs allowed
SIMPLE
- direct connection between the 2 SemU
or between the 2 SemType.
Experiments and evaluation: results
Event
Rhet.
Relation
22 (17.05%) 11 (50.00%) 7 (31.82%)
2 (9.09%)
2 (9.09%)
19 (14.72%) 12 (63.20%) 5 (26.31%)
2 (10.52%)
0
Resource
# Bridging
SIMPLE
IWN
Lexical
Inferential
Unsatisfactory results BUT still better than using IWN
Reason: lots of the extended qualia relations have not been introduced into the
resource
The classes of Inferential and Rhetorical Relations are mostly resolved by 2 type
of qualia: CONSTITUTIVE & TELIC
Conclusion & Future Work
 the use of a GL based resource can be seen as a way of reducing the
need of extralinguistic knowledge;
the problem of bridging anaphora resolution becomes part of a more
general problem of identification of semantic relations between linguistic
elements.
 a resource with GL qualia relations encoded in it should not be
compared with a world-knowledge databases. GL-based relations are
dynamic: they allow to discover new relations between lexical items and
can provide an account for the creative use of language;
Conclusion & Future Work (2)
 qualia relations can represent new features for machine learning
approaches;
 GL pattern induction from a corpus-based study can improve the
resource by adding missing relation;
 extensive exploitation of the SemTypes can overcome the need of
introducing single SemUs.
ESPLODERE (explode) - MACERIE (debris)
ESPLODERE Resulting_state SemU maceria SemType Cause_change_of_state
MACERIA result_of SemU esplodere SemType Cause_change_of_state
SemType Cause_change_of_state SemU DETRITO
SemU …………
Thanks
The Model :
Main DRS
2) Bill gave a book to Maria.
The author is very famous.
Bill (Cb), book (Cp),
Maria(Cf),
Bill(x1)
book(x2)
Maria(x3)
give(x1,x3,x2)
…………….
author (Cb)
author(y1)
famous(y1)
DRS 2
Lexical Bridging
la pistola (the gun) - l' arma (the weapon)): E poi stupisce che nel tamburo
della pistola mancasse un proiettile[…]. Alcune tracce di ruggine, infatti,
farebbero pensare che l' arma fu collocata nella cintura dei pantaloni
almeno 4 o 5 giorni prima del ritrovamento del corpo .
l’esplosivo (the explosive) – la bomba (the bomb): Gli agenti sono risaliti al
furgone utilizzato per trasportare l' esplosivo nel garage e alla persona
che l' aveva affittato , Salameh. Il suo arresto , anche per aver
collaborato alla preparazione della bomba , fu seguito dalla cattura di
Ayyad , un chimico
Event
essersi sparato (to shoot oneself) - il suicidio (the suicide): dopo essersi
sparato una prima volta con la sua “Smith e Wesson” calibro 38,
carica a proiettili “rafforzati” […]. Ciò non esclude automaticamente l'
ipotesi del suicidio , ma avvalora quella di successive manomissioni,
effettuate subito dopo la morte.
rispose (to answer) - le domande (the questions): Nel 1993 , invece ,
rispose positivamente alle domande degli inquirenti perché
ritenne che il clima politico consentisse di parlare liberamente .
Rhetorical Relations
il voto (the vote) - l' elezione (the election): il voto di lista maggioritario
per l' elezione in assemblea dei componenti del cda (che per altro
verranno retribuiti anche in relazione ai risultati ottenuti dalla società)
due elementi (two elements) - il voto... (the vote) / i limiti (the limits).. [i tre
ministri] che hanno voluto introdurre nello statuto due elementi finora
sconosciuti nell' universo italiano delle privatizzazioni: il voto di lista
maggioritario per l' elezione in assemblea dei componenti del cda (che
per altro verranno retribuiti anche in relazione ai risultati ottenuti dalla
società) e, soprattutto, i limiti imposti al tetto azionario che vanno ben
oltre il vincolo del 5 per cento.
Inferential
quattro uomini (four men) - i quattro immigrati (the four immigrants): “Non potrò veder
crescere mio figlio perché quattro uomini hanno deciso di far saltare simboli
americani” [...]. Neppure la chiusura del processo ai quattro immigrati di origine
araba promette però di scrivere la parola fine
il tribunale (the court) - il giudice (the judge): La decisione del tribunale era parsa
scontata e non ha sorpreso neppure Mohammed Salameh, Ahmad Ajaj, Mahmud
Abouhalima e Nidal Ayyad, il gruppo di fondamentalisti islamici sotto processo. “Mi
aspetto il massimo della pena” aveva detto Ajaj poco prima di ascoltare il responso
del giudice [...].
la Cina (China) – Pechino (Bejing): gli Stati Uniti sono parsi più vicini a trovare una
soluzione di compromesso anche sulla controversia con la Cina sui diritti umani. Il
segretario di Stato Warren Christopher avrebbe infatti stabilito che Pechino ha
soddisfatto richieste specifiche alle quali gli Usa.
I terroristi hanno fatto esplodere una potentissima carica di
esplosivo nel garage dei piu' alti grattacieli di New York
: tra le macerie persero la vita sei persone e altre mille rimasero
ferite ,
Scarica

Document