ANNOTATING EVENT ANAPHORA:
A CASE STUDY
Tommaso Caselli and Irina Prodanof
ILC-CNR, Pisa
[email protected] [email protected]
LREC-10 – May, 19th, La Valletta, Malta
Outline





Motivations
Coreference annotation in TimeML
Annotating event anaphora: a preliminary
scheme
Annotation methodology and results
Lesson learned and future works
Motivations



Eventualities represent the building blocks of the
informative content of a document
Eventualities give rise to relations which create a rich
informative network.
 temporal relations
 sharing of participants
 factivity
 coreferential relations
Coreferential relations among eventualities plays an
important role for facilitating access to content and
extract relevant information
Coref. in TimeML




TimeML & ISO-TimeML are standards for the annotation
of events, temporal expressions and a set of relations
between these entities (temporal, subordinating and
aspectual relations)
Main contribution of TimeML: standard definition of event
and methodology for its annotation
It-TimeML: Italian adaptation of TimeML (updated
version on request) and part of ISO-TimeML
It-TimeML is currently used for the creation of the Italian
TimeBank (172 news articles from ISST, PAROLE and
Web, 67,140 tokens)
Coref. in TimeML (2)


TimeML tags involved: EVENT and TLINK (temporal
link)
TimeML has not a specific link for coreference
annotation
 workaround: use of a special value of the TLINK tag:
“identity”
 “identity” is used to:
connect two tokens which are part of a single
event instance (e.g. light verbs)
 connect coreferential relations between events,
namely set-subset

Coref. in TimeML (3) – Use of
“identity”
fare la spesa [to do shopping].
<EVENT id="e1">fare</EVENT> la
<EVENT id="e2">spesa</EVENT>
<TLINK lid="l1" eventInstanceID="e1"
relatedToEventInstance="e2“
relType="IDENTITY"/>
Coref. in TimeML – Use of
“identity” (3)
La sessione privata servira’ a tre adempimentij . Innanzitutto,
all’ approvazionej della proposta di Abete (ISST sole006).
The private session will be used for three [fulfillments]j . First, the
[approval]j of the proposal of Abete.
La <EVENT id="e1">sessione</EVENT> privata
<EVENT id="e2">servira’</EVENT> a tre
<EVENT id="e3">adempimenti</EVENT>.
<SIGNAL id="s1">Innanzitutto</SIGNAL>, all’
<EVENT id="e4>approvazione</EVENT> della
<EVENT id="e5">proposta</EVENT>di Abete.
<TLINK lid="l1" eventInstanceID="e4“
relatedToEventInstance="e3"
relType="IDENTITY"/>
Coref. in TimeML (4)
 The use of the value “identity” is not satisfactory since it

is NOT homogeneous
During the (current!) annotation effort for the creation of
the Italian TimeBank we have observed that this value
could be applied to other cases such as:
 synonyms
 hypernyms
 coreference (strict coreference – same referent in the
world)
Event Anaphora


Previous works: Hasler et al 2006; Bejan & Harabagiu
2008
Hasler et al. 2006: only NPs coreference (strict
definition), detailed guidelines – but NO specifications for
the annotation;



which events? ACE event frame (LIFE, CONFLICT,
MOVEMENT, JUSTICE….)
TimeML compliant
Bejan & Harabagiu 2008: event coreference as a side
effect of event structure.


Event coreference is considered when two predicates express
same predicate, synonyms or hypernyms and share same
arguments
TimeML compliant
Event Anaphora - Methodology (2)

Our approach:

no event frames nor event templates; all instances of
event annotated in the Italian TimeBank (TimeML
compliant);
 open-domain text/discourse
 coarse grained bottom up approach in the definition
of the annotation scheme
 reduced and limited set of guidelines  active
discovery of what is needed through annotation and
observations from the data
 event anaphora: strict coreference + indirect
coreference
Event Anaphora - Annotation
scheme (3)
TAGS
ATTRIBUTES
MARKABLE
ID, POS, DEFINITENESS, CLASS
EMPTY
ID
TOPIC
ID
LINK
ID, ANAPHORTYPE, SRC
JJJJJJIII
<MARKABLE>
= <EVENT> BUT extended  includes
MA
annotation of pronouns and adverbs.
Event Anaphora - Annotation
scheme (4)
<EMPTY> = to annotate cases of zero anaphora and
ellipsis (frequent in Italian)
<TOPIC> = to annotate entire portions of text; it
provides anchor to those linguistic entities which can
refer to discourse topic
“Stiamo ancora parlando, come certamente deve essere, e continueremo a
consultarci”j . James Baker, segretario al Tesoro americano, ha commentato
cosi’j i risultati dell’assemblea. (ISST els019)
“[We are still speaking, as it should be, and we will keep consulting]”j .
James Baker, the American Treasure secretary, commented [so]j the results of
the assembly.
Event Anaphora - Annotation
scheme (4)
<EMPTY> = to annotate cases of zero anaphora and
ellipsis (frequent in Italian)
<TOPIC> = to annotate entire portions of text; it
provides anchor to those linguistic entities which can
refer to discourse topic
<LINK> = it marks up an anaphoric relations. The
attribute “anaphorType” explicits which type of
anaporic relation “src” marks the anchor
Event Anaphora – Results (5)



Annotation tool: PALinkA (Orasan, 2003)
3 annotators / 1,792 tokens
no K scores
-Low agreement on the identification of
anaphora but relative good on the
anchors
- More specific guidelines and
information
-Event anaphora is a widespread
phenomenon
Lession Learned and Future Work

Event anaphora is a widespread phenomenon which must be
addressed in separate tasks



Relations between full event N, V, PP and Adj
no pronominal anaphoras
New annotation scheme:


2 tags: <EVENT> and <AnafLink>
different attributes for <EVENT>: FACTIVITY, GENERICITY,
POLARITY





relations between particular events according to the attributes' values
reduced type of anaphors (two values: direct vs. indirect)
Tracking of the participants: how to?
Event anaphora annotation as a further link in TimeML or as
a separate task which can be built upon the TimeML
annotation
New Tool: BAT (thanks to Marc Verhagen)
Lession Learned and Future Work Example
Lession Learned and Future Work Example
Thank you!
Scarica

Annotating Event Anaphora