ANNOTATING EVENT ANAPHORA: A CASE STUDY Tommaso Caselli and Irina Prodanof ILC-CNR, Pisa [email protected] [email protected] LREC-10 – May, 19th, La Valletta, Malta Outline Motivations Coreference annotation in TimeML Annotating event anaphora: a preliminary scheme Annotation methodology and results Lesson learned and future works Motivations Eventualities represent the building blocks of the informative content of a document Eventualities give rise to relations which create a rich informative network. temporal relations sharing of participants factivity coreferential relations Coreferential relations among eventualities plays an important role for facilitating access to content and extract relevant information Coref. in TimeML TimeML & ISO-TimeML are standards for the annotation of events, temporal expressions and a set of relations between these entities (temporal, subordinating and aspectual relations) Main contribution of TimeML: standard definition of event and methodology for its annotation It-TimeML: Italian adaptation of TimeML (updated version on request) and part of ISO-TimeML It-TimeML is currently used for the creation of the Italian TimeBank (172 news articles from ISST, PAROLE and Web, 67,140 tokens) Coref. in TimeML (2) TimeML tags involved: EVENT and TLINK (temporal link) TimeML has not a specific link for coreference annotation workaround: use of a special value of the TLINK tag: “identity” “identity” is used to: connect two tokens which are part of a single event instance (e.g. light verbs) connect coreferential relations between events, namely set-subset Coref. in TimeML (3) – Use of “identity” fare la spesa [to do shopping]. <EVENT id="e1">fare</EVENT> la <EVENT id="e2">spesa</EVENT> <TLINK lid="l1" eventInstanceID="e1" relatedToEventInstance="e2“ relType="IDENTITY"/> Coref. in TimeML – Use of “identity” (3) La sessione privata servira’ a tre adempimentij . Innanzitutto, all’ approvazionej della proposta di Abete (ISST sole006). The private session will be used for three [fulfillments]j . First, the [approval]j of the proposal of Abete. La <EVENT id="e1">sessione</EVENT> privata <EVENT id="e2">servira’</EVENT> a tre <EVENT id="e3">adempimenti</EVENT>. <SIGNAL id="s1">Innanzitutto</SIGNAL>, all’ <EVENT id="e4>approvazione</EVENT> della <EVENT id="e5">proposta</EVENT>di Abete. <TLINK lid="l1" eventInstanceID="e4“ relatedToEventInstance="e3" relType="IDENTITY"/> Coref. in TimeML (4) The use of the value “identity” is not satisfactory since it is NOT homogeneous During the (current!) annotation effort for the creation of the Italian TimeBank we have observed that this value could be applied to other cases such as: synonyms hypernyms coreference (strict coreference – same referent in the world) Event Anaphora Previous works: Hasler et al 2006; Bejan & Harabagiu 2008 Hasler et al. 2006: only NPs coreference (strict definition), detailed guidelines – but NO specifications for the annotation; which events? ACE event frame (LIFE, CONFLICT, MOVEMENT, JUSTICE….) TimeML compliant Bejan & Harabagiu 2008: event coreference as a side effect of event structure. Event coreference is considered when two predicates express same predicate, synonyms or hypernyms and share same arguments TimeML compliant Event Anaphora - Methodology (2) Our approach: no event frames nor event templates; all instances of event annotated in the Italian TimeBank (TimeML compliant); open-domain text/discourse coarse grained bottom up approach in the definition of the annotation scheme reduced and limited set of guidelines active discovery of what is needed through annotation and observations from the data event anaphora: strict coreference + indirect coreference Event Anaphora - Annotation scheme (3) TAGS ATTRIBUTES MARKABLE ID, POS, DEFINITENESS, CLASS EMPTY ID TOPIC ID LINK ID, ANAPHORTYPE, SRC JJJJJJIII <MARKABLE> = <EVENT> BUT extended includes MA annotation of pronouns and adverbs. Event Anaphora - Annotation scheme (4) <EMPTY> = to annotate cases of zero anaphora and ellipsis (frequent in Italian) <TOPIC> = to annotate entire portions of text; it provides anchor to those linguistic entities which can refer to discourse topic “Stiamo ancora parlando, come certamente deve essere, e continueremo a consultarci”j . James Baker, segretario al Tesoro americano, ha commentato cosi’j i risultati dell’assemblea. (ISST els019) “[We are still speaking, as it should be, and we will keep consulting]”j . James Baker, the American Treasure secretary, commented [so]j the results of the assembly. Event Anaphora - Annotation scheme (4) <EMPTY> = to annotate cases of zero anaphora and ellipsis (frequent in Italian) <TOPIC> = to annotate entire portions of text; it provides anchor to those linguistic entities which can refer to discourse topic <LINK> = it marks up an anaphoric relations. The attribute “anaphorType” explicits which type of anaporic relation “src” marks the anchor Event Anaphora – Results (5) Annotation tool: PALinkA (Orasan, 2003) 3 annotators / 1,792 tokens no K scores -Low agreement on the identification of anaphora but relative good on the anchors - More specific guidelines and information -Event anaphora is a widespread phenomenon Lession Learned and Future Work Event anaphora is a widespread phenomenon which must be addressed in separate tasks Relations between full event N, V, PP and Adj no pronominal anaphoras New annotation scheme: 2 tags: <EVENT> and <AnafLink> different attributes for <EVENT>: FACTIVITY, GENERICITY, POLARITY relations between particular events according to the attributes' values reduced type of anaphors (two values: direct vs. indirect) Tracking of the participants: how to? Event anaphora annotation as a further link in TimeML or as a separate task which can be built upon the TimeML annotation New Tool: BAT (thanks to Marc Verhagen) Lession Learned and Future Work Example Lession Learned and Future Work Example Thank you!