Annotating Attribution Relations Towards an Italian Discourse Treebank Silvia Pareti Irina Prodanof Outline • • • • • • • • Introduction Related works Goal and methodology Proposed scheme Some issues Pilot annotation Attribution figures Conclusion and future work Introduction ATTRIBUTION in a text is ascribing the ownership of an attitude towards some linguistic material, i.e. the text itself, a portion of it or their semantic content, to an entity. Fiona says “This afternoon it will rain” Recognising attribution relations is fundamental for Information Extraction, (Multi Perspective) Question Answering, Opinion Mining etc. Different sources can differ in bias and reliability and this deeply affects the way we perceive information. Fiona Introduction Why should we identify the source of a portion of text? ODQA NLP techniques Information Retrieval Question Finding text comprehension fragments with the answer Answer selection Language Generation Answer generation • visualize only authoritative answers • collect different opinions, hearsay • discard second-hand or anonymous information • retrieve statements from a specific source over a given time span •… Introduction “È meglio vaccinarsi per l’influenza ‘suina’?” Is it better to get the swine flu vaccine? Introduction “È meglio vaccinarsi per l’influenza ‘suina’?” Is it better to get the swine flu vaccine? “The vaccine is useless.” orsetta90 blogger: not authoritative and not verifiable source “Everyone should get the vaccine.” Novartis Pharmaceuticals industry: authoritative but biased “Only persons having a higher risk of complication from influenza should get the vaccine .” Doctor association Related works Opinion holders identification projects Bethard et al. (2004) Consider just opinion propositions (source = agent) Kim and Hovy (2005) Identify all possible opinion holders agentive and NPs (no pronouns) Stoyanov and Cardie (2006) Identify NPs sources Choi et al.(2006) They do not consider implicit or multiple sources and test their system on the OPQA corpus Opinion recognition has limited coverage and not satisfactory precision: 60-70% Related works PDTB (Prasad et al., 2007) assertions, beliefs, facts, eventualities Attribution of discourse connectives and their arguments only Opinion Corpus (Wiebe, 2002) speech acts private states: opinions, beliefs, thoughts, feelings, emotions, goals, evaluations and judgements Attribution considered as an intra-sentential phenomenon GraphBank (Wolf and Gibson, 2005) attribution included as a directed coherence relation (satellite to nucleus) Attribution of discourse segments Goal and methodology Designing the addition of a level of annotation for attribution to the ISST (Italian Syntactic - Semantic Treebank) corpus. • more complete and independent analysis of attribution • development of an annotation schema • pilot annotation of a portion of the ISST • partial listing of possible attribution cues • evaluation Goal and methodology ANALYSIS •Scope definition •Identification of characteristics and issues •Selection of features to be annotated •Annotation requirement definition •Design of the schema •Match tool characteristics and annotation requirements SCHEMA DEFINITION •Setting the tool TOOL SELECTION X EVALUATION •Evaluation of the schema applicability •Pilot annotation and detection of issues ANNOTATION •Linguistic resource creation and release Proposed schema Markables relation SOURCE(S) -noun phrase CUE CONTENT(S) (SUPPLEMENT) -verb -word -cue modifier -noun -phrase -indirect object -adjective -clause -source of source -preposition -sentence -event specification -prep. group -entire article -adjective -prep. phrase -graphic marker Proposed schema Features Attribution type • assertion (e.g. dire, osservare, sostenere) • belief (e.g. credere, pensare, dubitare) • fact (e.g. ricordare, sapere, sentire) • eventuality (e.g. permettere, proibire) Source type • writer • other (e.g. il presidente, un uomo, Maria) • arbitrary (e.g. uno, la gente, tutti) • mixed Factuality • factual • non-factual Scopal change • none • scopal change Some issues Source • Nested attribution • Multiple sources • Source of source • Pronominal and bridging anaphora Some issues Source [Sue said {that Mary believes (that Gore • • Nested attribution Multiple sources won the election)}]. Fonti: [writer] {writer, Sue} (writer, Sue, Mary) (Wiebe, 2002:5 - with the addition of brackets) • • Source of source Pronominal and bridging anaphora Blinder, secondo voci riferite dal New York Times, sperava di succedere al presidente Greenspan quando a marzo scadrà la sua nomina. (ISST re070) Blinder, according to rumours reported by the New York Times, hoped to succeed to president Greenspan when in May his appointment will run over. Some issues Source • Nested attribution • Multiple sources • Source of source • Pronominal and bridging anaphora Arbitrary Other Tutti, incluse le autorità, conoscono la loro provenienza, ma nessuno dice e fa nulla per prevenire il massacro di capi selvatici. (cs.morph020) Everyone, including the authorities, knows their provenance, but no one says and does anything to prevent the massacre of wild animals. Some issues Source • (Ø) Ho saputo della squalifica di Garciano da Maurizio Damilano, vi giuro, non pensavo di arrivare primo. (ISST cs071) Nested attribution • Multiple sources • Source of source • Pronominal and bridging anaphora (I) heard of the disqualification of Garciano from Maurizio Damilano, I swear, I didn’t imagine I would have came first. Poi però, tramite la figlia che sta a Santiago, prima limita la portata del colloquio con Gaston Salvatore (“non è stata una vera intervista, solo una conversazione”), poi smentisce. (ISST period005) Afterwards however, through the daughter who lives in Santiago, first diminishes the importance of the colloquium with Gaston Salvatore (“it wasn’t a real interview, just a conversation”), then (she) denies. Some issues Source • Nested attribution • Multiple sources • Source of source • Pronominal and bridging anaphora La Fermenta, a sentire l' arabo, è organizzata in modo che oggi consegue un utile pari al 35 per cento del fatturato. Questo il vero traguardo che dovrà nel tempo raggiungere la Pierrel. Ma come? Con tagli di mano d'opera? Nemmeno per sogno, dice El Sayed. (ISST els001) Fermenta, according to the Arabian, is organised so that it earns at present a profit of 35 per cent of the turnover. This is the real goal that in the long distance Pierrel will have to achieve. But how? Cutting down on workforce? No way, says El Sayed. Some issues Cue • Type definition • Multimodal cues • Scopal change Some issues Cue Eventuality Assertion • Type definition "Vi daremo le statistiche alla fine", promettono i generali croati. (ISST cs030) • Multimodal cues “We’ll give you the statistics at the end”, promise the Croatian generals. • Scopal change assertion belief facts eventualities affermare credere ricordare permettere sostenere pensare sapere sostenere osservare dubitare osservare desiderare Some issues Arlacchi sorride: “Pura paranoia politica. Non ho partecipato ai lavori solo a causa di un impegno privato…”. (ISST re095) Arlacchi smiles: “Pure political paranoia. I didn’t participate in the works only because of a private appointment…” . "Sì - si adombra Matt - Un ruolo interessante: con Tarantino eravamo a buon punto, poi é arrivato Bruce. I suoi film incassano un po' più dei miei, no? Hanno scelto lui” …(ISST cs060) “Yes - Matt grows dark - An interesting role: with Tarantino we were at a good point, then Bruce arrived. His films cash in a bit more than mines, right? They chose him” … Cue • Type definition • Multimodal cues • Scopal change Some issues ? = tutti vorrebbero non accadessero Cue Strano destino, quello di Civitavecchia: finire spesso, troppo spesso, sulle pagine dei giornali per eventi misteriosi, oppure per fatti che nessuno vorrebbe accadessero nella sua città. (ISST cs090) Strange destiny, that of Civitavecchia: ending up often, too often, in the news because of mysterious events, or because of events that no one would like to happen in their town. • Type definition • Multimodal cues • Scopal change Se c’è, cioè, una maggioranza in Parlamento in grado di affrontare seriamente una fase di riforme anche elettorali, Ø penso che la legislatura possa utilmente proseguire. (ISST re075) If there is a majority at the Parliament able to seriously face a phase of reforms, also electoral, (I) think that the legislature could usefully continue. Some issues Content • Multiple contents • Discontinuous spans • Event anaphora Some issues Content • Multiple contents • Discontinuous spans • Event anaphora (Ø) Ho detto |che ero dalla sua parte| e |che ritenevo giusta la sua protesta|. (ISST cs063) (I) said |that I was on his side| and |that I considered his complaint fair|. Some issues Content • Multiple contents • Discontinuous spans • Event anaphora "There's no question that some of those workers and managers contracted asbestos-related diseases," said Darrell Phillips, vice president of human resources for Hollingsworth & Vose. "But you have to recognize that these events took place 35 years ago. It has no bearing on our work force today." (PDTB 0003) Some issues Content • Multiple contents • Discontinuous spans • Event anaphora “…L’umanità deve proclamare uno storico sciopero ad oltranza fino alla distruzione di tutti gli armamenti nucleari.” Le parole registrate di Gheddafi, …(ISST cs039) “…The world should proclaim a non-stop strike till the destruction of all nuclear armaments.” Gheddafi’s recorded words,… Pilot annotation Tool requirements Tools Discontinuous text selection GATE Nested selection Knowtator Relations Annotator Multiple sources/contents MMAX2 Pre-defined values selection Callisto … MMAX2 Display customizability Ease of setting a scheme Ease of annotation XML stand-off output Reference to word index Base Data (original text) Scheme (annotation schema) Style (display structure) Customization (preferences) Markable (annotation) Subcorpus: • 50 articles from the ISST • balanced • 37.000 word tokens • 461 attribution relations Pilot annotation Attribution figures Markables Source type WRITER 23 OTHER 375 ARBITRARY MIXED CUE 461 SOURCE 329 CONTENT 468 62 1 Scopal change NONE SCOPAL-CHANGE 429 7 Attribution figures Attribution type and Factuality Conclusion and future work Achievements: • more complete analysis of attribution • definition of an annotation schema • identification of issues and possible solutions • partial listing of possible attribution cues • annotation of a portion of the ISST corpus Future work: • testing of the interannotator agreement for the proposed schema • redefinition of problematic or underspecified attributes • annotation of the whole ISST corpus • expanding the list of attribution cues • relation between attribution and discourse connectives/ anaphora/ … Conclusion and future work Thank you Discourse generation Researches on journalistic discourse Testing ANNOTATED algorithms for CORPUS the recognition of attribution Development of corpora in other languages … Training tools for ODQA/ MPQA/ IE Statistical and combinatory analysis