Annotating Attribution Relations
Towards an Italian Discourse Treebank
Silvia Pareti
Irina Prodanof
Outline
•
•
•
•
•
•
•
•
Introduction
Related works
Goal and methodology
Proposed scheme
Some issues
Pilot annotation
Attribution figures
Conclusion and future work
Introduction
ATTRIBUTION in a text is ascribing the ownership of an attitude
towards some linguistic material, i.e. the text itself,
a portion of it or their semantic content, to an entity.
Fiona
says
“This afternoon it will rain”
Recognising attribution relations is fundamental for Information
Extraction, (Multi Perspective) Question Answering, Opinion Mining
etc.
Different sources can differ in bias
and reliability and this deeply
affects the way we perceive
information.
Fiona
Introduction
Why should we identify the source of a portion of text?
ODQA
NLP techniques Information Retrieval
Question
Finding text
comprehension fragments with the
answer
Answer
selection
Language Generation
Answer generation
• visualize only authoritative answers
• collect different opinions, hearsay
• discard second-hand or anonymous information
• retrieve statements from a specific source over a
given time span
•…
Introduction
“È meglio vaccinarsi per l’influenza ‘suina’?”
Is it better to get the swine flu vaccine?
Introduction
“È meglio vaccinarsi per l’influenza ‘suina’?”
Is it better to get the swine flu vaccine?
“The vaccine is useless.”
orsetta90
blogger: not authoritative
and not verifiable source
“Everyone should get the vaccine.”
Novartis
Pharmaceuticals
industry: authoritative
but biased
“Only persons having a higher risk of complication from
influenza should get the vaccine .”
Doctor association
Related works
Opinion holders identification projects
Bethard et al. (2004)
Consider just opinion propositions
(source = agent)
Kim and Hovy (2005)
Identify all possible opinion holders
agentive and NPs (no pronouns)
Stoyanov and Cardie (2006)
Identify NPs sources
Choi et al.(2006)
They do not consider implicit or multiple
sources and test their system on the
OPQA corpus
Opinion recognition
has limited coverage
and not satisfactory
precision: 60-70%
Related works
PDTB (Prasad et al., 2007)
assertions, beliefs, facts, eventualities
Attribution of discourse connectives and their arguments only
Opinion Corpus (Wiebe, 2002)
speech acts
private states: opinions, beliefs, thoughts, feelings, emotions,
goals, evaluations and judgements
Attribution considered as an intra-sentential phenomenon
GraphBank (Wolf and Gibson, 2005)
attribution included as a directed coherence relation (satellite to
nucleus)
Attribution of discourse segments
Goal and methodology
Designing the addition of a level of annotation for attribution to the
ISST (Italian Syntactic - Semantic Treebank) corpus.
• more complete and independent analysis of attribution
• development of an annotation schema
• pilot annotation of a portion of the ISST
• partial listing of possible attribution cues
• evaluation
Goal and methodology
ANALYSIS
•Scope definition
•Identification of
characteristics
and issues
•Selection of
features to be
annotated
•Annotation requirement
definition
•Design of the
schema
•Match tool characteristics
and annotation
requirements
SCHEMA
DEFINITION
•Setting the tool
TOOL
SELECTION
X
EVALUATION
•Evaluation of the schema
applicability
•Pilot annotation and detection of
issues
ANNOTATION
•Linguistic resource
creation and release
Proposed schema
Markables
relation
SOURCE(S)
-noun
phrase
CUE
CONTENT(S)
(SUPPLEMENT)
-verb
-word
-cue modifier
-noun
-phrase
-indirect object
-adjective
-clause
-source of source
-preposition
-sentence
-event
specification
-prep. group
-entire
article
-adjective
-prep.
phrase
-graphic
marker
Proposed schema
Features
Attribution type
• assertion (e.g. dire, osservare, sostenere)
• belief (e.g. credere, pensare, dubitare)
• fact (e.g. ricordare, sapere, sentire)
• eventuality (e.g. permettere, proibire)
Source type
• writer
• other (e.g. il presidente, un uomo, Maria)
• arbitrary (e.g. uno, la gente, tutti)
• mixed
Factuality
• factual
• non-factual
Scopal change
• none
• scopal change
Some issues
Source
•
Nested attribution
•
Multiple sources
•
Source of source
•
Pronominal and
bridging anaphora
Some issues
Source
[Sue said {that Mary believes (that Gore
•
•
Nested attribution
Multiple sources
won the election)}].
Fonti:
[writer]
{writer, Sue}
(writer, Sue, Mary)
(Wiebe, 2002:5 - with the addition of brackets)
•
•
Source of source
Pronominal and
bridging anaphora
Blinder, secondo voci riferite dal New
York Times, sperava di succedere al
presidente Greenspan quando a marzo
scadrà la sua nomina. (ISST re070)
Blinder, according to rumours reported
by the New York Times, hoped to
succeed to president Greenspan when
in May his appointment will run over.
Some issues
Source
•
Nested attribution
•
Multiple sources
•
Source of source
•
Pronominal and
bridging anaphora
Arbitrary
Other
Tutti, incluse le autorità, conoscono la
loro provenienza, ma nessuno dice e fa
nulla per prevenire il massacro di capi
selvatici. (cs.morph020)
Everyone, including the authorities,
knows their provenance, but no one
says and does anything to prevent the
massacre of wild animals.
Some issues
Source
•
(Ø) Ho saputo della squalifica di
Garciano da Maurizio Damilano, vi
giuro, non pensavo di arrivare primo.
(ISST cs071)
Nested attribution
•
Multiple sources
•
Source of source
•
Pronominal and
bridging anaphora
(I) heard of the disqualification of
Garciano from Maurizio Damilano, I
swear, I didn’t imagine I would have
came first.
Poi però, tramite la figlia che sta a Santiago,
prima limita la portata del colloquio con
Gaston Salvatore (“non è stata una vera
intervista, solo una conversazione”), poi
smentisce. (ISST period005)
Afterwards however, through the daughter
who lives in Santiago, first diminishes the
importance of the colloquium with Gaston
Salvatore (“it wasn’t a real interview, just a
conversation”), then (she) denies.
Some issues
Source
•
Nested attribution
•
Multiple sources
•
Source of source
•
Pronominal and
bridging anaphora
La Fermenta, a sentire l' arabo, è
organizzata in modo che oggi consegue
un utile pari al 35 per cento del fatturato.
Questo il vero traguardo che dovrà nel
tempo raggiungere la Pierrel. Ma come?
Con tagli di mano d'opera? Nemmeno
per sogno, dice El Sayed. (ISST els001)
Fermenta, according to the Arabian, is
organised so that it earns at present a
profit of 35 per cent of the turnover. This
is the real goal that in the long distance
Pierrel will have to achieve. But how?
Cutting down on workforce? No way,
says El Sayed.
Some issues
Cue
•
Type definition
•
Multimodal cues
•
Scopal change
Some issues
Cue
Eventuality
Assertion
•
Type definition
"Vi daremo le statistiche alla fine",
promettono i generali croati. (ISST
cs030)
•
Multimodal cues
“We’ll give you the statistics at the end”,
promise the Croatian generals.
•
Scopal change
assertion
belief
facts
eventualities
affermare
credere
ricordare
permettere
sostenere
pensare
sapere
sostenere
osservare
dubitare
osservare
desiderare
Some issues
Arlacchi sorride: “Pura paranoia
politica. Non ho partecipato ai lavori solo
a causa di un impegno privato…”. (ISST
re095)
Arlacchi smiles: “Pure political paranoia.
I didn’t participate in the works only
because of a private appointment…” .
"Sì - si adombra Matt - Un ruolo
interessante: con Tarantino eravamo a
buon punto, poi é arrivato Bruce. I suoi
film incassano un po' più dei miei, no?
Hanno scelto lui” …(ISST cs060)
“Yes - Matt grows dark - An interesting
role: with Tarantino we were at a good
point, then Bruce arrived. His films cash
in a bit more than mines, right? They
chose him” …
Cue
•
Type definition
•
Multimodal cues
•
Scopal change
Some issues
? = tutti vorrebbero non accadessero
Cue
Strano destino, quello di Civitavecchia:
finire spesso, troppo spesso, sulle
pagine dei giornali per eventi misteriosi,
oppure per fatti che nessuno vorrebbe
accadessero nella sua città. (ISST
cs090)
Strange destiny, that of Civitavecchia:
ending up often, too often, in the news
because of mysterious events, or
because of events that no one would like
to happen in their town.
•
Type definition
•
Multimodal cues
•
Scopal change
Se c’è, cioè, una maggioranza in Parlamento in grado di affrontare seriamente
una fase di riforme anche elettorali, Ø penso che la legislatura possa utilmente
proseguire. (ISST re075)
If there is a majority at the Parliament able to seriously face a phase of
reforms, also electoral, (I) think that the legislature could usefully continue.
Some issues
Content
•
Multiple contents
•
Discontinuous spans
•
Event anaphora
Some issues
Content
•
Multiple contents
•
Discontinuous spans
•
Event anaphora
(Ø) Ho detto |che ero dalla sua parte| e
|che ritenevo giusta la sua protesta|.
(ISST cs063)
(I) said |that I was on his side| and |that I
considered his complaint fair|.
Some issues
Content
•
Multiple contents
•
Discontinuous spans
•
Event anaphora
"There's no question that some of those workers and
managers contracted asbestos-related diseases,"
said Darrell Phillips, vice president of human
resources for Hollingsworth & Vose.
"But you have to recognize that these events took
place 35 years ago. It has no bearing on our work
force today."
(PDTB 0003)
Some issues
Content
•
Multiple contents
•
Discontinuous spans
•
Event anaphora
“…L’umanità deve proclamare uno storico sciopero ad
oltranza fino alla distruzione di tutti gli armamenti
nucleari.”
Le parole registrate di Gheddafi, …(ISST cs039)
“…The world should proclaim a non-stop strike till the
destruction of all nuclear armaments.” Gheddafi’s
recorded words,…
Pilot annotation
Tool requirements
Tools
Discontinuous text selection
GATE
Nested selection
Knowtator
Relations
Annotator
Multiple sources/contents
MMAX2
Pre-defined values selection
Callisto
…
MMAX2
Display customizability
Ease of setting a scheme
Ease of annotation
XML stand-off output
Reference to word index
Base Data (original text)
Scheme (annotation schema)
Style (display structure)
Customization (preferences)
Markable (annotation)
Subcorpus:
• 50 articles from the ISST
• balanced
• 37.000 word tokens
• 461 attribution relations
Pilot annotation
Attribution figures
Markables
Source type
WRITER
23
OTHER
375
ARBITRARY
MIXED
CUE
461
SOURCE
329
CONTENT
468
62
1
Scopal change
NONE
SCOPAL-CHANGE
429
7
Attribution figures
Attribution
type and
Factuality
Conclusion and future work
Achievements:
• more complete analysis of attribution
• definition of an annotation schema
• identification of issues and possible solutions
• partial listing of possible attribution cues
• annotation of a portion of the ISST corpus
Future work:
• testing of the interannotator agreement for the proposed schema
• redefinition of problematic or underspecified attributes
• annotation of the whole ISST corpus
• expanding the list of attribution cues
• relation between attribution and discourse connectives/ anaphora/ …
Conclusion and future work
Thank you
Discourse
generation
Researches
on journalistic
discourse
Testing
ANNOTATED
algorithms for
CORPUS
the recognition
of attribution
Development
of corpora in
other
languages
…
Training tools
for ODQA/
MPQA/ IE
Statistical
and
combinatory
analysis
Scarica

Towards an Italian Discourse Treebank