Dealing with Italian Temporal
Expressions: the ITA-Chronos System
Matteo Negri
Fondazione Bruno Kessler - IRST, Trento - Italy
[email protected]
EVALITA 2007 - Evaluation of NLP Tools for Italian
Rome - Italy
September 10, 2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Outline
•
•
•
•
Chronos: a multilingual system for TE recognition/normalization
System description
Some examples
Results at EVALITA 2007
M. Negri
EVALITA’07 - 09/10/2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Chronos
• Multilingual (ITA/ENG) tool for TE recognition and normalization
according to the TIMEX2 standard
• Approach
– Rule-based system
• ENG-Chronos: 1500 rules
• ITA-Chronos: 981 rules
– Six phases: Preprocessing, Detection, Braketing, Information
Gathering, Anchors Selection, Normalization
• ENG-Chronos participated in TERN-04 with good results on the
“Recognition+Normalization Task”
– Ranked 2nd, with 76% TERN-Value (best system: 78%)
M. Negri
EVALITA’07 - 09/10/2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System
ITA-Chronos: System Architecture
Plain Text
Intermediate
Annotation
Tagged Text
Tokenization, POS Tagging,
Multiwords Recognition
Detection
Basic Tagging Rules
Bracketing
Composition Rules
Attributes
Normalization
Dates
Normalization
Information Gathering
Tagging Rules for:
SET, Anchor_Dir,
Anchor_Val, MOD
Type, T_Cat, Heur, Op,
Quant, Val_Ext
Anchors Selection
Detection and Bracketing
Normalization
M. Negri
EVALITA’07 - 09/10/2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System
STEP1: Preprocessing
• The first phase of the process performs:
– Tokenization
– POS tagging
– Multiwords recognition
• The preprocessed input text is then passed to the TE detection phase,
where around 400 tagging rules are in charge of finding all the TEs it
contains.
M. Negri
EVALITA’07 - 09/10/2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System
STEP2: Detection
• Markable expressions are detected considering the presence of
lexical triggers in the input text
– “anno”, “oggi”, “Venerdì”, “Natale”, “quotidianamente”,
“10/09/2007”, “1982”, etc.
• Basic Tagging Rules
– Regular expressions checking for: word senses, parts of speech,
symbols, or words satisfying specific predicates
PATTERN
t1 t2 t3
t1
[pos=“E”]
…“E” = preposition
t2
[pos=“N”]
…“N” = numeral
t3
[pred=TimeUnit-p]
OUTPUT
<TIMEX2>t1 t2 t3<\TIMEX2>
…TimeUnit-p satisfied by: “secondo”, “minuto”,
“ora”, “giorno”, “settimana”, “mese”, etc.
Tagging rule matching with “Fra tre giorni”
M. Negri
EVALITA’07 - 09/10/2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System
STEP3: Bracketing
• Considers the context surrounding the detected triggers
– “inizio”, “fine”, “prima”, “dopo”, “fa”, “successivo”, “precedente”,
“durante”, “circa”, “almeno”, “3”, “sesto”, etc.
• Composition rules:
– In charge of handling conflicts between possible multiple taggings (e.g.
when a recognized TE contains, overlaps, or is adjacent to one or more
detected TEs)
PATTERN
T-EXP1 T-EXP2
T-EXP1
[start = n] [end = m]
T-EXP2
[start = n≤o<m] [end = o<p≤m]
OUTPUT
T-EXP-1
T-EXP-1
[start = n] [end = m]
Composition rule for handling inclusions
M. Negri
Tutta la notte di sabato
Tutta la notte
la notte
la notte di sabato
sabato
Tutta la notte di sabato
EVALITA’07 - 09/10/2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System
STEP4: Information gathering
• Goal: mine relevant information for normalization
• Considers triggers+context to assign values to
– TIMEX2 attributes (e.g. SET, MOD, ANCHOR_DIR)
– TEMPORARY attributes (e.g. Type, T_Cat, Heur, Op, Quant)
• This is done by running separate sets of specialized tagging rules
• Such information is stored in the Intermediate Annotation, and input
to the normalization component
M. Negri
EVALITA’07 - 09/10/2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Information Gathering: Example
TIMEX2 attributes
MOD: “più di”, “circa”, “oltre” …
SET: “ogni”, “tutti” …
ANCHOR_DIR: “prima”, “durante”, “dopo”...
TEMPORARY attributes
type: [T-ABS | T-REL]
t-cat: [second, minute, hour, day,…]
op: [=, +, -]
quant: [n≥0]
heur: [CR-DATE | PR-DATE]
M. Negri
EVALITA’07 - 09/10/2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Information Gathering: Example
TIMEX2 attributes
MOD: “più di”, “circa”, “oltre” …
SET: “ogni”, “tutti” …
ANCHOR_DIR: “prima”, “durante”, “dopo”...
oltre tre anni dopo
Detected TE
TEMPORARY attributes
type: [T-ABS | T-REL]
t-cat: [second, minute, hour, day,…]
op: [=, +, -]
quant: [n≥0]
heur: [CR-DATE | PR-DATE]
M. Negri
EVALITA’07 - 09/10/2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Information Gathering: Example
TIMEX2 attributes
MOD: “più di”, “circa”, “oltre” …
SET: “ogni”, “tutti” …
ANCHOR_DIR: “prima”, “durante”, “dopo”...
oltre tre anni dopo
MORE_THAN
TEMPORARY attributes
type: [T-ABS | T-REL]
t-cat: [second, minute, hour, day,…]
op: [=, +, -]
quant: [n≥0]
heur: [CR-DATE | PR-DATE]
M. Negri
EVALITA’07 - 09/10/2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Information Gathering: Example
TIMEX2 attributes
MOD: “più di”, “circa”, “oltre” …
SET: “ogni”, “tutti” …
ANCHOR_DIR: “prima”, “durante”, “dopo”...
oltre tre anni dopo
MORE_THAN
ENDING
TEMPORARY attributes
type: [T-ABS | T-REL]
t-cat: [second, minute, hour, day,…]
op: [=, +, -]
quant: [n≥0]
heur: [CR-DATE | PR-DATE]
M. Negri
EVALITA’07 - 09/10/2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Information Gathering: Example
TIMEX2 attributes
MOD: “più di”, “circa”, “oltre” …
SET: “ogni”, “tutti” …
ANCHOR_DIR: “prima”, “durante”, “dopo”...
oltre tre anni dopo
MORE_THAN
ENDING
TEMPORARY attributes
type: [T-ABS | T-REL]
T-REL
t-cat: [second, minute, hour, day,…]
op: [=, +, -]
quant: [n≥0]
heur: [CR-DATE | PR-DATE]
M. Negri
EVALITA’07 - 09/10/2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Information Gathering: Example
TIMEX2 attributes
MOD: “più di”, “circa”, “oltre” …
SET: “ogni”, “tutti” …
ANCHOR_DIR: “prima”, “durante”, “dopo”...
oltre tre anni dopo
MORE_THAN
ENDING
TEMPORARY attributes
type: [T-ABS | T-REL]
T-REL
t-cat: [second, minute, hour, day,…]
YEAR
op: [=, +, -]
quant: [n≥0]
heur: [CR-DATE | PR-DATE]
M. Negri
EVALITA’07 - 09/10/2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Information Gathering: Example
TIMEX2 attributes
MOD: “più di”, “circa”, “oltre” …
SET: “ogni”, “tutti” …
ANCHOR_DIR: “prima”, “durante”, “dopo”...
oltre tre anni dopo
MORE_THAN
ENDING
TEMPORARY attributes
type: [T-ABS | T-REL]
T-REL
t-cat: [second, minute, hour, day,…]
YEAR
op: [=, +, -]
+
quant: [n≥0]
heur: [CR-DATE | PR-DATE]
M. Negri
EVALITA’07 - 09/10/2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Information Gathering: Example
TIMEX2 attributes
MOD: “più di”, “circa”, “oltre” …
SET: “ogni”, “tutti” …
ANCHOR_DIR: “prima”, “durante”, “dopo”...
oltre tre anni dopo
MORE_THAN
ENDING
TEMPORARY attributes
type: [T-ABS | T-REL]
T-REL
t-cat: [second, minute, hour, day,…]
YEAR
op: [=, +, -]
+
quant: [n≥0]
3
heur: [CR-DATE | PR-DATE]
M. Negri
EVALITA’07 - 09/10/2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Information Gathering: Example
TIMEX2 attributes
MOD: “più di”, “circa”, “oltre” …
SET: “ogni”, “tutti” …
ANCHOR_DIR: “prima”, “durante”, “dopo”...
oltre tre anni dopo
MORE_THAN
ENDING
TEMPORARY attributes
type: [T-ABS | T-REL]
T-REL
t-cat: [second, minute, hour, day,…]
YEAR
op: [=, +, -]
+
quant: [n≥0]
3
heur: [CR-DATE | PR-DATE]
M. Negri
PR-DATE
EVALITA’07 - 09/10/2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Intermediate Annotation: Example
adige20041007_id413938
“…Così il 31 Luglio del 2002, quindi oltre tre anni dopo l’incidente, il giovane venne
nuovamente ricoverato e sottoposto ad un intervento che si dimostrerà risolutivo…”
Plain Text
Detection and Bracketing
…quindi <TIMEX2 MOD=“MORE_THAN” ANCHOR_DIR=“ENDING” type=“TREL” t-cat=“YEAR” op=“+” quant=“3”, heur=“PR-DATE>oltre tre anni dopo
</TIMEX2> l’incidente…
Intermediate Annotation
M. Negri
EVALITA’07 - 09/10/2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System
STEP5: Anchors Selection
• Goal: connect each detected T-REL to an appropriate anchor date
– While the meaning of T-ABSs (“13 Marzo 2005”) is contextindependent, T-RELs (“tre anni dopo”) can only be interpreted with
respect to e reference TE
• The “heur” attribute is used for this purpose
– 2 heuristics:
CR-DATE: connects a T-REL to the document’s creation date (found at the
beginning of the doc, or induced from doc’s name. e.g. “adige20041007_…)
PR-DATE: connects a T-REL to the nearest detected TE with a compatible
granularity (a “t-cat” with at least the same degree of specificity)
t-cat= “month”
M. Negri
“month”, “week”, “day”, “century”
EVALITA’07 - 09/10/2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System
STEP6: Dates Normalization
• Goal: fill the VAL attribute of each detected TE
T-ABSs: regular expressions considering their superficial form
(“1990s”
“199”)
T-RELs: rewriting rules considering
the anchor (e.g. “2002”)
the operator (“OP”) to be applied (e.g. “+”)
the quantity (“QUANT”) to be added/subtracted (e.g. “3”)
tre anni dopo
M. Negri
“2002” “+” “3”
2005
EVALITA’07 - 09/10/2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System
ITA-Chronos at EVALITA 2007
• Results over the EVALITA-07 test set (27’15’’ computation time,
~50 words/sec)
Value
Precision
Recall
F-Measure
Rec.
85.7
95.7
89.8
92.6
Rec.+Norm.
61.9
68.5
66.3
67.4
• Higher scores on MOD and SET attributes
– Activated by the presence of triggers that are easy to identify
• Lower scores with ANCHOR_VAL and ANCHOR_DIR
– Require the analysis of a larger context, e.g. including verb tense
M. Negri
EVALITA’07 - 09/10/2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Web Demo
http://www.qallme.itc.it/server/chronos/italian
M. Negri
EVALITA’07 - 09/10/2007
Scarica

T-REL - EVALITA