Dealing with Italian Temporal Expressions: the ITA-Chronos System Matteo Negri Fondazione Bruno Kessler - IRST, Trento - Italy [email protected] EVALITA 2007 - Evaluation of NLP Tools for Italian Rome - Italy September 10, 2007 Dealing with Italian Temporal Expressions: the ITA-Chronos System Outline • • • • Chronos: a multilingual system for TE recognition/normalization System description Some examples Results at EVALITA 2007 M. Negri EVALITA’07 - 09/10/2007 Dealing with Italian Temporal Expressions: the ITA-Chronos System Chronos • Multilingual (ITA/ENG) tool for TE recognition and normalization according to the TIMEX2 standard • Approach – Rule-based system • ENG-Chronos: 1500 rules • ITA-Chronos: 981 rules – Six phases: Preprocessing, Detection, Braketing, Information Gathering, Anchors Selection, Normalization • ENG-Chronos participated in TERN-04 with good results on the “Recognition+Normalization Task” – Ranked 2nd, with 76% TERN-Value (best system: 78%) M. Negri EVALITA’07 - 09/10/2007 Dealing with Italian Temporal Expressions: the ITA-Chronos System ITA-Chronos: System Architecture Plain Text Intermediate Annotation Tagged Text Tokenization, POS Tagging, Multiwords Recognition Detection Basic Tagging Rules Bracketing Composition Rules Attributes Normalization Dates Normalization Information Gathering Tagging Rules for: SET, Anchor_Dir, Anchor_Val, MOD Type, T_Cat, Heur, Op, Quant, Val_Ext Anchors Selection Detection and Bracketing Normalization M. Negri EVALITA’07 - 09/10/2007 Dealing with Italian Temporal Expressions: the ITA-Chronos System STEP1: Preprocessing • The first phase of the process performs: – Tokenization – POS tagging – Multiwords recognition • The preprocessed input text is then passed to the TE detection phase, where around 400 tagging rules are in charge of finding all the TEs it contains. M. Negri EVALITA’07 - 09/10/2007 Dealing with Italian Temporal Expressions: the ITA-Chronos System STEP2: Detection • Markable expressions are detected considering the presence of lexical triggers in the input text – “anno”, “oggi”, “Venerdì”, “Natale”, “quotidianamente”, “10/09/2007”, “1982”, etc. • Basic Tagging Rules – Regular expressions checking for: word senses, parts of speech, symbols, or words satisfying specific predicates PATTERN t1 t2 t3 t1 [pos=“E”] …“E” = preposition t2 [pos=“N”] …“N” = numeral t3 [pred=TimeUnit-p] OUTPUT <TIMEX2>t1 t2 t3<\TIMEX2> …TimeUnit-p satisfied by: “secondo”, “minuto”, “ora”, “giorno”, “settimana”, “mese”, etc. Tagging rule matching with “Fra tre giorni” M. Negri EVALITA’07 - 09/10/2007 Dealing with Italian Temporal Expressions: the ITA-Chronos System STEP3: Bracketing • Considers the context surrounding the detected triggers – “inizio”, “fine”, “prima”, “dopo”, “fa”, “successivo”, “precedente”, “durante”, “circa”, “almeno”, “3”, “sesto”, etc. • Composition rules: – In charge of handling conflicts between possible multiple taggings (e.g. when a recognized TE contains, overlaps, or is adjacent to one or more detected TEs) PATTERN T-EXP1 T-EXP2 T-EXP1 [start = n] [end = m] T-EXP2 [start = n≤o<m] [end = o<p≤m] OUTPUT T-EXP-1 T-EXP-1 [start = n] [end = m] Composition rule for handling inclusions M. Negri Tutta la notte di sabato Tutta la notte la notte la notte di sabato sabato Tutta la notte di sabato EVALITA’07 - 09/10/2007 Dealing with Italian Temporal Expressions: the ITA-Chronos System STEP4: Information gathering • Goal: mine relevant information for normalization • Considers triggers+context to assign values to – TIMEX2 attributes (e.g. SET, MOD, ANCHOR_DIR) – TEMPORARY attributes (e.g. Type, T_Cat, Heur, Op, Quant) • This is done by running separate sets of specialized tagging rules • Such information is stored in the Intermediate Annotation, and input to the normalization component M. Negri EVALITA’07 - 09/10/2007 Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example TIMEX2 attributes MOD: “più di”, “circa”, “oltre” … SET: “ogni”, “tutti” … ANCHOR_DIR: “prima”, “durante”, “dopo”... TEMPORARY attributes type: [T-ABS | T-REL] t-cat: [second, minute, hour, day,…] op: [=, +, -] quant: [n≥0] heur: [CR-DATE | PR-DATE] M. Negri EVALITA’07 - 09/10/2007 Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example TIMEX2 attributes MOD: “più di”, “circa”, “oltre” … SET: “ogni”, “tutti” … ANCHOR_DIR: “prima”, “durante”, “dopo”... oltre tre anni dopo Detected TE TEMPORARY attributes type: [T-ABS | T-REL] t-cat: [second, minute, hour, day,…] op: [=, +, -] quant: [n≥0] heur: [CR-DATE | PR-DATE] M. Negri EVALITA’07 - 09/10/2007 Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example TIMEX2 attributes MOD: “più di”, “circa”, “oltre” … SET: “ogni”, “tutti” … ANCHOR_DIR: “prima”, “durante”, “dopo”... oltre tre anni dopo MORE_THAN TEMPORARY attributes type: [T-ABS | T-REL] t-cat: [second, minute, hour, day,…] op: [=, +, -] quant: [n≥0] heur: [CR-DATE | PR-DATE] M. Negri EVALITA’07 - 09/10/2007 Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example TIMEX2 attributes MOD: “più di”, “circa”, “oltre” … SET: “ogni”, “tutti” … ANCHOR_DIR: “prima”, “durante”, “dopo”... oltre tre anni dopo MORE_THAN ENDING TEMPORARY attributes type: [T-ABS | T-REL] t-cat: [second, minute, hour, day,…] op: [=, +, -] quant: [n≥0] heur: [CR-DATE | PR-DATE] M. Negri EVALITA’07 - 09/10/2007 Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example TIMEX2 attributes MOD: “più di”, “circa”, “oltre” … SET: “ogni”, “tutti” … ANCHOR_DIR: “prima”, “durante”, “dopo”... oltre tre anni dopo MORE_THAN ENDING TEMPORARY attributes type: [T-ABS | T-REL] T-REL t-cat: [second, minute, hour, day,…] op: [=, +, -] quant: [n≥0] heur: [CR-DATE | PR-DATE] M. Negri EVALITA’07 - 09/10/2007 Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example TIMEX2 attributes MOD: “più di”, “circa”, “oltre” … SET: “ogni”, “tutti” … ANCHOR_DIR: “prima”, “durante”, “dopo”... oltre tre anni dopo MORE_THAN ENDING TEMPORARY attributes type: [T-ABS | T-REL] T-REL t-cat: [second, minute, hour, day,…] YEAR op: [=, +, -] quant: [n≥0] heur: [CR-DATE | PR-DATE] M. Negri EVALITA’07 - 09/10/2007 Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example TIMEX2 attributes MOD: “più di”, “circa”, “oltre” … SET: “ogni”, “tutti” … ANCHOR_DIR: “prima”, “durante”, “dopo”... oltre tre anni dopo MORE_THAN ENDING TEMPORARY attributes type: [T-ABS | T-REL] T-REL t-cat: [second, minute, hour, day,…] YEAR op: [=, +, -] + quant: [n≥0] heur: [CR-DATE | PR-DATE] M. Negri EVALITA’07 - 09/10/2007 Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example TIMEX2 attributes MOD: “più di”, “circa”, “oltre” … SET: “ogni”, “tutti” … ANCHOR_DIR: “prima”, “durante”, “dopo”... oltre tre anni dopo MORE_THAN ENDING TEMPORARY attributes type: [T-ABS | T-REL] T-REL t-cat: [second, minute, hour, day,…] YEAR op: [=, +, -] + quant: [n≥0] 3 heur: [CR-DATE | PR-DATE] M. Negri EVALITA’07 - 09/10/2007 Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example TIMEX2 attributes MOD: “più di”, “circa”, “oltre” … SET: “ogni”, “tutti” … ANCHOR_DIR: “prima”, “durante”, “dopo”... oltre tre anni dopo MORE_THAN ENDING TEMPORARY attributes type: [T-ABS | T-REL] T-REL t-cat: [second, minute, hour, day,…] YEAR op: [=, +, -] + quant: [n≥0] 3 heur: [CR-DATE | PR-DATE] M. Negri PR-DATE EVALITA’07 - 09/10/2007 Dealing with Italian Temporal Expressions: the ITA-Chronos System Intermediate Annotation: Example adige20041007_id413938 “…Così il 31 Luglio del 2002, quindi oltre tre anni dopo l’incidente, il giovane venne nuovamente ricoverato e sottoposto ad un intervento che si dimostrerà risolutivo…” Plain Text Detection and Bracketing …quindi <TIMEX2 MOD=“MORE_THAN” ANCHOR_DIR=“ENDING” type=“TREL” t-cat=“YEAR” op=“+” quant=“3”, heur=“PR-DATE>oltre tre anni dopo </TIMEX2> l’incidente… Intermediate Annotation M. Negri EVALITA’07 - 09/10/2007 Dealing with Italian Temporal Expressions: the ITA-Chronos System STEP5: Anchors Selection • Goal: connect each detected T-REL to an appropriate anchor date – While the meaning of T-ABSs (“13 Marzo 2005”) is contextindependent, T-RELs (“tre anni dopo”) can only be interpreted with respect to e reference TE • The “heur” attribute is used for this purpose – 2 heuristics: CR-DATE: connects a T-REL to the document’s creation date (found at the beginning of the doc, or induced from doc’s name. e.g. “adige20041007_…) PR-DATE: connects a T-REL to the nearest detected TE with a compatible granularity (a “t-cat” with at least the same degree of specificity) t-cat= “month” M. Negri “month”, “week”, “day”, “century” EVALITA’07 - 09/10/2007 Dealing with Italian Temporal Expressions: the ITA-Chronos System STEP6: Dates Normalization • Goal: fill the VAL attribute of each detected TE T-ABSs: regular expressions considering their superficial form (“1990s” “199”) T-RELs: rewriting rules considering the anchor (e.g. “2002”) the operator (“OP”) to be applied (e.g. “+”) the quantity (“QUANT”) to be added/subtracted (e.g. “3”) tre anni dopo M. Negri “2002” “+” “3” 2005 EVALITA’07 - 09/10/2007 Dealing with Italian Temporal Expressions: the ITA-Chronos System ITA-Chronos at EVALITA 2007 • Results over the EVALITA-07 test set (27’15’’ computation time, ~50 words/sec) Value Precision Recall F-Measure Rec. 85.7 95.7 89.8 92.6 Rec.+Norm. 61.9 68.5 66.3 67.4 • Higher scores on MOD and SET attributes – Activated by the presence of triggers that are easy to identify • Lower scores with ANCHOR_VAL and ANCHOR_DIR – Require the analysis of a larger context, e.g. including verb tense M. Negri EVALITA’07 - 09/10/2007 Dealing with Italian Temporal Expressions: the ITA-Chronos System Web Demo http://www.qallme.itc.it/server/chronos/italian M. Negri EVALITA’07 - 09/10/2007