The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004 Outline Part I Part II The origin of the CLIPS lexicon The PAROLE-SIMPLE model Creating a bilingual resource The two scenarios Scenario I General encoding criteria Phonological and morphological levels Syntactic level: information content The semantic lexicon Theoretical background: GL theory The original Qualia Structure The SIMPLE ontology The Extended Qualia Structure Semantic level: information content Predicative structure Syntax-semantics mapping Encoding methodology CLIPS essential features & applications Drawbacks Scenario II The cognate approach The sense indicator approach Results Concluding remarks Nilda Ruimy september 2004 CLIPS: a bit of genealogy Semantic Information for Multifunctional Plurilingual Lexica SIMPLE European project SIMPLE lexicons 12 harmonized lexicons Italy: enlargment of these core lexicons in a national follow-up project PAROLE lexicons 12 harmonized lexicons PAROLE European project morphology: 20,000 entries syntax: 20,000 lemmas semantics: 10,000 senses Nilda Ruimy CLIPS lexicon XML format phonology: 374,000 entries morphology: 49,000 entries syntax: 55,000 lemmas semantics: 55,000 senses september 2004 The PAROLE-SIMPLE Model PAROLE-SIMPLE GENELEX-PAROLE Theoretical model Representational Model •EAGLES recommendations •Extended GENELEX model •Results from EU projects: • EUROWORDNET • ACQUILEX • DELIS • GENERATIVE LEXICON Nilda Ruimy september 2004 The Linguistic Model Innovative Tackles misrepresented areas of knowledge Extendible and multifunctional Multilingual perspective PAROLE-SIMPLE lexicons common EAGLES-conformant model common representation language common building methodology Nilda Ruimy september 2004 R E U S A B I L I T Y Representational Model (1) Entity/Relationship Model: implemented through a DTD that defines: the structure of every descriptive element the relationships holding among the various descriptive elements as well as their co-occurence restrictions non ridondant data representation Nilda Ruimy september 2004 Representational Model (2) specific representational structures for the every level of linguistic description; link among the different levels although the information encoded at each level is perfectly autonomous Nilda Ruimy september 2004 General encoding criteria Reduce the lexicographer’s margin of subjectivity by setting precise guidelines for the treatment of particular phenomena Base as much as possible the encoding on corpus data Find a balance between the encoding of attested structures / senses only and an exhaustive encoding including rare structures / senses as well Nilda Ruimy september 2004 Splitting entries Avoid both redundancy and over-powerful gatherings Use criteria strictly relevant to the description level, e.g. at the syntactic level, syntactic-driven criteria: arity syntactic function: disporre i libri negli scaffali / disporre di due auto complement optionality: attraversare (la strada) (lit. sense) / attraversare un momento difficile different (non alternative) realization of complements: Leo evita Lia / L. ha evitato di guardare L., che L. si ferisse Encode, at the semantic level, most common senses distinguished in average size dictionaries (ca.150,000 words) Nilda Ruimy september 2004 The four-level architecture The first three levels stress position vowel openness cons. prononciation Phonological Unit Corresp. PhnU-MrphU PoS & subcat. inflectional paradigm position synt. restr. position synt. restr. Morphological Unit Corresp. MrphU-SynU a. head properties b. subcat. frame a. head properties b. subcat. frame syntactic structure 1 Frameset syntactic structure 2 Nilda Ruimy Syntactic Unit september 2004 Syntactic entry information content Aumentare: Il governo ha aumentato i prezzi del 3%. I prezzi sono aumentati del 3% ‘to increase: The government has increased the prices by 3%. Prices have increased by 3%’ Specific properties of the entry in the syntactic context described main verb aux. :avere Subcategorization frame MAIN syntactic frame: complex synt. entry P0 optional subject NP RELATED syntactic frame: P0 optional subject NP P1 oblig. object NP P2 optional adverbial di_PP P1 optional adverbial di_PP decausativization Link between syntactic structures locative alternation FRAMESET relating systematic frame alternations: relates main syntactic frame to alternating one reciprocal altern. symmetrical altern. relates respective frame positions Nilda Ruimy september 2004 The semantic lexicon Theoretical linguistic background: Extended version of Pustejovsky’s Generative Lexicon (GL) theory Nilda Ruimy september 2004 Generative Lexicon theory lexical meanings of various levels of complexity bambino dottore giornale HUMAN, age (childhood), sex (male) HUMAN, age (adult), sex (male), function 1. printed paper, 2. location polysemy 3. istitution 4. human group simplest ones : definable by a taxonomic relation more complex ones:hypernymic relation not sufficient Qualia Structure allows : to coherently model the pluridimensionality of meaning to capture the relationships holding btw. semantic units to represent uniformly semantic units of different degree of complexity Nilda Ruimy september 2004 The Original Qualia structure Consists of four roles: formal role: distinguishes the denoted entity from others constitutive role: expresses its components agentive role: expresses its coming about telic role: specifies its funtion Qualia formal constitutive agentive telic = = = = Nilda Ruimy what is X? what is X made of? how does X come about? what is X’s function? september 2004 The SIMPLE ontology (1) Lexicon structured on the basis of a type ontology: Core Ontology: top level, general types; large consensus; provide essential information; mappable on EuroWordNet ontology Recommended Ontology: hierarchically lower and more specific types; provide finer-grained information Possible creation of language / application specific types Nilda Ruimy september 2004 The SIMPLE ontology (2) 157 language independent semantic types simple types (one-dimensional) : can be fully characterized in terms of a hypernymic relation, e.g. Entity Concrete_entity Living_entity Animal Earth_Animal Nilda Ruimy september 2004 The SIMPLE ontology (3) unified types (multi-dimensional) : can only be defined through the combination of: the relation to their supertype the reference to orthogonal dimensions of meaning Agentive Entity Telic Abstract_Entity Institution Nilda Ruimy september 2004 The SIMPLE ontology (4) Simple Ontology: multidimensional type hierarchy based on both hierarchical and non-hierarchical conceptual relations Nilda Ruimy september 2004 Semantic types In the SIMPLE ontology, types are not mere labels but the repository of a specific set of structured semantic information Nilda Ruimy september 2004 some semantic types for abstract & concrete entities TOP TELIC AGENTIVE CONSTITUTIVE ... Representation ... Concrete_entity ENTITY •Living_entity Event ... Property Abstract_entity •Sign •Human •Quality •Language •Animal •Quantity •Information •Vegetal_entity •..... •Artifact •Convention •Cognitive_fact •Physical_prop •..... •Psychol_prop Artifact •..... •Susbstance •Furniture •Instrument •Location •Food •Clothing •Material •Artwork Nilda Ruimy Artifactual_material september 2004 some semantic types for events EVENT Phenomenon Aspectual ... ... ... State Act ... Relational_state ... Non_relational_act Relational_act Cause_change Psych_event Change ... ... Relational_change Move Change_possession Cause_act Speech_act ... Nilda Ruimy Creation ... Acquire_knowledge Natural_transition Change_location september 2004 some semantic types for adjectives TOP Intensional Temporal Extensional Psychological_prop Modal Emotive Relational_prop Social_prop Emphasizer Physical_prop Manner Object_related Nilda Ruimy Intensifying_prop Temporal_prop september 2004 Descriptive elements Features: PlusHuman, PlusCollective,.. Relations between semantic units: R (<SemU1>, <SemU2>) Nilda Ruimy september 2004 E Formal x t isa antonym_comp e antonym_grad mult_opposition n dQ E e u x d a t e l n i S d a t e r d u c r t o u l r e e s Constitutive Agentive Telic made_of is_a_follower_of C result_of used_for A O has_as_member agentive_prog used_as G INSTRUMENTAL N E is_a_member_of agentive_cause used_by S N has_as_part T agentive_experience used_against T I instrument caused_by I indirect_telic T TELIC V kinship source U purpose E is_a_part_of T created_by ARTIFACTUAL I resulting_state is_the_activity_of V derived_from AGENTIVE relates ACTIVITY E is_the_ability_of uses is_the_habit_of causes DIRECT concerns object_of_activity TELIC affects constitutive_activity P contains R has_as_colour has_as_effect O has_as_property P measured_by E measures R produces produced_by T property_of Y quantifies related_to successor_of precedes typical_of contains feeling is_in lives_in LOCATION Nilda Ruimy september 2004 typical_location Formal isa antonym_comp antonym_grad mult_opposition disgusto, provare (disgust, feel) casa, costruire (house, build) mohair, capra (mohair, goat) proiettile, colpire (projectile, hit) metano, combustibile (methane, fuel) bisturi, chirurgo (lancet, surgeon) medico, curare antitarmico, tarma (doctor, cure) (moth balls, moth) fumatore, fumare (smoker, smoke) Constitutive Agentive Telic made_of is_a_follower_of C result_of used_for A O has_as_member agentive_prog used_as G INSTRUMENTAL N E is_a_member_of agentive_cause used_by S N has_as_part T agentive_experience used_against T I instrument caused_by I indirect_telic T TELIC V kinship source U purpose E is_a_part_of T created_by ARTIFACTUAL I resulting_state is_the_activity_of V derived_from AGENTIVE relates ACTIVITY E is_the_ability_of uses is_the_habit_of causes DIRECT concerns object_of_activity TELIC pane, farina affects (bread, flour) constitutive_activity senato, senatore P contains R (senate, senator) has_as_colour has_as_effect O manubrio, bicicletta has_as_property P measured_by (handlebar, bicycle) E measures R produces produced_by T property_of Y quantifies related_to successor_of arancio, arancia precedes typical_of (orange tree, orange) contains feeling abbaiare, cane is_in lives_in LOCATION Nilda Ruimy september(bark, 2004 dog) typical_location Orthogonal dimensions of meaning Formal role is_made_of instrument used_for Agentive role Nilda Ruimy september 2004 Orthogonal dimensions of meaning Formal role has_as_part is_made_of violin used_for playing Agentive role Nilda Ruimy september 2004 meaning dimensions expressed by Qualia relations botte barrel Formal: isa Constitutive: made_of recipiente di legno traditional dictionary definition Agentive: created_by fatto Constitutive: made_of di doghe arcuate tenute unite da cerchi di ferro che serve per la conservazione e il trasporto di liquidi, specialmente vino Constitutive: contains Nilda Ruimy Telic: Used_for september 2004 Qualia informative power (1) Within a semantic type population, further clusterings can be made through the is-a relation: CONCRETE_ENTITY manufatto is-a relation is-a relation ARTIFACT arnese attrezzo utensile strumento INSTRUMENT is-a relation giogo spalliera graticola macchina piano aratro Nilda Ruimy apparecchio dispositivo is-a relation citofono september 2004 laser Qualia informative power (2) utensile INSTRUMENT is-a is-a graticola colabrodo frusta posata is-a forchetta coltello used for mangiare pentola contenitore CONTAINER used for cucinare is-a tegame Nilda Ruimy used for padella september 2004 semantic level: information content stress position vowel openness cons. prononciation Phonological Unit Corresp. PhnU-MrphU PoS & subcat. inflectional paradigm position synt. restr. position synt. restr. Morphological Unit Corresp. MrphU-SynU a. head properties b. subcat. frame a. head properties b. subcat. frame semant. class domain derivation synonymy formal role constitutive role agentive role telic role sem. restr. syntactic structure 1 Frameset syntactic structure 2 Syntactic Unit Corresp. SynU-SemU ontological type event type semant. features semant. relations Semantic Unit Extended Qualia Structure regular polysemy type of link arguments predicate predicative represent. Nilda Ruimy september 2004 Predicative Representation Describes the semantic scenario a word sense is involved in Assigned to predicative semantic units assignment of a lexical predicate type of link holding btw. entry and predicate predicate argument stucture semantic role of arguments selection restrictions of arguments link semantic arguments / syntactic complements Nilda Ruimy september 2004 Assignment of a lexical predicate verbs; predicative nouns: deverbals (costruzione) and collective simple nouns (gruppo), nouns denoting a relation (madre), quantity (bottiglia), part (fetta), unit of measurement (metro), property (bellezza); adjectives; some adverbs (indipendentemente da) Nilda Ruimy september 2004 Predicate-semantic unit link accusa accusare accusation to accuse process nominalisation master PRED_ACCUSARE patient nominalisation agent nominalisation accusato accusatore accused accusator Nilda Ruimy september 2004 Semantic arguments: thematic roles ProtoAgent: volitional subject of verb: ARG0 of kill ProtoPatient: object undergoing an action: ARG1 of kill 2ndParticipant: indirect object: ARG2 of give SoA (State of Affair): sentential complement: ARG2 of ask Location: ARG2 of put Direction: ARG2 of move Origin: ARG1 of move Kinship: ARG0 of father HeadQuantified: ARG0 of metre, bottle Nilda Ruimy september 2004 Semantic arguments: selectional restrictions Not proper restrictions, but rather preferences of combinations in prototypical situations. Expressible through: semantic types; notions (combination of types or type + feature…) features; semantic units Features, used transversely across semantic types (eg.: plusEdible), allow to capture wider preferences w.r.t. single semantic types: ARG1 eat : [PlusEdible] Nilda Ruimy / ARG1 eat : [FOOD] september 2004 Semantic entry information content (1) Aumento: L’aumento dei prezzi da parte del governo increase: the increase of prices by the government • Semantic type: Cause_change_of_value ONTOLOGICAL INFO. • Supertype: Cause_relational_change • Eventype: transition • Domain: general, economics • Gloss: accrescimento in dimensione o quantità • aumento isa cambiamento • aumento resulting_state maggiore EXTENDED QUALIA INFO. • Agentivecause: yes • Direction: up • Morphological derivation: Eventverb aumentare • Lexical semantic predicate: PRED_aumentare • Type of link: event nominalization PREDICATIVE REPRESENTATION • Predicate arg. struct.: range, semantic role & selectional restrictions of args.: Arg0 Arg1 Arg2 Protoagent ProtoPatient Quantifier Human / Institution Entity Nilda Ruimy Amount september 2004 Semantic entry information content (2) vaporizzatore: spruzzare acqua con un vaporizzatore spray: to spray water with a spray • Semantic type: Instrument ONTOLOGICAL INFO. • Supertype: Artifact • Eventype: === • Domain: general, cleaning, gardening, cosmetics • Gloss: apparecchio usato per ridurre in minuscole particelle un liquido • vaporizzatore isa apparecchio • vaporizzatore has_as_part pulsante • vaporizzatore created_by fabbricare • vaporizzatore used_for atomizzare • Synonymy: nebulizzatore • Morphological derivation: Eventverb vaporizzare EXTENDED QUALIA INFO. • Lexical semantic predicate: PRED_vaporizzare • Type of link: instrument nominalization PREDICATIVE REPRESENTATION • Predicate arg. struct.: range, semantic role & selectional restrictions of args.: Arg0 Arg1 Arg2 Protoagent ProtoPatient Location Human / Instrument +liquid Nilda Ruimy Concrete_entity september 2004 Syntax-semantics mapping (1) position synt. restr. position synt. restr. a. head properties b. subcat. frame a. head properties b. subcat. frame syntactic structure 1 Frameset syntactic structure 2 Syntactic Unit Corresp. Syntax-Semantics semant. class domain derivation synonymy formal role constitutive role agentive role telic role sem. restr. Corresp. SynU-SemU ontological type event type semant. features semant. relations Semantic Unit Extended Qualia Structure regular polysemy type of link arguments predicate predicative represent. Nilda Ruimy september 2004 Syntax-semantics mapping (2) SYNTACTIC LEVEL SynU_migliorare ‘to improve’ Transitive structure P0 P1 Intransitive structure P0 Frameset SEMANTIC LEVEL SemU1_migliorare SemU2_migliorare CAUSE_CHANGE_OF_STATE CHANGE_OF_STATE LINK PREDICATE-SEMANTIC UNIT SEMANTIC PREDICATE PRED_ migliorare ARG0 : Agent ARG1 : Patient Nilda Ruimy september 2004 Syntax-semantics mapping (2) SynU_migliorare Transitive structure P0 P1 ‘to improve’ Intransitive structure P0 Frameset CORRESPONDENCE SYNTACTIC-SEMANTIC FRAME isomorphic non-isomorphic SemU1_migliorare SemU2_migliorare CAUSE_CHANGE_OF_STATE CHANGE_OF_STATE PRED_ migliorare ARG0 : Agent ARG1 : Patient Nilda Ruimy september 2004 Template-driven encoding methodology a template is a schema providing, for each semantic type, a set of structured information that are deemed crucial to its definition twofold function: interface between ontology and lexicon guide for the lexicographer ensures systematicity, consistency and uniformity of representation of the lexical meaning Nilda Ruimy september 2004 A template SemU: SynU: BC number: Template_Type: Unification_path: Domain: Semantic Class Gloss: Predicative_Repr. : Selectional Restr.: Derivation: Formal: Agentive: Constitutive: Telic: Synonymy: Regular Polysemy: SemU identifier Identifier of the SynU the SemU is related to Number of the corresponding ItalWordNet base concept [Container] [Concrete_entity | ArtifactAgentive | Telic] General Link to the LexiQuest (or any other ontology) Lexicographic gloss Predicate associated to the SemU and its argument structure [container(arg0)] Selectional restrictions (Arg0-HeadQuantifier-Substance) Derivational relations between SemUs isa (1, <container> or <hyperonym>) created_by (1, <Usem>: [CREATION]) //definitorial// made_of (1, <Usem>) //optional// has_as_part (1, <Usem>) //optional// contains (1, <Usem>) used_for (1, <contain>) //definitorial// used_for (1, <measure>) //optional// Synonyms of the SemU //optional// [Amount] [Container] Nilda Ruimy september 2004 CLIPS’ key features The largest electronic, multilevel lexical resource of Italian language 55,000 words encoded 4 description levels: phonology, morphology, syntax, semantics Based on a rich and multifunctional linguistic and representational model shared by 11 other European lexica Lexical description conformant to international standards Respect of the principles of uniformity, consistency and exhaustivity Generic lexicon large coverage (vocabulary and synt. structures) Fine-grained information, highly structured, innovative, most useful for HLT applications High level of reusability Nilda Ruimy september 2004 Application fields surface and deep analysis of texts information machine natural retrieval translation language understanding, etc. The wealth of information the lexicon contains allows: building semantic networks extracting NP the vocabulary of a specific domain recognition: disambiguating the semantic contribution of some PPs in complex nominals Nilda Ruimy september 2004 To lend itself to further uses, a lexicon must have: flexible model generic database uniformly structured data precise and explicit linguistic description as the PAROLE and SIMPLE lexicons, CLIPS does meet these requirements Nilda Ruimy september 2004 Creating a bilingual electronic lexical resource Strategy I: 1) Use CLIPS and the PAROLE-SIMPLE French lexicon 2) Perform a semi-automatic linking of their respective entries Nilda Ruimy september 2004 Creating a bilingual electronic lexical resource Strategy II: 1) Derive , in a semi-automatic way, a semantically annotated French lexicon from CLIPS 2) Use source and derived lexicons as a basis for building a bilingual resource Nilda Ruimy september 2004 Strategy I: bilingual dictionary ALGORITHM IT-FR & FR-IT CLIPS PAR-SIMPLE French lex. capo ufficio gentile residenza tessere pompa scrivere tessuto vestibolo testo amministratore vincere capo xxxxx yyyyy zzzzz ufficio xxxxx yyyyy …….. tête chef bout bureau charge …….. tête xxxxx yyyyy zzzzz www testa capo faccia cima bureau xxxxx ufficio yyyyy scrivania capo_1 phon:……. morph:.…… syn:………. sem:……. capo_2 …. ufficio_1 …………… ……………. ? ? …….. Nilda Ruimy september 2004 tête_1 morph:.…… syn:………. sem:……. tête_2 ….. tête_3 … bureau_1 …………… ……………. Analysis of the inherent properties of the SL & TL senses: • identity of ontological classification or subsumption relation btw. the semantic type of the SL & TL senses • identity of semantic class or subsumption relation btw. their semantic class • identity of domain or subsumption relation btw. their domain info. • identity / corrispondence of semantic features • identity / corrispondence of semantic relations Analysis of their contextual properties: • compatibility of syntactic valency • function and grammatical instantiation of complements • compatibility of semantic valency • semantic role and semantic restrictions of arguments cf. Villegas et al. LREC 2000, Athens Nilda Ruimy september 2004 evento évènement freedefinition=”cio' che e' accaduto o potra' freedefinition="something that happens at a accadere, avvenimento” given place and time" Tipo semantico: EVENT Tipo semantico: EVENT Supertype: ENTITY Supertype: ----Classe semantica: EVENT Classe semantica: EVENT scrivere écrire freedefinition=”creare qualcosa di scritto” Tipo semantico: SYMBOLIC_CREATION Supertype: CREATION Classe semantica: CREATION Domain: CREATIVE_WRITING freedefinition=”create written works & semi” Tipo semantico: CREATION Supertype: ----Classe semantica: CREATION Domain: ---- pompa pompe freedefinition=”macchina o apparecchio usato per sollevare liquidi o comprimere gas” Tipo semantico: INSTRUMENT UnificationPath:ConcreteEntityArtifactagenti ve -Materialtelic Classe semantica: APPARATUS freedefinition= "a device that moves fluid or gas by pressure or suction" Tipo semantico: ----UnificationPath:----- Classe semantica: APPARATUS Nilda Ruimy september 2004 testo_1 Tipo semantico: INFORMATION Supertype: REPRESENTATION Classe semantica: ABSTRACT Domain: MEDIA Tratto distintivo: PLUS_SEMIOTIC texte Tipo semantico: RELATIONAL_ACT Supertype: ----Classe semantica: OBJECT Domain: ---Tratto distintivo: PLUS_SEMIOTIC testo_2 Tipo semantico: SEMIOTIC_ARTIFACT UnficationPath:ConcreteEntityArtifactagentive -Telic Classe semantica: ARTIFACT Domain: MEDIA Tratto distintivo: PLUS_SEMIOTIC vincere freedefinition=”portare a termine successo” Tipo semantico: RELATIONAL_ACT Classe semantica: ACTIVITY Rel.Sem:---- PREDICATE_vincere_1 vaincre con freedef.=”be the winner in contest/competition” Tipo semantico: CAUSE_RELAT.-CHANGE Classe semantica: CHANGE Rel.Sem: Resulting_action/state: victoire Agentive_cause:cause PREDICATE_vaincre_2 Nilda Ruimy september 2004 Drawbacks of this strategy Discrepancy of lexical coverage between the lexicons => method applicable to 10,000 senses only SIMPLE-FR does not always encode all information => necessity of manual intervention wherever SL and TL entries have NO corresponding element due to: lack of information encoding error having privileged different although complementary aspects of meaning, e.g.: imprigionare: PURPOSE_ACT vs. emprisonner: CAUSE_RELATIONAL_CHANGE Nilda Ruimy september 2004 Strategy II – Phase 1: Deriving a FR lexicon from CLIPS Feasibility study for deriving a semantically annotated French lexicon using CLIPS lexical knowledge Crucial step for deriving the French entries: correctly pair off each FR w. sense with the relevant CLIPS semantic unit whose information we want to ultimately assign to the French entry Nilda Ruimy september 2004 exploits the cognateness of Italian and French endings to relate the FR word to the IT CLIPS entry and infer the FR entry CLIPS cognate approach villaggio: 1 . (piccolo centro abitato) village 2. (complesso urbanistico) village capo: 1.(testa) tête; 2.(persona che...) chef ... sense indicator approach matches onto the CLIPS data the information provided in bilingual dictionaries by sense indicators, in order to identify the relevant CLIPS entry Nilda Ruimy semantically annotated French lexicon september 2004 The cognate approach P. Bouillon, B. Cartoni, TIM/ISSCO, ETI, Geneva IT—FR bilingual dict. villaggio : 1. (piccolo centro abitato) village 2. (complesso urbanistico) village Condition: unique French constructed word translate all IT senses IT–CLIPS FR–LEX <SemU id="USem4123villaggio"> naming="villaggio" weightvalsemfeatrel=«Geopolitical_Location» […] </SemU> <SemU id="USem0001village"> naming="village" weightvalsemfeaturel= «Geopolitical_Location» […] </SemU> <SemU id="USemD63504villaggio" naming="villaggio" weightvalsemfeaturel=«Human_group» […]</SemU> <SemU id="USem0002village"> naming="village" weightvalsemfeaturel=«Human_group» […] </SemU> Nilda Ruimy september 2004 N. Ruimy, The sense indicator approach ILC-CNR, Pisa … IT word SENSE INDICATOR FR word capo (persona che…) chef capo (testa) tête aspirare tr.(con un tubo) aspirer aspirare LING. aspirer aspirare tr. (inalare) aspirer aspirare intr.(avere) prep. a aspirer à avvertire (avvisare) prévenir avvertire (percepire) sentir asfalto (per rivestire) asphalte compagnia (gruppo) compagnie compagnia (presenza) compagnie extracted from bilingual dictionary analysis & classification of sense indicators Nilda Ruimy september 2004 Types of sense indicators (1) Atkins, Bouillon, 2003 indicators conveying morphosyntactic information: verb subclass, auxiliary selection, plural form of nouns, typical subject / object, PP type, etc. Italian–French COVARE typical subj. A. v.tr. 1 (di uccelli) [dar calore col proprio corpo alle uova per sviluppare l’embrione] couver 2 (fig.) [custodire con gelosia] couver 3 (fig.)[nutrire, alimentare in segreto dentro di sé] nourrir, mijoter [tramare, macchinare in segreto] couver [incubare] couver: covare un malanno B. v.intr. (aus. avere)(fig.)[stare chiuso, nascosto] couver: il fuoco cova sotto la cenere verbal class Nilda Ruimy september 2004 verbal class auxiliary Types of sense indicators (2) indicators conveying inferential information: synonyms, hypernyms, meronyms domain of use Italian–French synonym CAPO I (persone) 1 [testa] tête 2 (fig.) [mente, intelligenza] tête 3 [persona investita di comando, di potere] chef synonym domain of use hypernym II (animali) 1 (raro) -> testa 2 spec. al plur [ciascun individuo di una specie determinata] têtes, pièces III (cose) 1 [la parte più grossa e più sporgente di un oggetto] tête 2 [la parte più alta] haut 3 [ciascuna delle due estremità di qlco.] bout, tête 4 [inizio, principio] début 5 [fine, conclusione; sbocco] bout 6 loc. ….. 7 (nei filati) fil 8 [singolo oggetto appartenente ad una serie] pièce 9 (geog.) cap Nilda Ruimy synonym domain of use september 2004 IT word SENSE INDICATOR FR word capo (persona che…) chef capo (testa) tête aspirare tr.(con un tubo) aspirer aspirare LING. aspirer aspirare tr. (inalare) aspirer aspirare intr.(avere) prep. a aspirer à avvertire (avvisare) prévenir avvertire (percepire) sentir asfalto (per rivestire) asphalte gioielleria (negozio) bijouterie gioielleria (arte) bijouterie CLIPS … sense indicators used as search keys for identifying, in CLIPS, the semantic entry relevant to the IT sense of the bilingual pair Nilda Ruimy september 2004 Using sense indicators indicators usable straightforwardly sem. type of analizzatore belongs to HUMAN hierarchy indicators to be converted into the descriptive language of CLIPS: analizzatore (chi effettua analisi) analyste (who performs analyses) illuminare (rendere luminoso) illuminer (to make luminous) sem. type of iluminare belongs to causative types hierarchy Nilda Ruimy september 2004 Rule types search for a CLIPS entry containing the s.i. as target of the synonymic relation capo synonym_rel of the hypernymic relation gioielleria isa_rel testa negozio of any qualia relation search for a CLIPS entry sharing properties with the entry of the s.i. comunicare (notificare) isa_rel dire shared hypernym shared semantic type avvertire (percepire) semtype EXP._EVENT search for a CLIPS entry containing information inferred from the s.i. specific type conoscere (pron. (reciprocamente)) reciprocal syn. struct. specific relation or feature (esp. domain info.) specific syntactic structure Nilda Ruimy september 2004 IT word SENSE INDICATOR FR word capo (persona che…) chef capo (testa) tête aspirare tr.(con un tubo) aspirer aspirare LING. aspirer aspirare tr. (inalare) aspirer aspirare intr.(avere) prep. a aspirer à avvertire (avvisare) prévenir avvertire (percepire) sentir asfalto (per rivestire) asphalte compagnia (gruppo) compagnie compagnia (presenza) compagnie CLIPS … SemU3615capo, sem. type=Role, where <capo> isa <persona> SemU61397capo, sem. type=Body_part, where <capo> synonym <testa> SemU79372aspirare, sem. type=Speech_act, where domain:phonetics SemU7040aspirare, sem. type=Modal_event, linked to SynUaspirare, intr. pp_a SemU68603asfalto, sem. type=Artifact_Material, where <asfalto> used_for <rivestire> Nilda Ruimy september 2004 Cognate approach: results IT constructed words whose different senses are translated by a unique FR constructed word IT constructed words having more than one translation –aggio 89.9 % 10.1 % –tà 77.4 % 22.6 % –zione 80.4 % 19.6 % recall ratio FR constructed words sharing the IT CLIPS entries –aggio 99.97 % –tà 99.98 % –zione 99.98 % Nilda Ruimy september 2004 Small percentage of errors due to a different granularity of sense distinctions in CLIPS and in the blingual dictionary Sense indicator approach: results Itword – sense indicator – – X rule type application order – A 1 investigated target of lex. data syn. rel. 2 Y 3 2 search for an entry of X search for entry of containing string A X sharing properties with an entry of A 1 FRword 9 7 shared target of target of hyper. rel. any qualia hypernym search for an entry of X containing information inferred from A 8 6 3 5 4 shared semtype specific semtype specific domain specific feat/rel specific syn.struct the higher the rule rank, the more reliable the result success 16.6% 26.8% 0.92% rate 8.9% 5.8% Nilda Ruimy 3.9% 12.3% 9.2% 15.4% september 2004 distribution of success rate over the algorithm rules recall ratio: 69% Nilda Ruimy september 2004 Combining the two methods successful handling of: results may be enhanced by gleaning the most informative sense indicators from different sources 95% of constructed words + 69% of non constructed words constructed words represent 68.2% of the vocabulary Nilda Ruimy september 2004 Concluding remarks Deriving new lexical resources from existing ones: a worthwhile venture in terms of time and effort Derived lexicon building process is simplified and shortened Such practice entails coverage and consistency assessment of the source lexical resource Source and derived lexicons constitute a most reliable basis for developing a bilingual resource Approaches taken applicable to other language pairs sharing similarities in terms of morphological structure Nilda Ruimy september 2004