The Italian CLIPS Lexicon
and its reuse
in a bilingual environment
Nilda Ruimy
ILC CNR, Pisa
september 2004
Outline
Part I
Part II
 The origin of the CLIPS lexicon
 The PAROLE-SIMPLE model
 Creating a bilingual resource
 The two scenarios
 Scenario I
 General encoding criteria




Phonological and morphological levels
Syntactic level: information content
The semantic lexicon
Theoretical background: GL theory
 The original Qualia Structure







The SIMPLE ontology
The Extended Qualia Structure
Semantic level: information content
Predicative structure
Syntax-semantics mapping
Encoding methodology
CLIPS essential features & applications
 Drawbacks
 Scenario II
 The cognate approach
 The sense indicator approach
 Results
 Concluding remarks
Nilda Ruimy
september 2004
CLIPS: a bit of genealogy
Semantic Information for
Multifunctional
Plurilingual Lexica
SIMPLE European project
SIMPLE
lexicons
12 harmonized lexicons
Italy: enlargment of these core
lexicons in a national follow-up project
PAROLE
lexicons
12 harmonized lexicons
PAROLE European project
morphology: 20,000 entries
syntax:
20,000 lemmas
semantics: 10,000 senses
Nilda Ruimy
CLIPS
lexicon
XML format
phonology: 374,000 entries
morphology: 49,000 entries
syntax:
55,000 lemmas
semantics:
55,000 senses
september 2004
The PAROLE-SIMPLE Model
PAROLE-SIMPLE
GENELEX-PAROLE
Theoretical model
Representational Model
•EAGLES recommendations
•Extended GENELEX model
•Results from EU projects:
• EUROWORDNET
• ACQUILEX
• DELIS
• GENERATIVE LEXICON
Nilda Ruimy
september 2004
The Linguistic Model
Innovative
Tackles misrepresented areas of knowledge
Extendible and multifunctional
Multilingual perspective
PAROLE-SIMPLE lexicons
 common EAGLES-conformant model
 common representation language
 common building methodology
Nilda Ruimy
september 2004
R
E
U
S
A
B
I
L
I
T
Y
Representational Model (1)
Entity/Relationship Model:
 implemented through a DTD that defines:
 the structure of every descriptive element
 the relationships holding among the various
descriptive elements as well as their co-occurence
restrictions
 non ridondant data representation
Nilda Ruimy
september 2004
Representational Model (2)
specific representational structures for the
every level of linguistic description;
link among the different levels although the
information encoded at each level is perfectly
autonomous
Nilda Ruimy
september 2004
General encoding criteria
 Reduce the lexicographer’s margin of subjectivity
by setting precise guidelines for the treatment of
particular phenomena
 Base as much as possible the encoding on corpus
data
 Find a balance between the encoding of attested
structures / senses only and an exhaustive
encoding including rare structures / senses as well
Nilda Ruimy
september 2004
Splitting entries
 Avoid both redundancy and over-powerful
gatherings
 Use criteria strictly relevant to the description level,
e.g. at the syntactic level, syntactic-driven criteria:
 arity
 syntactic function:
disporre i libri negli scaffali / disporre di due auto
 complement optionality:
attraversare (la strada) (lit. sense) / attraversare un momento difficile
 different (non alternative) realization of complements:
Leo evita Lia / L. ha evitato di guardare L., che L. si ferisse
 Encode, at the semantic level, most common
senses distinguished in average size dictionaries
(ca.150,000 words)
Nilda Ruimy
september 2004
The four-level architecture
The first three levels
stress position
vowel openness
cons. prononciation
Phonological
Unit
Corresp. PhnU-MrphU
PoS & subcat.
inflectional paradigm
position
synt. restr.
position
synt. restr.
Morphological
Unit
Corresp. MrphU-SynU
a. head properties
b. subcat. frame
a. head properties
b. subcat. frame
syntactic structure 1
Frameset
syntactic structure 2
Nilda Ruimy
Syntactic
Unit
september 2004
Syntactic entry information content
Aumentare: Il governo ha aumentato i prezzi del 3%. I prezzi sono aumentati del 3%
‘to increase: The government has increased the prices by 3%. Prices have increased by 3%’
 Specific properties of the entry in the syntactic context described
main verb
aux. :avere
 Subcategorization frame
MAIN syntactic frame:
complex
synt.
entry
P0
optional
subject
NP
RELATED syntactic frame:
P0
optional
subject
NP
P1
oblig.
object
NP
P2
optional
adverbial
di_PP
P1
optional
adverbial
di_PP
decausativization
 Link between syntactic structures
locative alternation
FRAMESET relating systematic frame alternations:
relates main syntactic frame to alternating one
reciprocal altern.
symmetrical altern.
relates respective frame positions
Nilda Ruimy
september 2004
The semantic lexicon
Theoretical linguistic background:
Extended version of
Pustejovsky’s Generative Lexicon (GL) theory
Nilda Ruimy
september 2004
Generative Lexicon theory
 lexical meanings of various levels of complexity

bambino
 dottore
 giornale



HUMAN, age (childhood), sex (male)
HUMAN, age (adult), sex (male), function
1. printed paper, 2. location
polysemy
3. istitution 4. human group
 simplest ones : definable by a taxonomic relation
 more complex ones:hypernymic relation not sufficient
 Qualia Structure allows :
to coherently model the pluridimensionality of meaning
to capture the relationships holding btw. semantic units
to represent uniformly semantic units of different degree of
complexity
Nilda Ruimy
september 2004
The Original Qualia structure
Consists of four roles:




formal role: distinguishes the denoted entity from others
constitutive role: expresses its components
agentive role: expresses its coming about
telic role: specifies its funtion
Qualia
formal
constitutive
agentive
telic
=
=
=
=
Nilda Ruimy
what is X?
what is X made of?
how does X come about?
what is X’s function?
september 2004
The SIMPLE ontology (1)
Lexicon structured on the basis of a type ontology:
 Core Ontology:
 top level, general types;
 large consensus;
 provide essential information;
 mappable on EuroWordNet ontology
 Recommended Ontology:
 hierarchically lower and more specific types;
 provide finer-grained information
Possible creation of language / application specific types
Nilda Ruimy
september 2004
The SIMPLE ontology (2)
157 language independent semantic types
simple types (one-dimensional) :
can be fully characterized in terms of a hypernymic
relation, e.g.
Entity
Concrete_entity
Living_entity
Animal
Earth_Animal
Nilda Ruimy
september 2004
The SIMPLE ontology (3)
unified types (multi-dimensional) :
can only be defined through the combination of:
 the relation to their supertype
 the reference to orthogonal dimensions of meaning
Agentive
Entity
Telic
Abstract_Entity
Institution
Nilda Ruimy
september 2004
The SIMPLE ontology (4)
Simple Ontology:
multidimensional type hierarchy based on both
hierarchical and non-hierarchical conceptual
relations
Nilda Ruimy
september 2004
Semantic types
 In the SIMPLE ontology, types are not mere labels
but the repository of a specific set of structured
semantic information
Nilda Ruimy
september 2004
some semantic types for
abstract & concrete entities
TOP
TELIC
AGENTIVE
CONSTITUTIVE
...
Representation
...
Concrete_entity
ENTITY
•Living_entity
Event
...
Property
Abstract_entity
•Sign
•Human
•Quality
•Language
•Animal
•Quantity
•Information
•Vegetal_entity
•.....
•Artifact
•Convention
•Cognitive_fact
•Physical_prop
•.....
•Psychol_prop
Artifact
•.....
•Susbstance •Furniture
•Instrument
•Location
•Food
•Clothing
•Material
•Artwork
Nilda Ruimy
Artifactual_material
september 2004
some semantic types
for events
EVENT
Phenomenon
Aspectual
...
...
...
State
Act
...
Relational_state
...
Non_relational_act
Relational_act
Cause_change
Psych_event
Change ...
...
Relational_change
Move
Change_possession
Cause_act
Speech_act
...
Nilda Ruimy
Creation
...
Acquire_knowledge
Natural_transition
Change_location
september 2004
some semantic types
for adjectives
TOP
Intensional
Temporal
Extensional
Psychological_prop
Modal
Emotive
Relational_prop
Social_prop
Emphasizer
Physical_prop
Manner
Object_related
Nilda Ruimy
Intensifying_prop
Temporal_prop
september 2004
Descriptive elements
 Features:
PlusHuman, PlusCollective,..
 Relations between semantic units:
R (<SemU1>, <SemU2>)
Nilda Ruimy
september 2004
E
Formal
x
t isa
antonym_comp
e antonym_grad
mult_opposition
n
dQ
E
e u
x
d a
t
e
l
n
i S
d
a t
e
r
d
u
c
r
t
o
u
l
r
e
e
s
Constitutive
Agentive
Telic
made_of
is_a_follower_of
C
result_of
used_for
A
O
has_as_member
agentive_prog
used_as
G
INSTRUMENTAL
N
E
is_a_member_of
agentive_cause
used_by
S
N
has_as_part
T
agentive_experience
used_against
T
I
instrument
caused_by
I
indirect_telic
T
TELIC
V
kinship
source
U
purpose
E
is_a_part_of
T
created_by
ARTIFACTUAL
I
resulting_state
is_the_activity_of
V
derived_from AGENTIVE
relates
ACTIVITY
E
is_the_ability_of
uses
is_the_habit_of
causes
DIRECT
concerns
object_of_activity TELIC
affects
constitutive_activity
P
contains
R
has_as_colour
has_as_effect
O
has_as_property
P
measured_by
E
measures
R
produces
produced_by
T
property_of
Y
quantifies
related_to
successor_of
precedes
typical_of
contains
feeling
is_in
lives_in
LOCATION
Nilda Ruimy
september 2004
typical_location
Formal
isa
antonym_comp
antonym_grad
mult_opposition
disgusto, provare
(disgust, feel)
casa, costruire
(house, build)
mohair, capra
(mohair, goat)
proiettile, colpire
(projectile, hit)
metano, combustibile
(methane, fuel)
bisturi, chirurgo
(lancet, surgeon)
medico, curare
antitarmico,
tarma
(doctor, cure)
(moth balls, moth)
fumatore, fumare
(smoker, smoke)
Constitutive
Agentive
Telic
made_of
is_a_follower_of
C
result_of
used_for
A
O
has_as_member
agentive_prog
used_as
G
INSTRUMENTAL
N
E
is_a_member_of
agentive_cause
used_by
S
N
has_as_part
T
agentive_experience
used_against
T
I
instrument
caused_by
I
indirect_telic
T
TELIC
V
kinship
source
U
purpose
E
is_a_part_of
T
created_by
ARTIFACTUAL
I
resulting_state
is_the_activity_of
V
derived_from AGENTIVE
relates
ACTIVITY
E
is_the_ability_of
uses
is_the_habit_of
causes
DIRECT
concerns
object_of_activity TELIC
pane, farina
affects
(bread, flour)
constitutive_activity
senato, senatore
P
contains
R
(senate, senator)
has_as_colour
has_as_effect
O
manubrio, bicicletta
has_as_property
P
measured_by
(handlebar, bicycle)
E
measures
R
produces
produced_by
T
property_of
Y
quantifies
related_to
successor_of
arancio, arancia
precedes
typical_of
(orange tree, orange)
contains
feeling
abbaiare, cane
is_in
lives_in
LOCATION
Nilda Ruimy
september(bark,
2004 dog)
typical_location
Orthogonal dimensions of meaning
Formal role
is_made_of
instrument
used_for
Agentive role
Nilda Ruimy
september 2004
Orthogonal dimensions of meaning
Formal role
has_as_part
is_made_of
violin
used_for
playing
Agentive role
Nilda Ruimy
september 2004
meaning dimensions expressed by
Qualia relations
botte
barrel
Formal: isa
Constitutive:
made_of
recipiente
di legno
traditional
dictionary
definition
Agentive:
created_by
fatto
Constitutive:
made_of
di doghe arcuate tenute unite da cerchi di ferro
che serve per la conservazione e il trasporto
di liquidi, specialmente vino
Constitutive:
contains
Nilda Ruimy
Telic:
Used_for
september 2004
Qualia informative power (1)
Within a semantic type population, further
clusterings can be made through the is-a relation:
CONCRETE_ENTITY
manufatto
is-a relation
is-a relation
ARTIFACT
arnese
attrezzo
utensile
strumento
INSTRUMENT
is-a relation
giogo
spalliera
graticola
macchina
piano
aratro
Nilda Ruimy
apparecchio
dispositivo
is-a relation
citofono
september 2004
laser
Qualia informative power (2)
utensile
INSTRUMENT
is-a
is-a
graticola
colabrodo
frusta
posata
is-a
forchetta coltello
used for
mangiare
pentola
contenitore
CONTAINER
used for
cucinare
is-a
tegame
Nilda Ruimy
used for
padella
september 2004
semantic level: information content
stress position
vowel openness
cons. prononciation
Phonological
Unit
Corresp. PhnU-MrphU
PoS & subcat.
inflectional paradigm
position
synt. restr.
position
synt. restr.
Morphological
Unit
Corresp. MrphU-SynU
a. head properties
b. subcat. frame
a. head properties
b. subcat. frame
semant. class
domain
derivation
synonymy
formal role
constitutive role
agentive role
telic role
sem. restr.
syntactic structure 1
Frameset
syntactic structure 2
Syntactic
Unit
Corresp. SynU-SemU
ontological type
event type
semant. features
semant. relations
Semantic
Unit
Extended Qualia Structure
regular polysemy
type of link
arguments
predicate
predicative represent.
Nilda Ruimy
september 2004
Predicative Representation
Describes the semantic scenario a word sense is
involved in
Assigned to predicative semantic units
 assignment of a lexical predicate
 type of link holding btw. entry and predicate
 predicate argument stucture
 semantic role of arguments
 selection restrictions of arguments
 link semantic arguments / syntactic complements
Nilda Ruimy
september 2004
Assignment of a lexical predicate
verbs;
predicative nouns: deverbals (costruzione) and
collective simple nouns (gruppo), nouns denoting
a relation (madre), quantity (bottiglia), part (fetta),
unit of measurement (metro), property (bellezza);
adjectives;
some adverbs (indipendentemente da)
Nilda Ruimy
september 2004
Predicate-semantic unit link
accusa
accusare
accusation
to accuse
process
nominalisation
master
PRED_ACCUSARE
patient
nominalisation
agent
nominalisation
accusato
accusatore
accused
accusator
Nilda Ruimy
september 2004
Semantic arguments:
thematic roles
ProtoAgent: volitional subject of verb: ARG0 of kill
ProtoPatient: object undergoing an action: ARG1 of kill
2ndParticipant: indirect object: ARG2 of give
SoA (State of Affair): sentential complement: ARG2 of ask
Location: ARG2 of put
Direction: ARG2 of move
Origin: ARG1 of move
Kinship: ARG0 of father
HeadQuantified: ARG0 of metre, bottle
Nilda Ruimy
september 2004
Semantic arguments:
selectional restrictions
Not proper restrictions, but rather preferences of
combinations in prototypical situations.
Expressible through:
semantic types;
notions (combination of types or type + feature…)
features;
semantic units
Features, used transversely across semantic
types (eg.: plusEdible), allow to capture wider
preferences w.r.t. single semantic types:
ARG1 eat : [PlusEdible]
Nilda Ruimy
/ ARG1 eat : [FOOD]
september 2004
Semantic entry information content (1)
Aumento: L’aumento dei prezzi da parte del governo
increase:
the increase of prices by the government
• Semantic type: Cause_change_of_value
ONTOLOGICAL INFO.
• Supertype: Cause_relational_change
• Eventype: transition
• Domain: general, economics
• Gloss: accrescimento in dimensione o quantità
• aumento isa cambiamento
• aumento resulting_state maggiore
EXTENDED QUALIA INFO.
• Agentivecause: yes
• Direction: up
• Morphological derivation: Eventverb aumentare
• Lexical semantic predicate: PRED_aumentare
• Type of link: event nominalization
PREDICATIVE REPRESENTATION
• Predicate arg. struct.: range, semantic role & selectional restrictions of args.:
Arg0
Arg1
Arg2
Protoagent
ProtoPatient
Quantifier
Human / Institution
Entity
Nilda
Ruimy
Amount
september
2004
Semantic entry information content (2)
vaporizzatore:
spruzzare acqua con un vaporizzatore
spray:
to spray water with a spray
• Semantic type: Instrument
ONTOLOGICAL INFO.
• Supertype: Artifact
• Eventype: ===
• Domain: general, cleaning, gardening, cosmetics
• Gloss: apparecchio usato per ridurre in minuscole particelle un liquido
• vaporizzatore isa apparecchio
• vaporizzatore has_as_part pulsante
• vaporizzatore created_by fabbricare
• vaporizzatore used_for atomizzare
• Synonymy: nebulizzatore
• Morphological derivation: Eventverb vaporizzare
EXTENDED QUALIA INFO.
• Lexical semantic predicate: PRED_vaporizzare
• Type of link: instrument nominalization
PREDICATIVE REPRESENTATION
• Predicate arg. struct.: range, semantic role & selectional restrictions of args.:
Arg0
Arg1
Arg2
Protoagent
ProtoPatient
Location
Human / Instrument
+liquid
Nilda
Ruimy
Concrete_entity
september
2004
Syntax-semantics mapping (1)
position
synt. restr.
position
synt. restr.
a. head properties
b. subcat. frame
a. head properties
b. subcat. frame
syntactic structure 1
Frameset
syntactic structure 2
Syntactic
Unit
Corresp. Syntax-Semantics
semant. class
domain
derivation
synonymy
formal role
constitutive role
agentive role
telic role
sem. restr.
Corresp. SynU-SemU
ontological type
event type
semant. features
semant. relations
Semantic
Unit
Extended Qualia Structure
regular polysemy
type of link
arguments
predicate
predicative represent.
Nilda Ruimy
september 2004
Syntax-semantics mapping (2)
SYNTACTIC LEVEL
SynU_migliorare ‘to improve’
Transitive structure
P0
P1
Intransitive structure
P0
Frameset
SEMANTIC LEVEL
SemU1_migliorare
SemU2_migliorare
CAUSE_CHANGE_OF_STATE
CHANGE_OF_STATE
LINK PREDICATE-SEMANTIC UNIT
SEMANTIC PREDICATE
PRED_ migliorare
ARG0 : Agent
ARG1 : Patient
Nilda Ruimy
september 2004
Syntax-semantics mapping (2)
SynU_migliorare
Transitive structure
P0
P1
‘to improve’
Intransitive structure
P0
Frameset
CORRESPONDENCE SYNTACTIC-SEMANTIC FRAME
isomorphic
non-isomorphic
SemU1_migliorare
SemU2_migliorare
CAUSE_CHANGE_OF_STATE
CHANGE_OF_STATE
PRED_ migliorare
ARG0 : Agent
ARG1 : Patient
Nilda Ruimy
september 2004
Template-driven
encoding methodology
 a template is a schema providing, for each
semantic type, a set of structured
information that are deemed crucial to its
definition
 twofold function:
interface between ontology and lexicon
guide for the lexicographer
 ensures systematicity, consistency and
uniformity of representation of the lexical
meaning
Nilda Ruimy
september 2004
A template
SemU:
SynU:
BC number:
Template_Type:
Unification_path:
Domain:
Semantic Class
Gloss:
Predicative_Repr.
:
Selectional
Restr.:
Derivation:
Formal:
Agentive:
Constitutive:
Telic:
Synonymy:
Regular
Polysemy:
SemU identifier
Identifier of the SynU the SemU is related to
Number of the corresponding ItalWordNet base concept
[Container]
[Concrete_entity | ArtifactAgentive | Telic]
General
Link to the LexiQuest (or any other ontology)
Lexicographic gloss
Predicate associated to the SemU and its argument
structure [container(arg0)]
Selectional restrictions (Arg0-HeadQuantifier-Substance)
Derivational relations between SemUs
isa (1, <container> or <hyperonym>)
created_by (1, <Usem>: [CREATION]) //definitorial//
made_of (1, <Usem>) //optional//
has_as_part (1, <Usem>) //optional//
contains (1, <Usem>)
used_for (1, <contain>) //definitorial//
used_for (1, <measure>) //optional//
Synonyms of the SemU //optional//
[Amount] [Container]
Nilda Ruimy
september 2004
CLIPS’ key features
The largest electronic, multilevel lexical resource of Italian language
 55,000 words encoded
 4 description levels: phonology, morphology, syntax, semantics
 Based on a rich and multifunctional linguistic and representational
model shared by 11 other European lexica
 Lexical
description conformant to international standards
 Respect
of the principles of uniformity, consistency and exhaustivity
 Generic
lexicon  large coverage (vocabulary and synt. structures)
 Fine-grained
information, highly structured, innovative, most useful
for HLT applications
 High
level of reusability
Nilda Ruimy
september 2004
Application fields
 surface
and deep analysis of texts
 information
 machine
 natural
retrieval
translation
language understanding, etc.
The wealth of information the lexicon contains allows:
 building
semantic networks
 extracting
 NP
the vocabulary of a specific domain
recognition: disambiguating the semantic contribution
of some PPs in complex nominals
Nilda Ruimy
september 2004
To lend itself to further uses, a lexicon must have:
 flexible model
 generic database
 uniformly structured data
 precise and explicit linguistic description
as the PAROLE and SIMPLE lexicons,
CLIPS does meet these requirements
Nilda Ruimy
september 2004
Creating a bilingual electronic
lexical resource
Strategy I:
1) Use CLIPS and the PAROLE-SIMPLE French lexicon
2) Perform a semi-automatic linking of their respective
entries
Nilda Ruimy
september 2004
Creating a bilingual electronic
lexical resource
Strategy II:
1) Derive , in a semi-automatic way, a semantically
annotated French lexicon from CLIPS
2) Use source and derived lexicons as a basis for
building a bilingual resource
Nilda Ruimy
september 2004
Strategy I:
bilingual
dictionary
ALGORITHM
IT-FR & FR-IT
CLIPS
PAR-SIMPLE
French lex.
capo
ufficio
gentile
residenza
tessere
pompa
scrivere
tessuto
vestibolo
testo
amministratore
vincere
capo
xxxxx
yyyyy
zzzzz
ufficio
xxxxx
yyyyy
……..
tête
chef
bout
bureau
charge
……..
tête
xxxxx
yyyyy
zzzzz
www
testa
capo
faccia
cima
bureau
xxxxx ufficio
yyyyy scrivania
capo_1
phon:…….
morph:.……
syn:……….
sem:…….
capo_2
….
ufficio_1
……………
…………….
?
?
……..
Nilda Ruimy
september 2004
tête_1
morph:.……
syn:……….
sem:…….
tête_2
…..
tête_3
…
bureau_1
……………
…………….
Analysis of the inherent properties of the SL & TL senses:
• identity of ontological classification or subsumption relation btw.
the semantic type of the SL & TL senses
• identity of semantic class or subsumption relation btw. their semantic
class
• identity of domain or subsumption relation btw. their domain info.
• identity / corrispondence of semantic features
• identity / corrispondence of semantic relations
Analysis of their contextual properties:
• compatibility of syntactic valency
• function and grammatical instantiation of complements
• compatibility of semantic valency
• semantic role and semantic restrictions of arguments
cf. Villegas et al. LREC 2000, Athens
Nilda Ruimy
september 2004
evento
évènement
freedefinition=”cio' che e' accaduto o potra' freedefinition="something that happens at a
accadere, avvenimento”
given place and time"
Tipo semantico: EVENT
Tipo semantico: EVENT
Supertype: ENTITY
Supertype: ----Classe semantica: EVENT
Classe semantica: EVENT
scrivere
écrire
freedefinition=”creare qualcosa di scritto”
Tipo semantico: SYMBOLIC_CREATION
Supertype: CREATION
Classe semantica: CREATION
Domain: CREATIVE_WRITING
freedefinition=”create written works & semi”
Tipo semantico: CREATION
Supertype: ----Classe semantica: CREATION
Domain: ----
pompa
pompe
freedefinition=”macchina o apparecchio
usato per sollevare liquidi o comprimere
gas”
Tipo semantico: INSTRUMENT
UnificationPath:ConcreteEntityArtifactagenti
ve -Materialtelic
Classe semantica: APPARATUS
freedefinition= "a device that moves fluid or
gas by pressure or suction"
Tipo semantico: ----UnificationPath:-----
Classe semantica: APPARATUS
Nilda Ruimy
september 2004
testo_1
Tipo semantico: INFORMATION
Supertype: REPRESENTATION
Classe semantica: ABSTRACT
Domain: MEDIA
Tratto distintivo: PLUS_SEMIOTIC
texte
Tipo semantico: RELATIONAL_ACT
Supertype: ----Classe semantica: OBJECT
Domain: ---Tratto distintivo: PLUS_SEMIOTIC
testo_2
Tipo semantico: SEMIOTIC_ARTIFACT
UnficationPath:ConcreteEntityArtifactagentive -Telic
Classe semantica: ARTIFACT
Domain: MEDIA
Tratto distintivo: PLUS_SEMIOTIC
vincere
freedefinition=”portare
a
termine
successo”
Tipo semantico: RELATIONAL_ACT
Classe semantica: ACTIVITY
Rel.Sem:----
PREDICATE_vincere_1
vaincre
con freedef.=”be the winner in
contest/competition”
Tipo semantico: CAUSE_RELAT.-CHANGE
Classe semantica: CHANGE
Rel.Sem: Resulting_action/state: victoire
Agentive_cause:cause
PREDICATE_vaincre_2
Nilda Ruimy
september 2004
Drawbacks of this strategy
 Discrepancy of lexical coverage between the lexicons
=> method applicable to 10,000 senses only
 SIMPLE-FR does not always encode all information
=> necessity of manual intervention wherever SL and
TL entries have NO corresponding element due to:
 lack of information
 encoding error
 having privileged different although complementary
aspects of meaning, e.g.:
imprigionare: PURPOSE_ACT
vs. emprisonner: CAUSE_RELATIONAL_CHANGE
Nilda Ruimy
september 2004
Strategy II – Phase 1:
Deriving a FR lexicon from CLIPS
 Feasibility study for deriving a semantically annotated French
lexicon using CLIPS lexical knowledge
 Crucial step for deriving the French entries:
correctly pair off each FR w. sense with the relevant CLIPS
semantic unit whose information we want to ultimately assign
to the French entry
Nilda Ruimy
september 2004
exploits the cognateness of Italian and
French endings to relate the FR word
to the IT CLIPS entry and infer the FR
entry
CLIPS
cognate
approach
villaggio:
1 . (piccolo centro abitato)
village
2. (complesso urbanistico)
village
capo:
1.(testa)
tête;
2.(persona che...)
chef
...
sense indicator
approach
matches onto the CLIPS data the
information provided in bilingual
dictionaries by sense indicators, in
order to identify the relevant CLIPS
entry
Nilda Ruimy
semantically
annotated
French lexicon
september 2004
The cognate approach
P. Bouillon, B. Cartoni,
TIM/ISSCO, ETI, Geneva
IT—FR bilingual dict.
villaggio : 1. (piccolo centro abitato) village
2. (complesso urbanistico) village
Condition:
unique French constructed word
translate all IT senses
IT–CLIPS
FR–LEX
<SemU id="USem4123villaggio">
naming="villaggio"
weightvalsemfeatrel=«Geopolitical_Location»
[…] </SemU>
<SemU id="USem0001village">
naming="village"
weightvalsemfeaturel= «Geopolitical_Location»
[…] </SemU>
<SemU id="USemD63504villaggio"
naming="villaggio"
weightvalsemfeaturel=«Human_group»
[…]</SemU>
<SemU id="USem0002village">
naming="village"
weightvalsemfeaturel=«Human_group»
[…] </SemU>
Nilda Ruimy
september 2004
N. Ruimy,
The sense indicator approach ILC-CNR,
Pisa
…
IT word
SENSE INDICATOR
FR word
capo
(persona che…)
chef
capo
(testa)
tête
aspirare
tr.(con un tubo)
aspirer
aspirare
LING.
aspirer
aspirare
tr. (inalare)
aspirer
aspirare
intr.(avere) prep. a
aspirer à
avvertire
(avvisare)
prévenir
avvertire
(percepire)
sentir
asfalto
(per rivestire)
asphalte
compagnia
(gruppo)
compagnie
compagnia
(presenza)
compagnie
extracted from
bilingual dictionary
analysis & classification
of sense indicators
Nilda Ruimy
september 2004
Types of sense indicators (1)
Atkins, Bouillon, 2003
 indicators conveying morphosyntactic information:
 verb
subclass, auxiliary selection, plural form of nouns,
typical subject / object, PP type, etc.
Italian–French
COVARE
typical subj.
A. v.tr.
1 (di uccelli) [dar calore col proprio corpo alle uova per
sviluppare l’embrione] couver
2 (fig.) [custodire con gelosia] couver
3 (fig.)[nutrire, alimentare in segreto dentro di sé] nourrir,
mijoter
[tramare, macchinare in segreto] couver [incubare] couver:
covare un malanno
B. v.intr. (aus. avere)(fig.)[stare chiuso, nascosto] couver: il
fuoco cova sotto la cenere
verbal class
Nilda Ruimy
september 2004
verbal class
auxiliary
Types of sense indicators (2)
 indicators conveying inferential information:
 synonyms, hypernyms, meronyms
 domain of use
Italian–French
synonym
CAPO
I (persone)
1 [testa] tête
2 (fig.) [mente, intelligenza] tête
3 [persona investita di comando, di potere] chef
synonym
domain of use
hypernym
II (animali)
1 (raro) -> testa
2 spec. al plur [ciascun individuo di una specie determinata]
têtes, pièces
III (cose)
1 [la parte più grossa e più sporgente di
un oggetto] tête
2 [la parte più alta] haut
3 [ciascuna delle due estremità di qlco.] bout, tête
4 [inizio, principio] début
5 [fine, conclusione; sbocco] bout
6 loc. …..
7 (nei filati) fil
8 [singolo oggetto appartenente ad una serie] pièce
9 (geog.) cap
Nilda Ruimy
synonym
domain of use
september 2004
IT word
SENSE INDICATOR
FR word
capo
(persona che…)
chef
capo
(testa)
tête
aspirare
tr.(con un tubo)
aspirer
aspirare
LING.
aspirer
aspirare
tr. (inalare)
aspirer
aspirare
intr.(avere) prep. a
aspirer à
avvertire
(avvisare)
prévenir
avvertire
(percepire)
sentir
asfalto
(per rivestire)
asphalte
gioielleria
(negozio)
bijouterie
gioielleria
(arte)
bijouterie
CLIPS
…
sense indicators used as search keys for identifying,
in CLIPS, the semantic entry relevant to the IT sense
of the bilingual pair
Nilda Ruimy
september 2004
Using sense indicators
 indicators usable straightforwardly
 sem. type of analizzatore belongs to HUMAN hierarchy
 indicators to be converted into the descriptive
language of CLIPS:
 analizzatore (chi effettua analisi) analyste
(who performs analyses)
 illuminare (rendere luminoso) illuminer
(to make luminous)
 sem. type of iluminare belongs to causative types hierarchy
Nilda Ruimy
september 2004
Rule types
 search for a CLIPS entry containing the s.i. as target
of the synonymic relation
capo  synonym_rel 
of the hypernymic relation
gioielleria
isa_rel 
testa
negozio
of any qualia relation
 search for a CLIPS entry sharing properties with the entry of
the s.i.
comunicare (notificare) isa_rel  dire
shared hypernym
shared semantic type
avvertire (percepire) semtype EXP._EVENT
 search for a CLIPS entry containing information inferred from
the s.i.
specific type
conoscere (pron. (reciprocamente)) reciprocal syn. struct.
specific relation or feature (esp. domain info.)
specific syntactic structure
Nilda Ruimy
september 2004
IT word
SENSE INDICATOR
FR word
capo
(persona che…)
chef
capo
(testa)
tête
aspirare
tr.(con un tubo)
aspirer
aspirare
LING.
aspirer
aspirare
tr. (inalare)
aspirer
aspirare
intr.(avere) prep. a
aspirer à
avvertire
(avvisare)
prévenir
avvertire
(percepire)
sentir
asfalto
(per rivestire)
asphalte
compagnia
(gruppo)
compagnie
compagnia
(presenza)
compagnie
CLIPS
…
SemU3615capo, sem. type=Role, where <capo> isa <persona>
SemU61397capo, sem. type=Body_part, where <capo> synonym <testa>
SemU79372aspirare, sem. type=Speech_act, where domain:phonetics
SemU7040aspirare, sem. type=Modal_event, linked to SynUaspirare, intr. pp_a
SemU68603asfalto, sem. type=Artifact_Material, where <asfalto> used_for <rivestire>
Nilda Ruimy
september 2004
Cognate approach: results
IT constructed words whose
different senses are
translated by a unique FR
constructed word
IT constructed words
having more than one
translation
–aggio
89.9 %
10.1 %
–tà
77.4 %
22.6 %
–zione
80.4 %
19.6 %
recall ratio
FR constructed words sharing
the IT CLIPS entries
–aggio
99.97 %
–tà
99.98 %
–zione
99.98 %
Nilda Ruimy
september 2004
Small
percentage of
errors due to a
different
granularity of
sense
distinctions in
CLIPS and in
the blingual
dictionary
Sense indicator approach: results
Itword – sense indicator –
–
X
rule type
application
order
–
A
1
investigated target of
lex. data syn. rel.
2
Y
3
2
search for an entry of X search for entry of
containing string A
X sharing properties
with an entry of A
1
FRword
9
7
shared
target of target of
hyper. rel. any qualia hypernym
search for an entry of X containing
information inferred from A
8
6
3
5
4
shared
semtype
specific
semtype
specific
domain
specific
feat/rel
specific
syn.struct
the higher the rule rank, the more reliable the result
success
16.6% 26.8% 0.92%
rate
8.9%
5.8%
Nilda Ruimy
3.9%
12.3% 9.2% 15.4%
september 2004
distribution of success rate over
the algorithm rules
recall ratio: 69%
Nilda Ruimy
september 2004
Combining the two methods
successful handling of:
results may be enhanced by
gleaning the most informative
sense indicators from different
sources
 95% of constructed words
+
 69% of non constructed words
constructed words represent
68.2% of the vocabulary
Nilda Ruimy
september 2004
Concluding remarks
Deriving new lexical resources from existing ones: a worthwhile
venture in terms of time and effort
Derived lexicon building process is simplified and shortened
Such practice entails coverage and consistency assessment of
the source lexical resource
Source and derived lexicons constitute a most reliable basis for
developing a bilingual resource
Approaches taken applicable to other language pairs sharing
similarities in terms of morphological structure
Nilda Ruimy
september 2004
Scarica

Semantic type