When GL meets the corpus
A data-driven investigation of semantic
types and coercion phenomena
Elisabetta Jezek1, Alessandro Lenci2
1University
of Pavia - Department of Linguistics
2University
of Pisa - Department of Linguistics
GL Workshop 2007
Paris, 10th May
Research goals and methodology

Corpus evidence as the major device to
explore the semantic type system

in the line of Corpus Pattern Analysis (CPA)
as proposed by Hanks & Pustejovsky
(2005), Pustejovsky et al. (2004)


here we focus on V-obj combinations
Improve our understanding about



the structure of semantic types
the organisation of the type system
how types behave compositionally
Theoretical framework

Generative Lexicon Theory (Pustejovsky
1995, 2006)


the semantic type system is integrated in a
general theory of argument selection
Generative devices in the lexicon


type system generativity (paradigmatic)
 Qualia Structure (SQ) and “dot” operations
create multidimensional lexical types
compositional generativity (syntagmatic)
 compositional operations (coercion, cocompositions, etc.) change and create types in
contexts
Type system in GL
(Pustejovsky 2001, 2006)

Natural types


Artifactual types


only formal and constitutive information
(organized taxonomically)
 lion:animate, rock:concrete, water:liquid
natural types + Telic and/or Agentive quale
 violinist:animate ⊗T play
 beer:liquid ⊗T drink
 knife: concrete ⊗T cut
Complex types (dot types)


composition of types
 libro “book”:physical • information
cf. inherent polysemy
Compositional operations on types
(Pustejovsky 2006)

Key issue - how the type selected by predicate matches
the type of its arguments

Pure Selection – selecting type is directly satisfied by the
argument type

Accomodation – selecting type is inherited by the
argument type

Type Coercion – selecting type does not directly match
the argument type


Exploitation – selecting type corresponds to a portion of the QS
of the argument. A subcomponent of the argument’s type is
accessed to satisfy the predicate requirements
Introduction – selecting type is richer than the type of its
argument. The argument is wrapped with the type required by
the predicate
A map of compositional operations
on types
Domain-preserving operations
SELECTING TYPE
ARGUMENT TYPE
Simple (natural)
Unified (artifactual)
Simple (natural)
selection
exploitation
introduction
Unified (artifactual)
exploitation
selection
introduction
Dot (complex)
exploitation
exploitation
selection

Other compositional operations
 non domain-preserving coercion



e.g. object  event
co-composition
…
Dot (complex)
Corpus evidence and GL type system
The use of corpus analysis raises crucial issues
concerning how to properly map the extracted patterns
onto the GL architecture of the lexicon

Observed evidence

a set of predicative pairs Σ = {σ1, …, σn} extracted from
a corpus


σi = <read-bookobj>, σi = <eat-cakeobj>, etc.
What can we infer from the extracted contexts in Σ
about the semantic type system and the
compositional rules?



what is the type of the argument?
what is the type selected by the predicate?
what is the operation that allowed the predicate and the
argument to compose?
Corpus evidence and GL type system

Distributional Hypothesis



lexical items belonging to the same type are expected to
show similar syntagmatic distributions
differences in combinatorial distributions can be taken as
indicators of differences in type
Incremental data-driven type definition

top-down definition of repository of “shallow types” acting
as a priori constraints on the semantic type system


cf. Brandeis Shallow Ontology (Pustejovsky et al. 2004)
corpus-based definition of fine-grained types
emerging as abstractions over the combinatorial
patterns of lexical items
Brandeis Shallow Ontology
(Pustejovsky et al. 2004)
Corpus evidence and GL compositional
operations

Given GL architecture, we have to assume that each
observed context pair σ has been generated by the
combinations of two different factors




the semantic types of the elements of σ
the semantic operations that drove the composition of σ
If σ represents the observational datum, (1) and (2)
are the two hidden parameters that we have to
discover
Key methodological consequence
Any attempt to get at a data-driven characterization of the
semantic types system can not dispense with a careful analysis
of the compositional operations between types
Corpus processing and data extraction



Pilot experiment performed on a 20 million word
corpus of written Italian
 subset of La Repubblica Corpus (Baroni et al. 2004)
The corpus has been automatically processed with
IDEAL+
 dependency-based parser for Italian (Bartolini et al.
2004)
502,404 V-OBJ pairs (σ) have been automatically
extracted with their frequency in the corpus
Verb-LIBROOBJ oF (book)
leggere (read)
scrivere (write)
presentare (present)
.....
How we proceed

The extracted patterns are used to build lexical sets
(LS) (cf. Hanks & Pustejovsky 2005)




nominal LS – the sets of the most “typical” nouns
occurring as OBJ of a given V
verbal LS – the sets of the most “typical” verbs with
which a given N occurs as OBJ
Typicality is measured by the log-likelihood (Dunning
1993) association score between V and N
LSs are used to investigate two separate but related
issues
 what is the type of an argument?
 what is the particular operation that allows an
argument to compose semantically with a certain
predicate?
Investigating semantic types with LSs


We choose a verb vi that typically
selects for a target type τ
We identify the nominal LS of vi

the set of Ns that co-occur with that verb in
a certain argument position
type τ
Verb vi
Nominal LS of vi
n1
n2
n3
…
Investigating semantic types with LSs
The case of “leggere” (read)

leggere “read”


selective environment prima facie fairly well
characterized in terms of its type
 complex functional type selecting for a complex,
dot-argument as its direct object
!y : phys • info !x : en [leggere( x, y )]
phys ● info

concrete entities that have an informational
content (e.g. book)
&phys • info
$
$
$
&TELIC
QUALIA
=
$
$% AGENTIVE
%
#
!
!
read ( x, y, e1 ) # !
write( z , y, e2 )!" !"
Top 40 nouns in the LS of leggere
Noun
LL value
Noun
LL value
libro “book”
giornale “newspaper”
articolo “article”
lettera “letter”
romanzo “novel”
testo “text”
documento “document”
intervista “interview”
comunicato “communiqué”
dichiarazione “statement”
pagina “page”
sceneggiatura “script”
riga “line”
discorso “speech”
cartella “page”
messaggio “message”
relazione “report”
passo “passage”
resoconto “report”
parola “word”
225,44
174,98
133,28
96,77
76,63
58,34
56,42
52,37
49,23
48,07
47,76
44,17
42,03
41,07
40,64
36,10
35,14
34,60
30,04
29,71
frase “sentence”
sentenza “sentence”
motivazione “justification”
Freud
Financial Times
omelia “sermon”
notizia “news”
saggio “essay”
missiva “missive”
telegramma “telegram”
poesia “poem”
verdetto “verdict”
brano “passage”
nota “note”
opera “work”
Rimbaud
sofisma “sophisma”
Tuttosport
scritta “writing, notice”
telex “telex”
28,75
25,93
23,39
19,96
19,40
16,92
16,14
16,04
15,85
14,97
14,77
14,62
14,62
14,51
14,20
14,19
14,19
14,19
11,75
11,59
How to proceed

From the fact that a N is included in the nominal lexical set
of leggere, we can not simply infer that its type is phys ●
info

leggere has the ability not only to combine by pure selection,
but also to coerce the argument type



type τ
cf. leggere Rimbaud “read Rimbaud”
leggere can itself undergo co-compositions
In order to find out what the type of a N is, we inspect the
verbal LS of N, i.e. the verbs with which N most frequently cooccurs
Verb vi
Nominal LS of vi
n1
n2
n3
…
Verbal LS of n1
v1
v2
…
Verbal LS of n2
v3
v4
…
Top 40 nouns in the LS of leggere
Noun
LL value
Noun
LL value
libro “book”
giornale “newspaper”
articolo “article”
lettera “letter”
romanzo “novel”
testo “text”
documento “document”
intervista “interview”
comunicato “communiqué”
dichiarazione “statement”
pagina “page”
sceneggiatura “script”
riga “line”
discorso “speech”
cartella “page”
messaggio “message”
relazione “report”
passo “passage”
resoconto “report”
parola “word”
225,44
174,98
133,28
96,77
76,63
58,34
56,42
52,37
49,23
48,07
47,76
44,17
42,03
41,07
40,64
36,10
35,14
34,60
30,04
29,71
frase “sentence”
sentenza “sentence”
motivazione “justification”
Freud
Financial Times
omelia “sermon”
notizia “news”
saggio “essay”
missiva “missive”
telegramma “telegram”
poesia “poem”
verdetto “verdict”
brano “passage”
nota “note”
opera “work”
Rimbaud
sofisma “sophisma”
Tuttosport
scritta “writing, notice”
telex “telex”
28,75
25,93
23,39
19,96
19,40
16,92
16,14
16,04
15,85
14,97
14,77
14,62
14,62
14,51
14,20
14,19
14,19
14,19
11,75
11,59
Top 10 verbs in the LS of nouns
selected by leggere
Verbs
LL value
Verbs
LL value
Libro (book)
scrivere (write)
leggere (read)
pubblicare (publish)
presentare (present)
sfogliare (turn pages)
dedicare (dedicate)
riscrivere (rewrite)
tradurre (translate)
ristampare (reprint)
vendere (sell)
369,39
225,44
124,94
66,11
45,98
37,42
25,56
19,82
17,87
17,12
articolo (article)
scrivere “write”
leggere “read”
pubblicare “publish”
inviare “send”
ricevere“get”
abrogare “cancel”
applicare“enforce”
dedicare “dedicate”
approvare “approve”
bocciare “reject”
139,79
133,28
103,38
79,18
50,49
46,73
45,56
44,40
38,07
24,60
romanzo (novel)
scrivere (write)
leggere (read)
pubblicare (publish)
ristampare (reprint)
concepire (conceive)
intitolare (give a title)
Pianificare (plan)
filmare (film)
comprare (buy)
finire (finish)
188,77
76,63
52,11
13,07
11,61
10.26
8,02
6,79
6,76
6,28
testo (text)
pubblicare “publish”
approvare “approve”
votare “vote”
leggere “read”
modificare “modify”
scrivere “write”
redigere “write”
emendare “amend”
preparare “prepare”
diffondere “circulate”
63,13
61,26
59,76
58,34
58,01
55,01
48,06
30,39
25,37
22,79
Top 10 verbs in the LS of nouns
selected by leggere
Verbs
lettera (letter)
inviare “send”
scrivere “write”
ricevere “get”
spedire “send”
leggere “read”
mandare “send”
recapitare “deliver”
consegnare “deliver”
pubblicare “publish”
firmare “sign”
LL value
Verbs
922,22
812,93
122,51
104,99
96,77
94,39
87,28
73,53
57,60
38,14
messaggio (message)
inviare “send”
lanciare “send”
mandare “send”
ricevere “get”
consegnare “deliver”
trasmettere “trasmit”
intercettare “intercept”
leggere “read”
portare “bring”
recapitare “deliver”
LL value
515,77
208,60
149,36
70,27
68,27
52,75
36,72
36,10
24,64
24,13
Generating types (1)
Specifying the phys ● info type


The type phys ● info does not suffice in accounting for
the whole syntagmatic distribution
Differences in syntagmatic distribution can be accounted
for in terms of QS specifications

(1)
QS can be used to generate more fine-grained types
libro “book”, articolo “article”, romanzo “novel”
: phys ● info ⊗ Telic reading_events {read, reread,…}
⊗ Agentive writing_events {write, rewrite…}
⊗ Agentive publishing_events {publish, print, …}
lettera (letter), messaggio (message)
: phys ● info ⊗ Telic reading_events {read, reread,…}
⊗Telic transmission_events {send,
circulate, deliver…}
⊗Agentive writing_events {write, compile, …}
⊗Agentive publishing_events {publish, …}
testo (text), articolo (article)
: phys ● info ⊗ Telic applying_events {apply, enforce,…}
⊗agentive performative_events {approve, vote, …}
Top 10 verbs in the LS of nouns
selected by leggere
Verbs
giornale (newspaper)
leggere (read)
scrivere (write)
stampare (print)
sfogliare (turn the pages)
leggiucchiare (read)
posare (put down)
querelare (bring an action)
rileggere (re-read)
attaccare (attack)
obbligare (force)
discorso (speech)
pronunciare “pronounce”
riprendere “start again”
fare “make”
tenere “give”
leggere “read”
allargare “enlarge”
riaprire “reopen”
ascoltare “listen to”
rivolgere “address”
concludere “conclude”
LL value
174,98
83,46
24,75
20,19
14,64
14,37
14,37
11,51
10,32
9,85
intervista (interview)
rilasciare “give”
concedere “give”
leggere “read”
dare “give”
mandare “send”
pubblicare “publish”
rileggere “reread”
realizzare “make”
raccogliere “collect”
registrare “record”
328,26
54,52
48,35
46,97
41,07
39,16
26,70
24,11
21,85
16,65
dichiarazione (declaration)
rilasciare “give”
913,25
fare “make”
84,04
diffondere “make circulate”
63,68
leggere “read”
48,07
presentare “present”
45,91
firmare “sign”
45,78
sottoscrivere “endorse”
35,29
smentire “deny”
31,34
consegnare “deliver”
28,42
interpretare “interpret”
26,14
294,57
119,14
52,37
23,59
16,51
15,65
15,20
12,25
10,58
9,37
Generating types (2)
Discovering new types
(2)
intervista “interview” discorso “speech”
: event ● info ⊗Agentive speech_events {pronounce,
address, give a speech…}
⊗Telic listening_events {listen, …}
(3)
giornale “newspaper”
: organization ● (phys ● info)
⊗Telic reading_events
{read, ...}
⊗Agentive publishing_events
⊗Telic
{publish, print, …}
agentive_events {edit, attack, ...}
Conclusions so far …

Variations in the verbal lexical sets can
be an indicator of two main facts



difference in QS specification
difference in type
Our assumptions about what the type
of a N is are sensibly confirmed and
reflected by its syntagmatic behavior
Predictions about compositional
behavior of types (1)

Complex Nouns

a Complex Noun will compose either by pure selection,
with a dot selecting predicate, or by exploitation with a
natural or artifactual selecting predicate
SELECTING TYPE (V)
TYPE SELECTED
(N)
Natural
Artifactual
Complex
Natural
selection
introduction
introduction
Artifactual
exploitation
selection
introduction
Complex
exploitation
exploitation
selection

To test this prediction against the corpus data we use
the verbal LSs of the Ns that we assigned to the
following types

phys * info, event *info and organization * (phys * info)
Type coercion: dot exploitation (1)
phys-selecting verbs with phys*info nouns
libro
bruciare “burn”, portare “carry”
articolo
firmare “sign”, spostare “move”
romanzo
portare “carry”
testo
perdere “lose”, firmare “sign”
lettera
imbucare “post”, conservare “keep”, infilare “put”,
distruggere “destroy”, raccogliere “pick up”,
esibire “exhibit”, ritrovare “find again”, perdere
“lose”
messaggio
bruciare “burn”, firmare “sign”, portare “bring”,
conservare “keep”, infilare “put”
giornale
aprire “open”, posare “put down”, distribuire
“distribute”, mostrare “show”, portare “bring”
Type coercion: dot exploitation (2)
info-selecting verbs with phys*info nouns
libro
amare “love”, citare “quote”, studiare “study”
articolo
approvare “approve”, bocciare “reject”, citare
“quote”, votare “vote”, correggere “correct”,
ignorare “ignore”, commentare “comment”,
conoscere “know”
romanzo
tradurre “translate”
testo
approvare “approve”, votare “vote”, conoscere
“know”, analizzare “analyze”, presentare “present”,
discutere “discuss”, citare “quote”, difendere “defend”
spiegare “explain”, controllare “check”
lettera
censurare “censor”, scorrere “scroll”, riassumere
“summmarize”, interpretare “interpret”, esaminare
“examine”, comprendere “understand”, spiegare
“explain”, ricordare “remember”, vedere “see”
messaggio
interpretare “interpret”, citare “quote”, analizzare
“analyze”, capire “understand”, spiegare “explain”,
decifrare “decipher”
Type coercion: dot exploitation (3)
info-selecting verbs with organization*(phys*info)
giornale
criticare “criticize”, censurare “censor”,
commentare “comment”, smentire “deny”
info-selecting verbs with event*info nouns
intervista
commentare “comment”, tradurre “translate”,
citare “quote”, giudicare “judge”, valutare
“evaluate”
discorso
interpretare “interpret”, commentare “comment”,
gradire “like”, contestare “question”,
giudicare “judge”, ripensare “rethink”
dichiarazione smentire “deny”, interpretare “interpret”,
travisare “misrepresent”, valutare “evaluate”,
calibrare “graduate”
Type coercion: dot exploitation (4)
event-selecting verbs with event*info nouns
discorso
riprendere “start again with”, attendere “wait for”,
concludere “conclude”, terminare “finish”,
improvvisare “improvize”, interrompere
“interrupt”, continuare “go on with”, troncare
“cut”, avviare “start with”, completare “complete”,
cominciare “begin”, iniziare “start”, finire “finish”,
proseguire “go on with”, vedere “see”
intervista
ultimare “finish”, iniziare “start”, interrompere
“stop”, vedere “see”, bloccare “stop”, annunciare
“announce”
organization-selecting verbs with org*(phys*info) nouns
giornale
attaccare “attack”, querelare “prosecute”,
danneggiare “damage”, obbligare “force”, dirigere
“direct”, costringere “force”, lasciare “leave”
Asymmetries in dot exploitations (1)
articolo, testo are more info then phys
articolo
phys
firmare “sign”, spostare “move”
info
approvare “approve”, bocciare “reject”, citare “quote”,
votare “vote”, correggere “correct”, ignorare “ignore”,
commentare “comment”, conoscere “know”
testo
phys
info
perdere “lose”, firmare “sign”
approvare “approve”, votare “vote”, conoscere “know”,
analizzare “analyze”, presentare “present”,
revisionare “amend”, discutere “discuss”, censurare “censor”,
citare “quote”, decifrare “decipher”, difendere
“defend”,
spiegare “explain”, controllare “check”
Asymmetries in dot exploitations (2)
articolo, testo are less phys then libro and lettera
libro
phys
info
bruciare “burn”, mandare “send”, portare “carry”
amare “love”, citare “quote”, studiare “study”
lettera
phys
imbucare “post”, conservare “keep”, infilare “put”,
distruggere “destroy”, raccogliere “pick up”, esibire
“exhibit”, ritrovare “find again”, perdere “lose”,
info
riassumere “summmarize”, interpretare “interpret”,
esaminare “examine”, comprendere “understand”,
spiegare “explain”
Introduction (human)
libro
accusare “accuse” ===> the
PERSON who wrote the book
testo
difendere “defend”
lettera
condannare “condemn”
Introductions (phys)
intervista
discorso
leggere “read” mandare “send”,
rileggere “reread”, pubblicare “publish”
leggere “read”
dichiarazione consegnare “deliver”, leggere “read”,
firmare “sign”
Domain-shifting introductions (event)
libro
terminare “finish”, cominciare “start”
romanzo
finire “finish”, cominciare “start”, aprire
“open”
articolo
concludere “conclude”, iniziare “start”,
cominciare “begin”, terminare “finish”,
chiudere “close”
testo
completare “complete”, finire “finish”
lettera
concludere “conclude”, terminare
“finish”, interrompere “interrupt”, finire
“finish”
messaggio
concludere “finish”, cominciare “start”,
finire “finish”
Predictions about compositional
behavior of types (2)

Dot selecting predicate

a dot-selecting predicate will compose either by pure
selection, with a matching dot-argument, or by
introduction, with natural and artifactual arguments
SELECTING TYPE (V)
TYPE SELECTED
(N)
Natural
Artifactual
Complex
Natural
selection
introduction
introduction
Artifactual
exploitation
selection
introduction
Complex
exploitation
exploitation
selection

To test this prediction against the corpus data we use
the nominal LSs of leggere
Dot-selecting predicate: leggere
selection
dot exploitation
leggere un libro “book”, un articolo “article”, un
romanzo “novel”, una lettera “letter”
leggere un giornale “newspaper”
introduction
phys
leggere la trama “plot”, la musica “music”, un
discorso “speech”
info
leggere la mano “hand”, leggere una lapide
“headstone”, un dispositivo “device”, un
contatore “meter”
phys and info leggere l’anima “soul”, gli umori “mood”
Accounting for different senses of
leggere
leggere
leggere una radiografia, (an x-ray), il
grafico (a graph), un sintomo (symptom),
una favola (a tale) …
Corpus evidence helps us to...



confirm or falsify our assumptions
about what the semantic type of a
given N is
refine the representation of QS
empirically test our assumptions about
compositional operations of coercion
and co-composition.
Concluding remarks

Mutual feeding between corpus data and
models of the lexicon



an architecture of the lexicon like GL can provide the
interpretative key for various corpus data
corpus data can help to anchor the study of lexical
dynamics and architecture on empirical evidence
(eventually enriching the model)
Future research


extend the analysis to other syntagmatic relations
(e.g. subj, modifiers, etc.)
extend the analysis to other semantic types
Scarica

When GL meets the corpus