From Lexicon to Text:
Pre-Target structures
Annotation
in a L2 Italian corpus
*Université Paris VIII
°Università di Salerno
Third International Lablita Workshop in Corpus
Linguistics
June 2008
„ Today
we present the first results of a
classification of pre-target structures
collected during a wider research on the
acquisition of second-language Italian
syntax, based on a small test corpus of
23.000 words of written texts.
„ The
main goal of the project was to
compare learners productions with those
of natives as far as the use and frequency
of Noun and Verb Phrases in different type
of texts are concerned (Turco in press;
Voghera 2008; Policarpi, Rombi, Voghera
in press).
Contrasting needs…
Within this context we were looking for a tagging
system :
- which allowed a straightforward comparison
between non-native and native productions;
- which could make the minimum use of ad hoc
categories for the description of L2 texts, so to
make them really comparable with those of
natives;
- which allowed the retrieval of non-native
structures.
„
Consequently, we have adapted a tagging
system that originally was not designed for L2
analysis, AN.ANA.S., supported by XML,
developed within the Tree-bank Project of the
University of Salerno (cfr. www.parlaritaliano.it).
„
Initially what seemed to be a tentative step
turned out to be a challenge: to conceive a
system that could preserve the richness of the
original annotation without leaving out the
specificity of L2 texts.
AN.ANA.S.
1.
2.
3.
AN.ANA.S. (Annotation and Analysis of
Syntax) is a syntactic annotation system based
on a manual approach (Voghera et al. 2004,
2005).
Its main properties are :
alignment of the syntactic annotation with the
signal, so to have a multilevel representation;
focus on intraclausal relations
flexible structures conceived for the annotation
of a wide range of different texts: spoken and
written, dialogues and monologues.
AN.ANA.S. L2 - TAGSET
„
„
We are going to show you AN.ANA.S L2 tagset
and its DTD.
A DTD (Document Type Definition) is a set of
declarative elements which makes use of a
special purpose syntax. The DTD is a part of the
original XML specification that permits to specify
what elements and attributes may be used in a
particular type of XML document and what their
structure may be (W3C: World Wide Web
Consortium).
XGATE
„
„
For the annotation and the analysis of our
corpus we have used XGate application
(Cutugno, D’Anna 2006).
XGate’s main purpose is to make text coding
processing much easier and user-friendly thanks
to the support of XML.
– the Editor function allows us to create XML files and
modify them, once a DTD has been defined.
– The Query function allows to make queries on the
databases in order to get quantitative results.
AN.ANA.S. L2 allows to retrieve
pre-target structures
„ per level of syntactic encoding:
Sentence
Clause
Phrase
„ per level of textual encoding
Text
Paragraph
„
Finally it allows to retrieve lexical deviations
affecting head phrases.
Interlanguage
„
The present research has been developed within the
framework of the functional approach applied to L2
studies (Huebner 1983; Long/Sato 1984; Givón 1984;
Tomlin 1984; Sato 1990; Dittmar 1992; Perdue 1990;
1993 Klein/Perdue 1992; Giacalone Ramat/Crocco
1995) and the perspective of the Interlanguage (IL)
(Selinker, 1972; 1992)
„
IL is considered as a series of grammars developed by
the language learner at different points in the L2acquistion process: “a separate linguistic system based
on the observable output which results from a learner’s
attempted production of a target language norm”
(Selinker, 1972: 214)
„
IL grammar can be «systematic», «permeable»,
«transitional» and «discrete» (Selinker, 1972;
Adjemian, 1976; Selinker, 1992; Perdue 1993,
etc.)
„
L2 acquisition is systematic and, to a large
extent, universal, that is it reflects how cognitive
mechanisms control acquisition, irrespective of
the personal background of learners, their
mother tongue, or the setting in which they learn.
„
ILs at different stages of acquisition/learning
present systematic linguistic features, that can
be described in terms of pre-target structures.
Pre-target Structures (1)
1) Structures perceived as a-grammatical
respect to the target language
ho nato 5 marzo
instead of
(I) have born 5th March
sono nato il 5 marzo
mi ha piaciuto
me has liked
instead of
mi è piaciuto
‘I liked’
me va matto
me goes mad
instead of
vado matto
‘I go mad’
in la casa
[BEGNARR]
with
instead of
nella casa
‘in the house’
Pre-target Structures (2)
2) Structures that well-match with the grammar of the
target language but do not fit in well with the context
and/or do not convey the intended meaning
Sono molto buono
I am very good
instead of
Sto molto bene
I am very well
Vorrei alti gradi all’Università
I’d like high degrees at the University
alti gradi
high degrees
[BEGNARR]
instead of
voti alti
high marks
Un soggiorno ammobiliato e comodo e non più grande.
a living room furnished is(?) comfortable and not more
big.
Una cucina pratica e l’appartamento è non più caro.
a functional kitchen room and the flat is not more
expensive
più grande
more big
instead of
troppo grande
too big
più caro
more expensive
instead of
troppo caro
too expensive
[BEGDESC]
The Corpus
„
We have tagged a test corpus of 23.000 written
words
„
The quantitative analysis we present here is based
on 6.000 words per learning level.
Beginner :
Intermediate :
Advanced :
„
6.179
6.136
6.151
Collection
of
narrative,
argumentative compositions:
home written productions
descriptive
and
„
Subjects: 50 Undergraduate Students
optional language learning courses of Italian chosen by
students as part of combined honour degree (Greenwich
University; London)
„
Different mother tongues (L1s)…
English, Spanish, French, Portuguese, Greek, etc.
„
…and English as a second language
Some Remarks
As we well know, annotation is not a straightforward
process.
1)
a pre-target structure may have different scopes, that
is it may involve different levels of codification. There
are many very clearly defined cases of pre-target
structures as well as there are many other ones where
a structure has necessarily to be tagged at more than
one level.
ex. il giardino dovere Ø grande e balla con molta fiori
the garden must-INF be-Ø big and dance3PERS/SING PRES SIMP with many-FEM/SING
flowers-FEM/PLUR
[BEGNARR]
2) We believe that there is not just one grammar of the
target language. We have kept away from a tempting
prescriptive approach typically based on discrete and decontextualized features of language.
Besides, as far as the Italian is concerned, we must take
into account the deep diatopic differences that can
interfere with the learning process (Dal Negro/Molinelli
2002; Lepshy 2005).
ex. Ci stavano delle differenze
There-EXIST
stay-3PERS-PLUR/IMPERF
differences
ci stavano
there stayed
instead of (???)
[ADVNARR]
some
c’erano
there were
Levels of pre-target structures
TEXT LEVEL
Incipit of a letter
Tanti saluti da Pheonix.
Many regards from Phoenix
Visitavamo nostri amici per una settimana in albergo.
We used to visit the-Ø our friends for a week at the-Ø hotel
[BEGNARR]
PARAGRAPH LEVEL
In la casa le camere dovere grande e splendidamente
decorate con i bagni. Che chiamo un bellissimo e
speciale casa abitare in*.
in the house the rooms must BE-Ø big and splendidly
decorated with the bathrooms. That (I) call a
beautiful-MASC and special house-FEM to live in.
*instead of
quello che io considero essere una casa bellissima e speciale
in cui
abitare
what I consider to be a beautiful and special house to live in
[BEGDESC]
SENTENCE LEVEL
Non potrei nuotare e la mia mamma ha
deserta insegnarla.
(I) could not swim and the my mother has
deserted*-FEM teach her (PRON-COREF of
“swim”)
*instead of
e la mia mamma ha desiderato insegnarmelo (?)
and my mother wished to teach it to me (?)
[BEGNARR]
CLAUSE LEVEL
l giorni più belle è mia sorella giornata del matrimonio*
the-MASC/SING most beautiful-FEM/PLUR day-MASC/PLUR is my sister
day of the wedding
*instead of
la giornata del matrimonio di mia sorella
my sister’s wedding day
[BEGDESC]
Tutta la classe hanno mostrato i loro piatti e ha giudicato dalla mia preside
The whole class have shown their dishes and has judged by my headmaster
*instead of
…è (stata) giudicata dalla mia preside
…it is/has been judged by my head-master
[INTNARR]
PHRASE LEVEL
la spiaggia più bello del mondo
The-FEM/SING
most
beautiful-MASC/SING
FEM/SING in the world
beach-
[BEGNARR]
Sono scrittura voi dall’Italia*
(I) am writing-NOUN you-PERS/PRON/PLUR from Italy
*instead of
Vi sto scrivendo dall’Italia
I’m writing to you from Italy
[BEGNARR]
LEXICON
Il sole sta lucidando* e fa caldo
The sun is polishing and it is warm
[BEGNARR]
*instead of BRILLARE = TO BRIGHT
Ho picchiato una macchina parcheggiata
(I) beat* -1PERS/SING PRES PERF a
parked car down
[BEGNARR]
*instead of TAMPONARE (?)= TO HIT
What data and what for?
ƒ
We present here the first quantitative data of
the pre-target structures per linguistic level and
learning level.
ƒ
Since we have been working on a test corpus,
we do not claim to give statistical significance
to our data. We rather show a trend analysis
concerning the relative distribution of different
pre-target structures.
How many pre-target
structures?
„ We
found 1.271 pre-target
structures
which
have
the
following distribution across the
three learning levels
BEG
INT
17%
ADV
51%
32%
Pre-Target Structures per Learning Level
2%
11%
12%
TEXT
PARAGRAPH
SENTENCE
CLAUSE
PHRASE
51%
16%
Pre-Target Structures per Linguistic Level
Pre-Target Structures
per Linguistic Level & per Learning Levels
Even if the advanced learners present the lowest
distribution of pre-target structures, the types are
basically the same.
TEXT
PARAGRAPH
SENTENCE
CLAUSE
PHRASE
60
50
BEG
40
INT
30
20
10
0
ADV
LEXICON
„
Pre-target structures
affect lexical level in
9% of cases: as
expected,
the
number
in
the
advanced level is
nearly half than in
the beginner level,
although there is not
a linear progression
from beginners to
advanced learners.
12,00%
10,00%
8,00%
6,00%
LEXICON
4,00%
2,00%
0,00%
BEG
INT
ADV
„ We
can notice that across the three levels
pre-target
structures
are
equally
distributed : since the beginning learners
seem to have a relatively good control of
the highest level of textual and syntactic
planning, i.e. sentential and clausal level.
„ On
the contrary all learners seem not to
perform so well at a phrase level.
„ This
supports results from other
studies on other languages (i.e.
English – Kroll 1990) where it has
been found that learners may exhibit
varying degrees of control on writing.
“We cannot predict students’ ability to
perform in one area on the basis of
their performance in the other area”
(Kroll 1990: 150).
Hypotheses
H1: We have less pre-target structures at
a
textual
level
because
textual
competence is related to higher education
and strictly dependent on the type of
writing assessment.
Learners
were
homogenous.
relatively
culturally
„
Considering the type of text and the
learners we have taken into account, text
level can be seen as the less “marked”
learning level for
Cultural reason: common literacy tradition
Linguistic reason: common textual-literally
tradition and typological proximity
3. Psychological reason: transferability and
learner’s perception
1.
2.
– (Eckman, 1977, 1985; Kellerman 1979)
Hypotheses
H2: In Italian it is at the Phrase level that
most of the choices related to grammatical
categories must be made: gender,
number, definiteness, case or preposition
choice…
… all this unavoidably leads to the
production of a higher number of
deviations from the target structures
„ Our
data encourage the idea that, at the
first stages, the linguistic learning goes
from the top planning levels to the bottom
ones. This means that learners take
advantage of the textual frame so to offset
the deficits that eventually affect lower
levels (i.e. phrase).
„ An
ill-structured
phrase
receive
significance from a well-formed textual
structure. Let’s look at the following
example…
Example
Abbiamo parlati tutto il giorno e la notte, e da
allora, noi amore l’altro* e ci sposiamo.
We have spoken-PLUR all day and all night long, and
since then, we love-NOUN the other and we get
married
*instead of
ci amiamo
we love each other
[INTNARR]
„ On
the contrary a well-formed phrase
looses significance if inserted in a illformed textual structure.
Incipit of a letter
Tanti saluti da Pheonix.
Many regards from Phoenix
Visitavamo nostri amici per una settimana in
albergo.
We used to visit the-Ø our friends for a week at the-Ø hotel
[BEGNARR]
Final remarks and comments
„
As it often happens a research opens to new
questions and insights:
as far as the annotation is concerned,
AN.ANA.S. L2 has done a pretty good job.
However, we need to formalize the possibility
to get a structure tagged at more than one
level. If this becomes feasible, we will be able
to distinguish deviant structures with a local
and/or a global scope.
Local and global are not mutually exclusive.
„
From a linguistic viewpoint, even if we know that
both top-down and bottom-up strategies are at
work in language learning (among others
Selinker et al., 2004) it could be interesting to
explore this top-down frequency pattern of pretarget structure:
– by analyzing pre-target structures across different
text types;
– by passing from a test corpus to a larger corpus;
– by comparing spoken data with written data;
– by comparing the acquisitional stages in L2 speaking
and writing (i.e. L2 Italian oral descriptions by
Progetto Pavia)
Last but not least
„ Perhaps
we could have better entitled
our contribution as :
From Text to Lexicon
rather than
From Lexicon to Text
Bibliography
Adjemian,
Adjemian, C. (1976). “On the nature of interlanguage systems”
systems”. Language Learning, 26,(2), 297297-320
Cutugno,
Cutugno, D’Anna (2006) “Limiti e complessità
complessità del recupero delle informazioni da treetree-bank sintattiche”
sintattiche”. Atti del convegno della SLI, Vercelli
settembre 2006.
Dal Negro, S./Molinelli, P. (2002) Comunicare nella torre di Babele
Babele Comunicare nella torre di Babele. Repertori plurilingui in Italia
Italia oggi. Roma
Carocci.
Carocci.
Dittmar,
249–257.
Dittmar, N. (1992) “Grammaticalization in second language acquisition”
acquisition”. Studies in Second. Language Acquisition 14, 249–
Eckmann,
Eckmann, S. “Markdness and contrastive analysis hypothesis”
hypothesis”. In Language Learning, 27, 1977: 315315-330.
Eckmann,
Eckmann, S. “Some theoretical and pedagogical implications of the markdness differential hypothesis”
hypothesis”. Studies in second language
acquisition, 7, 1985: 289289-307.
Huebner, T. (1983) A Longitudinal Analysis of The Acquisition of English. Ann Arbor, MI: Karoma.
Karoma.
Kellerman,
Kellerman, E. 1979 “Transfer and nonnon-transfer: where we are now. Studies in Second Language Acquisition 2, 3737-57.
Klein, W., Perdue,
Perdue, C. (1992) Utterance structure. Developing grammars again. Amsterdam, Benjamins
Kroll, B., (1990) Second Language Writing: Research and Insights for the Classroom. Cambridge: Cambridge University Press
Givòn,
Givòn, T. (1984) On Understanding Grammar. New York: New York Academic Press.
Lepshy,
Lepshy, G. (2005) “Lo standard”
standard”. Lepshy,
Lepshy, A.L./Tamponi
A.L./Tamponi A.R. a cura di In Prospettive dell’
dell’italiano come lingua straniera. Perugia Guerra : 151521.
Long, M. H., & Sato, C. J. (1984). “Methodological issues in interlanguage studies:
studies: an interactionist perspective”
perspective”. In Davies,
Davies, A., Criper,
Criper, C., &
Howatt,
.), Interlanguage (pp. 253Howatt, A. P. R. (eds
(eds.),
253-80). Edinburgh: Edinburgh University Press.
Perdue,
Perdue, C. (1990) “Complexification of the simple clause in the narrative discourse of adult language learners”
learners” . Linguistics 28, 983–
983–1009
Perdue, C. (1993) Adult language acquisition: crosscross-linguistic perspectives. Cambridge, Cambridge University Press.
Policarpi,
Policarpi, Rombi,
Rombi, Voghera in press, Classi lessicali e strategie sintattiche:
sintattiche: nomi e verbi in sincronia e diacronia,
diacronia, acettato al Congresso SILFI
2008.
Sato, C. J. (1990) “Origins of complex syntax in interlanguage development”
371-95
development”. Studies in Second Language Acquisition 10: 371Selinker,
Selinker, L. (1972). “Interlanguage”. International Review of Applied Linguistics, 10, 209209-31.
Selinker,
Selinker, L. (1992). Rediscovering interlanguage. New York: Longman.
Longman.
Selinker,
Selinker, L. et al. Linguistic structure with processing in second language research: is « unified theory » possible?. In Second Language
Research n°
n°20, 2004: 7777-94.
Tomlin,
Tomlin, R.S. 1984. “The treatment of foregroundforeground-background in the onon-line descriptive discourse of second language learners”
learners”. Studies in
Second Language Research, 9, 4949-83.
Turco,
Turco, G. (in press) “Complessità
Complessità sintattica nell’
nell’italiano scritto L2”
L2”
Voghera, M., Cutugno, F. 2004, AN.ANA.S.:
AN.ANA.S.: Analisi sintattica e annotazione XML a contatto, in Albano Leoni
Leoni F., Cutugno F., Pettorino M.,
Savy R. (a cura di), Il parlato italiano, Atti del Convegno Nazionale,
Nazionale, D'Auria Editore, Napoli, M03
Voghera, M, Basile, G., Cutugno, F. Fiorentino, G. 2005, Sintassi
Sintassi in AN.ANA.S.,
AN.ANA.S., in Albano Leoni F., Giordano R. (a cura di), Italiano Parlato.
Parlato.
Analisi di un dialogo, Liguori,
Liguori, Napoli, 187187-209
Voghera,
Voghera, M. (2008), La grammatica nei testi.
testi. In A.L. Lepschy &A.Ledgway (eds.), Didattica della lingua italiana:
italiana: testo e contesto.
contesto.
W3C: World Wide Web Consortium, http://www.w3.org/TR/REC
http://www.w3.org/TR/REC--xml (accessed 27 Mai, 2008)
www.parlaritaliano.it
Errori a livello di sintagma
„
„
„
Esempi
Errori che riguardo il lessema testa o proprietà
del lessema testa
NP -> la bella colore
VP-> ha <deserta> insegnarla = desiderato
errori di valenza…
PP
Errori che riguarda l’accordo e/o l’ordine delle
parole interno al sintagma
Errori a livello di clausola
„
„
Reggenza del nesso subordinante: in modo da
potrei invece di in modo da poter
In-between phrases
– Per proteggere la macchina della pioggia
„
Valenza verbale :
– Ho voglia un campo di calcio (int. Voglio un campo di
calcio – trasp. nominale)
– Non piace essere disordinata
– [not it likes to be messy ]
– La Ø più importante è che ….
ƒ [the- FEM. most important Ø is that …]
Errori a livello di sentence
„
In-between
clauses:
subordinators/coordinators
Lack
of
– Es.: In la casa le camere dovere grande e
–
–
–
splendidamente decorate con i bagni. Che chiamo un
bellissimo e speciale casa abitare in. (transl. a
special house to live in ?)
Es. Il giorno più bello della mia vita è il giorno <0> ho
incontrato il mo ragazzo
Vorrei un giardino <in modo da potrei> godere più
meglio : + INF
Gradisco fare il giardinaggio <in modo da mi
assicurerò> che ho lotti dei fiori in esso <0> <da
potrei godere> più meglio la sera
Errori a livello di paragraph
Errori a livello di testo
„ Copiare p. 11
???
„ Questi errori possono essere ulteriormente
classificati nelle categorie tradizionali
usate da Ellis di….
„ Qualche esempio
APPENDICE
„ The pre-target structures can be deviant
structures which have different scope, i.e.
must be considered deviant at local or
global level.
Local vs. Global
„
Usually a deviant structure is considered to have a local
scope when…. + esempio
– Ho nato 5 marzo 1983
– [ (I) have born 5th March 1983]
Usually a deviant structure is considered to have a
global scope when…. + esempio
But in the real linguistic parole
Local and global are not mutually exclusive
esempio di locale che si riflette a livello globale.
„ Una stessa struttura pre-target may
involve different levels of linguistic
codification: from lexicon to text.
„ Esempio
Testo etichettato xgate
qualche commento su ricorsioni e
livelli di dipendenza….
SCARTI
– Ex. Abbiamo bevuto molte tutti il tempo, era
molto brilliante. Abbiamo le fotografie molte.
[BEGNARR]
– We have drunk a lot for a long time, it was
very brilliant. We have the photosMASC/PLUR much-FEM/PLUR
Remarks
„
Doubt 1: pre-target involves more than
one linguistic level
EX. Vorrei avere Ø mio ufficio nella mia
casa, in modo da potrei studiare*.
[(I) like-1PERS/SING/COND to have my
office in my house so to (I) can1PERS/SING/COND study] ???
*in modo da poter studiare /in modo che
possa studiare
[BEGDESC]
Doubt 2: In some cases which linguistic level to tag
as WF=F?
Ex. ho deciso di andare al Casino giocare e
scommettere al Roulette
[(I) decide-PRES PERF go-INF to the Casino
play-INF and beg on the Roulette]
[INTNARR]
Remarks
„ Doubt 3: it is about whether a case should
be tagged or not as well-formed (WF=T/F)
Abbiamo bevuto molte tutti il tempo, era molto
brilliante. Abbiamo le fotografie molte.
We have drunk a lot for a long time, it was
very brilliant. We have the photosMASC/PLUR much-FEM/PLUR
[BEGNARR]
Remarks
„ Nonetheless,
whenever feasible, we try to
save as much as possible the correctness
of the clause.
„ The pre-target structures can be deviant
structures which have different scope, i.e.
must be considered deviant at local or
global level.
Local vs. Global
„
Usually a deviant structure is considered to have a local
scope when…. + esempio
– Ho nato 5 marzo 1983
– [ (I) have born 5th March 1983]
Usually a deviant structure is considered to have a
global scope when…. + esempio
But in the real linguistic parole
Local and global are not mutually exclusive
esempio di locale che si riflette a livello globale.
Scarica

pre-target structures annotation in a L2 Italian corpus