INFORMATICA UMANISTICA D: LESSICOGRAFIA & COMPUTER Dizionari elettronici WordNet Dizionari elettronici Strumenti informatici usati non piu’ solo per realizzare dizionari cartacei, ma per sviluppare nuovi tipi di dizionari che consentono nuove forme di ricerca DIZIONARI PER L’INGLESE IN FORMA ELETTRONICA Oxford English Dictionary, seconda edizione Oxford Talking Dictionary Concise Oxford Dictionary Learner dictionaries: Longman Dictionary of Contemporary English (LDOCE) Collins COBUILD English Dictionary CONCISE OXFORD DICTIONARY RICERCA: Headword search (con *) Hypertext search Full text search (also of phrases / groups) FILTRI: etymology, phrasal verbs, suffixes COLLINS: COBUILD Disponibile da: http://www.biblio.unitn.it/BancheDati/Banche Dati.asp DIZIONARI ELETTRONICI PER L’ITALIANO Il VELI Zanichelli: CD-ROM Multilingue, Scaffale Elettronico Devoto-Oli Garzanti: IPA `parla’ DEVOTO-OLI ESEMPIO: DEVOTO-OLI Ricerca normale Forme di citazione (incrementale) Hyperlinks Definizione / declinazione Sinonimi / contrari Ricerca avanzata No: pronuncia; citazioni? Limitato: storico DEVOTO-OLI: SINONIMI E CONTRARI ESEMPIO: ZINGARELLI INTERATTIVO MRDS Distinzione importante: Particolarmente utili: dizionari creati per EFL: Dizionari consultabili elettronicamente Dizionari MACHINE READABLE Dizionari MACHINE TRACTABLE LDOCE COBUILD Progetto piu’ ambizioso: ODE in XML ESEMPIO: ODE su CD-ROM (in XML) Esempio di database lessicografico in XML (= estremamente machine tractable) ODE IN XML: OVERVIEW ODE IN XML: FORMATO DELLE ENTRIES <se> <cn>815750</cn> <hg> <hw>stock</hw> </hg> <s1> <ps>noun</ps> <s2 num="1"> <df>the goods or merchandise kept on the premises of a shop or warehouse and available for sale or distribution:</df> <ex>the store has a very low turnover of stock</ex> - - | </S2> <S2 num=“2”> …… </S2> </S1> <s1> <ps>adjective</ps> ….. ODE IN XML: INFORMAZIONI NLP - <nlp> - + </nlp> <sup>merchandise</sup> <ss>Commerce</ss> <morph id="01"> <mu sy="NN"> <inf>stock</inf> <ph>stQk</ph> </mu> <mu sy="NNS"> <ph>stQks</ph> </mu> </morph> ELDIT (Elektronisches Lern(er)wörterbuch Deutsch-Italienisch – Dizionario elettronico per apprendenti italianotedesco ) Un esempio di dizionario Per apprendimento Nato in forma elettronica Lezione su ELDIT: il 14/5 WordNet SEMANTICA & LESSICO: UN RIASSUNTO “eat” “eats” EAT-LEX-1 eat0600 eat0700 “ate” “eaten” WORD-FORMS LEXEMES SENSES L’ORGANIZZAZIONE DEL LESSICO stock0100 STOCK-LEX-1 “stock” STOCK-LEX-2 stock0200 stock0600 stock0700 STOCK-LEX-3 stock0900 stock1000 WORD-FORMS LEXEMES SENSES SINONIMIA cheap0100 “cheap” CHEAP-LEX-1 CHEAP-LEX-2 …. …… cheapXXXX “inexpensive” INEXP-LEX-3 inexp0900 inexpYYYY WORD-FORMS LEXEMES SENSES WORDNET A lexical database created at Princeton Information about a variety of SEMANTICAL RELATIONS Three sub-databases (supported by psychological research as early as (Fillenbaum and Jones, 1965)) Freely available for research from the Princeton site http://www.cogsci.princeton.edu/~wn/ NOUNs VERBS ADJECTIVES and ADVERBS Each database organized around SYNSETS SYNSETS Senses (or `lexicalized concepts’) are represented in WordNet by the set of words that can be used in AT LEAST ONE CONTEXT to express that sense / lexicalized concept: the SYNSET E.g., {chump, fish, fool, gull, mark, patsy, fall guy, sucker, shlemiel, soft touch, mug} (gloss: person who is gullible and easy to take advantage of) IL DATABASE DEI NOMI About 90,000 forms, 116,000 senses Relations: hypernym breakfast -> meal hyponym meal -> lunch has-member faculty -> professor member-of copilot -> crew has-Part table -> leg part-of course -> meal antonym leader -> follower IPERNIMIA 2 senses of robin Sense 1 robin, redbreast, robin redbreast, Old World robin, Erithacus rubecola -(small Old World songbird with a reddish breast) => thrush -- (songbirds characteristically having brownish upper plumage with a spotted breast) => oscine, oscine bird -- (passerine bird having specialized vocal apparatus) => passerine, passeriform bird -(perching birds mostly small and living near the ground with feet having 4 toes arranged to allow for gri pping the perch; most are songbirds; hatchlings are helpless) => bird -- (warm-blooded egglaying vertebrates characterized by feathers and forelimbs modified as wings) => vertebrate, craniate -(animals having a bony or cartilaginous skeleton with a segmented spinal column and a large brain en closed in a skull or cranium) => chordate -(any animal of the phylum Chordata having a notochord or spinal column) => animal, animate being, beast, brute, creature, fauna -(a living organism characterized by voluntary movement) => organism, being -(a living thing that has (or can develop) the ability to act or function independently) => living thing, animate thing -- (a living (or once living) entity) => object, physical object -=> entity, physical thing -- MERONIMIA wn beak –holon Holonyms of noun beak 1 of 3 senses of beak Sense 2 beak, bill, neb, nib PART OF: bird VERBI About 10,000 forms, 20,000 senses Relations between verb meanings: Hypernym Troponym Entails Antonym fly-> travel Walk -> stroll Snore -> sleep Increase -> decrease RELAZIONI TRA SIGNIFICATI VERBALI V1 ENTAILS V2 when Someone V1 (logically) entails Someone V2 - e.g., snore entails sleep TROPONYMY when To do V1 is To do V2 in some manner - e.g., limp is a troponym of walk AGGETTIVI & AVVERBI About 20,000 adjective forms, 30,000 senses 4,000 adverbs, 5600 senses Relations: Antonym Heavy <-> light (adjective) Antonym (adverb) Quickly <-> slowly COME USARLO Online: http://cogsci.princeton.edu/cgi-bin/webwn Scaricatevelo, poi da command line: Get synonyms: Get hypernyms: wn –synsn bank wn –hypen robin (also for adjectives and verbs): get antonyms wn –antsa right I LIMITI DI WORDNET Coverage words not in WordNet slump, crash, bust all synonyms in the WSJ corpus The structure of WordNet Missing information: MERONYMY Context-dependent senses: Crocidolite, spinoff (spin-off) Some information is encoded in complex ways (room, wall, floor) But: MOVING TARGET!! MERONIMIA IN WORDNET: UN ESPERIMENTO 100 bridging descriptions in a mereological relation Ran a script trying to find a direct link in WordNet (1.7) between one of the senses of the BD and one of the senses of any of the previous NPs Results: in only 6 cases there is in WordNet a direct lexical relation between a BD and one of the CFs John looked at the HOUSE. The WALL was crumbling. ARTIFACT IS-A IS-A HOUSING IS-A HOUSE BUILDING IS-A PART-OF HOME ROOM PART-OF WALL PART-OF FLOOR SOLUZIONE: ACQUISIZIONE LESSICALE Parziale (aggiungi informazioni a WordNet, specialmente per domini specialistici) Totale (crei un nuovo lessico a partire da zero) LETTURE Jackson, cap. 6.7 Marello, cap. 5.5 C. Fellbaum. WordNet: An electronic lexical database. MIT Press, 1998 cap. 1