Digital Italian An overview of Italian corpora A linguistic corpus: a body of texts / transcripts collected for linguistic purposes, computerized, representative for the variety studied, balanced, annotated. Annotation Linguistic annotation can be useful or restrictive Extra-linguistic annotation useful for sociolinguistic research Italian corpora General Written Diachronic Specialized Spoken Synchronic General corporaWritten Italian Corpus e lessico di frequenza dell’italiano scritto (COLFIS) Corpus di riferimento dell’italiano scritto / Corpus dinamico dell’italiano scritto (CORIS/CODIS) COLFIS - structure COLFIS (over three and a half million words) Newspapers Il Corriere della Sera La Repubblica La Stampa Economy, news of local interest, society, crime news, internal / external affairs, science, show biz and sports. Periodicals Books Other, arts, science and technology, cars and boats, children and youngsters, home and hobby, women’s magazines, photo love story, general information, society, radio and television, sport, travels and ecology. Other, arts, children, SF, detective and spy stories, hobby and travel, classics, modern narrative, romance, essays, natural and exact sciences, human and social sciences, theatre and poetry. CORIS/CODIS – structure CORIS / CODIS (one hundred million words) Press Fiction Newspaper, periodical, supplement Novels, short stories National, local/ specialist, nonspecialist / connotated, nonconnotated Italian, foreign, for adults, for children, crime, adventure, SF, women literature Legal and Administrati ve Prose Miscella -nea Ephemera Human sciences, natural sciences, physics, experimental sciences Legal, bureaucratic, administrative Books on religion, travel, cookery, hobbies, etc. Letters, leaflets, instruction Books, reviews, scientific, popular history, philosophy, arts, literary criticism, law, economy, biology, etc. Books, reviews Books, reviews Private, public/ Printed form, electronic form Academic Prose General corporaSpoken Italian Lessico di frequenza dell’italiano parlato (LIP) -> Bancadati dell’italiano parlato (BADIP). Archivio delle varietà dell’italiano parlato (AVIP). LABLITA Spoken and written Italian: Corpora e lessici dell’italiano parlato e scritto (CLIPS) CLIPS (the spoken corpus) Radio and television speech Entertainment, informative transmissions, cultural and educational transmissions, commercials. Field recordings Readings Telephone speech Map task dialogues and spot the difference game. Readings by the speakers themselves or by professional dubbing actors. Conversations between a fake tour-operator and three hundred people. Specialized corpora Corpus di italiano televisivo (CIT) La Repubblica CIT – structure CIT Current affairs Studio broadcast. On-field broadcast. Entertain ment (games, talk-show, varieties) Commercials Text Text. Slogans. Sports news Commenta -ries. Playbyplay Studio broadcast Onfield broadcast Text Newscast Headlines. Studio broadcast. On-field broadcast Corpus di italiano televisivo La Repubblica – structure La Repubblica Year 1985 - 2000 Genre News Comment Topic Religion Culture Economics Education News Politics Science Society Sport Weather Unclassified La Repubblica Thank you! Anne-Marie OBRETIN Mres in European Languages and Cultures University of Exeter [email protected]