Knowledge engineering techniques for the creation of a semantic digital edition of Saussure's manuscripts Gilles Falquet, Luka Nerima, Massimo Brero 1. Storage, visualization, annotation, transcriptions of manuscripts from Ferdinand de Saussure 2. Digital scholarly publishing of manuscripts } Université de Genève - CUI Fribourg Workshop – 27.02.2014 2 A system for the visualization, annotation, and transcription of manuscripts from Ferdinand de Saussure Swiss linguist (1857 – 1913 Famous for modern linguistics structuralism Cours de linguistique générale very few publications in his lifetime but 15'000 sheets of paper given to libraries (Harvard, Paris, Geneva) Université de Genève - CUI Fribourg Workshop – 27.02.2014 3 Aims of the project A usable tool for researchers 1. 2. 3. Visualization Annotation Transcription Université de Genève - CUI of manuscrits from F. de Saussure Fribourg Workshop – 27.02.2014 4 Typical (human) task: Reconstructing the reading order Université de Genève - CUI Fribourg Workshop – 27.02.2014 5 Main concepts Transcription Transcription element Annotation zone Transcription element zone Pictures Covered surface Université de Genève - CUI Writing surface Fribourg Workshop – 27.02.2014 6 Data/Knowledge Model Represent • basic metadata about manuscripts • • • • location, date, image file, ... (scientific) transcriptions annotations semantic annotations Available on the semantic web • • expressed in RDF/S stored in a RDF triple store Université de Genève - CUI Fribourg Workshop – 27.02.2014 7 From classification numbers to URIs Semantic web => universal identification (URI) • library classification number → URI Example (BGE) • Cote : • Nom de fichier : Ms. fr. 3951/10, f. 28 ms_fr_03951_10_f028v_029.tif URI : • x:ms_fr • x:ms_fr_03951 • x:ms_fr_03951_10 • x:ms_fr_03951_10_f028v_029 • x:ms_fr_03951_10_f028v_029-DOT-jp2 • x:ms_fr_03951_10_f028v_029_Z_001 • x:ms_fr_03951_10_f028v_029_Z_001_annot_001 • x:ms_fr_03951_10_f028v_029_Z_001_Shape_001 Université de Genève - CUI 27.02.14 8 Data Model Université de Genève - CUI Fribourg Workshop – 27.02.2014 9 System / User Interface Université de Genève - CUI Fribourg Workshop – 27.02.2014 10 Manuscript visualization Université de Genève - CUI 27.02.14 11 Manuscript visualization Université de Genève - CUI 27.02.14 12 Manuscript visualization Université de Genève - CUI 27.02.14 13 Manuscript visualization Université de Genève - CUI 27.02.14 14 Manuscript visualization Université de Genève - CUI 27.02.14 15 IIP Image server } Tiles Université de Genève - CUI 27.02.14 16 Creating Annotations (texts or concepts) Université de Genève - CUI Fribourg Workshop – 27.02.2014 17 Navigation in the corpus Université de Genève - CUI 27.02.14 18 Navigation in the corpus Université de Genève - CUI 27.02.14 19 Full text search Université de Genève - CUI 27.02.14 20 System Architecture Image import Web Server/ Front end (REST) Université de Genève - CUI Back end Storage control (updates, authentification) 21 Example: Inserting a new annotation } Insert request sent to the RDF server Université de Genève - CUI 27.02.14 22 Usability Testing Methodology • 14 users (linguists, librarians, ...) • 13 tasks (4 scenarios) • find a manuscrit, create an annotation, ... • Measurements: • #completed tasks • time to complete each task • user satisfaction ¨ System Usability Scale (SUS) questionaire Université de Genève - CUI 27.02.14 23 Results Task completion by task 100% 85% Task completion by user Université de Genève - CUI 50% 27.02.14 24 Satisfaction evaluation SUS scores (by question) SUS scores (by user) 68 Université de Genève - CUI Fribourg Workshop - 27.02.14 25 Demo Site fds.unige.ch/iipmooviewer/homepage.php Université de Genève - CUI 27.02.14 26 Digital scholarly publishing of manuscripts a knowledge representation and management model ... and a system for the digital edition of large corpora of original works Université de Genève - CUI 27.02.14 27 Context and goals Digital Critical Edition – current state • based on paper critical edition • DCE of Nietzsche, Peirce, Wittgenstein • other obstacles: • no scientific catalogue Digital edition of Saussure’s manuscripts project • to provide a cooperative edition platform for the next 20 years • to use computers as convergence and mediation tools • the scientific catalogue and the critical edition will be the outputs Université de Genève - CUI Fribourg Workshop – 27.02.2014 28 Digital editions as knowledge networks Transcriptions terminologies Manuscripts ontologies Articles/Monographs Université de Genève - CUI Fribourg Workshop – 27.02.2014 29 Digital editions as knowledge networks Transcriptions Semantic indexes terminologies Manuscripts ontologies Alignment Articles/Monographs Université de Genève - CUI Fribourg Workshop – 27.02.2014 30 Digital editions as knowledge networks Transcriptions Semantic indexes terminologies Inferred relations Manuscripts ontologies Alignment Articles/Monographs Université de Genève - CUI Fribourg Workshop – 27.02.2014 31 Knowledge modeling challenge To represent the current state of our knowledge about the manuscripts different types of resources • direct transcriptions • scholarly transcriptions • related terminologies, ontologies, dictionaries • annotations • ... and resource interconnections • • semantic indexes text alignments / ontology alignments Université de Genève - CUI 27.02.14 32 Operations Transcriptions MLU extraction multiword lexical units Handwriting recognition Semantic indexing Manuscripts ontologies Ontology Alignment Articles/Monographs Université de Genève - CUI Fribourg Workshop – 27.02.2014 33 Operations alignment operations: Finding correspondences between elements of different resources, aligning ontologies, aligning texts at the sentence or term level. enrichment operations: Create new resources that describe an existing one, add transcriptions to manuscript pictures, extract collocations from texts, create a semantic index. } Specific to each type of resource } Based on OCR, NLP, AI algorithms Challenge: define a minimal and expressive set of operations Université de Genève - CUI Fribourg Workshop – 27.02.2014 34 System/Workbench for linguists/knowledge engineers } } } Transcription acquisition • crowdsourcing Indexing • word spotting, handwriting recognition ? Knowledge network operations • NLP techniques for multiword lexical unit extraction • terminology extraction • semantic indexing • resource alignment (existing ontologies, terminologies, ...) • • define operation workflows define virtual (hyper) document generation Université de Genève - CUI Fribourg Workshop – 27.02.2014 35 Thank you Questions ? Université de Genève - CUI Fribourg Workshop – 27.02.2014 36