Networked Knowledge Organization Systems and Information Discovery Douglas Tudhope 5th ISKO-Italy 2011, Venice This presentation • Overview NKOS activities • Examples from recent NKOS related research at Glamorgan on cross search of different archaeological datasets and reports - STAR and STELLAR projects • Discuss issues of KOS - Ontology connections as part of this Acknowledgements Research team members and collaborators – – Ceri Binding (University of Glamorgan) Andreas Vlachidis (University of Glamorgan) – Keith May, English Heritage (EH) – Stuart Jeffrey, Julian Richards, Archaeology Data Service (ADS) Archaeology Department, University of York NKOS: Networked Knowledge Organization Systems/Services Informal network for enabling knowledge organization systems (KOS), such as classification systems, thesauri, gazetteers, and ontologies, as networked interactive information services to support the description and retrieval of diverse information resources through the Internet – Listserv hosted by OCLC – NKOS website http://nkos.slis.kent.edu/ NKOS: Networked Knowledge Organization Systems/Services Two ongoing series of NKOS workshops 9 JCDL (and CENDI) Conference workshops in USA origin 1997 workshop at ACM Digital Libraries Conference 9th Joint NKOS/CENDI workshop 2009 9 ECDL Conferences in Europe 9th European NKOS workshop 2010 plus Dublin Core NKOS sessions 2005, 2008, 2010 – Special issues in JoDI (2001, 2004), NRHM (2006, issue 1) – JISC Reviews on Terminology Services 2006 and Terminology Registries 2009 – See details on NKOS website http://nkos.slis.kent.edu/ – ECDL NKOS workshops http://hypermedia.research.glam.ac.uk/kos/nkos/ Longstanding agenda: KOS integration into DL services from Linda Hill 2002 Research Agenda KOS/DL Taxonomy of KOS - KOS types linked to DL service protocols Registries of KOS and KOS-level metadata to represent them XML/RDF KOS representations - customisable Core set of relationship types across all KOS General KOS service protocol (terminology services) from which protocols for specific types of KOS can be derived Robust linking model in which DL entities (collections, objects, and services) can refer to KOS entities (concepts, labels, and relationships) Visualization tools that fully use and display the rich semantics embedded in KOS Still relevant to new trends in semantic web, linked data, registries, tagging NKOS: Forthcoming / Ongoing – DCMI/NKOS Task Group to develop Dublin Core Application Profile for KOS resources – Workshop at DC 2010 Pittsburgh Activities include – Develop a functional requirements specification – Develop a simple domain model – Develop metadata terms for KOS – Develop corresponding Dublin Core application profile – Revise and finalize KOS Type vocabulary – Task Group official webpage: http://dublincore.org/groups/nkos/ Working wiki: http://www.metadataetc.org/wiki/dcmi-nkos/doku.php NKOS: Forthcoming / Ongoing • Special NKOS session at ISKO-UK 2011 conference: "What role can KOS play in information retrieval applications?” Session 4 at 2nd biennial conference of UK Chapter of ISKO 4-5th July, London http://www.iskouk.org/conf2011/programme.htm • Forthcoming Special KOS issue of ASIST Bulletin NKOS: Forthcoming / Ongoing – 10th European NKOS workshop at TPDL 2011 Berlin 28 (pm) and 29 (am) September 2011 – CFP expected mid April - general topics at http://www.comp.glam.ac.uk/pages/research/hypermedia/nkos/nkos2011/ Initially suggested topics include (related to agenda of ISO 25964 Part 2 standard ) • Relation between KOS and formal ontologies Relationship of domain thesauri to upper ontologies such as CIDOC CRM • From KOS to formal ontologies and back? Repurposing and reengineering of KOS (for other usage scenarios than indexing) and also from “enriched” ontologies to the originating and contributing KOS. also • Management and integration of multiple vocabulary types • SKOS extensions – for mapping and vocabulary integration, additional KOS types, etc. • Library Linked Data: Linking KOS data on the web. Information Discovery • Literal string match (eg Google) is good for some kinds of searches: specific concrete topics where all we want are some relevant results - not care how many we miss! • Google less good at more conceptual (re)search topics where important to be sure not missed anything important eg medical, legal, scholarly research ------------• Searching data and documents a recent general research focus variously termed ... eScience, Digital Humanities, Cyberinfrastructure - data.gov.uk a recent initiative for government data Words are tricky! "When I use a word," Humpty Dumpty said in rather a scornful tone, "it means just what I choose it to mean--neither more nor less." (Lewis Carroll) • Various potential problems with literal string search • • Different words mean same thing Same word means different things • Trivial spelling differences can affect results or a particular choice of synonym or a slightly different perspective in choice of concept - How to address this issue? NKOS • Bridging some aspects of Information Science and Semantic Web • Part of a general move towards a (more) machine understandable Web Machine readable vs machine understandable What we say to the machine: <h1>The Cat in the Hat</h1> <ul> <li>ISBN: 0007158440</li> <li>Author: Dr. Seuss</li> <li>Publisher: Collins</li> </ul> What the machine understands: <<h1>asd plu bg ith mys</h1> <ul> <li>jvfr: 0007158440</li> <li>vuyrok: Dr. Seuss</li> <li>Publisher: Collins</li> </ul> (More) machine understandable What we say to the machine: <h1>Title:The Cat in the Hat</h1> <ul> <li>ISBN: 0007158440</li> <li>Author: Dr. Seuss</li> <li>Publisher: Collins</li> </ul> What the machine understands: <<h1>asd plu bg ith mys</h1> <ul> <li>jvfr: 0007158440</li> <li>vuyrok: Dr. Seuss</li> <li>Publisher: Collins</li> </ul> (More) machine understandable Book ID What we say to the machine: Author Publisher <h1>Title:The Cat in the--------------Hat</h1> <ul> conceptual structure <li>ISBN: 0007158440</li> (ontology) <li>Author: Dr. Seuss</li> <li>Publisher: Collins</li> </ul> What the machine understands: <<h1>asd plu bg ith mys</h1> <ul> <li>jvfr: 0007158440</li> <li>vuyrok: Dr. Seuss</li> <li>Publisher: Collins</li> </ul> (More) machine understandable Book ID What we say to the machine: Author Publisher <h1>Title:The Cat in the--------------Hat</h1> <ul> conceptual structure <li>ISBN: 0007158440</li> (ontology) <li>Author: Dr. Seuss</li> --------------<li>Publisher: Collins</li> Theodor vocabularies for </ul> Geisel terminology and knowledge What the machine understands: organization <<h1>asd plu bg ith mys</h1> <ul> <li>jvfr: 0007158440</li> <li>vuyrok: Dr. Seuss</li> <li>Publisher: Collins</li> </ul> Knowledge Organization Systems • Knowledge Organization Systems eg classifications, thesauri and ontologies help semantic interoperability • Reduce ambiguity by defining terms and providing synonyms • Organise concepts via semantic relationships Knowledge Organization Systems • Knowledge Organization Systems - classifications, thesauri and ontologies help semantic interoperability • Reduce ambiguity by defining terms and providing synonyms Organise concepts via semantic relationships EH Monuments Type Thesaurus Knowledge Organization Systems • Knowledge Organization Systems - classifications, thesauri and ontologies help semantic interoperability • Reduce ambiguity by defining terms and providing synonyms Organise concepts via semantic relationships EH Monuments Type Thesaurus Knowledge Organization Systems • Knowledge Organization Systems - classifications, thesauri and ontologies help semantic interoperability • Concept expansion of Rubbish Pit (as tag cloud or as ranked list) using STAR semantic services http://hypermedia.research.glam.ac.uk/resources/terminology/ Eg Midden (refuse heap) useful alternative search term to Rubbish EH Monuments Type Thesaurus STAR Semantic Terminology Services - concept expansion (as web service) midden STAR Semantic Technologies for Archaeological Resources • AHRC funded project(s) with English Heritage and the ADS • Currently excavation datasets isolated Different datasets with different structures and vocabularies • Currently no connection with grey literature excavation reports ADS OASIS Grey Literature Library (unpublished reports) Online AccesS to the Index of archaeological investigationS Aim: • Cross search at a conceptual level archaeological datasets with associated grey literature • http://hypermedia.research.glam.ac.uk/kos/STELLAR/ STAR Semantic Technologies for Archaeological Resources • Need for integrating conceptual framework and terminology control via thesauri and glossaries • EH (Keith May) designed an ontology describing the archaeological process The archaeological process Events in the past have results in the present • Events in the present and events in the past, related by the place in which they occur and the physical remains in that place • Activities in the present investigate the remains of the past (affecting them in the process) Broader conceptual framework (ontology) EH extension of CIDOC Conceptual Reference Model (CRM) explicit modelling of archaeological events – complicated! CRM is event-based and chains of relationships connect major entities STELLAR 12 month AHRC funded project Hypermedia Research Unit, University of Glamorgan Archaeology Data Service, University of York – English Heritage Centre for Archaeology, Portsmouth Builds on previous 3 year AHRC funded STAR Project http://hypermedia.research.glam.ac.uk/kos/STELLAR/ STELLAR aims • Make it easier to map and extract datasets to CIDOC CRM in a consistent manner • Generalise the data extraction tools produced by STAR so third party data providers can use them • Develop guidelines for mapping and extraction of archaeological datasets into RDF/XML conforming to CIDOC CRM-EH ontology • Develop guidelines and tools for generating corresponding Linked Data STELLAR background • In practice mapping to CRM has tended to require specialist knowledge of the ontology and been resource intensive • Given the wide scope of the CRM, it is possible to make multiple valid mappings depending on the intended purpose and focus of the mappings • STELLAR tools convert archaeological data to CRM/RDF in a consistent manner, without requiring detailed knowledge of the underlying ontology • User chooses a template for a particular data pattern and supplies the corresponding input from their database • STELLAR templates for – CRM-EH archaeological extension to the CIDOC CRM – Some more general CIDOC CRM templates conforming to the CLAROS Project format – SKOSifying a glossary/thesaurus connected with the dataset to allow controlled data items to be linked via SKOS. Thesaurus – Ontology interoperability? • What options? • Most formal ontologies lack vocabulary - draw on a thesaurus (or other KOS?) • However mapping is problematic thesauri and ontologies designed for different purposes and use cases tend to differ • Can we describe purpose of a type of KOS/ontology? • What is relationship to application entities? Taxonomy of Knowledge Organisation Systems Gail Hodge Term Lists Authority Files, Glossaries, Gazetteers, Dictionaries Classification and Categorization Subject Headings Classification Schemes and Taxonomies eg DDC, scientific taxonomies Relationship Schemes Thesauri Semantic Networks (eg WordNet) (Ontologies) http://www.clir.org/pubs/abstract/pub91abst.html Types of Knowledge Organisation System (KOS) from Zeng & Salaba: FRBR Workshop, OCLC 2005 Relationship Groups: Classification & Categorization: Term Lists: Ontologies Semantic networks Thesauri Classification schemes Taxonomies Categorization schemes Subject Headings Synonym Rings Authority Files Glossaries/Dictionaries Gazetteers Pick lists Natural language Controlled language Dagobert Soergel 2001 Underlying characteristics for defining elements in a Taxonomy of KOS Potential Facets in Classification of KOS? • Entities covered • Information given • Arrangement • Purpose for which designed Sue Ellen Wright (Terminology – NPL) ISKO 2006 keynote, Terminology Summer School Potential for faceting • • • • • • • Communities of Practice Systematic resources Non-systematic resources Technology orientation Degrees of indeterminacy Language & knowledge-oriented standards Standards bodies Semiotic Triangle (Ogden and Richards, 1923) reproduced in Campbell et al. 1998, Representing Thoughts, Words, and Things in the UMLS Needs to be problematised Only indirect link via an interpreter Semiotic Triangle (Ogden and Richards, 1923) reproduced in Campbell et al. 1998, Representing Thoughts, Words, and Things in the UMLS (AI) Ontology tends to be … Instance of scientific concept Fact in a ‘possible world’ - part of the ontology? Semiotic Triangle (Ogden and Richards, 1923) reproduced in Campbell et al. 1998, Representing Thoughts, Words, and Things in the UMLS information retrieval (subject) KOS tends to be Probable relevance – aboutness - outside the scope of a thesaurus Inter/Intra indexer consistency ? (eg Bates 1986) Rationale for draft template of (some) KOS characteristics • Not exhaustive/complete - for exploration – other characteristics to be included – Some characteristics to be omitted • for types of KOS, rather than a specific instance • Tentative facets (a subset) Partly chosen to help make distinctions between some common types of KOS • Begin to consider KOS purposes and contexts of use - how we might describe purpose? Factors governing types of KOS Template (draft) Entities Concepts, terms, strings, Atomic - Composite (attributes) Enumerative - Synthetic Low – medium - high degree precombination (coordination in KOS itself) Size: small – large Depth: small – medium - large Relationships (internal) Types / expressivity of relationships: low (core set) – medium – high (definable) concept-concept, concept-term, term-term monohierarchies - polyhierarchies Formality: low – medium – high Typical application to objects in domain of interest Metadata element: subject, various elements, general Granularity of application objects: unstructured - complex Relationship applying concepts to objects in domain about (fuzzy), instance Exhaustivity: low - high Specificity: low - high Coordination: low - high expressivity and formality of relationships in coordination (synthesis rules) Factors governing types of KOS Thesaurus Entities Concepts, terms, strings, Atomic - Composite (attributes) Enumerative - Synthetic Low – medium - high degree precombination (coordination in KOS itself) Size: small – large Depth: small – medium - large Relationships (internal) Types / expressivity of relationships: low (core set) – medium – high (definable) concept-concept, concept-term, term-term monohierarchies - polyhierarchies Formality: low – medium – high Typical application to objects in domain of interest Metadata element: subject, various elements, general Granularity of application objects: unstructured - complex Relationship applying concepts to objects in domain about (fuzzy), instance Exhaustivity: low - high Specificity: low - high Coordination: low - high expressivity and formality of relationships in coordination (synthesis rules) Factors governing types of KOS Formal Ontology Entities Concepts, terms, strings, Atomic - Composite (attributes) Enumerative - Synthetic Low – medium - high degree precombination (coordination in KOS itself) Size: small – large Depth: small – medium - large Relationships (internal) Types / expressivity of relationships: low (core set) – medium – high (definable) concept-concept, concept-term, term-term monohierarchies - polyhierarchies Formality: low – medium – high Typical application to objects in domain of interest Metadata element: subject, various elements, general Granularity of application objects: unstructured - complex Relationship applying concepts to objects in domain about (fuzzy), instance Exhaustivity: low - high Specificity: low - high Coordination: low - high expressivity and formality of relationships in coordination (synthesis rules) Thesaurus – Ontology interoperability • What options? eg • Publishing a thesaurus using SKOS • Reengineering a thesaurus as an ontology (and vice versa) extend/map an ontology class with a thesaurus (hierarchy)? • Complementary use of a thesaurus and an ontology • For STAR, had thought to extend some CRM classes with thesaurus hierarchies But thesauri designed for slightly different purposes than CRM-EH and not clean fit even though CRM has E55 Type (weaker Is-Type_Of relationship than Instance) So decided to use together with in practice an informal mapping between thesauri and CRM classes for NLP work Perhaps • A data controlled type as BOTH an instance of Ontology class and as a SKOS concept? - use ontology for inferencing and thesaurus for retrieval purposes? STAR general architecture • Windows applications • Browser components • Full text search • Browse concept space • Navigate via expansion • Cross search archaeological datasets STAR client applications EH Thesauri and CRM ontology Grey literature indexing (CRM) STAR web services Archaeological Datasets (CRM) STAR datasets (expressed in terms of CRM) Natural Language Processing (NLP) of archaeological grey literature Extract key concepts in same semantic representation as for data. Allows unified searching of different datasets and grey literature in terms of same underlying conceptual structure “ditch containing prehistoric pottery dating to the Late Bronze Age” STAR Demonstrator – search for a conceptual pattern An Internet Archaeology publication on one of the (Silchester Roman) datasets we used in STAR discusses the finding of a coin within a hearth. -- does the same thing occur in any of the grey literature reports? Requires comparison of extracted data with NLP indexing in terms of the ontology. STAR Demonstrator – search for a conceptual pattern Research paper reports finding a coin in hearth – exist elsewhere? Wider implications - reuse of data • Expose (invisible) datasets for wider analysis and reuse • Meta studies comparing different excavation projects • Connect datasets and wider grey literature – look for wider patterns • Connect interpretations with underlying data • Open up a broader range of research questions that might be answered when we connect currently isolated excavation datasets • Allow different communities to share data and expertise References Campbell K., Oliver D., Spackman K., Shortliffe E. 1998. Representing Thoughts, Words, and Things in the UMLS. Journal of the American Medical Informatics Association, 5 (5), 421-431. Hodge G. 2000. Taxonomy of Knowledge Organization systems. http://nkos.slis.kent.edu/KOS_taxonomy.htm Soergel D. 2001a The representation of Knowledge Organization Structure (KOS) data.: a multiplicity of standards. JCDL 2001 NKOS Workshop, Roanoke. http://www.clis.umd.edu/faculty/soergel/SoergelNKOS2001KOSStandards.PDF Wright S. 2005. ISO TC 37 Standards: Basic Principles of Terminology. NKOS JCDL 2005 Workshop, Denver. http://nkos.slis.kent.edu/2005workshop/TC37.ppt Contact Douglas Tudhope [email protected] University of Glamorgan KOS research http://hypermedia.research.glam.ac.uk/kos/