Networked Knowledge Organization
Systems and Information Discovery
Douglas Tudhope
5th ISKO-Italy 2011, Venice
This presentation
•
Overview NKOS activities
•
Examples from recent NKOS related research at Glamorgan
on cross search of different archaeological datasets and reports
- STAR and STELLAR projects
•
Discuss issues of KOS - Ontology connections as part of this
Acknowledgements
Research team members and collaborators
–
–
Ceri Binding (University of Glamorgan)
Andreas Vlachidis (University of Glamorgan)
–
Keith May, English Heritage (EH)
–
Stuart Jeffrey, Julian Richards,
Archaeology Data Service (ADS)
Archaeology Department, University of York
NKOS: Networked Knowledge Organization
Systems/Services
Informal network for enabling knowledge organization systems (KOS),
such as classification systems, thesauri, gazetteers, and ontologies,
as networked interactive information services
to support the description and retrieval of diverse information
resources through the Internet
– Listserv hosted by OCLC
– NKOS website http://nkos.slis.kent.edu/
NKOS: Networked Knowledge Organization
Systems/Services
Two ongoing series of NKOS workshops
9 JCDL (and CENDI) Conference workshops in USA
origin 1997 workshop at ACM Digital Libraries Conference
9th Joint NKOS/CENDI workshop 2009
9 ECDL Conferences in Europe
9th European NKOS workshop 2010
plus
Dublin Core NKOS sessions 2005, 2008, 2010
– Special issues in JoDI (2001, 2004), NRHM (2006, issue 1)
– JISC Reviews on Terminology Services 2006
and Terminology Registries 2009
– See details on NKOS website http://nkos.slis.kent.edu/
– ECDL NKOS workshops
http://hypermedia.research.glam.ac.uk/kos/nkos/
Longstanding agenda:
KOS integration into DL services
from Linda Hill 2002 Research Agenda KOS/DL
Taxonomy of KOS - KOS types linked to DL service protocols
Registries of KOS and KOS-level metadata to represent them
XML/RDF KOS representations - customisable
Core set of relationship types across all KOS
General KOS service protocol (terminology services)
from which protocols for specific types of KOS can be derived
Robust linking model in which DL entities (collections, objects, and
services) can refer to KOS entities (concepts, labels, and relationships)
Visualization tools that fully use and display the rich semantics embedded
in KOS
Still relevant to new trends in semantic web, linked data, registries, tagging
NKOS: Forthcoming / Ongoing
– DCMI/NKOS Task Group
to develop Dublin Core Application Profile for KOS resources
– Workshop at DC 2010 Pittsburgh
Activities include
– Develop a functional requirements specification
– Develop a simple domain model
– Develop metadata terms for KOS
– Develop corresponding Dublin Core application profile
– Revise and finalize KOS Type vocabulary
– Task Group official webpage: http://dublincore.org/groups/nkos/
Working wiki: http://www.metadataetc.org/wiki/dcmi-nkos/doku.php
NKOS: Forthcoming / Ongoing
•
Special NKOS session at ISKO-UK 2011 conference:
"What role can KOS play in information retrieval applications?”
Session 4 at 2nd biennial conference of UK Chapter of ISKO
4-5th July, London
http://www.iskouk.org/conf2011/programme.htm
•
Forthcoming Special KOS issue of ASIST Bulletin
NKOS: Forthcoming / Ongoing
– 10th European NKOS workshop at TPDL 2011
Berlin 28 (pm) and 29 (am) September 2011
– CFP expected mid April - general topics at
http://www.comp.glam.ac.uk/pages/research/hypermedia/nkos/nkos2011/
Initially suggested topics include (related to agenda of ISO 25964 Part 2 standard )
•
Relation between KOS and formal ontologies
Relationship of domain thesauri to upper ontologies such as CIDOC CRM
•
From KOS to formal ontologies and back?
Repurposing and reengineering of KOS (for other usage scenarios than indexing) and also
from “enriched” ontologies to the originating and contributing KOS.
also
•
Management and integration of multiple vocabulary types
•
SKOS extensions – for mapping and vocabulary integration, additional KOS types, etc.
•
Library Linked Data: Linking KOS data on the web.
Information Discovery
•
Literal string match (eg Google) is good for some kinds of searches:
specific concrete topics
where all we want are some relevant results
- not care how many we miss!
•
Google less good at more conceptual (re)search topics
where important to be sure not missed anything important
eg medical, legal, scholarly research
------------•
Searching data and documents a recent general research focus
variously termed ... eScience, Digital Humanities, Cyberinfrastructure
- data.gov.uk a recent initiative for government data
Words are tricky!
"When I use a word," Humpty Dumpty said in rather a scornful tone,
"it means just what I choose it to mean--neither more nor less." (Lewis Carroll)
•
Various potential problems with literal string search
•
•
Different words mean same thing
Same word means different things
•
Trivial spelling differences can affect results
or a particular choice of synonym
or a slightly different perspective in choice of concept
- How to address this issue?
NKOS
•
Bridging some aspects of Information Science and Semantic Web
•
Part of a general move towards
a (more) machine understandable Web
Machine readable vs machine understandable
What we say to the machine:
<h1>The Cat in the Hat</h1>
<ul>
<li>ISBN: 0007158440</li>
<li>Author: Dr. Seuss</li>
<li>Publisher: Collins</li>
</ul>
What the machine understands:
<<h1>asd plu bg ith mys</h1>
<ul>
<li>jvfr: 0007158440</li>
<li>vuyrok: Dr. Seuss</li>
<li>Publisher: Collins</li>
</ul>
(More) machine understandable
What we say to the machine:
<h1>Title:The Cat in the Hat</h1>
<ul>
<li>ISBN: 0007158440</li>
<li>Author: Dr. Seuss</li>
<li>Publisher: Collins</li>
</ul>
What the machine understands:
<<h1>asd plu bg ith mys</h1>
<ul>
<li>jvfr: 0007158440</li>
<li>vuyrok: Dr. Seuss</li>
<li>Publisher: Collins</li>
</ul>
(More) machine understandable
Book
ID
What we say to the machine:
Author
Publisher
<h1>Title:The Cat in the--------------Hat</h1>
<ul>
conceptual
structure
<li>ISBN: 0007158440</li>
(ontology)
<li>Author: Dr. Seuss</li>
<li>Publisher: Collins</li>
</ul>
What the machine understands:
<<h1>asd plu bg ith mys</h1>
<ul>
<li>jvfr: 0007158440</li>
<li>vuyrok: Dr. Seuss</li>
<li>Publisher: Collins</li>
</ul>
(More) machine understandable
Book
ID
What we say to the machine:
Author
Publisher
<h1>Title:The Cat in the--------------Hat</h1>
<ul>
conceptual
structure
<li>ISBN: 0007158440</li>
(ontology)
<li>Author: Dr. Seuss</li>
--------------<li>Publisher: Collins</li>
Theodor
vocabularies for
</ul>
Geisel
terminology and
knowledge
What the machine understands:
organization
<<h1>asd plu bg ith mys</h1>
<ul>
<li>jvfr: 0007158440</li>
<li>vuyrok: Dr. Seuss</li>
<li>Publisher: Collins</li>
</ul>
Knowledge Organization Systems
•
Knowledge Organization Systems
eg classifications, thesauri and ontologies
help semantic interoperability
•
Reduce ambiguity by defining terms
and providing synonyms
•
Organise concepts via semantic relationships
Knowledge Organization Systems
•
Knowledge Organization Systems
- classifications, thesauri and ontologies
help semantic interoperability
•
Reduce ambiguity by defining terms
and providing synonyms
Organise concepts via semantic relationships
EH Monuments
Type Thesaurus
Knowledge Organization Systems
•
Knowledge Organization Systems
- classifications, thesauri and ontologies
help semantic interoperability
•
Reduce ambiguity by defining terms
and providing synonyms
Organise concepts via semantic relationships
EH Monuments
Type Thesaurus
Knowledge Organization Systems
•
Knowledge Organization Systems
- classifications, thesauri and ontologies
help semantic interoperability
•
Concept expansion of Rubbish Pit
(as tag cloud or as ranked list)
using STAR semantic services
http://hypermedia.research.glam.ac.uk/resources/terminology/
Eg Midden (refuse heap) useful alternative
search term to Rubbish
EH Monuments
Type Thesaurus
STAR Semantic Terminology Services
- concept expansion (as web service)  midden
STAR
Semantic Technologies for Archaeological Resources
•
AHRC funded project(s) with English Heritage and the ADS
•
Currently excavation datasets isolated
Different datasets with different structures and vocabularies
•
Currently no connection with grey literature excavation reports
ADS OASIS Grey Literature Library (unpublished reports)
Online AccesS to the Index of archaeological investigationS
Aim:
• Cross search at a conceptual level
archaeological datasets with associated grey literature
•
http://hypermedia.research.glam.ac.uk/kos/STELLAR/
STAR
Semantic Technologies for Archaeological Resources
•
Need for integrating conceptual framework
and terminology control via thesauri and glossaries
•
EH (Keith May) designed an ontology
describing the archaeological process
The archaeological process
Events in the past have results in the present
•
Events in the present and events in the past,
related by the place in which they occur
and the physical remains in that place
•
Activities in the present investigate the remains of the past
(affecting them in the process)
Broader conceptual framework (ontology)
EH extension of CIDOC Conceptual Reference Model (CRM)
explicit modelling of archaeological events – complicated!
CRM is event-based and chains of relationships connect major entities
STELLAR

12 month AHRC funded project
 Hypermedia Research Unit, University of Glamorgan
 Archaeology Data Service, University of York
– English Heritage Centre for Archaeology, Portsmouth

Builds on previous 3 year AHRC funded STAR Project

http://hypermedia.research.glam.ac.uk/kos/STELLAR/
STELLAR aims
•
Make it easier to map and extract datasets to CIDOC CRM
in a consistent manner
•
Generalise the data extraction tools produced by STAR
so third party data providers can use them
•
Develop guidelines for mapping and extraction of archaeological
datasets into RDF/XML conforming to CIDOC CRM-EH ontology
•
Develop guidelines and tools for generating corresponding Linked Data
STELLAR background
•
In practice mapping to CRM has tended to require specialist knowledge of the ontology
and been resource intensive
•
Given the wide scope of the CRM, it is possible to make multiple valid mappings
depending on the intended purpose and focus of the mappings
•
STELLAR tools convert archaeological data to CRM/RDF in a consistent manner,
without requiring detailed knowledge of the underlying ontology
•
User chooses a template for a particular data pattern
and supplies the corresponding input from their database
•
STELLAR templates for
– CRM-EH archaeological extension to the CIDOC CRM
– Some more general CIDOC CRM templates conforming to the CLAROS Project format
– SKOSifying a glossary/thesaurus connected with the dataset
to allow controlled data items to be linked via SKOS.
Thesaurus – Ontology interoperability?
•
What options?
•
Most formal ontologies lack vocabulary
- draw on a thesaurus (or other KOS?)
•
However mapping is problematic
thesauri and ontologies designed for different purposes and use cases
tend to differ
•
Can we describe purpose of a type of KOS/ontology?
•
What is relationship to application entities?
Taxonomy of Knowledge Organisation Systems
Gail Hodge
Term Lists
Authority Files, Glossaries, Gazetteers, Dictionaries
Classification and Categorization
Subject Headings
Classification Schemes and Taxonomies
eg DDC, scientific taxonomies
Relationship Schemes
Thesauri
Semantic Networks (eg WordNet)
(Ontologies)
http://www.clir.org/pubs/abstract/pub91abst.html
Types of Knowledge Organisation System (KOS)
from Zeng & Salaba: FRBR Workshop, OCLC 2005
Relationship Groups:
Classification &
Categorization:
Term Lists:
Ontologies
Semantic networks
Thesauri
Classification schemes
Taxonomies
Categorization schemes
Subject Headings
Synonym Rings
Authority Files
Glossaries/Dictionaries
Gazetteers
Pick lists
Natural language
Controlled language
Dagobert Soergel 2001
Underlying characteristics for defining elements in
a Taxonomy of KOS
Potential Facets in Classification of KOS?
• Entities covered
• Information given
• Arrangement
• Purpose for which designed
Sue Ellen Wright (Terminology – NPL)
ISKO 2006 keynote, Terminology Summer School
Potential for faceting
•
•
•
•
•
•
•
Communities of Practice
Systematic resources
Non-systematic resources
Technology orientation
Degrees of indeterminacy
Language & knowledge-oriented standards
Standards bodies
Semiotic Triangle (Ogden and Richards, 1923)
reproduced in Campbell et al. 1998,
Representing Thoughts, Words, and Things in the UMLS
Needs to be problematised
Only indirect link via an interpreter
Semiotic Triangle (Ogden and Richards, 1923)
reproduced in Campbell et al. 1998,
Representing Thoughts, Words, and Things in the UMLS
(AI) Ontology tends to be …
Instance of scientific concept
Fact in a ‘possible world’
- part of the ontology?
Semiotic Triangle (Ogden and Richards, 1923)
reproduced in Campbell et al. 1998,
Representing Thoughts, Words, and Things in the UMLS
information retrieval (subject) KOS tends to be
Probable relevance –
aboutness
- outside the scope
of a thesaurus
Inter/Intra indexer consistency ?
(eg Bates 1986)
Rationale for draft template
of (some) KOS characteristics
• Not exhaustive/complete - for exploration
– other characteristics to be included
– Some characteristics to be omitted
• for types of KOS, rather than a specific instance
• Tentative facets (a subset)
Partly chosen to help make distinctions
between some common types of KOS
• Begin to consider KOS purposes and contexts of use
- how we might describe purpose?
Factors governing types of KOS
Template (draft)
Entities
Concepts, terms, strings,
Atomic - Composite (attributes)
Enumerative - Synthetic
Low – medium - high degree precombination (coordination in KOS itself)
Size: small – large
Depth: small – medium - large
Relationships (internal)
Types / expressivity of relationships:
low (core set) – medium – high (definable)
concept-concept, concept-term, term-term
monohierarchies - polyhierarchies
Formality: low – medium – high
Typical application to objects in domain of interest
Metadata element: subject, various elements, general
Granularity of application objects: unstructured - complex
Relationship applying concepts to objects in domain
about (fuzzy), instance
Exhaustivity: low - high
Specificity: low - high
Coordination: low - high
expressivity and formality of relationships in coordination (synthesis rules)
Factors governing types of KOS
Thesaurus
Entities
Concepts, terms, strings,
Atomic - Composite (attributes)
Enumerative - Synthetic
Low – medium - high degree precombination (coordination in KOS itself)
Size: small – large
Depth: small – medium - large
Relationships (internal)
Types / expressivity of relationships:
low (core set) – medium – high (definable)
concept-concept, concept-term, term-term
monohierarchies - polyhierarchies
Formality: low – medium – high
Typical application to objects in domain of interest
Metadata element: subject, various elements, general
Granularity of application objects: unstructured - complex
Relationship applying concepts to objects in domain
about (fuzzy), instance
Exhaustivity: low - high
Specificity: low - high
Coordination: low - high
expressivity and formality of relationships in coordination (synthesis rules)
Factors governing types of KOS
Formal Ontology
Entities
Concepts, terms, strings,
Atomic - Composite (attributes)
Enumerative - Synthetic
Low – medium - high degree precombination (coordination in KOS itself)
Size: small – large
Depth: small – medium - large
Relationships (internal)
Types / expressivity of relationships:
low (core set) – medium – high (definable)
concept-concept, concept-term, term-term
monohierarchies - polyhierarchies
Formality: low – medium – high
Typical application to objects in domain of interest
Metadata element: subject, various elements, general
Granularity of application objects: unstructured - complex
Relationship applying concepts to objects in domain
about (fuzzy), instance
Exhaustivity: low - high
Specificity: low - high
Coordination: low - high
expressivity and formality of relationships in coordination (synthesis rules)
Thesaurus – Ontology interoperability
•
What options?
eg
•
Publishing a thesaurus using SKOS
•
Reengineering a thesaurus as an ontology (and vice versa)
extend/map an ontology class with a thesaurus (hierarchy)?
•
Complementary use of a thesaurus and an ontology
•
For STAR, had thought to extend some CRM classes with thesaurus hierarchies
But thesauri designed for slightly different purposes than CRM-EH and not clean fit
even though CRM has E55 Type (weaker Is-Type_Of relationship than Instance)
So decided to use together with in practice an informal mapping
between thesauri and CRM classes for NLP work
Perhaps
•
A data controlled type as BOTH an instance of Ontology class and as a SKOS concept?
- use ontology for inferencing and thesaurus for retrieval purposes?
STAR general architecture
• Windows applications
• Browser components
• Full text search
• Browse concept space
• Navigate via expansion
• Cross search
archaeological datasets
STAR client applications
EH Thesauri and
CRM ontology
Grey literature
indexing (CRM)
STAR web services
Archaeological
Datasets (CRM)
STAR datasets
(expressed in terms of CRM)
Natural Language Processing (NLP)
of archaeological grey literature
Extract key concepts in same semantic representation as for data.
Allows unified searching of different datasets and grey literature
in terms of same underlying conceptual structure
“ditch containing prehistoric pottery dating to the Late Bronze Age”
STAR Demonstrator – search for a conceptual pattern
An Internet Archaeology publication on one of the (Silchester
Roman) datasets we used in STAR discusses the finding of a coin
within a hearth.
-- does the same thing occur in any of the grey literature reports?
Requires comparison of extracted data with NLP indexing in terms
of the ontology.
STAR Demonstrator – search for a conceptual pattern
Research paper reports finding a coin in hearth – exist elsewhere?
Wider implications - reuse of data
• Expose (invisible) datasets for wider analysis and reuse
• Meta studies comparing different excavation projects
• Connect datasets and wider grey literature – look for wider patterns
• Connect interpretations with underlying data
• Open up a broader range of research questions that might be
answered when we connect currently isolated excavation datasets
• Allow different communities to share data and expertise
References
Campbell K., Oliver D., Spackman K., Shortliffe E. 1998. Representing Thoughts, Words, and Things in the
UMLS. Journal of the American Medical Informatics Association, 5 (5), 421-431.
Hodge G. 2000. Taxonomy of Knowledge Organization systems.
http://nkos.slis.kent.edu/KOS_taxonomy.htm
Soergel D. 2001a The representation of Knowledge Organization Structure (KOS) data.: a multiplicity of
standards. JCDL 2001 NKOS Workshop, Roanoke.
http://www.clis.umd.edu/faculty/soergel/SoergelNKOS2001KOSStandards.PDF
Wright S. 2005. ISO TC 37 Standards: Basic Principles of Terminology. NKOS JCDL 2005 Workshop,
Denver. http://nkos.slis.kent.edu/2005workshop/TC37.ppt
Contact
Douglas Tudhope
[email protected]
University of Glamorgan KOS research
http://hypermedia.research.glam.ac.uk/kos/
Scarica

Thesaurus-based access to multimedia collections