ALCTS Subject Analysis Committee
Subcommittee on Semantic Interoperability
Annotated Bibliography
Note: Text often recorded as it appears in the article.
The
ADEPT is
being developed as an integrated learning environment based on ADL geospatial
digital library technology. It is currently used to teach Physical Geography to
undergraduate students at
ADEPT will provide gazetteer,
thesaurus, and geo-ontology services. The gazetteer will be built from the ADL
project gazetteer and serve as an index supporting transformations between
named places and geographic coordinates. Thesauri provide a basis for resolving
semantic inconsistencies, for example between alternative names for geographic
feature types. They will build a set of core thesauri covering geographic
representations of regions in space and space relations with objects.
Geo-ontology services: the vocabularies used to describe geographic features
and phenomena vary by discipline. By knowing which ontologies are used in
different contexts, and by mapping between them, it is possible to make
appropriate semantic correlations between different information sources. They
will build 1) a set of domain-specific ontologies for geospatial information;
and 2) a set of domain-independent ontologies supporting system, syntactic, and
structural interoperability.
American
Library Association. "Subject data in the metadata record", Division
of Association for Libraries and Technical Services, Cataloging and
Classification Section, Subcommittee on Metadata and Subject Analysis (1999). <http://www.ala.org/ala/alctscontent/catalogingsection/catcommittees/subjectanalysis/metadataandsubje/subjectdata.htm>
(
Ardo, Anders. Browsing Engineering Resources on the Web: a
General Knowledge Organization Scheme (Dewey) vs. a Special Scheme (EI). <http://staff.oclc.org/~vizine/traugott/OCLC_NetLab_ISKO6.html>
(
Published
Ardö, Anders, Godby, Jean, Godby, Houghton, Andrew, Koch,
Traugott, Reighart, Ray, Thompson, Roger and Vizine-Goetz, Diane.
"Browsing Engineering Resources on the Web" in Dynamism and
Stability in Knowledge Organization: Proceedings of the Sixth ISKO Conference, 10-13
July, 2000: 385-390 (2000)
The goal of the DESIRE II project
is to explore automated methods for gathering and organizing Web resources to
improve resource discovery on the Internet. Researchers at NetLab and OCLC
provided searching and browsing of a test collection of engineering documents
on the Web. The goal of the project is to explore simple methods of automatic
classification to provide subject browsing of a robot-generated engineering
index. At NetLab the documents were automatically classified and organized
using an engineering-specific scheme, the Engineering Index (Ei) Thesaurus and
Classification; at OCLC the Dewey Decimal Classification (DDC), a general
knowledge organization scheme was used. The enhanced DDC database includes
several mechanisms for incorporating new terminology. Scorpion is used to do
automatic class number assignment. WordSmith software was used to create a
small set of high-quality topical vocabulary suitable as an index or browse
display and that can supplement the subject indexes provided by the Ei
Thesaurus or the DDC.
Ardo, Anders, Berggren, Marten, Koch, Traugott, and
Kringstad, Reidun. Nordic Interconnected Subject-based Information Gateways
(NISBIG). Final report (2002). <http://www.lub.lu.se/nisbig/slutrapport.html>
(
Project final report addresses
all types of metadata including subject access for use in a quality-controlled
subject gateway. It discussed problems and limitations, and recommends pursing
Renardus, IMesh Toolkit, etc. Subject gateways were developed in order to
support discovery and retrieval of Internet resources as well as to integrate
Internet resources with "traditional" library resources. Apart from gaining experience with content
and metadata profiling and classification mapping for cross-browsing, the main
technical goal of the project was to explore the applicability of the
LDAP-based Isaac Network software developed by the US Internet Scout Project to
provide cross-searching between the involved three Nordic subject gateways and
other gateways joining the Isaac Network.
Baker, Thomas and Dekkers, Makx. "Identifying Metadata
Elements with URIs : the CORES Resolution." D-Lib Magazine, 9, no.
7/8 (July/August 2003). <http://www.dlib.org/dlib/july03/baker/07baker.html>
(2003).
At a meeting organized by the
CORES Project (Information Society Technologies Programme, European Union),
several organizations regarded as maintenance authorities for metadata elements
achieved consensus on a resolution to assign Uniform Resource Identifiers
(URIs) to metadata elements as a useful first step towards the development of
mapping infrastructures and interoperability services. The maintainers of GILS, ONIX, MARC 21, CERIF,
DOI, IEEE/LOM, and Dublin Core reported on their implementations of the
resolution and highlighted issues of relevance to establishing good-practice
conventions for declaring, identifying, and maintaining metadata elements more
generally. In November 2002, they committed to implementing the agreement to
define URI assignment mechanisms, assign URIs to elements, and formulate
policies for the persistence of those URIs.
Baker, Thomas. "What Terms Does Your Metadata Use?
Application Profiles as Machine-Understandable Narratives." Journal of
Digital Information, 2, no. 2 (
Rachel Heery and Manjula Patel have defined
application profiles as 'schemas which consist of data elements drawn from one
or more namespaces, combined together by implementers, and optimized for a
particular application.' By definition, such profiles depend for their elements
on namespaces. Namespaces, in this context, are element sets maintained as
stable points of reference. They serve to 'identify the management authority
for an element, support definition of unique identifiers for elements, [and]
uniquely define particular data element sets or vocabularies". The
registry prototyped in the DESIRE Project focused on the disclosure of
information about the authoritative use of metadata -- element definitions,
usage notes, allowed schemes, and mappings to other namespaces -- and explored
typical user queries. The SCHEMAS registry builds on the DESIRE experience.
Bates. Marcia. After the Dot-bomb: Getting Web
Information Retrieval Right this Time. 2002 <http://www.firstmonday.org/issues/issue7_7/bates/>
(
The author proposes using systems
already design for information retrieval, e.g. faceted classification and
information resources thesauri, which have an internal structure, concept
clusters, etc. The long-term solution to index the Web is probably overlapping
methods of classifying and indexing knowledge. She disapproves of the use of
the word "ontology" since it refers to the philosophical issues
surrounding the nature of being.
Bates, Marcia. Task Force Recommendation 2.3 Rresearch
and Design Review: Improving User Access to Library Catalog and Portal
Information : Final Report. (2003).
Selections:
It is
recommended that with regard to access vocabulary:
• A cluster vocabulary be created, based on the searcher
vocabulary developed by Sara Knapp (1993, 2000), if she and her publisher
agree.
Bates Task Force 2.3 Review 51
• For the price of a share of the maintenance of the
database, libraries and commercial firms may subscribe to the searcher
vocabulary database, and install it in their catalogs, portals, and websites.
• With experience, other types of
clusters are added--for names, works, geographical locations, etc.
• Access to catalogs and portal
information should be available both directly through and around the vocabulary
database. In this way, searchers may choose to use the database or not, and, if
they do choose it, they do not have to enter and exit a separate database (a
violation of the ever-present Principle of Least Effort).
• Institutional users may link
the searcher vocabulary with their own controlled vocabulary. As a result,
users of these sites may input their search term(s), be shown a cluster of
terms, including “legitimate” controlled terms, and use the clusters as a basis
for selecting terms for either controlled vocabulary or keyword searching.
• With this vocabulary as a core,
one or two lexicographers are hired cooperatively to maintain the searcher
vocabulary, adding popular new terms as they come along, and adding terms found
by cooperating organizations in “zero hit” searches. As changes are made in the
vocabulary, rather than in millions of individual cataloging records, cultural
and research changes can be accommodated much more rapidly and cheaply.
• These vocabularies become part of a "Vocabulary
Headquarters" (VHQ) website, supported by the library community or
organizations therein.
It is recommended that with regard to bibliographic
families:
• Preliminary agreement be gained on what shall constitute
bibliographic families at the work level, probably based on the work of
Tillett, Smiraglia, Hickey, and others. It may be found that work-sets, as
described by Hickey et al. should also be considered.
• As these bibliographic families probably follow the
Bradford Distribution, there will be some few that are very large, and many
that are very small or singletons. As the larger families are much more likely
to cause difficulties for searchers, and as they are also often around
canonical works that attract a great deal of research and cultural interest,
the larger families should be grouped first.
• At first on an experimental basis, individual libraries
or other institutions offer each to do the work to collect just one large
family (from records already created at the individual level). The results of
these experiences are shared at conferences and other meetings.
• Based on these experiences, criteria are finalized for
the creation of bibliographic families. Libraries may acquire the cataloging
information for the families in a manner similar to the currently existing
cooperative cataloging arrangements.
Bates Task Force 2.3 Review 52
• Further experience will also provide enlightenment
regarding just how far down the chain of family size the cooperative effort
should go.
• Eventually, with further technological advances, it
becomes possible that whenever a searcher happens on a record that is part of a
bibliographic family, the searcher may click on a “related records” link and
see displayed on the screen the progenitor record plus links to all the
different types of bibliographically related records arrayed around the core
record.
It is recommended that with regard to staging of
access to records:
• Libraries and other information institutions take as an
objective the approach of providing staged access to information that drops
down into the information in a
• Current cooperation with publishers can be extended,
including use of book flap and contents information that is already in
electronic form for catalog records.
• The online bookstore, amazon.com, contains within it many
of the design features that have been recommended by catalog and database user
studies over the years. Amazon.com can be seen as a source of ideas and prior
testing of design features.
Becker, Hans J. "Cultural Heritage Projects:
Renardus". Paper presented at TEL Milestone Conference,
Goals: a) to improve access to
existing academic subject gateway services in Europe; b) to develop a 'broker'
service that will allow integrated searching and browsing of distributed
resource collections; c) to develop models for sharing metadata, agreement on
technical solutions and other standards. Subject gateway definition:
"quality controlled subject gateways and resource discovery broker
systems, target audience is predominantly higher education and academic
research communities across Europe: a) selection and collection development
(human intellectual effort, certain policy with regard to collection
development, documented selection criteria); b) Collection management
(maintaining or improving the level of quality of the collection, certain
policy with regard to maintenance); c) Resource description (all selected
resources are described according to a fixed and documented metadata set,
metadata are structured in well-defined semantic fields to enable structured
searching); d) Subject classification (all resources are indexed according to a
subject classification scheme in order to enable subject browsing). Various
aspects of the project will be addressed by different groups in areas called
'work packages'.
Beghtol, Clare. "The Iter Bibliography: International
Standard Subject Access to Medieval and Renaissance materials (400-1700)."
Paper presented at IFLA Satellite
Meeting: Subject Retrieval in a Networked World, OCLC,
The Iter Bibliography contains
unique provisions for subject analysis and access. It uses a combination of
multiple LCSH headings and multiple DDC notations for subject specification in
order to incorporate the strengths of each system, and it also provides
uncontrolled keywords to cater for terms that would likely to be used by
Medieval and Renaissance scholars.
Bird, Steven and Simons,
This paper describes a new
digital infrastructure for language resource discovery, based on the Open
Archives Initiative, and called OLAC - Open Language Archives Community. The
OLAC Metadata Set and the associated controlled vocabularies facilitate
consistent description and focused searching.
Brickley, Dan and Miller, Libby. Imesh Tk: Subject
Gateway Review Plan, 2000. <http://www.ilrt.bris.ac.uk/discovery/2000/07/itk-sgr/>
(
The objective of the Subject
Gateway Review is to ensure that the IMesh Tk architectural and technical
strategies are well-grounded in the documented needs and practical requirements
of the internet cataloging community as they stand now, with a view to the next
2-3 years. The Review will be responsible for producing scope and
prioritization guidelines and a literature review. The relationship between
XML-based metadata systems, notably RDF and other traditions such as LDAP and
X39.50 is not yet clear. XML's popularity stems in large part from its
cross-domain generality: XML representations of white pages data, bibliographic
metadata, structured documents etc. can (to some extent) exploit common tools
and software components. One issue that the Subject Gateway Review will need to
address is the distinction between data-format based interfaces and
API/protocol interfaces. The latter addresses the possibility of tools such as
on-the-fly adaptors that translate (say) Z39.50 queries into LDAP queries or
vise-versa., while the former addresses the need for common data formats/information
models for data exchange. Need to address: Do gateway managers prefer
query-time protocol mapping to scenarios in which they 'batch convert' (given
some standard data format, e.g. some flavor of qualified Dublin Core) records
to make them available in multiple search protocols?
Buchel, Olha and Coleman, Anita. "How Can
Classificatory Structures be Used to Improve Science Education?" Library
Resources & Technical Services, 47, no. 1 (2003): 4-15
The Alexandria Digital Earth
Prototype (ADEPT) project provides the test bed for instructional materials and
user analyses. ADEPT is supported by the National Science Foundation Digital
Libraries Initiative, Phase 2 and is a successor to the Alexandria Digital
Library (ADL) project. http://www.alexandria.uscb.edu
Buckland, Michael, and others. "Mapping Entry
Vocabulary to Unfamiliar Metadata Vocabularies." D-Lib Magazine, 5,
no. 1 (January 1999). <http://www.dlib.org/dlib/january99/buckland/01buckland.html>
(
Proposes an entry module to help the user get
started. Mapping entry vocabulary modules use classification clustering,
exploit the combination of linguistic analysis with statistical methods, and is
based on searching fragments within the metadata and databases, performing
statistical and linguistic analysis, presenting the user with a familiar term.
There is always one additional
vocabulary in play - the User's .
The network environment is leading to an
increasing number of heterogeneous repositories, using diverse metadata
vocabularies (categorization codes, classification numbers, index and thesaurus
terms) This is creating more and more unfamiliar sets of terms users must
employ to access Internet resources. It has been argued that the most
cost-effective single investment for improving effectiveness in the searching
of repositories would be technology to assist the searcher in coping with
unfamiliar metadata vocabularies.
A DDC number is a word/meaning.
The Relative index provides the English to DDC number translation. What is now
needed is a natural language index ('ordinary English") to the Relative
Index and/or DDC numbers. The Entry Vocabulary Module helps the searcher be
more effective and, thereby, provides a value-added enhancement.
Research has focused on:
development of tools to support the creation of Entry Vocabulary Modules;
creation of a set of prototype Entry Vocabulary Modules for a challenging range
of examples, including subdomains; deployment; use of natural language
processing techniques in addition to statistical term co-occurrence;
recommendations for the improvement of metadata documentation for numeric
databases.
Prototype available at: http://www.sims.berkeley.edu/research/metadata/oasis.html
Chan, Lois Mai and Zeng, Marcia Lei. "Ensuring
Interoperability among Subject Vocabularies and Knowledge Organization Schemes:
a Methodological Analysis." Paper presented at the 68th IFLA Council and
General Conference,
The ideal approach would be to
provide "one-stop: seamless searching instead of requiring the user to
search individual databases or collections separately. To enable such an
approach, it is important to render the different knowledge organization
systems, such as controlled vocabularies and classification schemes,
interoperable within a single search apparatus. A number of projects are trying
to achieve interoperability between and among different subject vocabularies
(including both controlled and uncontrolled vocabularies) and knowledge
organization systems. They include efforts at establishing interoperability
among vocabularies in the same language or in different languages, among
different classification schemes, and between controlled vocabularies and
classification schemes.
Chan, Lois (2000) "Exploiting LCSH, LCC, and DDC to
retrieve networked resources issues and challenges", Library of Congress
(2002) <http://www.loc.gov/catdir/bibcontrol/chan.html>
(
Vocabulary control for improved precision and recall and
structured organization for efficient shelf location and browsing have
contributed to effective subject access to library materials. The question is
whether existing tools can continue to function satisfactorily in dealing with
web resources. To meet the challenges of web resources, certain operational
requirements must be taken into consideration, the most important being the
ability to handle a large volume of resources efficiently and interoperability
across different information environments and among a variety of retrieval
models. Schemes that are scalable in semantics and flexible in syntax,
structure, and application are more likely to be capable of meeting the
requirements of a diversity of information retrieval environments and the needs
of different user communities.
Chan, Lois Mai, and others. "A Faceted Approach to
Subject Data in the
For the Dublin Core metadata
record, a new approach to subject vocabulary was investigated. Faceted
Application of Subject Terminology (FAST), is based on the existing vocabulary
in Library of Congress Subject Headings. It is applied in a simpler
syntax. In FAST, non-topical (geographic, chronological, and form) data are
separate from topical data and placed in different elements provided in the
Dubin Core metadata record.
Chan, Lois Mai, Lin, Xia, Zeng, Marcia (1999).
"Structural and Multilingual Approaches to Subject Access on the
Web." Paper presented at the 65th
IFLA Council and General Conference,
A report in three parts.
Using hierarchical or classification-based
formats to organize web resources should have important advantages, among which
are improved subject browsing facilities, potential multi-lingual access and
improved interoperability with other services. In the web environment, subject data
often are separate from or reside outside the resources themselves. It can be
stored in interfaces that link subject data to the resources but do not affect
them otherwise. The advantage of "linking-to" rather than
"storing-with" is flexibility. Desirable characteristics: a)
intuitive, logical and easy to use … with expressive captions; b) flexible,
adjustable, and expandable; c) useful in a wide range of settings; d)
relatively easy to maintain and revise.
Part II. Knowledge Class.
The purpose of this research
project is to create and test a device called "Knowledge Class",
designed for customizing knowledge organization and access, to supplement and
complement existing devices for Web users. Knowledge Class contains two basic
components: a) an organizing framework, and b) interface for access to and
retrieval of web resources. I) The organizing framework is a classified
mini-thesaurus, consisting of a hierarchically structured collection of terms
on a specific topic or discipline of interest or concern to an individual user.
The user can initiate searches by selecting the display terms or by using
pre-stored search strategies, which often contain synonyms and can also connect
to sites previously discovered by clicking on links with pre-stored URLs.
Part III.
Multilingual approach to subject
access.
Multilingual processing has
emerged as a key issue in the evolution of search engine technologies. Major
search engines have developed new services functional as regional search guides
in these areas: a) domain filtering, b) domain direction, c) mirror sites, d)
language specific search, e) multilingual search, f) regional interfaces, g)
localized subject directories.
The road towards a fully
functional cross-lingual subject access is both optimistic and sophisticated.
Many other technical issues as well as social and cultural issues also need to
be addressed. These include character encoding support, user interface
linguistic translation, support of culture-specific data formats (date,
currency, etc.), user interface graphical modification (color, images), foreign
products support (.e.g. databases), and operating system compatibility. In
summary, there has been an increasing need for effective mechanisms to organize
web resources for exploration, discovery, and retrieval.
Cherry, Steven M. "Weaving a Web of Ideas." IEEE
spectrum, Sept. 2002.
Software agents, robots, were not
successful in dealing with semantics, with multiple meanings of words. The
Semantic Web idea, instead suggests the Web pages should contain their own
semantics. Successful search engines have developed sophisticated methods of
delivering documents. The Semantic Web aims to get to the information in the
documents by using an ontology - a collection of related RDF statements, which
together specify a variety of relationships among data elements and ways of
making logical inferences among them. It addresses syntax, which is the set of
rules or patterns according to which words are combined into sentences.
Semantics is the meaningfulness of the terms - how the terms related to real
things. Search engines have room for
improvement. One method (
Clark, Judith. "Subject Portals." Ariadne,
29 (
The author describes a 3-year
project to develop a set of subject portals or hubs, part of the Development
Programme of the Distributed National Electronic Resource (DNER), funded by the
JISC. The project aims to enhance resource discovery by developing a series of
portals focused on the requirements of end-users located in a variety of
learning environments within higher education sectors. The first phase of the
project (2000-2001) was to build a Z39.50 cross search prototype at three RDN
hubs, SOSIG, EEVL, and BIOME. The second phase ads HUMBUL and PSIgate. Sites
are selected on the basis of selection criteria, cataloged following consistent
practices, and analyzed by people with expertise I the relevant subject
discipline. Links are checked daily in an automated process and all entries are
updated regularly by subject specialists. These are classified using an
appropriate controlled vocabulary.
RDN portals (http://www.rdn.ac.uk/projects/) are
primarily concerned with technologies that broker subject-oriented access to
resources. Effective cross-searching depends on consistent metadata standards.
Z39.50 is the standard that has been adopted for preliminary cross-search
functionality. Further functionality is being developed using RSS (Rich Site
Summary) and OAI (Open Archives Initiative). Other standards applications that
underpin the portals are notably Dublin Core and a variety of subject-specific
thesauri such as the CAB Thesaurus and MeSH.
Clavel-Merrin, Genevieve. "Multilingual Access to
Subjects: the MACS Prototype." Paper presented at TEL Milestone
Conference,
National and other libraries have
invested heavily in encyclopedic subject heading languages that offer a
complementary access to their collections. The tasks of creation, management
and maintenance of these subject heading languages require significant
resources, and rely generally on co-operation so that this approach is
naturally considered as a way to extend access to users from other linguistic
areas. Therefore, the CoBRA+ Working Group on Multilingual Subject Access
conducted a feasibility study between Autumn 1997 and February 1999 on linking
headings between the three Subject Heading Languages (SHL's) used in the
Bibliothèque Nationale, Die Deutsche Bibliothek, the Swiss National Library and
the British Library. The SHLs used were RAMEAU, SWD/RSWK and LCSH. As a result
the MACS (Multilingual Access to Subjects) project was set up to develop a
prototype system testing the recommendations and findings of the feasibility
study
Clavel-Merrin, Genevieve. "The Need for Co-operation
in Creating and Maintaining Multilingual Subject Authority Files." Paper presented at the 65th IFLA Council and
General Conference,
In 1997, the Conference of
European National Librarians (CENL) asked Computerized Bibliographic Record
Actions) CoBRA+ to consider the problem of multilingual subject access to
bibliographic databases and conduct a pilot study in French, German and
English. The aim of the study was to establish equivalents between RAMEAU,
SWD/RSWK and LCSH: 1) establish a methodology for the selection and linking of
headings, 2) link headings and analyze the results in the selected subject
areas, 3) see the practical applications of these linked headings by indexing a
test group of titles, 4) compare the indexing of titles in other subject
fields. The study did confirm the following: 1) the number of headings and
subdivision which may be combined and the complexity of the strings which may
result varies from language to language, 2) the number of strings that may be
applied to a document also varies according to the different rules applied.
CORES - A Forum on Shared Metadata Vocabularies. <http://www.ukoln.ac.uk/metadata/cores/>
(
CORES project is funded within
the Information Societies Technology (IST) Programme; managed by the
Information Society Directorate-General of the European Commission. The central
objective of the CORES project is to encourage the sharing of metadata
semantics. CORES will address the need to reach consensus on a data model for
declaring semantics of metadata terms in a machine-readable way. Consensus of
the ground-rules for declaring standard definitions of terms, as well as local
usage and adaptations, will enable the diversity of existing standards to
"play together" in an integrated, machine-understandable Semantic Web
environment. In order to achieve this level of interoperability, CORES will
support applications re-using and adapting terms maintained by key
organizations and standardization initiatives.
For more detailed information,
see the CORES website: http://www.cores-eu.net/
Day, Michael. "Metadata in Support of Subject Gateway
Services and Digital Preservation." Draft version of paper presented at
Electronic Resources: Definition, Selection and Cataloguing,
This paper provides an introduction to two of the
metadata-related projects in which UKOLN has been a partner. It first describes
the development of services known as quality controlled subject gateways and
looks in more detail at the Resource Discovery Network and the EU Renardus
project. It then provides an outline of recent preservation metadata
initiatives and describes the way the OAIS model has been used in the Cedars
project.
DESIRE Information Gateways Handbook
(2000). <http://www.desire.org/handbook> (
This is a thorough guide to
creating a high quality portal or gateway on the Internet. Section 2 of the
handbook covers important decisions to be made when setting up a new gateway
(such as choosing a metadata format, designing a user interface, writing a
selection policy) but also covers issues such as cataloging and resource
discovery. Subject gateways should aim to guarantee high quality
resources and facilitate subject-based access to the collection. Information
gateways are characterized by their creation of third-party metadata
records - individual descriptions of Internet resources held in a database that
have separate fields for different attributes of the resources, such as title,
author, URL, etc. The role of cataloging rules or guidelines is to specify how
the content of a metadata format is entered in accordance with certain rules
and will often include additional features such as classification, subject
analysis and authority control. Once a metadata format is selected, a metadata
content standard needs to be selected or developed to address dates, language
codes, name authority files, and subject information. The use of classification
schemes, keywords and thesauri are central features of the formal resources descriptions
provided by a gateway service. Browsing
(through a directory-like structure) is usually based on subject classification
schemes or thesauri. Classification schemes differ from other subject indexing
systems, such as subject headings and thesauri, by trying to create collections
or related resources in a hierarchical structure. Cross-browsing two or more
gateways is useful, but difficult. Mapping methods can be used, e.g. DESIRE II
and has been tested by ROADS. "As
with cross-browsing using classification schemes, cross-searching only becomes
possible if either of the different catalogs use the same vocabulary or if a
mapping has been done between two or more different schemes." Gateways
need to address the language needs of their audiences. Users may want to search
a multilingual collection by using queries in one language or to retrieve
documents in a number of specific languages, preferably also via an interface
in the language of their choice. There are two issues: the storing, processing,
and presentation of information in many languages; and multilingual search and
retrieval. Each chapter includes a bibliography.
Dhamankar,
R., Lee, Y., Doan, A., Halevy, A., & Domingos, P. (2004). “iMAP:
Discovering complex semantic matches between database schemas.” in SIGMOD
'04: Proceedings of the 2004 ACM SIGMOD international conference on management
of data,
Doerr, M. "Semantic problems of thesauri
mapping." Journal of Digital
Information, vol. 1, no. 8 (
With networked information access to heterogeneous data
sources, the problem of terminology provision and interoperability of
controlled vocabulary schemes such as thesauri becomes increasingly urgent.
Solutions are needed to improve the performance of full-text retrieval systems
and to guide the design of controlled terminology schemes for use in structured
data, including metadata. Thesauri are created in different languages, with
different scope and points of view and at different levels of abstraction and
detail, to accommodate access to a specific group of collections. In any wider
search accessing distributed collections, the user would like to start with
familiar terminology and let the system find out the correspondences to other
terminologies in order to retrieve equivalent results from all addressed
collections. This paper investigates possible semantic differences that may
hinder the unambiguous mapping and transition from one thesaurus to another.
Dunsire, Gordon. "Joined up Indexes: Interoperability
Issues in Z30.50 Networks." Paper presented at the 68th IFLA Council and
General Conference,
The paper discusses issues in the
interoperability of indexes to metadata records in distributed information
retrieval networks, based on the findings of Cooperative Academic Information
Retrieval Network for Scotland (CAIRNS) and Scottish Collections Network
Extension (SCONE) projects. The two have evolved services which together
provide user-driven collection identification and selection mechanisms and the
ability to cross-search related metadata for item discovery and access. The
CAIRNS Cataloguing Issues Working Group identified a number of factors
affecting cross-searching of metadata indexes for authors, titles, subjects and
control numbers, including local cataloging policies, content standards, and
index structures. The
Duval, Erik, Hodgins, Wayne, Sutton, Stuart, and Weibel,
Stuart L. "Metadata Principles and Practicalities." D-Lib Magazine,
8, no. 4 (April 2002). <http://www.dlib.org/dlib/april02/weibel/04weibel.html>
(
The focus of the article is
metadata in general, but some information is apropos to subject analysis. The
use of controlled vocabularies is another important approach to refinement that
improves the precision for descriptions and leverages the substantial
intellectual investment made by many domains to improve subject access to
resources. The Dewey Decimal Classification System, for example, affords a
multilingual classification system long used in traditional library
environments that can be applied to electronic resources as well. There are
hundreds of domain-specific thesauri and classification systems, as well, that
can be imported into the Web metadata architecture to support subject
descriptions. Specifying the use of a particular vocabulary in a given
collection of metadata will allow applications to provide more coherent search
and browsing facilities. It is essential to adopt metadata architectures that
respect linguistic and cultural diversity. However, unless such resources can
be made available to users in their native languages, in appropriate character
sets, and with metadata appropriate to management of the resources, the Web
will fail to achieve its potential as a global information system.]
By elucidating shared principles
and practicalities of metadata, the authors hope to raise the level of
understanding among our respective (and shared) constituents. The ideas in this
paper are divided into two categories a) Principles,
and b) Practicalities.
Eden, Brad. "Metadata and its Application." Library
technology reports, 38, no. 5 (Sept./Oct. 2002): p. 1-77.
This report is a guide to current
metadata standards and their application.
Major standards are included. The report examines: which metadata is
suitable for certain libraries, linking initiatives and how they relate to
metadata, how to use metadata to build an enriched library catalog, how
metadata assists in natural language recognition technology.
Creating metadata is important
because metadata facilitates the discovery of relevant information and
resources. Metadata help identify resources, distinguish among dissimilar
resources, bring similar resources together, allow resources to be found by
relevant criteria and give location information. Metadata promotes
interoperability if accompanied by careful mapping of data elements and
crosswalking of standards. Interoperability shows multiple systems to exchange
data with minimal loss of content and functionality, regardless of different
hardware and software platforms, data structures, and interfaces. The use of
metadata allows resources to be searched seamlessly across networks through
crosswalks and shared transfer protocols. Metadata ensures resources will be
accessible into the future, can provide persistent and unique digital identification,
can track rights and reproduction information, and organize information.
Problems with polysemy (words with multiple meanings), ambiguity of meaning,
and synonymy can all be alleviated by the proper application of metadata,
either manually or through selected harvesting. Interoperability has become the
key shared focus if multiple metadata standards are to survive.
One of the core ideas behind the
Semantic Web is the creation of machine-processable relationships between
resource identifiers (URI's). Two often discussed ways of representing those
relationships are RDF and Topic Maps. A topic is simply a representation of any
subject or concept of interest; it is the 'proxy' of that subject in the topic
map. Topics have characteristics: names of different types, roles played by the
topic in associations with other topics, occurrences, which are resources
pertinent to the topic, also of different types. Topic characteristics can be asserted as
being valid with in a "scope" which acts as a context for assertions.
Topics in a Topic Map each play an identified "role". Topic Maps tend
to start with the 'abstract' and optionally extend to include concrete
resources, whereas RDF tends to start with defining relationships between
concrete resources and optionally building abstract conceptual links between
those relationships.
Fr^ancu,
Summary: The article describes
the research done over a bibliographic database in order to show what impact
the specificity of the knowledge organizing tools may have on information
retrieval. For this purpose two multilingual Universal Decimal Classification
(UDC) based thesauri having different degrees of specificity are considered.
Issues of harmonizing a classificatory structure with a thesaurus structure are
introduced, and significant aspects of information retrieval in a multilingual
environment are examined.
Franklin, Rosemary Aud. "Re-inventing Subject Access for
the Semantic Web". Online Information Review, 27, no. 2 (2003):
94-101.
Second generation web
research is beginning to model subject access with library science principles
of bibliographic control and cataloging. Harnessing the Web and organizing the
intellectual content with standards and controlled vocabulary provides precise
search and retrieval capability, increasing relevance and efficient use of
technology. Current research points to a type of structure based on a system of
faceted classification. This system allows the semantic and syntactic
relationships to be defined. Controlled vocabulary can be assigned, not in a
hierarchical structure, but rather as descriptive facets of relating concepts.
Garrison, William A. "Retrieval Issues for the
The Colorado Digitization Project
(CDP) is a collaborative initiative involving
Geisselmann,
Friedrich. CARMEN. WP12: Cross concordances of classifications and thesauri,
2004. <http://www.bibliothek.uni-regensburg.de/projects/carmen12/index.html.en>
(Jan. 2005)
The goal is to allow an integrated search for subject
aspects in distributed data holdings with different intentional emphases taking
into account the conceptual differences of the applied thesauri and classifications
by cross concordances.
Godby, Carol Jean and Reighart, Ray. "Terminology
Identification in a Collection of Web Resources." Journal of Internet
Cataloging, 4, no. 1 /2 (2001): 49-65.
The primary goal of OCLC's
WordSmith project was to obtain subject terminology directly from raw text. The
hypothesis was that reliable subject terms can be automatically collected,
re-used, and organized into thesaurus-like objects that enhance access to
Internet material that is too time consuming to catalog by hand.
Godby, C. Jean. The WordSmith Indexing System. <http://www.oclc.org/research/publications/arr/1998/godby_reighart/wordsmith.htm>
(Dec. 27, 1999).
The OCLC WordSmith indexing
system uses the results of research in computational linguistics to implement a
series of largely statistical filters to identify descriptive vocabulary in
collections of English-language text of arbitrary subjects.
Godby, Carol Jean and Stuler, Jay. "The Library of
Congress Classification as a Knowledge Base for Automatic Subject
Categorization." Paper
presented at the IFLA Satellite Meeting: Subject Retrieval in a Networked
Environment,
This paper describes a set of
experiments in adapting a subset of the Library of Congress Classification for
use as a database for automatic classification.
A high degree of concept integrity was obtained when subject headings
were mapped from OCLC's WorldCat database and filtered using the log-likelihood
statistic. The project had three goals: 1) to adapt the LCC for use as a
knowledge base for automatically classifying full text, 2) to exploit the LCC's
structure for online subject-oriented browsing, and 3) to make the results of
the work freely available to the library community.
Hardin, Chris. "3 questions: Semantic Interoperability
Defined." ITBusinessEdge, (June 16, 2005). <http://www.dlib.org/dlib/april02/weibel/04weibel.html>
(July 18, 2005).
An example from the business arena for the need for semantic
interoperability among records.
Harken, S.
(2005). SAC subcommittee on semantic interoperability: Introduction/criteria
[draft], 2005. <http://www.und.nodak.edu/dept/library/Departments/abc/SACSEM-Criteria.htm>
(Sept. 25, 2005)
Heery, Rachel, Carpenter, Leona, Day, Michael.
"Renardus Project Developments and the Wider Digital Library
Context." D-Lib Magazine, 7, no. 4 (April 2001). <http://www.dlib.org/dlib/april01/heery/04heery.html>
(Aug. 8, 2002)
A subject gateway provides a
search service to high quality web resources selected from a particular subject
area. This work was informed by earlier modeling work carried out in the
context of Moving to Distributed Environments for Library Services (MODELS). It
is hoped that results of the Renardus work will feed back to the ongoing
development of the MODELS application framework, and also to the Imesh Toolkit
project. The IMEsh Toolkit project is providing subject gateway developers with
a systems framework for an extendable set of interoperable tools and
components.
Enhanced subject access is
considered a key difference offered by subject gateways, and an important part
of the Renardus service will be its attempt to provide some kind of subject
directory browsing service across the participating gateways. In order to
achieve this, a classification scheme has been chose to act as an 'interlingua'
within the Renardus pilot. The scheme chose is the Dewey Decimal Classification
(DDC). Gateways participating in the Renardus system will be invited to map DDC
terms to the subject terms used in their own browse hierarchies. In order to
facilitate this process, the project established a small working group to
prepare guidelines for this work. In addition, the software tool developed as
part of the German CARMEN project has been adapted to facilitate the relevant
workflow. The Renardus browse system will link directly into the subject
hierarchies of individual gateways. If a part of an individual gateway's browse
structure has been mapped to this DDC term, the gateway's name is visibile and
this becomes a hyperlink to the relevant part of the local browse structure. It
relates to work currently taking place within the UK HILT project which is
studying the problem of cross-searching and browsing by subject across a range
of communities, services, and service or resource types. HILT will assist with
consensus building on best practice in the sort to medium term perspective as
regards working with existing or new subjects schemes and thesauri. Renardus
will feed back experience to Network Knowledge Organization Systems/Services
(NKOS), a loose coalition of people and organizations concerned with the use of
knowledge organization systems such as classification systems, thesauri,
gazetteers, and ontologies, to support description and retrieval of resources
via the Web.
A draft Renardus application
profile has been agreed upon to form the basic metadata schema. Definitions of
the semantics of these elements are based, where possible, on the Dublin Core
Metadata Element Set. There is the possibility of expanding the scope of the
Renardus search service to the end-user. One proposal suggests that it would be
possible to combine a brokered gateway service with Web indexes based on
harvesting techniques. Within Renardus they intend to explore the possible
benefits of collaborative cataloging for creating metadata about web resources.
There may no longer be a need to duplicate metadata describing the same
resource in so many locations, rather original metadata will be created and
further enhancements to that metadata will be linked to an original
authoritative metadata instance. One possible methodology to achieve this is to
use XML/RDF annotations. Within Renardus they may explore linking local
metadata enhancements to metadata residing in a central 'union catalog'.
Heery, Rachel and
Wagner, Harry (2002). "A Metadata
Registry for the Semantic Web." D-Lib Magazine, v. 8, no. 5 (May
2002). <http://www.dlib.org/dlib/may02/wagner/05wagner.html>
(May 17, 2002).
The article primarily deals with
schema registries. Registries essentially provide an index of terms. RDF
provides the basis for declaring the schema in use. Work is underway to add
richness and fullness to the schema language, a) Web Ontology Group, and b)
Ontology Interface Lay (OIL) http://www.ontoknowledge.org/oil The Dublin Core Metadata Initiative (DCMI)
has defined a relatively small set of data elements (referred to within the
CDMI as the DCMI vocabulary or DCMI terms) for use in describing Internet
resources as well as to provide a base-line element set for interoperability
between richer vocabularies. The aim was to enable registration, discovery, and
navigation of semantics as defined by DCMI. Two of several goals: 1) automating
identification of relationships between terms in vocabularies, 2) be
multilingual. Tried several prototypes including using the Extensible Open RDF
Toolkit (EOR) for database management and Extensible Stylesheet Language
Transformation (XSLT) for the user interface. A multi-lingual schema language
must always be identified when registering a schema; it helps enable discovery
and navigation; multi-lingual interface is accomplished using XSLT 'translate'
stylesheet. (Relational databases don't support good performance)
Himanka,
J. and Kautto, V. Translation of the
Finnish abridged edition of UDC into general Finnish subject headings.
International Classification, 19,
no. 3 (1992): 131-4+.
HILT Project Overview. <http://hilt.cdlr.strath.ac.uk/About-HILT/overview.html>
(March 26, 2002).
The project is jointly funded by
the RSLP and the JISC. The purpose of the first-year of the project was to
study and report on the problem of cross-searching and browsing by subject
across a range of communities, services, and service or resource types. Phase
II aims to move the findings of Phase I into a "Pilot Project" stage.
The project encompasses partners and stakeholders from a wise range of
communities including archives, museum and libraries, amongst others.
Hudon, Michele. "Multilingual Thesaurus Construction:
Integrating the View of Different Cultures in One Gateway to Knowledge and
Concepts," in Knowledge Organization, v. 24, no. 2 (1997): 84-91.
Focuses on the social/political
aspects of treating multiple languages in egalitarian fashion, along with the
technical implications.
Hunter, Jane. "MetaNet - a Metadata Term Thesaurus to
Enable Semantic Interoperability between Metadata Domains," JoDI.
v. 1, no. 8 (Feb. 2001). <http://jodi.ecs.soton.ac.uk/Articles/v01/i08/Hunter/>
(Feb. 17, 2005)
Abstract
Metadata interoperability is a fundamental requirement for
access to information within networked knowledge organization systems. The
Harmony international digital library project has developed a common underlying
data model (the ABC model) to enable the scalable mapping of metadata
descriptions across domains and media types. The ABC model provides a set of
basic building blocks for metadata modeling and recognizes the importance of
'events' to describe unambiguously metadata for objects with a complex history.
To test and evaluate the interoperability capabilities of this model, we
applied it to some real multimedia examples and analysed the results of mapping
from the ABC model to various different metadata domains using XSLT. This work
revealed serious limitations in the ability of XSLT to support flexible dynamic
semantic mapping. To overcome this, we developed MetaNet, a metadata term
thesaurus which provides the additional semantic knowledge that is non-existent
within declarative XML-encoded metadata descriptions. This paper describes
MetaNet, its RDF Schema representation and a hybrid mapping approach which
combines the structural and syntactic mapping capabilities of XSLT with the
semantic knowledge of MetaNet, to enable flexible and dynamic mapping among
metadata standards.
Huxley, Lesly, Carpenter, Leona, Peereboom, Marianne. Collaborative
Systems and Tools: Renardus Case Study. (2002) Abstract, http://www.internet-librarian.com/presentations/huxley.pdf>
No longer available.
Renardus builds on existing
trends towards greater collaboration, standardization, and interoperability
between information services. The ability to cross-search and particularly to
cross-browse participating gateways' records led to development of tools to
support the integration and 'sensible' presentation of records from a wide
range of services, each using unrelated classification systems and data models,
providing interfaces and data in different languages, based on different
technical solutions.
IFLA. Classification
and Indexing Section, Division of Bibliographic Control. Newsletter, 27
(May 2003).
Sect. 2.2 Changing Roles of
Subject Access Tools describes several projects: a) FAST, faceted Library of
Congress Subject Headings; b) UDC implementation (UK) - role of classification
in information retrieval systems to serve as an underlying knowledge structure
to provide systematic subject organizations and thus complement the search
using natural language terms; c) SWD/RSWK (SZ) after 5 years. Dewey Decimal
Classification is being translated into German and is being used for the ePrint
IFLA.
Classification and Indexing Section. Working Group on Multilingual Thesauri. Guidelines for Multilingual Thesauri. http://www.ifla.org/VII/s29/pubs/Draft-multilingualthesauri.pdf
(Apr. 20, 2005)
The IFLA Working Group on Guidelines for Multilingual
Thesauri started
to prepare this document in 2002. The objective of the
document is to
add to the existing Guidelines for Multilingual Thesauri as
worded in
the ISO-standard for multi-lingual thesauri (ISO-5964-1985)
or in
handbooks on thesaurus building, such as Aitchison et
al.,(2000). The
general principles for the building of monolingual thesauri
are
assumed.
There are three approaches in the development of
multilingual thesauri:
1. building a new thesaurus from the bottom up
. starting with one language and adding another language or
languages
. starting with more than one language simultaneously
2. combining existing thesauri
. merging two or more existing thesauri into one new
(multilingual)
information retrieval language to be used in indexing and
retrieval
. linking existing thesauri and subject heading languages
to each
other; using the existing thesauri and/or subject heading
languages
both in indexing and retrieval
3. translating a thesaurus into one or more other
languages.
IFLA. Section on Classification and Indexing, Division of
Bibliographic Control. Newsletter, 24 (Dec. 2001).
Czechia
More detailed subject access to documents to
get a piece of information has become the vital need in the online environment
where the best solution seems to be combination of keywords with a controlled
vocabulary. Merging many external documents into the database of Union
Catalogue gives rise to discrepancies between index terms (lexical units),
application syntax and hierarchical structure of original indexing systems.
Subject authority file: a) an
integrated indexing and retrieval tool, in which the verbal terms of a
thesaurus (controlled vocabulary) are combined with equivalent notations of a
classification scheme (e.g. UDS); it enables subject access to documents either
via verbal terms (searching) or through the classification notation) browsing;
b) application of this integrated tool in online (Web) environment may support
automatic indexing and classification of web resources; in this case would be
very useful to apply such verbal expressions and UDC notations that are
reflecting real situations. Since subject access depends on national languages
… it was difficult to find and apply any international recipe. After much
debate LCSH system has been finally chosen. However it was considered useful at
that time to meet local needs and requirements as well, so some modifications
of LCSH were formulated such as: direct form of geographical subdivisions, form
subdivisions were made separate headings, used generic headings for classes of
persons or types of corporate bodies more often, etc.
RAMEAU is not the subject
authority file of the Bibliotheque nationale de France, but the common French
indexing language. We are classifying our RAMEAU subject headings in about
sixty broad subject fileds, named RAMEAU Domains, which are more or less
arranged on the basis of DDC numbers. This work is partly done thanks to an
automatic mapping between call numbers and subject indexing. t will allow to
propose thematic views of RAMEAU and to provide consistent files of headings
for our multilingual subject access project MACS.
Royal Library in
Imesh Toolkit, 2002. <http://www.imesh.org/toolkit> (Aug.
6, 2002).
The Imesh Toolkit project evolved
out of discussions within the Imesh community which was set up to encourage
international collaboration amongst subject gateways. The project will build on
existing subject gateway software to develop a configurable, reusable, and
extensible toolkit for subject gateway providers.
The project plan: a) manual
selection, description and classification; b) a structured record format; c)
some search and retrieve protocol; d) mechanism for routing queries between
gateways. For this reason, in the subject gateways review interviews are
restricted to the needs of the Renardus definition of quality controlled
subject gateways. It is a subject-based resource discovery guide which provides
links to information resources (documents, collections, sites or services),
predominantly accessible via the Internet, and applies a documented set of
quality measures to support systematic resource discovery. It is also managed,
collected by humans according to documented selection criteria, with maintenance
criteria, with a fixed metadata set and controlled subject classification. It
will eventually be a broker system for simultaneous access to
quality-controlled subject gateways and other Internet-based, distributed
services.
Current and Possible Future
Technologies and Standards: a) Z39.50 is the protocol of choice for the
majority of the services; b) Whois++ is a very simple search and retrieval
protocol which provides a profile and a protocol at once; c) LDAP is
light-weight directory access protocol. XML offers the possibility of combining
QSBIG records with other non-QSBIG sources. XML is not sufficient on its own;
analogously to Z39.50 requiring a profile for interoperability, XML requires a
syntax upon which to be agreed. Some form of DC/RDF/XML protocol was strongly
supported in the Renardus survey. SOAP
is a remote procedure call proposal which uses XML and http as the carrying
mechanism. Queries are couched in XML and results are received in XML.
IMesh Toolkit.
The IMesh Toolkit project evolved
out of discussions aimed at encouraging international collaboration amongst
subject gateways and subject-based resource discovery services. To include:
Resource collection, cataloging, management and discovery (e.g. academic
guides, virtual libraries and subject gateways); sharing technical, marketing,
standards and cataloging effort, investigating cross-searching, cross-browsing
and developing standards for related software and information issues.
IMesh
Toolkit: an architecture and toolkit for distributed subject gateways. <http://www.imesh.org/toolkit> (Aug.
6, 2002).
The project will build on exiting
subject gateway software to develop a configurable, reusable and extensible
toolkit for subject gateway providers.
IMesh
Toolkit: subject gateway requirements. <http://www.imesh.org/toolkit/work/requirements>
(Aug. 6, 2002).
The objective of this work
package is to ensure that the IMesh toolkit architectural and technical
strategies are well-grounded in documented needs and practical requirements of
subject gateways.
IMesh
toolkit: General architectural overview of the IMesh Toolkit. <http://www.imesh.org/toolkit/work/architecture/notes.php3>
(Aug. 9, 2002).
Focuses on discussion of how to
achieve interoperability for the IMesh toolkit, particularly in regards to
architecture and functionality of query languages, etc.
IMesh
toolkit: architecture. <http://www.imesh.org/toolkit.work/architecture>
(Aug. 6, 2002).
Architectural diagram.
Information and Documentation - a Reference Ontology for
the Interchange of Cultural Heritage Information (2002). ISO/CD
21127. <http://www.niso.org/international/SC4/n491.pdf>
(Oct. 27, 2002).
The primary purse of ISO 21127 is
to offer a conceptual basis for the mediation of information between cultural
heritage organizations such as museums, libraries, and archives. The standard
aims to provide a common reference point against which divergent and
incompatible sources of information can be compared and, ultimately,
harmonized. It is designed to be explanatory and extensible rather than
prescriptive and restrictive. Consequently, the model has been formulated as an
object-oriented semantic model, which can easily be converted into other
object-oriented models. All cross-references and inheritance of properties are
explicitly resolved. The exchange of information relevant to museum collections
with libraries and archives falls within the scope of the standard.
ISO 2788-1986. Documentation - Guidelines for the Establishment
and Development of Monolingual Thesauri. <http://www.nlc-bnc.ca/iso/tc46sc9/standard/2788e.htm>
(July 1, 2002).
Iyer, H.,
& Giguere, M. D. “Towards designing an expert system to map mathematics
classificatory structures”. Knowledge Organization, 22, no. 3-4 (1995), 141-147.
Janee, Greg, Satoshi Ikeda, Linda L. Hill. The ADL
Thesaurus Protocol. 2003. <http://alexandria.sdc.ucsb.edu/~gjanee/thesaurus/specification.html>
(April 9, 2003).
The document describes an XML-
and HTTP-based protocol for accessing thesauri: structured, controlled
vocabularies of words and phrases that represent conceptual categories. The
protocol is intended to allow programmatic clients to easily access and
utilized existing thesauri, and thus the services offered by the protocol are
oriented around querying thesauri and navigating within thesauri. The protocol
does not support creation, maintenance, or sharing of thesauri, or mapping
between thesauri. It does address the term that represents a
conceptual category which may have a scope note. Terms may be preferred or
nonpreferred. It includes the reciprocal term relations of narrower, broader,
related, use (use instead) and used-for. Eight XML formats are used. The
hierarchy feature describes the hierarchy of terms above (broader) or below
(narrower) including the starting term itself. Operators include "equals",
"contains-all-words", "contains-any-word",
"matches-regexp" (a perl-like regular expression). The protocol
provides five independent, stateless services which are invoked over the HTTP
protocol.
Koch, Traugott and Neuroth, Heike. Classification
Mapping for Cross-browsing in the European Subject Gateway Broker Renardus. Presentation
at the NKOS workshop at JDL, June 28, 2001. <http://www.lub.lu.se/tk/renardus/NKOS01-pres.htm>
(Nov. 7, 2002).
Koch, Traugott (2001). Controlled Vocabularies, Thesauri
and Classification Systems Available in the WWW . DC Subject, 2001. <http://www.lub.lu.se/metadata/subject-help.html>
(July 29, 2002).
Lists a large number available on
the web.
Koch, Traugott (2000). Quality-controlled Subject
Gateways on the Internet, 2000. <http://www.lub.lu.se/tk/demos/Sgin.html>
(Aug. 8, 2002).
This paper summarizes DESIRE
approach, software solutions, cooperative subject gateway projects, broker
architectures, metadata mapping and cross-searching, browsing structure in a
subject gateway, and classification mapping and cross-browsing problems and
issues. "Quality-controlled subject gateways" are Internet services
which apply a rich set of quality measures to support systematic resource
discovery. Considerable manual effort is used to secure a selection of
resources which meet quality criteria and to display a rich description of
these resources with standards-based metadata. Regular checking and updating
ensure good collection management. A main goal is to provide a high quality of
subject access through indexing resources using controlled vocabularies and by offering
a deep classification structure for advanced searching and browsing.
Koch, Traugott.
"The Renardus Broker: a '
Koch, Traugott, Neuroth, Heike, and Day, Michael.
"Renardus: Cross-browsing European Subject Gateways via a Common
Classification System (DDC)." Paper
delivered at the IFLA satellite meeting: Subject Retrieval in a Networked
Environment, OCLC,
The paper presents the approach
and first results of the classification mapping process in the EU project
Renardus. The outcome is a cross-browsing feature based on the Dewey Decimal
Classification (DDC) and improved subject searching across distributed and
heterogeneous European subject gateways. The project aims to develop a
Web-based service to enable searching and browsing across a range of
distributed European-based information services designed for the academic and
research communities - and in particular those services known as subject
gateways. Predecessor projects like the EU project DESIRE have already
developed solutions for the description of individual resources and for
automatic classification at the level of an individual subject gateway using
established classification systems. Renardus intends to develop a service that
can cross-search and cross-browse a number of distributed subject gateways
through the use of a common metadata profile and by the mapping all locally-used
classification schemes to a common scheme.
Kriewel, Sascha, and others. "DAFFODIL - Strategic
Support for User-oriented Access to Heterogeneous Digital Libraries." D-Lib
Magazine, 10, no. 6 (June 2004) <http://www.dlib.org/dlib/june04/kriewel/06kriewel.html>
(June 2004).
DAFFODIL (Distributed Agents for
User-Friendly Access to Digital Libraries) is a search system for digital
libraries aiming at strategic support during the information search process. It
is a system for integrated search with the heterogeneous digital libraries of a
scientific community with merging of results. It combines browsing and
searching strategies in a natural way. It uses a classification tool which
provides users with access to a hierarchical, topic oriented representation of
the search domain. It allows the browsing of classification schemes like the
ACM Computing Classification system. The thesaurus tool can be used to get more
general or more specific terms (hypernyms or hyponyms), or semantic definitions
for a search term. Subject specific and web-based thesauri are used for finding
related terms. The resulting terms can then be used in other tools for further
queries.
Kuhr, Patricia S. "Putting the World Back Together:
Mapping Multiple Vocabularies into a Single Thesaurus." Paper delivered at
the IFLA satellite meeting: Subject Retrieval in a Networked Environment, OCLC,
This paper describes an ongoing
project by the H.W. Wilson Company in which the subject headings contained in
twelve controlled vocabularies covering multiple disciplines from the
humanities to the sciences and including law and education among others are
being collapsed into a single vocabulary and reference structure.
Kunz, Martin. "Subject Retrieval in Distributed
Resources: a Short Review of Recent Developments." Paper presented at the 68th IFLA Council and
General Conference, Aug. 18-24, 2002. <http://www.ifla.org/IV/ifla68/papers/007-122e.pdf>
(Oct. 27, 2002).
Subject searching across
distributed resources is a current challenge when carrying out online searches
for bibliographic data. The construction of portals for comparable sources is
only the first step; the subsequent navigation of disparate search interfaces
still presents problems. Both broad and specialist vocabularies exist. If
retrieval is to be improved, there must be some adaptation of these differing
resources. There are techniques for
relating various subject terminologies, but they have their problems and
limitations. Whether you call it a cross-concondance or a crosswalk, it is
about creating links between equivalent terms describing similar concepts in
two (or more) thesauri AND it is about affiliation of documentary languages.
New developments in MACS, CARMEN, and Economics cross-concordance are
discussed.
One part of the CARMEN Project
concerns itself with the association of the thesaurus of the Informationszentru
Sozialwissenschaften (IZT) with the SWD. Starting from alphabetical lists which
contain the keyword material from a specific subject area, the relationships
between the two thesauri are determined intellectually and recorded in a link
management system.
The aim of MACS is to study the
links between the three extensive subject heading authority files - LCSH,
RAMEAU, and SWD. The immediate objective is to indicate in each authority file
the equivalent preferred descriptors of the other authority files for a few
chosen subject areas. The process being developed for MACS will not affect the
structure of the individual national authority files. It uses
intellectually-determined equivalencies to link the content of the
bibliographic databases which use a controlled vocabulary to describe their
content and present in an ordered, structured way. MACS is based on the
assumption that the users accesses the results of intellectually assigned subject
descriptions via a thesaurus. Thesauri can distinguish to a better and more
comprehensive degree between material to be indexed than a method based on
syntactical indexing.
Kwasnik, Barbara H. and Rubin, Victoria L. "Stretching
Conceptual Structures in Classifications across Languages and Cultures," Cataloging
& Classification Quarterly, v. 37, no. 1 / 2 (2003): 33-47.
Summary: The authors describe the
difficulties of translating classifications from a source language and culture
to another language and culture. To demonstrate these problems, kinship terms
and concepts from native speakers of fourteen languages were collected and
analyzed to find differences between their terms and structure and those used
in English. At issue are vocabulary, syntax, and semantics. In harmonizing
classification schemes across languages and culture, one must address the way
these terms are bound up in knowledge representations.
Landry, Patrice (2000).
"The MACS Project: Multilingual Access to Subjects (LCSH, RAMEAU,
SWD)." Paper presented at the 66th IFLA Council and General Conference,
A report on the progress of the
project during the previous year. Based on the final report of the CoBRA+
working group on multilingual subject access, the importance of co-operation in
the quest for multilingual subject access was stressed. The goal is to allow the user to conduct a
subject search in catalogs in their preferred language. The link management
software should have a file management and a maintenance structure that allows
data to be easily added and amended. The prototype should provide for any user
the possibility to choose a source language and or more target catalogues. The
Link Management Interface should only be accessed by the partner libraries to
add and to manage the links between the different subject heading lists. The
Search Results screen shows which links have been made to a particular subject
heading in the focus subject heading list. The View Link function is primarily
an editorial function. From this screen, a term (authority) or the link can be
modified. The Search Interface was designed to give the library users the
possibility of using their preferred subject heading list and doing their
search in the catalogs of one or many libraries. The Browse button will show
all the headings where a particular heading term is used and the links to these
headings. The library user can access the full bibliographic record by clicking
on the title. The interface will retrieve the bibliographic record in the
selected library and will display the record in the bibliographic format used
by that library.
Lauser, Boris, and others. "A Comprehensive Framework
for Building Multilingual Domain Ontologies: Creating a Prototype Biosecurity
Ontology." Paper presented at Proceedings of the International Conference
on Dublin Core and Metadata for e-Communities, 2002: 113-123 <http://www.bncf.net/dc2002/program/ft/paper13.pdf>
or <http://www.bncf.net/dc2002/program/papers.html>
(2002)
This paper presents ongoing work in
establishing a multilingual domain ontology for a biosecurity portal. The
project is embedded into the bigger context of the Food and Agriculture
Organization (AOS) project of the FAO. The paper focuses on introducing a
comprehensive, reusable framework for the process of semi-automatically
supported ontology evolvement. An extendable layered ontology modeling approach
will address multilinguality issues. In the context of the AOS, an ontology is
a system of terms, the definition of these terms and the specification of
relationships between the terms. It extends the approach of classical thesauri
by providing the opportunity of creating an infinite number of different
semantic relationships. Semantic robustness towards representational changes,
as well as multilingualism, are crucial for the development of the domain
ontology. Therefore, they distinguish between terms, and the concepts these
terms represent. These are called
Lexical Entries with two attributes, the concept it refers to and its language.
RDFS (http://www.w3.org/TR/rdf-schema/#intro) is used to define vocabularies of
resources and relationships amongst them. Using several tools a list of terms
is developed. This is combined with terms in AGROVOC, a multilingual
agricultural thesaurus, in which all terms have also been converted to
concepts. Hence automated and manual processes have been used to create a
single ontology which is reviewed by specialists.
Lee, Jonghoon, Dubin, David S., Kurtz, Michael J.
"Co-occurrence Evidence for Subject Vocabularly Reconciliation in ADS
Databases," in ASP Conference Series, vol. 172, Astronomical Data
Analysis Software and Systems, VIII, 1999. <http://monet.astro.uiuc.edu/adass98/Proceedings/leej/>
(Sept. 27, 2002)
Reports on a project to reconcile
heterogeneous indexing vocabularies in the NASA Astrophysics Data System (ADS)
which mixes controlled vocabularies and keywords. The mixture of different descriptor
vocabularies in ADS defeats the standardization goal, and the merging of the
abstract and key word indexes limits the search precision function of the
subject indexing. Descriptors representing identical concepts can stand in
several different relationships to each other.
A project at the
Lee, Maria, Stewart Baillie, Jon Dell'Oro (1999).
"TML: a Thesaural Markup Language." Paper presented at Proceedings of
the 4th Australasian Document Computing Symposium,
Thesauri are used to provide
controlled vocabularies for resource classification. Their use can greatly
assist document discovery because thesauri mandate a consistent shared
terminology for describing documents. A particular thesaurus classifies documents
according to an information community’s needs. As a result, there are many
different thesaural schemas. This has led to a proliferation of schema-specific
thesaural systems. In their research, they exploit schematic regularities to
design a generic thesaural ontology and specify it as a markup language. The
language provides a common representational framework in which to encode the
idiosyncrasies of specific thesauri. This approach has several advantages: it
offers consistent syntax and semantics in which to express thesauri; it allows
general purpose thesaural applications to leverage many thesauri; and it
supports a single thesaural user interface by which information communities can
consistently organize, store and retrieve electronic documents.
An ontology, in computer science, has come to denote an
explicitly specified conceptualization of part of the world. In software, an
ontology is implemented as a data structure. What distinguishes the ontology
from the data structure is semantics: that it talks about something in the
world. An ontology provides users with a representation which is essential to
effective communication and coordination.
The general thesaural ontology gives us a conceptual
representation of thesauri. A thesaural markup language (TML) manifests this as
a grammar in which to express the content and structure of specific thesauri.
TML is specified as an XML schema which defines the permitted markup element
types and embedding structure. The TML syntax consists of the element names and
structure.
TML provides a way to represent task-domain specific
thesauri and make them available to a document management system. In order to
demonstrate this generality, the authors developed a Thesaural Explorer
application. The Explorer reads a thesaurus from its TML file, presents it
graphically, and supports browser style term navigation. The user selects a
thesaurus to explore and then can navigate the structure along inter-term
relations by clicking on terms or using various look up tables such as ordered
lists by class, term alphabetic, and browsing history.
Library of Congress Portals Applications Issues
Group. "List of Portal Application Functionalities for the Library of
Congress", 2003. <http://www.loc.gov/catdir/lcpaig/>
; <http://www.loc.gov/catdir/lcpaig/portalfunctioanlitieslist4publiccomment1st-03.pdf>
(June 3, 2003).
The list represents
the results of market analysis to study portal functionality of particular
products. Functionalities include: a) general requirements, b) client
requirements, c) searching and search results, d) knowledge database, e) patron
authentication, and f) portal administration and vendor support. One aspect of
a portal is its database and the subject metadata used within it and its
maintenance. LCPAIG focused its explorations and testing on portals as tools
for organized knowledge discovery rather than as enterprise interfaces. Portals
may be characterized by their ability to: a) assist users in identifying and
selecting appropriate target resources, b) help users determining the target
resources most useful to their research by providing effective search
interfaces and an architecture that supports groupings and rich descriptions of
resources, c) provide federated searching and information retrieval of
descriptive metadata from multiple, diverse target resources, including but not
limited to commercial or licenses electronic resources, databases, Web pages,
and library catalogs, d) integrate and manage search results, e) save and
export search results, f) link search results to full-text or other content
delivery options, g) manage access to target resources and portal functionalities
for authenticated users.
Several relevant
points: The vendor must maintain descriptive metadata and configuration
information for core target databases, including target title or name, subject
terms, etc. Ability to locally define and configure composite search qualifier
groupings, e.g. name/author. Ability for user to search descriptive metadata in
multiple metadata forms. Ability for user to search by specific fields in
advanced searches. Should support keyword and browse searches, including: a)
ability to browse a list of targets, b) ability to search target descriptions
by keyword, c) ability to present different views of targets (e.g. by subject,
user group, etc.), d) ability to brows target resources in hierarchical
displays, e) ability to browse a composite list of target resources (aggregated
databases), f) ability to present different views of the target resources.
Ability to integrate metadata for target resources from more than one source.
Lin, Dekang and Pantel, Patrick (2001). "Induction of
Semantic Classes from Natural Language Text", in KDD-2001, Proceedings
of the seventh ACM SIGKIDD International Conference on Knowledge Discovery and
Data Mining, Aug. 26-29, 2001, San Francisco, Calif., 317-322. <http://www.acm.org/sigkdd/kdd2001>
(Oct. 26, 2002).
Lovins, Daniel. Thesaurus Design for Semantic
Information Management. a day-long seminar led by Prof. Bella
Hass-Weingberg in
Published: Cataloging and Classification Quarterly,
34, no. 4 (2003) <http://catalogingandclassificationquarterly.com/ccq36nr1news.html
>
Bella suggested that
"semantic information management", really just means vocabulary
control; that ontology usually just means classification scheme, but sometimes
gets used as a synonym for thesaurus, and the taxonomy is just a synonym for
classification. Subject headings lists, such as LCSH are essential tools for
managing information in a print environment, while true thesauri are often more
useful in the online environment (where they can be viewed hierarchically or
combined in Boolean searches) Thesauri often run into the problem of needing to
distinguish homographs. The problem in the selection of thesaurus terms is
largely one of determining a set of appropriate lexemes, that is, the smallest
units of lexicon that can be understood on their own terms. Synonymy is a
common problem, though easily manage, e.g. Cancer, see Neoplasm. Other
problems: having to choose between singular and plural, parts of speech, etc.
MACS: Multilingual Access to Subjects, 2002.
<http://infolab.kub.nl/prj/macs>
(Mar. 26, 2002).
MACS aims to provide multilingual
subject access to library catalogues. It enables users to simultaneously search
the catalogues of the project's partner libraries in the language of their
choice (English, French, German) Partners are: Swiss National Library (SNL),
Bibliotheque nationale de France, British Library (BnF), Die Deutsche
Bibliothek (DDB), and it is running under auspices of Conference of European
National Librarians (CENL). This multilingual search is made possible thanks to
the equivalence links created between the three indexing languages used in
these libraries: SWD , RAMEAU, LCSH. Topics (headings) from the three lists are
analyzed to determine whether they are exact or partial matches, of a simple or
complex nature. The end result is neither a translation nor a new thesaurus but
a mapping of existing and widely used indexing languages.
MACS (Multilingual Access to Subjects) Project, report for
2000-2001. <http://infolab.kub.nl/prj/macs/pub/MACSreport3.pdf>
(Aug. 7, 2002)
MACS is a cooperative Conference
of European National Libraries (CENL) project to develop a prototype system for
providing multilingual subject access searching between the catalogs of the
partner libraries to: 1) research the technical and organizational issues
involved in managing a working system for creating and maintaining links
between the three subject headings lists (SHL), and 2) demonstrate the
effectiveness of the linked SHLs for retrieving results for the end-user. The
CoBRA study group defined a specific approach to mapping headings based on a
number of core principles including: 1) all SHLs are equal, 2) headings are
only mapped to equivalent headings judged to be synonymous in meaning, 3)
hierarchical structures and thesaural relationships are not mapped or
reproduced as part of the process of linking individual headings, 4) only
headings at the authority level are linked, 5) where an equivalence cannot be
found a proposed heading should stand alone in the system to represent the
concept (for future possible mapping). Items are cataloged in the local
library's language and SHL. Hierarchical navigation is only possible within
each SHL, so it is envisaged that searches are refined by the user in his own
language until the required concept is identified and then expanded for
linguistics equivalences and documents in other libraries. Two interfaces
proposed: 1) A Link Management Interface to support management of the links,
their creations, and maintenance; 2) User Search Interface to support end user
searching and links to the partners' catalogs. Partners share equal
responsibility for authorization of links and validation of links proposed to
their own SHL. MACS is to be an external link database, with each SHL remaining
independent and linked to other SHLs only through MACS.
Mai, Jens-Erik. "The Future of General
Classification." Cataloging
& Classification Quarterly, v. 37, no. 1 / 2 (2003): 3-31.
Summary: Discusses problems
related to accessing multiple collections using a single retrieval language.
Surveys the concepts of interoperability and switching language. Finds that
mapping between more indexing languages will always be an approximation.
The paper treats the issues
related to subject representation and focuses on the use of general
classification schemes for accessing documents across domains and collections.
The goal of iinteroperability is to build coherent services for users, from
components that are technical different and managed by different organizations.
This requires agreements on three levels: technical, content, and
organizational. The problem is using switching languages is in mapping meaning
of words in context of the language. Mapping will always be an approximation
due to pre-coordination, hierarchical structure, and the absence of concepts to
match.
Maniez, Jacques. "Database Merging and the
Compatibility of Indexing Languages," in Knowledge Organization,
24, no.4 (1997): 213-224.
This article contains succinct
and critical descriptions of concordance tables, switching languages, and
reference languages, and their usability in the harmonization of information
languages.
McKiernan, Gerry. Beyond Bookmarks: Schemes for
Organizing the Web, 2001. <http://www.public.iastate.edu/~CYBERSTACKS/CTW.htm>
(Aug. 6, 2002).
Schemes for Organizing the Web is
a clearinghouse of World Wide Web sites that have applied or adopted standard
classification schemes or controlled vocabularies to organize or provide
enhanced access to Internet resources. Coovers Classifications systems:
Alphabetic, Numeric, Alphanumeric; and Controlled vocabularies
Medical Subject Authority in OCLC: Background and Resources. Informal
discussion during ALA Midwinter 2002, January 18, 2002. <http://corc.oclc.org/WebZ/XpathfinderQuery?sessionid=0:term=3049:xid=LTM>
(March 26, 2002).
An OCLC pathfinder listing
resources dealing with inclusion of medical subject heading authority records
in OCLC services.
MetaSearch Initiative. <http://www.niso.org/committees/MetaSearch-info.html>
(May 10, 2003).
Metasearch, parallel search,
federated search, broadcast search, cross-database search, search portal have
become commonplace in the information community's vocabulary. They speak to a
common theme of allowing search and retrieval to span multiple databases,
sources, platforms, protocols, and vendors at once.
One-search access to multiple
resources holds the promise of enabling libraries to offer portal environments
so their users can enjoy the same easy searching found in web-based services
like Google.
Michel,
Shows associative, equivalence,
and hierarchical relationships.
Miles, Alistair and Brickley, Dan. SKOS Core Guide. <
http://www.w3.org/TR/swbp-skos-core-guide/> (Aug. 25, 2005).
SKOS stands for
Simple Knowledge Organisation System. The name SKOS was chosen to emphasise the
goal of providing a simple yet powerful framework for expressing knowledge
organisation systems in a machine-understandable way.
A 'concept scheme' is
defined here as: a set of concepts, optionally including statements about
semantic relationships between those concepts. Thesauri, classification
schemes, subject heading lists, taxonomies, terminologies, glossaries and other
types of controlled vocabulary are all examples of concept schemes.
SKOS Core provides a model for
expressing the basic structure and content of concept schemes (thesauri,
classification schemes, subject heading lists, taxonomies, terminologies,
glossaries and other types of controlled vocabulary).
The SKOS Core Vocabulary is an
application of the Resource Description Framework (RDF),
that can be used to express a concept scheme as an RDF graph. Using RDF allows
data to be linked to and/or merged with other RDF data by semantic web
applications.
This document is a guide using
the SKOS Core Vocabulary, for readers who already have a basic understanding of
RDF concepts.
See also Quick Guide to Publishing a Thesaurus on the
Semantic Web http://www.w3.org/TR/swbp-thesaurus-pubguide/
See also the SKOS Core
Vocabulary Specification http://www.w3.org/TR/swbp-skos-core-spec
Miller, Libby, Brickley, Dan and Hamilton, Martin. Imesh
Tk: Subject Gateway Review Literature Review, 2002. <http://www.ilrt.bris.ac.uk/discovery/2000/09/imesh/>
(Aug. 6, 2002)
The goal of the literature review
is: a) to try to define the scope of the IMesh Toolkit, b) its purpose -
improve speed of searching, c) enable cross-searching more easily between
gateways, or enable portalization of gateways, d) draw together existing
research, e) summarize current and possible future technologies, f) form
preliminary conclusions about possible archictures which could be used in IMesh
Toolkit.
Miller, Ken and Matthews, Brian. "Having the Right
Connections: the LIMBER Project." JoDi, 1, no. 8 (Aug. 2001). <http://jodi.ecs.soton.ac.uk/Articles/v01/i08/Miller/>
(Aug. 2, 2002).
Cross-discipline interoperability
will be provided via a uniform metadata description. In addition, the provision
of multilingual user interfaces and the controlled vocabulary of a
multi-lingual thesaurus will make these datasets globally accessible in a range
of end-user natural languages. LIMBER will use the multi-lingual European
Language Social Science Thesaurus (ELSST) derived and translated from HASSET.
Tools developed in LIMBER will work with any thesaurus marked up in the LIMBER
RDF format, and the semi-automatic indexing tool will apply keywords from these
thesauri to any metadata record marked up in either XML or RDF. LIMBER will
still be able to provide multi-lingual interfaces to thesaurus-aided searching
across domains, using thesauri conforming to the LIMBER RDF schema and
retrieving metadata mapped to the Dublin Core with assigned keywords translated
back to the user's native language, the underlying metadata having been
semi-automatically indexed by terms from the conforming thesauri. The project
plans to develop a high-level object-oriented conceptual model that could be
translated in whichever format becomes internationally accepted. All screens
and drop-down menus will be available in German, French, Spanish and English to
begin with, but defined in a standard format that can easily be translated to
other languages in the future. LIMBER is designed as three stand-alone
products: 1) multi-lingual thesaurus management tool, 2) user browsing
interface, 3) semi-automatic indexing tool.
Miller, Joseph.
"An Overview of Subject Cataloging and the Absence of a Code."
Presented at ARLIS/NA Annual Conference,
Subject cataloging deals with
what a book or other library item is about, and the purpose of subject
cataloging is to list under one uniform word or phrase all the materials on a
given topic that a library has in its collection. A subject heading is that
uniform word or phrase used in the library catalog to express a topic. The use
of authorized words or phrases only, with cross-reference from unauthorized
synonyms, is the essence of bibliographic control in subject cataloging.
Miller, Paul. "I Say What I Mean, but Do I Mean What I
Say?" Ariadne, 23 (2000).
<http://www.ariadne.ac.uk/issue23/metadata/>
(Aug. 7, 2002).
Addresses: 1) issues surrounding
the use of controlled vocabulary, 2) recent MODELS 11 workshop, 3) some
recommendations for future work. First, there is a need some mechanism for
querying multiple resources simultaneously. Second, there is a need for some
commonality of content or description across information resources being made
available for searching. To ensure common meanings across applications and
between users and between applications, the normal solution is to impose a
degree of control upon the terms used by both parties. At its most basic, this
control will involve no more than defining a list of words, from which
application and user have to select. In more complex instances, fully formed
thesauri may be employed, rich with hierarchy, synonyms, and relationships. In
an uncontrolled environment, users will consistently either use the wrong terms
or use right terms in wrong contexts. In the same uncontrolled environment,
creators will potentially use terms inconsistently. Terminology tools are: controlled vocabularies
(created manually or generated automatically by harvest keywords), alphanumeric
classification schema, and thesauri. Thesauri follow the structural guidelines
in ISO 2788 or ISO 5964, includes synonyms, complex hierarchies, scope notes,
and inter-relationships (equivalence, hierarchy, association). MODELS 11's aim was to explore the value
practicality of creating a single high-level thesaurus. There is a need to
study user behavior with respect to terminology.
Milstead, Jessica. Report on the Workshop on Electronic
Thesauri, November 4-5, 1999. Presented at NISO/APA/ASI/ALCTS. <http://www.niwo.org/news/events_workshops/thes99rprt.html>
(March 26, 2002). No longer available.
The definition of "thesaurus" for purposes of this meeting was
broader than that of the present standard for thesauri ANSI/NISO Z39.19-1993
(R1998). The meeting considered
vocabularies that meet two basic criteria: 1) use to facilitate analysis of
texts and their subsequent retrieval (or retrieval of the information which
they contain); 2) and inclusion of a rich set of semantic relationships among
their constituent terms. The scope included: standard thesauri, subject
headings lists, semantic networks, and taxonomies (Internet directories). It
excluded: simple term lists without equivalence relationships and dictionaries.
They identified 4 key issues:
1) the need for (and feasibility
of developing) a standard that speaks to criteria and/or methods for generating
thesauri by machine-aided or automatic means
2) the need for (and feasibility
of developing) a standard set of tools which show semantic relationships among
terms, as aids to text and information analysis and retrieval
3) the need for (and feasibility
of developing) a standard structure that supports a variety of electronic
thesaurus displays
4) the need for (and feasibility
of developing) a standard that supports interoperability protocols, structures,
and/or semantics applicable to thesauri.
Mongin, Larry, Yueyu Fu, Javed Mostafa. "Open Archives
Data Service Prototype and Automated Subject Indexing Using D-lib Archive
Content as a Testbed", D-Lib Magazine, 9, no. 12 (Dec. 2003). <http://www.dlib.org/dlib/december03/mongin/12mongin.htm>
(Dec. 18, 2003).
The Indiana University School of
Library and Information Science's laboratory has as it purpose to work in areas
of information retrieval and information visualization. They decided to use
OAI-PMH as a resource discovery tool. Since the D-Lib metadata file does not
contain a subject term, they decided to use IR algorithms to generate them.
After running the Java program that computed subject terms, they read each
article to make a judgment on whether the computed subject terms were relevant
to that article. The criteria was not whether the program selected the best
subject terms for that text, but rather whether the term generally reflected
the semantic meaning of the article. The resulting scores varied from
70-95%.
Murata, Masaki, and others. "Meaning Sort - Three
Examples: Dictionary Construction, Tagged Corpus Construction, and Information
Presentation System," ArXiv, 12 March 2001 <http://arxiv.org/abs/cs/0103012>
(Feb. 17, 2005)
It is often useful to sort words
into an order that reflects relations among their meanings as obtained by using
a thesaurus. In this paper, the authors introduce a method of arranging words
semantically by using several types of "is-a" thesauri and a
multi-dimensional thesaurus.
Murray-Rust, Peter and West, Lesley. Terminology in a
Global Context: VHG and XML. Part II, 2002. <http://www.vhg.org/uk.pub/vhgnews2.html>
(March 26, 2002). No longer available.
The aim of this article is to set out the
technical aspects of VHG.
XML is ideally suited to
delivering terminology over the web. Thus, in the spirit of XML, a simple
subset of ISO FDIS 12620 data categories is chosen to represent the communality
of the semantics of a majority of web-based glossaries. VHG is a platform- and convention-independent
specification. We put a high value on interoperability and achieve this by
reliance on several current W3C initiatives in XML. Semantics are added through a mechanism which
would link any tags starting with <VHG: to the semantics in the Unique
Resource Locator (URL). This distinguishes the VHG approach, so that when
someone encounters a VHG glossary it is self-identifying and can be processed
with VHG-compliant software. In a related manner, a document can link to a
number of glossaries simultaneously. It might use absolute URLS or it might use
a namespace mechanism. An element in a document linked to any number of
glossaries may provide complementary or even conflicting views. In the spirit
of the WWW, the reader of the document resolves the appropriate ontology.
National
Library of Medicine. (2005). Fact sheet: UMLS metathesaurus, 2005 . <http://www.nlm.nih.gov/pubs/factsheets/umlsmeta.html>
(Jan.7, 2005)
Neuroth, Heike and Koch, Traugott. Cross-browsing and
Cross-searching in a Distributed Network of Subject Gateways: Architecture,
Data Model, and Classification, 2001. <http://www.stk.cz/elag2001/Papers/HeikeNeuroth/HeikeNeuroth.html>
(Aug. 8, 2002).
The aim of the Renardus project
is to provide users with integrated access by searching or browsing, through a
single interface, to partners' quality-controlled subject gateways. Further
goals are to develop and define organizational models, business models,
technical solutions and metadata standards (Renardus Application Profile,
Renardus Namespaces, Renardus Collection Level Description). The following
elements can be used to define a quality-controlled subject gateway: Selection
and collection development, Collection management, Creation, Resource description
and metadata, Subject access, Search and browse access, Standards, Value-adding
features. Each participating partner is responsible for mapping his metadata
format to the common Renardus metadata format, derived from Dublin Core. A
generic normalization toolkit with Z39.50 configuration files and a conversion
script were provided. Each participant set up a Renardus server with their
content normalized to the Renardus datamodel. A set of screens were built for
the user interface: Homepage, Advanced Search screen, Index scan window,
Advanced search page after index scan, Browse by subject screen, (Preliminary)
Result screen, Sorted result screen, Participating gateways screen and Help
(index) screen. In order to accomplish subject browsing, the various systems
will be mapped to a common classification system. The Renardus service will
give access to resources from all kinds of subjects, published world-wide and
in many languages and it is intended to be offered to an international
multi-disciplinary community of users. Dewey Decimal Classification was chosen
because of: online availability and tools, global usage, suitability of the
classification system and its functionality, frequency and character of the
updates, Research and methodological development efforts.
Neuroth, Heike. Metadata
issues: Renardus. Presented at Cultural Heritage Projects Concertation
Event, June 30, 2000, Bundesamtsgebaude Wien. <http://www.cscaustria.at/events/documents/renardus.ppt>
(Aug. 6, 2002).
The
data model is mostly Dublin Core compatible with some Renardus specific
extension. The definition of a Renardus Schema is in progress. Still need to
address how to handle mutli-linguality.
Neuroth, Heike and Koch, Traugott. "Metadata Mapping
and Application Profiles: Approaches to Providing the Cross-searching of
Hetergeneous Resources in the EU project Renardus", 2001. <http://www.lub.lu.se/~traugott/drafts/DC2001-neuroth.pdf>
(Nov. 7, 2002).
The paper presents the approach
and results of a mapping process to define a common metadata format for
cross-searching distributed and heterogeneous subject gateways in the
heterogeneous subject gateways in the EU project Renardus. The outcome in is a well defined data model
with semantic and syntactical definitions of each metadata element. It results
in richer and semantically controlled cross-searching. The metadata elements
are mainly based on Dublin Core. The aim
of Renardus is to provide user with integrated access, through a single
interface, to high-quality Internet resources.
It is also to provide high quality subject access through indexing
resources using controlled vocabularies and by offering a deep classification
structure for advanced searching and browsing. All gateways participating in
Renardus apply resource descriptions and subject classification to all their
records. Participants have agreed to use a core set of metadata elements and
qualifiers: Title, Creator, Description, Subject, Identifier, Language, and
Type; plus Country. Further, they focused on the following characteristics for
each metadata element: semantic definition, syntactic definition, associated
qualifiers, cataloging rules, namespace definition, repeatability of elements,
form of obligation, language qualifiers.
For Subject, Renardus has four different namespaces plus they will
develop a cross-browsing structure based on Dewey Decimal Classification with
added European specific captions.
Nicholson, Dennis. "Subject-based Interoperability:
Issues from the High Level Thesaurus (HILT) Project." Paper presented at 68th IFLA Council and
General Conference,
HILT Phase 2 will create a pilot
terminologies mapping service or route map with a specific focus on current
concerns in the developing Distributed National Electronic Resource (DNER),
covering primarily higher education. HILT Phase I discovered that the various
service providers use a range of subject schemes (LCSH, UNESCO, DDC, AAT,
MeSH). If cross-searching and browsing is to function coherently for users of
the Information Environment (IE), these (multiple, varied) subject schemes must
be mapped to one another, perhaps using a common 'spine' such as DDC with
international and multi-lingual application and the potential to facilitate
machine to machine interworking. The terminologies must be disambiguated, then
translated into the service-assigned terms the users need to cross-search
browse the group of services of relevance to their query. The aim of HILT Phase
II is to build and evaluate a pilot service that will mediate as a DNER shared
service in the IE. The pilot TeRM would be built using commercially available
Wordmap software (http://www.wordmap.com);
examples at: http://www.oingo.com or http://vivisimo.com)
The initial illustrative TeRM would be based on the RDN (http://www.rdn.ac.uk/cgi-bin/browse)
terminologies available as part of the Wordmap taxonomies set, which include,
in particular, a set of terms used by general Internet users, and on selective
subsets of LCSH, DDC, UNESCO, and AAT. At issue, is the question of whether a
spine such as DDC should be used to map everything else to and also is it better
to adopt (adapt) an existing scheme or create a new one. The aim is to utilize
'native subject schemes' for the collections in the environment users use them,
and to use the pilot TeRM to 'disambiguate" user terms and resolve
differences between schemes. TeRM supports creation, editing, display, and User
[user interface], staff, and system interaction with terminologies map showing
terms in use and inter-relationships. It interacts with users and systems to
establish term and service context of search (e.g. archives only), provides
synonyms, broader, narrower, related terms, other contexts and service-set
navigational aids for cross searching browsing as required. See: http://hilt.cdlr.strath.ac.uk/Reports/FinalReport.html
Nicholson, Dennis,
and others. HILT: High-Level
Thesaurus Project: Final Report to RSLP & JISC, December 2001. <http://hilt.cdlr.strath.ac.uk/Reports/FinalReport.html>
(Oct. 29, 2002).
There is evidence of growing
agreement that interoperability in respect of subject schemes in a distributed
environment is recognized as an issue and that a standards-based approach is
the answer, but no evidence to suggest that one particular scheme or single
approach will provide the answer. There is very little information available on
the needs and behavior of users as regards subject searching in a distributed
environment. It is suggested that a mix of controlled vocabularies and free
text in searching gives the best results and is preferred by users. HILT's
recommended option - map LCSH AAT UNESCO UDC to DDC. Set up a mapping service,
ideally with international participation and support, and gradually build
towards a complete mapping of LCSH, UNESCO, UDC, and ATT to a DDC backbone.
Conclusion: best way forward for HILT was a pilot mapping services as described
in option 5.2. The pilot should: have a strong user focus, determine reliable
costs, includes cost benefits, involve international players, look at how best
to integrated semantic web and artificial intelligence developments, involved a
broad range of target services, use existing machine-readable mappings wherever
possible, be closely linked to a cross-sectoral and cross-domain task force,
use contexts, relationships, clustering, etc. look at user terminology as
against DDC as the central spine to which other schemes were to be mapped. DDC
by itself is not a solution, but mapped to more specific subjects schemes was
worth being a pilot project.
Nicholson, Dennis. "HILT High Level Thesaurus Project
: Interoperability and Cross-searching Distributed Services." Presented at
Satellite Meeting: Subject Retrieval in a Networked World, OCLC,
Published.
In C. C. Chen (Ed.), Global digital library development in the new
millennium: Fertile ground for distributed cross-disciplinary collaboration, 2001.
Presentation describes background
of HILT and the HILT Stakeholder Survey.
Nicholson, Dennis. HILT: High Level Thesaurus Project :
Investigating the Problems of Cross-searching Distributed Services by Subject
in the
NISO. Developing the Next Generation of Standards for
Controlled Vocabularies and Thesauri, (2005) <http://www.niso.org/committees/MT-info.html>
(Feb. 15, 2005)
NISO Z39.19-1993. American National Standards Institute. Guidelines
for the Construction, Format, and Management of Monolingual Thesauri, 1993.
<http://www.niso.org/standards/resources/Z39-19.html>
(July 1, 2002).
Abstract.
A thesaurus is a controlled
vocabulary arranged in a known order and structured so that equivalence,
homographic, hierarchical, and associate relationships among terms are
displayed clearly and identified by standardized relationship indicators that
are employed reciprocally. The primary purposes of a thesaurus are a) to
facilitate retrieval of documents, and b) to achieve consistency in the
indexing of written or otherwise recorded documents and other items, mainly for
post-coordinate information storage and retrieval systems. This standard
provides guidelines for constructing monolingual thesauri: formulating the
descriptors, establishing relationships among terms, and effectively presenting
the information in print and on a screen. It also includes thesaurus
maintenance procedures and recommended features of thesaurus management
systems.
The goals of the project were to:
1) use information visualization to help searchers understand and explore
information spaces, and 2) use the metadata in library records to accomplish
this end; specifically to explore the use of a classification system. One
approach is cluster-based spaces which uses clustering to coalesce
documents/topics, multidimensional scaling techniques to create space and
spatial metaphors to show relationships. Users infer the semantics of the space
from the characteristics of the clusters.
Olson,
Tony. “Integrating LCSH and MeSH in information systems.” Subject retrieval
in a networked environment: Papers presented at an IFLA satellite meeting
sponsored by the IFLA section on classification and indexing & IFLA section
on information technology,
Olson, Tony. "The Integration of Information Languages
and Interoperability." Present at "Real World Steps to
Interoperability in Libraries", ALCTS/LITA Authority Control in the Online
Environment Interest Group,
There are two types of indexing
languages: 1) information languages, and 2) natural languages. Information
languages include: classification systems (e.g. DDC), controlled vocabularies
(e.g. thesauri like AAT), and subject headings lists (e.g. LCSH). Issues
regarding controlled vocabularies are discussed. By their very nature different
controlled vocabularies are incompatible. While controlled vocabularies promote
consistency within the systems for which they are design, they tend to reduce
intersystem and database compatibility. Major problems are: 1) conflicts
between cross references in one vocabulary and established headings in the
other vocabularies; 2) no references or links between corresponding headings
from different vocabularies; 3) Differences in syntax in the construction of
subject heading strings; 4) Although a substantial majority of the
correspondences between terms in different vocabularies my be one-to-one, there
is a significant number of correspondences that are not; 5) Difference in
semantic relationships between vocabularies, which in turn also lead to
one-to-many correspondences; 6) identical headings in different vocabularies
can cause the retrieval of duplicate entries.
Methods and projects undertaken
in an effort to integrate various information languages include: 1) Mapping to
a larger metathesaurus, e.g. Unified Medical Language System (UMLS), which
integrates over 60 biomedical vocabularies and classifications and links many
different names for the same concepts and the H.W. Wilson mapping of 12
different Wilson vocabularies; 2) Multilingual Access to Subjects (MACS)
Project is an example of integrating multiple subject languages by providing
links between equivalent subject headings; 3) Another method of integration is
to use a reference language. In this case terms from various information
languages are mapped to a term (or classification number) in a single
particular information language (called a reference language); 4) The High
Level Thesaurus Project (HILT) project was to study the problems of
incompatibility among various information languages utilized by various
libraries and information centers. One
of the recommendations was to set up a mapping service that would eventually
carry-out a mapping of LCSH, the UNESCO thesaurus, AAT, UDC to a DDC backbone,
as the reference language; 5) in the Renardus Project local classification
schemes that are use in subject gateways, are mapped to DDC; 6) The LCSH/MESH
mapping project at Northwestern University is another approach to the
integration of controlled vocabularies.
In the LCSH/MESH mapping project,
instead of creating a separate database that contains the linking data, the
data is entered into the authority records of the vocabularies being mapped.
The LCSH/MESH mapping project at
Another aspect is differing semantic
relationships in different vocabularies. It was decided to map at similar
levels and use each vocabulary's structure to trace relationships. . Two issues
still exist: 1) broader/narrow term relationship are not explicit in MeSH but
are implicit in category (tree) number. A program was written to take data in
072 fields and put in 550 fields of authority records; 2) the syndetic
structure of LCSH is not complete (especially as distributed) containing only
narrow term references and not explicit broader term references. Another
problem is the syntactical differences between subject heading strings in the
various vocabularies or no string heading exists.
Olson, Tony and Strawn, Gary. "Mapping the LCSH and
MeSH Systems." Information Technology and Libraries, 16, no. 1
(March 1997): 5-19.
In an effort to resolve problems
of two subject systems in one online catalog, this project maps the LCSH and
MESH vocabularies. The two systems are integrated by a) mapping terms and
headings from one system to corresponding headings in the other system; b)
adding the mapping data to authority records, c) enhancing the library
management system software so that mapping data in authority records can be
used to develop syndetic structures that relate the systems smoothly and
consistently, while enhancing subject retrieval.
Open
Metadata Registry. <
http://avalon.ulis.ac.jp/~sugimoto/RPs/dc2001.pdf > ( Feb. 17, 2005)
Open Metadata Registry has much in common the SCHEMAS. It
will be used to promote the discovery and reuse of semantics within existing
vocabularies and the creation of new vocabularies. It will register
vocabularies relating to the Dublin Core Metadata Initiative.
Park, J., and
Ram, S. “Information systems interoperability: What lies beneath?” ACM Transactions on Information
Systems, 22, no. 4 (2004): 595-632.
Parsons,
J., and Wand, Y. “Choosing classes in
conceptual modeling.” Communications of the ACM, 40 (1997): 63-69.
Patton,
Glenn. "International Efforts to Improve Interoperability". Presented
at "Real World Steps to Interoperability in Libraries", ALCTS/LITA
Authority Control in the Online Environment Interest Group,
RDF Topicmaps : Theory. OCLC Research. <http://topicmap.oclc.org:5000/theory.html>
(Oct. 31, 2002).
The goal of the Topicmaps is to
bootstrap the efforts to meld natural-language-processing technologies with
Semantic Web development. It is comprised of: 1) the noun phrase extractor, 2)
noun phrase filter, and 3) relationship generator, wherein the goal was to
identify simple, thesaurus-like relations such as "broader-than"
using only a list of words as input.
Renardus. <http://www.renardus.org> (March 26,
2002).
Renardus is a collaborative
project that aims to improve academic users' access to a range of existing
Internet-based information services across
Renardus Project Deliverables (2000?)
This project deliverable intends
to ensure that any chosen broker architecture for Renardus is based on existing
models and/or emerging developments. It provides an extensive and comprehensive
review of 18 existing brokers models that have been developed for a variety of
existing services, projects, or initiatives.
Renardus
Project deliverable: specification of functional requirements for the broker
system. <http://www.renardus.org/about_us/deliverables/d1_3/titlePage.html>
(Aug. 7, 2002).
Evaluation
report of existing broker models. <http://www.renardus.org/about_us/deliverables/d_1/D1_1summ.html>
(Aug. 7, 2002).
Specification
of functional requirements for the broker system. <http://www.renardus.org/about_us/deliverables/d1_3/D1_3bsumm.html>
(Aug. 7, 2002).
Data
model: requirements and specification. <http://www.renardus.org.about_us/deliverables/d6_4/D6_4summ.html>
(Aug. 7, 2002).
Resnik, Philip (1995). "Disambiguating Noun Groupings
with Respect to WordNet Senses." ArXiv, (Nov. 29, 1995). http://xxx.lanl.gov/abs/cmp-lg/9511006>
(Feb. 17, 2005).
In word groupings within online
thesauri, one is interested in the relationships among word senses, not just
words. The paper presents a method for automatic sense disambiguation of nouns
appearing within sets of related nouns - the kind of data one finds in online
thesauri or as the output of distributional clustering algorithms.
Report of the SAC Subcommittee on Subject Reference
Structures in Automated Systems: Recommendations for Providing Access to,
Display of, Navigation within and among, and Modifications of Existing Practice
Regarding Subject Reference Structures in Automated Ssystems. 2003.
<http://www.ala.org/ala/alctscontent/catalogingsection/catcommittees/subjectanalysis/subjectreference/subjectreference.htm>
(Feb. 17, 2005).
The subcommittee concentrated on
maximizing the use of existing subject reference structures in automated
systems. The recommendations are divided in four sections: access to reference
structures, display of reference structures, navigation among and within
reference structures, and changes to the policies and practices that govern
creation of the authority records that underlie these reference structures in
automated systems.
Resource Discovery Network. "Renardus Project";
"Subject Portals Development Project". (2002) <http://rdn.ac.uk/projects/> (Completed
projects) (Sept. 25, 2005)
Resource Organisation and Discovery in Subject-based
Services: ROADS, 2000. <http://www.ukoln.ac.uk/metadata/roads/>
(Aug. 6, 2002)
The overall object of the ROADS
projects was to design and implement a user-oriented resource discovery system.
It investigated the creation, collection, and distribution of resources
descriptions, to provide a transparent means of searching for, and using
resources. The object was not to create an individual and idiosyncratic system
but to draw on, and help create, standards of good practice which can be widely
adopted by subject communities to aid and automate the process of resources
organization and discovery. See http://www.ukoln.ac.uk/roads/
ROADS: Interoperability and Metadata, 1998.
<http://www.ukoln.ac.uk/metadata/roads/interoperability/inter-meta.html>
(Aug. 7, 2002).
ROADS began work in a context
where interoperability is becoming increasingly important as a means to
integrate the wide range of information services. Users require distributed
information services to interwork in terms of search, location and delivery.
Semantic interoperability: Users will be searching a variety of indexes
constructed from a number of different underlying database structures.
Effective searching across services requires that semantically equivalent
fields in these indexes are mapped to each other. In addition semantics in the
search (client) must be managed so that they match the semantics in the indexes
(targets). Z39.50 allows indexed to be mapped to standard sets of attributes,
hiding the underlying structure of the target database. A common indexing
protocol enables routing of queries to the most appropriate database via a mesh
of centroids or index summaries. Resource Description Framework (RDF) aims to
provide a framework for expression machine-readable metadata about resources.
It is designed to enable different applications to interoperate by using a
common data model. RDF uses Extensible Markup Language (XML) as the encoding
syntax.
Russell, Rosemary and Day, Michael. Automated and Manual
Approaches to the Provision of Thesauri and Subject Vocabularies, 2001.
<http://www.ukoln.ac.uk/metadata/hilt/interfaces/>
Accessed June 11, 2002; ; final report <http://hilt.cdlr.strath.ac.uk/Reports/FinalReport.html>
(Feb. 25, 2003).
The term thesaurus is used in
different contexts to describe tools that fulfill different functions. From an
information science point of view, thesauri were originally developed as tools
to allow terminology control of detailed subject indexing of printed documents.
What distinguishes thesauri from some other subject vocabulary types is that
they show relationships between concepts. Relationships commonly expressed in
thesaurus include hierarchy, equivalence (synonymy), and association or
relatedness. These relationships are generally represented by the notation BT
(broader term), NT (narrower term), SY (synonymy), and RT (associated or
related term)
In addition to thesauri, there is
a range of other types of controlled subject terminologies (or vocabularies).
One can either browse alphabetical lists or the hierarchy of subject terms that
may be hyperlinked, or one can search terms and if a non-preferred term is
used, the user will be taken to the preferred term.
SCHEMAS Project
<http://reg.ukoln.ac.uk/registry/jsp/sforum.jsp>
Currently in development by the
United Kingdom Office for Library and Information Networking (UKOLN). Its goal
is the development of a comprehensive database of RDF schemas, application
profiles, and related semantics that have been used by programs under the IST
Program and other related European initiatives. The SCHEMAS database will be
used to promote the reuse and interoperability of semantics for existing and
new projects. It will register RDF schemas and namespaces used by projects
within the European Union.
SCHEMAS Registry, 2002. <http://www.schemas-forum.org/registry/>
(Aug. 6, 2002).
One important focus of the
SCHEMAS Project (to provide standards for metadata schema designers) is
provision of a registry of metadata schemas. The registry itself will serve as
a good-practice example of registry use and benefits. Workpackage6 aims to
promote the deployment of metadata registries defined with the Resource
Description Framework (RDF), promote standards and methods for creating and
processing schemas in multiple languages and writing systems, encourage re-use
and adaptation of global metadata elements in local schemas, formulate and
disseminate good-practice guidelines and investigate the process for managing
the evolution of multilingual registries.
Sheikholeslami, Gholamhosein and Chang, Wendy and Zhang,
Aidong. "SemQuery: Semantic Clustering and Querying on Heterogeneous Features
for Visual Data." IEEE transactions on knowledge and data engineering, v.
14, no. 5 (2002): 988-1002.
The effectiveness of
content-based image retrieval can be enhanced using heterogeneous features
embedded in the images. However, since the features in text, color, and shape
are generated using different computation methods, and thus may require
different similarity measurements, and integration of the retrievals on
heterogeneous features is a nontrivial task. In this paper the authors present
a semantics-based clustering and indexing approach, termed SemQuery, to support
visual queries on heterogeneous features of images.
Slater, Jenny. References - Taxonomies and thesauri.
CETIS, Metadata Special Interest Group, 2002. <http://cetis-metadata.lboro.ac.uk/vocab-ref.htm>
(July 29, 2002).
Lists a large number available on
the web.
Stoklasova, Bohdana, Marie Balikova, Ludmila Celbova.
"The Relationship between Subject Gateways and National Bibliographies in
International Context". Paper delivered at World Library and
Information Congress, 69th IFLA General Conference and Council,
The paper examines the
relationship between subject gateways and national bibliographies together with
general principles of universal bibliographic control in the broader context of
the need for integration of heterogeneous information sources. The paper gives
examples from the
Subcommittee on Subject Relationships/Reference Structure. Report
to the ALCTS/CCS Subject Analysis Committee. Appendix A, 1996. <http://www.ala.org/alcts/organiztion/ccs/sac/appendxa.html>
(March 26, 2002).
The charge was to investigate: 1)
the kinds of relationships that exist between subjects, the display of which
are likely to be useful to catalog users; 2) how these relationships are or
could be recorded in authorities and classification formats; 3) options for how
these relationships should be presented to users of online and print catalogs,
indexes, etc.
One conclusion was there is a
need for BT and NT and related browsing or exploding. Because Library of
Congress only distributes only the broader code, OPACs can display only
broader-to-narrower references. However, Gary Strawn has demonstrated that
systems can be programmed to generate narrower-to-broader references without
anyone having to add "narrower" 5XX fields to the authority records.
Non-specific "see also" relationships can be generated by coding the
byte used for reference relationships coding "n". Indexing databases
often use an alphabetical browsing list which then displays broader, narrower,
and related terms for a chosen subject. In addition an "explode"
function employs these term relationships along with several others (synonym,
abbreviation and language equivalent) to automatically retrieve all records
bearing on the chose term or related terms.
Subject Gateways, 1999. <http://www.desire.org/html/subjectgateways/subjectgateways.html>
(Aug. 7, 2002).
What is a subject gateway?
"Subject gateways are online services and sites that provide searchable
and browsable catalogues of internet based resources. Subject gateways will
typically focus on a related set of academic subject areas." Many of the
activities and research project within DESIRE are focused on developing the
ideas behind this definition of a subject gateway, as well as developing
methodology and tools that provide the functionality needed for a subject
gateway to function.
Sugimoto, Shiegeo, and others. "Developing Community-oriented
Metadata Vocabularies: Some Case Studies." Paper presented at
International Symposium on Digital Libraries and Knowledge Communities in
Networked Information Society (DLKC'04), 2004.
<http://www.kc.tsukuba.ac.jp/dlkc/>
;
<http://www.kc.tsukuba.ac.jp/dlkc/e-proceedings/papers/dlkc04pp128.pdf>
(Sept. 13, 2004).
This paper presents two case
studies which include the development of domain-specific subject vocabularies -
a core subject vocabulary for a subject gateway for library and
library-and-information science (LIS) resources, and subject vocabularies of a
portal service for a regional community. These case studies show that small
subject vocabularies are useful for these community-oriented services, and that
maintenance is a crucial issue for the development and use of the
vocabularies. In order to build a
community-oriented information environment in the Internet, we have to solve
two contradictory requirements for metadata schemas - specialization (or
localization) in a community and interoperability among communities.
Metadata, which has been widely
recognized as a key component for the Web and digital libraries in local or
domain-specific communities would need to define metadata schemas and
controlled vocabularies in accordance with their requirements in the case that
their requirements are difficult to be satisfied only by those defined for the
global communities. On the other hand, community-oriented specialization of
schemas and vocabularies would raise a bar for interoperability issues for
cross-community use of metadata and information resources. In addition,
long-term maintenance of the schemas and vocabularies is a crucial aspect for
the communities. Thus, we need to
satisfy the contradictory requirements to metadata in order to create a
community-oriented information environment.
The rest of the paper presents
two case studies of metadata centered research and a conceptual
model of metadata schema for
interoperability. The first case study is a development of a subject vocabulary
for the ULIS-DL metadata database that has about 27000 records of Simple Dublin
Core metadata for Web resources published by and/or useful for libraries and
library and information science (LIS) institutions. They developed an XML-based
software to create a subject directory from the metadata database using the
core vocabulary which is encoded in Web Ontology Language (OWL). The second case study is a development of a
set of vocabularies for an information navigation service named Digital Okayama
Dai-Hyakka (Digital Encyclopedia of Okayama). Its metadata schema is defined
based on Simple Dublin Core and it uses a subject vocabulary designed for the
local community.
A major issue to enhance the
usability of ULIS-DL has been (semi-)automatic creation of a directory style
interface for navigating users to appropriate resources in addition to the
text-based retrieval function. A subject vocabulary is required to create the
directory interface.
Based on the experiences in
IPL-Asia, we have defined the following guidelines to build subject
vocabularies for community-oriented metadata: 1) create a core subject
vocabulary which should be a reasonably small set of subject terms; 2) create
subject vocabularies by tailoring the core vocabulary and associating
appropriate expressions to every subject term in order to present the subject
terms in accordance with the properties of users, i.e., age range and language;
3) encode the vocabularies in an ontology description language such as XML
Topic Maps and OWL. This encoding is essential not only for automatic creation
of subject directories from metadata records but also for interoperability of
the subject vocabularies and for long-term maintenance of the subject
vocabularies.
Discussion on Subject Vocabulary
Maintenance:
In the preliminary study, the
authors built the subject term vocabulary for IPL-Asia using XML Topic Maps in
which each subject term is defined as a topic and associated with multiple
presentation labels in the CJK languages. They applied the multi-lingual
subject vocabulary to the IPL-Asia metadata and the DODH metadata in order to
build subject-based directories of the resources. This experimental study,
which is a straightforward approach, has shown the feasibility of building a
user interface that has multiple presentation modes.
From this study, they learned
that ontology description languages such as XML Topic Maps and OWL are useful
not only for encoding the vocabulary in a machine understandable form but also
for maintaining the vocabulary for long term. Vocabulary maintenance is a
crucial issue even if OKV is a small set of terms since it evolves over time, for
example, evolution of subject terms and subject groups, and update of
presentation labels. XML-based encoding is not a panacea but will help to
decrease the cost of maintenance.
It is useless to assume that a
single metadata element set will meet the needs of all domains and
purposes. It is also impractical to
develop metadata sets application by application: the result would be expensive
and chaotic, and interoperability would be non-existent. On the other hand, it
is desirable for application developers to use established metadata schemas and
adopt them in accordance with local requirements.
Dublin Core Metadata defines the
vocabulary of metadata, i.e., terms and their meanings, but in general does not
specify the encoding or syntactic characteristics. These requirements can be
defined independently of the vocabulary definitions. Description of this
application-specific syntactic feature is called an application profile. Any
application can have its own application profile, which specifies a set of metadata
vocabulary terms used in the application as well as syntactic or structural
features of the particular application. The application profile could be used
to define a mapping between the application’s scheme to a global scheme(s),
which is crucial for interoperability. A
conceptual model of metadata schema for interoperability was defined. Metadata
for an application is composed of three layers:
(1) Layer 1 - Semantic Definition
Layer: Definition of terms used in the schema.
In general, two types of metadata
terms are included in the metadata vocabulary - property vocabulary and value
vocabulary. A property vocabulary, or in other words element vocabulary, is a
set of property terms, for example, elements and element refinement qualifiers
of DCMES. A value vocabulary is a set of value terms, for example, encoding
schemes.
(2) Layer 2 - Structural
Constraints Definition Layer: Definition of syntactic features A set of terms
used in the schema and structural constraints applied to each term should be
included in a definition. Application profiles are given in this layer.
(3) Layer 3 - Implementation
Dependent Syntax Definition Layer: Definition of syntax of metadata in an
implementation.
In addition to these definitions, each application schema developer would provide guidelines for creating metadata. A metadata schema registry is a key software tool to enhance interoperability of metadata schemas expressed in all layers. Metadata schema registries are useful to store and provide all types of metadata vocabularies, i.e., application profiles, subject terms and other vocabularies.
Svenonius,
E. The intellectual foundation of information organization.
Taylor, Mike. Zthes: a Z39.50 Profile for Thesaurus
Navigation. Ver. 4.0, 2000. <http://www.lcweb.loc.gov/z3950/agency/profiles/zthes-04.html>
(March 26, 2002).
This document describes an abstract model for representing
and searching thesauri - semantic hierarchies of terms as described in ISO 2788
- and specifies how this model may be
implemented using the Z39.50 protocol. It also suggests how the model may be
implemented using other protocols and formats.
This profile is laid out in two main sections. The first is
concerned solely with the abstract representation of thesaurus terms and how
they may be searched; and the second with the implementation of these abstract
concepts in Z39.50: how thesaurus terms are encoded in the GRS-1 record
structure, how searches are encoded in the type-1 query, etc. It is intended
that the abstract model described here is sufficiently general that it can also
be implemented by protocols and data formats other than Z39.50. This profile
does not mandate any relationship between a thesaurus and any other database.
The model is that terms from any thesaurus database may be used to search any
other database (called a target database). This profile represents a
thesaurus as a database of inter-linked terms. If multiple thesauri are to be
supported by a single server, then they must be presented as separate
databases.
Tennant, R. (2004). Metadata's bitter harvest.
Library Journal (1976), 129, no. 12 (?? 2004) , 32.
Tennis,
Joseph T. “Layers of meaning: Disentangling subject access interoperability.”
Advances in Classification Research, 12
(2004)
Therond,
Daniel. "Www.European-Heritage.Net:
The European Heritage Network". Cultivate Interactive, issue 2, no.
16 (Oct. 2000). <http://www.cultivate-int.org/issue2/herein/>
(Aug. 7, 2002).
The European Information network
on cultural heritage policies (HEREIN Project) recommended setting up a
permanent information system for authorities, professionals, researchers and
training specialists. The aim of the project was to convert the Council of
Europe's paper databank on architectural and archaeological heritage into a
system a) with fast, easy access via the Internet, and b) which correspondents
in member countries would be able to update easily by email.
Tillett, Barbara. "A Virtual International Authority
File." Presentation
to the
Giornata di studio sul controllo di autorità nel Servizio
Bibliotecario Nazionale
Nov. 22,
2002. <http://www.iccu.sbn.it/TillettAF.ppt>
(April 1, 2003).
Objectives: a) facilitate sharing
to reduce cataloguing costs to libraries, museums, archives, rights management
agencies, etc. b) simplify creation and maintenance of authority records
internationally, c) enable users to access information in the language, script,
form they prefer.
Authority control virtues: a) “Precision”
in searching, b) syndetic structure of references to help navigate (the variant
forms of name/title/subject/etc.), c) displays to collocate works, d) links to
forms used in particular resources, e) bring library catalogues into the mix of
tools available on the Web.
There a number of p
rojects to facilitate or that
incorporate aspects of authority control on a international scale: EU:
AUTHOR Project, LEAF, <indecs>, INTERPARTY, HKCAN, IFLA: MLAR, GARR,
FRANAR, Dublin Core “Agents”, DELOS/NSF Working Group “Actors/Roles”, EAC
(Encoded Archival Context), CORC/Connexion, Unicode/Multiple Scripts, NACO/SACO
for AACR2 and LSCH. There is
increased need for interoperability exemplified to efforts to map different
communication formats with Z39.50 protocols, create crosswalks to the “MARCs”,
XML, ONIX. The Virtual International Authority File (VIAF) supports IFLA UBC
authority principles. Each country is responsible for authority headings for
its own personal and corporate authors. National authority records are
available for everyone to use. The same form and structure would be used
worldwide.
VIAF proposes using programs to
facilitate authority work, that would do automatic check of headings against
existing local authority file, and if not found, would automatically check
against “virtual” international authority file. It would display found matches
for editing or reference and insert authorized forms into local authority
record for future linking. The author would like to test using the
unique, persistent record control numbers such as the International Standard
Authority Number or the International Standard Authority Data Number and see if
that works or possibly use the number assigned to an information package for an
entity under OAI (Open Archive Initiative) protocols. There are many models
that can be envisioned for a virtual international authority file to help with
cataloging. Some of which are: a) a distributed system with the independent
National Bibliographic Agencies (NBA's) being searchable using the next
generation of Z39.50 protocols; b) a linked model that would use a search
protocol, such as Z39.50 going to any one of the linked authority files (LEAF
is testing this model); c) a centralized model that uses Open Archive
Initiative protocols to harvest the metadata from authority files of the
National Bibliographic Agencies on one or more servers; or d) providing a
centralized link, where one authority file is viewed as the central point to
which all others are linked.
Tudhope, Douglas, Alani, Harith, Jones, Christopher.
"Augmenting Thesaurus Relationships: Possibilities for Retrieval," JoDI,
1, no. 8 (Feb. 5, 2001). <http://jodi.ecs.soton.ac.uk/Articles/v01/08/Tudhope/>
(June 27, 2002).
The paper discusses the
augmentation of thesaurus relationships. First the authors discussed a case
study that explored the retrieval potential of an augmented set of thesaurus
relationships by specializing standard relationships into richer subtypes, in
particular hierarchical geographical containment and the associative
relationship. Various attempts to build taxonomies of thesaurus relationships
are discussed. They concluded by discussing the feasibility of hierarchically
augmenting the core set of thesaurus relationships, particularly the associate
relationship. They discussed the possibility of enriching the specification and
semantics of Related Term (RT relationships), while maintaining compatibility
with traditional thesauri via a limited hierarchical extension of the
associative relationships. They first illustrated how hierarchical spatial
relationships can be used to provide more flexible retrieval for queries
incorporating place names in applications employing online gazetteers and geographical
thesauri. The work described was part of a larger project, Ontologically
Augmented Spatial Information System (OASIS). Another aim was to explore the
potential of reasoning over the semantic relationships in thesauri to assist
retrieval. The three main types: a) equivalence (equivalent terms), b)
hierarchical (broader/narrower terms: BT/NT's), c) Associative (related terms:
RT's)
UK Interoperability Focus is
hosted by UKOLN. This post is responsible for exploring, publicizing and
mobilizing the benefits and practice of effective interoperability across
diverse information sectors. Interoperability is a broad term, encompassing
many of the issues impinging upon the effectiveness with which diverse
information resources might fruitfully co-exits. The issues are many be may be
defined as:
1) Technical Interoperability:
consideration of technical issues includes ensuring an involvement in the
continued development of communication, transport, storage and representation
standards such as Z39.50, ISO-ILL, XML, etc. <technical architecture>
2) Semantic Interoperability: …
individual resources - each internally constructed in their own semantically
consistent fashion - are made available through gateways or catalogs. Almost
inevitably these discrete resources use different terms to describe similar
concepts, or even identical terms to mean very different things. The
development and distributed use of thesauri such as those from Getty is worthy
of further consideration.
3) Political / Human
Interoperability: there are implications for the organizations concerned who
may see it as a lost of control or ownership. Staff may need extensive training
or retraining to ensure effective long-term use of any service
4) Inter-community
Interoperability: between institutions
5) International
Interoperability: existing issues magnified with varied languages, differences
in technical approach, working practices, etc.
Unified Medical Language System (UMLS). <http://www.nlm.nih.gov/research/umls/>
(March 26, 2002).
NLM's Unified Medical Language
System (UMLS) project develops and distributes multi-purpose, electronic
"knowledge sources" and associated lexical programs. The
Metathesaurus provides a uniform, integrated distribution format for more than
100 biomedical and health-related vocabularies, classifications, and coding
systems (some in multiple languages) and links many different names for the
same concepts. System developers can use the UMLS products to enhance their
applications. There are three UMLS Knowledge Sources: the Metathesaurus ® , the Semantic Network, and the SPECIALIST lexicon. They are distributed
with flexible lexical tools and the MetamorphoSys install and customization
program.
Van de Sompel, Herbert Van, Jeffrey A. Young, Thomas B.
Hickey. "Using the OAI-PMH … Differently," in D-Lib magazine,
9, no. 7/8 (July 3, 2003). <http://www.dlib.org/dlib/july03/young/07young.html>
(July 23, 2003).
The Open Archives Initiative's Protocol for
Metadata Harvesting (OAI-PMH) was created to facilitate discovery of
distributed resources. The OAI-PMH achieves this by providing a simple, yet
powerful framework for metadata harvesting. The OAI-PMH has been widely
accepted, and until recently, it has mainly been applied to make Dublin Core
metadata about scholarly objects contained in distributed repositories
searchable through a single user interface. Initially, the descriptive metadata
provided by OAI-PMH repositories was to a large extent limited to the mandatory
unqualified Dublin Core, but an evolution towards the provision of more
extensive descriptive metadata, such as MARC21, is becoming apparent. Metadata
records in the OAI-PMH are any data that can be validated against a W3C XML
Schema. Therefore, the OAI-PMH can be a medium for incremental, data-sensitive
exchange of any form of semi-structured data. The metadata contained in OAI-PMH
repositories is typically gathered by harvesters that process it and make it
searchable through a user interface. In these uses of the OAI-PMH, repositories
are never directly accessed by end-users; the "customers" of the
repositories are robots. A section of the article describes an approach to
overlay OAI-PMH repositories with an interface allowing users to directly
navigate the repository content. The authors also show how this approach has
been used to make the GSAFD Thesaurus, the OpenURL Registry and the XTCat
Thesis Catalog user-accessible.
Veen, Theo van and Robina Clayphan. "Metadata in the Context of the European
Library Project." Presented at Proceedings of the International Conference
on Dublin Core and Metadata for e-Communities, 2002: 19-26 <http://www.bncf.net/dc2002/program/papers.html>
The European Library sponsored by
the European Commission, brings together 10 major European national libraries
and library organizations to investigate the technical and policy issues
involved in sharing digital resources.
Vizine-Goetz, Diane. "Dewey in CORC: Classification in
Metadata and Pathfinders." Journal of Internet Cataloging, 4, no. 1
/ 2 (2001): 67-80.
The Cooperative Online Resource
Catalog (CORC) project provided an opportunity for OCLC research and Dewey
editors to explore the potential of the Dewey Decimal Classification (DDC)
system for organizing electronic resources. The mapped vocabulary was used in
the following ways: 1) to improve access to Dewey by expanding the indexing
vocabulary; 2) to assist in the assignment of subject elements during metadata
creation; 3) to provide supplemental terminology for automated classification;
4) to provide alternative access mechanisms for views to resources in the CORC
database.
Vizine-Goetz,
D., Hickey, C., Houghton, A. H., & Thompson, R. (2004). “Vocabulary mapping
for terminology services.” Journal of Digital Information, 4, no. 4 (2004)
Vizine-Goetz, Diane. "Terminology Services."
Presentation. 2003. <http://www.oclc.org/research/projects/mswitch/>
; <http://www.oclc.org/research/projects/mswitch/4_termservs.shtm>
(Feb. 18, 2005),
Discusses research at OCLC to add
value to metadata. Metadata Switch is a project involving a set of projects:
harvesting metadata, merging metadata from different sources, schema
transformation, terminology and name authority services, enrichment or
augmentation of records with various types of data. DDC, Thesaurus of ERIC descriptors, GSAFD
genre terms, MeSH, LSCH, and LCSHAC were converted to a common content model
and linked using intellectual and automated mapping techniques.
Wagner, Harry R. "The EOR toolkit: an Open Source
Solution for RDF Metadata," Information Technology and Libraries,
21, no. 1 (March 2002): 27-31.
RDF provides solutions that will
enable a significantly higher degree of reliability, relevance, and accuracy
for applications and services focused on resource discovery and management of
Web sites and other Internet resources. Through its use of
machine-understandable semantics, RDF enables the automated discovery,
management, and exchange of metadata. It significantly improves resource
discovery by enabling a finer degree of granularity and improved precision. In
addition to facilitating the creation of new resources descriptions, RDF builds
on the established work of various resources communities by enabling the
interoperability of existing metadata vocabularies within those communities.
EOR is one of a large and growing number of open resources applications that
are being used to develop applications and services focused on the discovery,
management, integration, and navigation of electronic resources. http://eor.dublincore.org
Wake, Susannah and Nicholson, Dennis. "HILT -
High-Level Thesaurus Project: Building Concensus for Interoperable Subject
Access across Communities." D-Lib Magazine, 7, no. 9 (Sept. 2001).
<http://www.dlib.org/dlib/september01/wake/09wake.html>
(Oct. 26, 2002).
The article provides an overview
of the work carried out by the HILT Project http://hilt.cdlr.strath.ac.uk in
making recommendations towards interoperable subject access, or cross-searching
and browsing distributed services amongst the archives, libraries, museums and
electronic services sectors. The article discusses the consensus achieved at
the June 19, 2001 HILT Workshop. The best way forward for HILT was the pilot
mapping service combined to an extent with a terminologies task force. The
service envisaged would map key schemes like LCSH, UNESCO, DDC, Universal
Decimal Classification, Art and Architecture Thesaurus, and possibly user and
regional terminologies, and local adaptations of standard schemes. Users would
be able to: a) input the term or terms that describe their problem using the
terminology that is most meaningful to them; b) specify their query more
closely if necessary by specifying a context; and c) obtain a list of
equivalent or near-equivalent terms with which they could then cross-search or
cross browse the various services.
Wake, Susannah (2001). HILT: High-Level Thesaurus
Project. Paper presented at IFLA Satellite Meeting: Subject Retrieval in a
Networked World, OCLC,
Presentation
gives background of HILT and summarizes the work of the June 2001 HILT
Workshop.
Whitehead,
C. “Mapping LCSH into thesauri: The AAT model”. In T. Peterson, & P. Moholt
(Eds.), Beyond the book: Extending MARC for subject access.
Willpower Information. Publications on Thesaurus
Construction and Use. <http://www.willpower.demon.co.uk/thesbibl.htm>
(July 1, 2002).
This is a list of printed and
electronic publications about the principles of constructing and using
information retrieval thesauri. It is not a list of existing thesauri, although
some thesauri have been included when they are good examples or illustrate the
results of different approaches to thesaurus construction. References to lists
of thesauri and systems that provide for thesaurus use by combining terms from
multiple facets in search interfaces are given at the end.
WordNet : a Lexical Database of the English Language, 2001.
<http://www.cogsci.princeton.edu/~wn/>
(Aug. 6, 2002).
WordNet is an online lexical
reference system whose design is inspired by current psycholinguistic theories
of human lexical memory. English nouns, verbs, adjectives and adverbs are
organized into synonym sets, each representing one underlying lexical concept.
Different relations link the synonym sets. Developed by the Cognitive Science
Laboratory at
Xiaoming Liu, [OAI-Implementers] "Dublin Core XML and
OAI," March 29, 2002, personal email to listserv. <http://arc.cs.edu/edu>
Xiaoming Lui's work on ARC, building
on Open Archives Initiative work includes a subject file from various schemas.
Young, Iain. "Da Chanan / Two Languages:
Creating Bi-lingual Name Authorities."
Paper presented at the 68th IFLA Council and General Conference,
The issue is how to create
standard name authorities in a bi-lingual environment. Using as a specific
example the project undertaken by the Scottish Poetry Library to create name
authorities for Gaelic poets, some with Gaelic and English forms of their names,
issues raised are examined.
Zeng,
M. L., & Chan, L. M. (2004). “Trends and issues in establishing
interoperability among knowledge organization systems.” Journal of the American Society
for Information Science and Technology, 55 (5), 377-395.