ALCTS Subject Analysis Committee
Subcommittee on Semantic Interoperability
Subject Semantic
Interoperability: Final Report
Report of the Subcommittee on Semantic
Interoperability to the ALCTS Subject Analysis Committee
Submitted by Chair, Shelby E. Harken,
With assistance of Subcommittee
members: Bonnie A. Dede, University of Michigan; Lois M. Chan, University of
Kentucky; Anton J. Olson, Northwestern University; Ruth A. Bogan, Rutgers
University; Shannon L. Hoffmann, Brigham Young University; Daniel Solomon
Lovins, Yale University Libraries; and input from Subcommittee members: Diane
Dates Casey, Governors State University; Rebecca J. Dean, OCLC; Giles Stewart
Martin, OCLC; Lynn M. El-Hoshy, Library of
Congress; Mary C. Lasater, Vanderbilt University; and non-member Assist.
Prof. Joseph Tennis,
Introduction
An information system managing its own subject access for a single
resource can relatively easily produce a successful database. However, there is
an increasing need to access multiple resources in multiple languages or with
multiple thesauri or controlled vocabularies. To a point, multiple controlled vocabularies
and knowledge organization systems can be made to interoperate. However,
without appropriate design, the resulting search results will be ‘non-semantic’
and of little value to users. Given that converging information systems — with
their idiosyncratic histories and social functions — are likely to produce
overlaps, seams, and gaps in the composite whole, the Subject Analysis
Committee formed the Subcommittee on Semantic Interoperability to investigate
what techniques are currently being employed by developers to minimize loss of
meaning and create true semantic interoperability.
Work of the
Subcommittee
Charge:
Survey the current state of international interoperability
projects which focus on subject and/or classification data. Produce a document
outlining "best practices" at a level of generality that is both
flexible enough to be measured against a variety of actual projects and
specific enough to be made operational in current or proposed projects.
Specific tasks include, but are not necessarily limited to:
a) an inventory of known semantic interoperability projects, with descriptions;
b) an evaluation of selected projects in terms of those projects stated
objectives; c) an investigation of the various concepts involved in the
harmonization of indexing languages.
To carry
out its charge the Subcommittee undertook a number of tasks which have resulted
in the documents appended to this report.
"The ability of two or more systems or components to
exchange or harmonize cognate subject vocabularies and/or knowledge organization
schemes to be used for the purposes of effective and efficient resource
discovery without significant loss of lexical or connotative meaning and
without special effort by the user."
Based on
the above list of tasks and accomplishments, the Subcommittee believes that it
has met its Charge with one exception, i.e., the development of a "Best Practices"
document. The reasons for this are discussed below in the Subcommittee's
Findings.
Subcommittee Findings
1. The 37 projects in Appendix C, fall
into 2 broad categories.
i. Production
projects, such as the H.W. Wilson Megathesaurus, and the MACS (Multilingual
Access to Subjects) Project. The goal of these projects is to develop a product or system that can be used by a
large number of users in a setting in which semantic interoperability is
needed. Most of the 21 projects in this category
are still in development, and a few have ceased or become inactive.
ii. Research/demonstration
projects, such as the DARPA Unfamiliar Metadata Project and the HILT (High
Level Thesaurus) Project. There are 14 projects in this category, most of which are completed or have become
inactive due to a lack of funding. It is possible that a few of the
active research projects might evolve
into working production systems.
Note that the Subcommittee was unable to classify two of
the projects as production or research. Furthermore, lack of documentation made
it difficult to determine if some of the projects had been permanently or
temporarily suspended.
2. After reviewing the literature and
examining various projects, the Subcommittee decided that a best practices document for semantic interoperability was
premature. There were several
reasons for this.
·
The
Subcommittee was unable to find any existing tool that could be used to
evaluate a semantic interoperability project. Consequently, the Subcommittee
would have to first develop an evaluation tool.
·
Once
developed, the tool could be employed to evaluate selected projects in order to
identify successful methods and models. The methods and models would form the
basis of a best practices document. The Subcommittee's three year term proved
insufficient to accomplish three major tasks: the development of a tool; the
project evaluations; and the analysis of successful projects.
·
Finally, even with a completed evaluation tool, there were still only a
few SI projects in full production, and these had not been in production long
enough to yield much analyzable data about how successful they were in meeting
their goals and objectives.
3. For its evaluation tool the Subcommittee
developed a Checklist (Appendix E) comprising a
number of questions to evaluate projects. To test its viability, the
Subcommittee used the Checklist to
evaluate seven projects (Appendices F1-F7). Based on these evaluations, the Subcommittee found that the Checklist
could serve as a useful evaluation tool.
4. Many of the questions in the Checklist
are the same ones that developers of SI projects need to answer as they design their projects. Therefore, the
Subcommittee also concluded that the
Checklist could serve as a guide to developers of SI projects.
5. The Subcommittee has been able to
identify a few semantic interoperability projects that are in full production. Some examples are listed below.
AGROVOC Thesaurus {Food and Agricultural Association}
Art and Architecture Thesaurus (AAT) {Getty Research
Institute}
Bilingual Subject Access {Library & Archives of
Classification Web {Library of Congress}
H. W. Wilson Megathesaurus
Renardus {Renardus Consortium}
Unified Medical Language System (UMLS) {National Library of
Medicine}
WebDewey {OCLC}
These projects share the following attributes:
·
A
well developed master plan for life-cycle management and data migration
·
Reliance
on international standards
·
A
viable business model which provides ongoing financial support for the project
·
Adequate
staff, computer software and hardware to support the project
Conclusion
The need
for improved semantic interoperability between and among vocabularies and
knowledge organization schemes is undeniable and growing in importance. There
is an ever-increasing need to create an environment by which even multiple
portals could be accessed via subject metadata using software that is neutral
and available ubiquitously or directly to the user, that could be copied by
libraries for use in their own environment. In order to develop or improve a
knowledge organization system including emerging options in semantic
interoperability, scholars and practitioners need to be able to evaluate a wide
variety of projects and stay current with the professional literature.
Based on
its findings, the Subcommittee concludes that the development of a successful
subject semantic interoperability project is a long and difficult process. It
requires a substantial investment of financial, human and computer resources.
The Subcommittee recommends using the information and tools in this report and
its appendices to assist in developing a successful project incorporating
subject semantic interoperability. Finally the Subcommittee concludes that
since this field of endeavor is still relatively young and immature, it is too
early to generate a set of Best Practices that could be used in developing a
successful project. We are past the theoretical and basic research phase and
into the development phase. Even though there are some successful projects in
full production, more projects need to reach maturity and much more research
needs to be done.
Appendices
A Background Information and Literature
Review
B. Glossary
C. Project Inventory
D. Criteria for Evaluating and Developing
Subject Semantic Interoperability
E. Checklist for Evaluating and Developing
Subject Semantic Interoperability Projects
F. Project Evaluations
F1. ADL
Thesaurus Protocol
F2. Library
& Archives of
F3. H.W.
Wilson Megathesaurus
F4. HILT
F5. MACS
F6. RDN
Subject Portals
F7. UMLS
(National Library of Medicine)
G. Program Summary
H. Annotated Bibliography
A Background Information and Literature
Review
Most online library systems worldwide utilize some type of
controlled vocabulary, and in many cases multiple vocabularies. From a librarian's point of view keyword
searching on the Internet has its limitations. Yet, online catalogs exist in
the Internet environment along with other remotely accessible databases, which
may utilize their own controlled vocabularies. Consequently during an
information seeking experience, users may be presented with a myriad of
thesauri and other controlled vocabularies. These same problems were identified
by Marcia Bates in her report to the Library of Congress[1].
Although some Internet search engines function fairly well, the Subcommittee
felt it needed to limit its focus to environments using some type of structured
subject-based metadata or embedded metatags, rather than random or weighted
keywords.
In the ALCTS report, "Subject data in the metadata
record"[2]
functional requirements for subject access to Internet resources include: a) to
assist searchers in identifying the most efficient paths for resource discovery
and retrieval; b) help users focus their searches; c) enable optimal recall; d)
enable optimal precision; e) assist searchers in developing alternative search
strategies; f) provide all of the above in the most efficient, effective and
economical manner.
In a networked environment, interoperability among disparate
systems is necessary to allow users to search among resources from multiple
sources generated and organized according to different standards and
approaches. Lois Chan in her paper for the Bicentennial Conference on
Bibliographic Control for the New Millenium 2000[3]
summarized the interoperability requirements as follows: a) interoperability
among different systems, metadata standards, and languages; b) flexibility and
adaptability to different information communities, not only different types of
libraries, but also other communities such as museums, archives, corporate
information system, etc; c) extensibility and scalability to accommodate the
need for different degrees of depth and different subject domains; d)
simplicity in application, i.e. easy to use and to comprehend; e) versatility,
i.e. the ability to perform different functions; and f) amenability to computer
application.
Doerr (2001)[4]
notes that terminological resources are increasingly important for information
retrieval in the networked environment, for retrieving documents by querying
databases, and for using metadata employing controlled vocabularies. There is a
growing interest in developing automated intermediaries to negotiate the
differences between controlled vocabulary schemes so that a user can use a
familiar set of terms to search collections using other vocabulary schemes.
Hunter (2001)[5]
points out that networked knowledge organization systems typically contain
objects of mixed media types which are described using a multitude of diverse
metadata schemas. Hence machine understanding of metadata descriptions which
conform to schemas from different domains is a fundamental requirement for
access. Yet, problems arise from the differences in terminological semantics
and hierarchical relationships within various subject schemes.
Bella Hass-Weinberg[6]
in Thesaurus Design for Semantic Information Management suggested that
"semantic information management” really just means vocabulary control;
that ontology usually just means classification scheme, but sometimes is used
as a synonym for thesaurus, and that taxonomy is just a synonym for
classification. Subject headings lists, such as LCSH are essential tools for
managing information in a print environment, while true thesauri are often more
useful in the online environment (where they can be viewed hierarchically or
combined in Boolean searches). Thesauri often run into the problem of needing
to distinguish homographs. The problem in the selection of thesaurus terms is
largely one of determining a set of appropriate lexemes, that is, the smallest
units of lexicon that can be understood on their own terms. Synonymy is a
common problem, though easily managed, e.g. Cancer, see Neoplasm. Other
problems: having to choose between singular and plural, parts of speech, etc.
A subject portal connects users to a site focusing on a
particular subject, with access to high-quality information resources, allowing
aggregated cross-searching, streamlined account management, user profiling, or
additional services.[7] However, the user has to know to go to the
portal. The number of subject portals is growing.
Renardus is an example of a subject gateway/portal project
with a goal of providing users with integrated access by searching or browsing,
through a single interface, to partners' quality-controlled subject gateways.
Further goals are to develop and define organizational models, business models,
technical solutions and metadata standards (Renardus Application Profile,
Renardus Namespaces, Renardus Collection Level Description). The following
elements can be used to define a quality-controlled subject gateway: a)
selection and collection development, b) collection management, c) creation, d)
resource description and metadata, e) subject access, f) search and browse
access, g) standards, h) value-adding features. Each participating partner is
responsible for mapping its metadata format to the common Renardus metadata
format, derived from Dublin Core. A generic normalization toolkit with Z39.50
configuration files and a conversion script were provided. Each participant set
up a Renardus server with their content normalized to the Renardus data model.
A set of screens were built for the user interface: a) homepage, b) advanced
search screen, c) index scan window, d) advanced search page after index scan,
e) browse by subject screen, f) (preliminary) result screen, g) sorted result
screen, h) participating gateways screen, and I) help (index) screen. In order
to accomplish subject browsing, the various systems, are mapped to a common
classification system. The Renardus service provides access to resources from
all kinds of subjects, published world-wide and in many languages and it is
intended to be offered to an international multi-disciplinary community of
users. The Dewey Decimal Classification
and Relative Index (DDC) was chosen because of online availability and
tools, global usage, suitability of the classification system and its
functionality, frequency and character of the updates, research and
methodological development efforts.[8]
About the same time the SAC Subcommittee on Semantic
Interoperability was formed, NISO decided Z39.19 Guidelines for the
Construction, Format, and Management of Monolingual Thesauri needed
changing to meet the needs of the changing information environment. Their
rationale included, "Developers of Internet and Intranet-accessible Web
pages, databases, and information systems need better metadata to support
non-expert information searches, and metadata developers are recognizing the value
of incorporating high-quality, interoperable controlled vocabularies and
taxonomies into their schemes."[9]
Literature Review
Some
researchers have been making close examinations of individual projects, while
others focus mainly on theoretical issues. Recent noteworthy articles of both
types in the library and information science domain include those by Chan &
Zeng[10], Tennis[11], and
Zeng & Chan[12];
while those in the computer science and database design domain include
Dhamankar,
The work
of Chan and Zeng is particularly useful for breaking down the many variables
that make up subject semantic interoperability. One major variable involves the
selection of data types, systems, or standards, which are to be made interoperable.
There are projects, for example, that harmonize different controlled
vocabularies in the same language, e.g., Northwestern University’s mapping of
LCSH and MeSH[16],
the Wilson Megathesaurus[17],
and CARMEN’s integration of multiple German thesauri; projects that aggregate
subject vocabularies from among different languages and classification
systems, e.g., the Unified Medical Language System (UMLS)[18],
the High Level Thesaurus[19]
&[20] ), and the DARPA Unfamiliar Metadata Project[21]; projects
that map a controlled vocabulary to a universal classification system such as
OCLC’s correlation of LCSH with DDC[22],
and the mapping of UDC to General Finnish Subject Headings[23];
and projects that harmonize heterogeneous classification schemes such as
the American Mathematical Society’s mapping of Mathematics Subject
Classification to Schedule 510 of the DDC[24].
Some
interoperability variables are more methodological in nature. Following the
work of Chan and Zeng[25], these
may be sorted into six categories: (1) “Derivation/Modeling,” where a
relatively simple vocabulary is derived from a more complicated pre-existing
source, the way Faceted Application of Subject Terminology (FAST) is extracted
from LCSH, for example; (2) “Translation/Adaptation” (e.g., the Bibliothèque
Nationale’s Rameau system, generated through translation and adaptation of LCSH
and Canadian Subject Heading (CSH)); (3) “Satellite and Leaf Node Linking,”
where specialized thesauri (such as The Legislative Indexing Vocabulary
(LIV), Thesaurus for Graphic Materials, Global Legal Information
Network (GLIN)) are treated as
satellites of a larger entity (LCSH) or conceptualized as leaves (specialized
thesauri) attached to a tree structure (the larger thesaurus or vocabulary
list); (4) “Direct mapping,” where equivalence between differently-sourced
terms and classification numbers are established, usually requiring intensive
intellectual effort; (5) linking through a “temporary union list”; and (6)
linking through a “thesaurus server protocol,” as with the Alexandria Digital
Library project.
Other
variables discussed in the literature include: How are interoperable links stored
and managed? Do they rely on authority records, concordance tables, a central
switching language, semantic networks, lexical databases, semantic layers[26],
or some other structure? How are data and metadata in general
stored? This is to say, are they being gathered into a union catalog (e.g.,
American Memory Project, NSDL), or living in a distributed system. How is data
structured? For example, do they rely on XML, MARC, Dublin Core, and/or other
metadata standards?
Yet
another set of variables involves difference in degree of granularity, and
logical structure. In the chapter “Compatibility and Convertibility” (pp.
179-216) of his Vocabulary Control for Information Retrieval, W.F.
Lancaster points out several difficulties with which anyone attempting semantic
interoperability (or “vocabulary reconciliation”, as he puts it) must contend:
How to reconcile vocabularies which have different degrees of specificity,
different degrees of pre-coordination, overlap in subject matter, and different
arrangements of hierarchy[27].
Vizine-Goetz, et al.[28] paraphrases
B. Glossary
|
Terms |
Definitions |
|
classification scheme |
The terms classification scheme, taxonomy, categorization scheme are often used interchangeably. Though there may be subtle differences from example to example, in general these types of KOSs provide ways to separate entities into buckets or relatively broad topic levels. Some examples provide a hierarchical arrangement of numeric or alphabetic notation to represent broad topics. These types of knowledge organization systems may not follow the strict rules for hierarchy required in the ANSI NISO Thesaurus Standard (Z39.19) (NISO), and often lack the explicit relationships presented in a thesaurus.[33] |
|
concept map |
A diagram showing the relationships between concepts. Concepts are connected with labeled arrows, in a downward-branching hierarchical structure. The relationship between concepts is articulated in linking phrases, e.g., "gives rise to", "results in", "is required by," or "contributes to".[34] |
|
concordance table |
Also called a correspondence table. Methodologically, a concordance table describes the way in which terms in multiple vocabularies are related.[35] |
|
controlled vocabulary |
A subset of a language, consisting of pre-selected words and phrases designated as index terms. In a controlled vocabulary, each subject is represented by one valid term only; and, conversely, each term represents only one subject. References are made from equivalent or synonymous terms not selected as valid index terms. Homographs are disambiguated. In addition, a controlled vocabulary contains links among hierarchically or otherwise related terms. Examples of controlled vocabularies include Library of Congress Subject Headings, Thesaurus of ERIC Descriptors, and Medical Subject Headings. The term "controlled vocabulary" is often used in a broad sense to include scheme-based classification data, which also manifest rigorous structures and embody relationships among concepts.[36] |
|
cross-domain search |
A search of multiple resources from different domains through a single interface, using a single query. |
|
crosswalk |
A program or algorithm to map elements in different metadata schemes. An example is the Dublin Core/MARC/GILS Crosswalk designed by the Library of Congress.[37] |
|
descriptors |
Terms used in indexes, abstracts, or other databases/periodical indexes to describe the subjects of an article. |
|
dictionary |
Alphabetical lists of terms and their definitions that provide variant senses for each term, where applicable. They are more general in scope than a glossary. While a dictionary may also provide synonyms and through the definitions, related terms, there is no explicit hierarchical structure or attempt to group terms by concept.[38] |
|
gazetteer |
A dictionary of place names. Traditional gazetteers have been published as books or they appear as indexes to atlases.[39] |
|
glossary |
A list of terms, usually with definitions. The terms may be from a specific subject field or those used in a particular work. The terms are defined within that specific environment and rarely have variant meanings provided. Examples include the EPA Terms of the Environment.[40] |
|
harmonization |
The process of making disparate entities or systems work together. Its purpose is to resolve conflicts and to remove obstacles by overcoming idiosyncrasies of individual systems. Within the context of subject access, harmonization implies efforts to make terms from different controlled vocabularies work together for the benefit of improving retrieval results. Differences may occur in semantics and/or syntax, and among multiple languages. Harmonization provides the ability to accommodate two or more different systems, schemes, or standards to facilitate searching across databases. Methods of harmonization include linking and mapping.[41] |
|
interoperability |
The ability of two or more systems or components to exchange information and use the exchanged information without special effort on the part of either system.[42] |
|
knowledge organization system |
A general term referring to the tools that present the organized interpretation of knowledge structures; includes authority files, classification systems, concept spaces, dictionaries, gazetteers, glossaries, ontologies, subject heading sets, thesauri; often called KOS, sometimes, knowledge organization scheme.[43] |
|
|
See knowledge organization system. |
|
link |
A mechanism for associating equivalent or associated terms. |
|
mapping |
A special form of linking, with efforts to identify equivalence or establish one-to-one and, in some instances, one-to-many relationships. Mapping facilitates automatic switching between systems or languages. Recent developments include efforts to match elements in the MARC record with those in other metadata records and efforts to identify equivalent terms among different controlled vocabularies or different languages. Examples of mapping of subject entries include the Omni File (based on the indexes to individual WILSONLINE databases) and MACS (Multi-lingual Access to Subject headings), a European project on multilingual access to subject authority files and data to develop a prototype for the mapping of subject entries based on three controlled vocabularies: Library of Congress Subject Headings (LCSH), RAMEAU, and Schlagwortnormdatei (SWD)).[44] |
|
metathesaurus |
A "thesaurus of thesauri," serving as a framework within which diverse controlled vocabularies are harmonized for the purpose of facilitating cross-file searching. An example is the UMLS(Unified Medical Language System) Metathesaurus developed and maintained by the National Library of Medicine, in which "alternate names [from different source vocabularies] for the same concept (synonyms, lexical variants, and translations) are linked together. Each Metathesaurus concept has attributes that help to define its meaning, e.g., the semantic type(s) or categories to which it belongs, its position in the hierarchical contexts from various source vocabularies, and, for many concepts, a definition." (National Library of Medicine 1999).[45] |
|
networked knowledge organization system |
An interactive information device aimed at supporting the description and retrieval of heterogeneous information resources on the internet; sometimes NKOS.[46] |
|
NKOS |
See networked knowledge organization system |
|
ontology |
A knowledge representation format. That is, an ontology is a shared understanding of the structure of a domain of interest. Ontologies make it easy both for humans to compile and maintain a body of knowledge, and for computer programs to use this knowledge to intelligently manipulate data. An ontology organizes all data using the concepts of class, object, and relationship. Classes are organized into a hierarchy, ordered by subclass, called a taxonomy. A well-known taxonomy is the biological taxonomy of all living things, in which living things are sub-classed into their kingdom: plant or animal. Plants and animals are further classified into phylum, etc. An ontology extends a taxonomy by including relationships among objects and classes, which can represent properties and values. To continue the biological example, there is a relationship "number of limbs" between certain classes of animals and integers. Many taxonomies have been developed to organize knowledge in particular areas.[47] |
|
ontology mapping |
The process of ontology mapping concerns how classes from one ontology can be mapped to classes of another taxonomy in an automated way.[48] |
|
query term |
The word or term with which a user begins a search. |
|
semantic interoperability |
The
ability of two or more systems or components to exchange or harmonize cognate
subject vocabularies and/or knowledge organization schemes to be used for the
purpose of effective and efficient resource discovery without significant
loss of lexical or connotative meaning and without special effort by the user |
|
semantic network |
A type
of KOS that structures concepts and
terms not as hierarchies but as a network or a web; concepts are thought of
as nodes with various relationships branching out from them; the
relationships generally go beyond the standard BT, NT and RT and may include
specific whole-part relationships, cause-effect, parent-child, etc. Examples
of semantic networks include |
|
subject authority file |
An internal tool for catalog or database management. It contains authority records and provides documentation of a body or list of authorized and authoritative indexing terms in the context and framework of its vocabulary.[50] |
|
subject authority record |
A record of a subject heading that shows its established form, cites the authorities consulted in determining the choice and form of the heading, and indicates the cross-references made to and from the heading.[51] |
|
subject headings |
A set of controlled terms to represent the subjects of items in a collection. Subject heading lists can be extensive, covering a broad range of subjects. In use, subject headings tend to be pre-coordinated, with rules for how subject headings can be joined to provide more specific concepts. Examples include the Medical Subject Headings (MeSH) and the Library of Congress Subject Headings (LCSH).[52] |
|
switching language |
Intermediary terms that serve as a mechanism for moving between vocabularies; unlike links, which are internal, switching language is external to records for the terms being associated |
|
taxonomy |
A hierarchical data structure or a type of classification schema made up of classes, where a child of a taxonomy node represents a more restricted, smaller, subclass than its parent.[53] |
|
term list |
A list of words or phrases, often with definitions; examples include authority files, glossaries, gazetteers, and dictionaries.[54] |
|
thesaurus |
A type
of |
C.
Project Inventory
Using the
definition of semantic interoperability developed by the Subcommittee, 37
projects were identified. The projects, along with information about them, are
listed below alphabetically by name. As can be seen from the list, the amount
of information that the Subcommittee was able to find varied from extensive for
some projects to very little for others. Minimally, for each project the
Subcommittee attempted to provide contact information, a URL, and/or a
citation, so that a reader of this report could be directed to additional
sources of information about a particular project. The Subcommittee's term
ended at the 2005 ALA Annual Conference, so this list has not been updated
since June 2005. Since then, information about some of these projects may have
changed, and some new projects may have begun. The Subcommittee attempted to be
as comprehensive as possible and include all known major SI projects in the
List, but of course some projects may have been overlooked. The Subcommittee
would especially like to acknowledge the work of Marcia Lei Zeng and Lois Mai
Chan, whose list of 18 SI projects[56]
(with descriptions) was the starting point for the Subcommittee's list.
|
Name |
ADL
Thesaurus Protocol |
|
Institution
or agency |
|
|
URL |
project
site at http://alexandria.sdc.ucsb.edu/~gjanee/thesaurus/ demonstrator
page at http://www.comp.glam.ac.uk/%7Efacet/formats/skos/skos_search.htm |
|
Contact
information |
Linda
Hill, Ph.D. UC |
|
Project
type |
Production |
|
Project
dates |
|
|
Status of
project |
Current
with demonstrator project available for public viewing |
|
Languages |
|
|
Knowledge
organization systems ( |
Thesauri |
|
Subject
Coverage |
General |
|
Description |
Protocol
for exchange of thesaurus information. Thesaurus data exchange tool The
Thesaurus Protocol is based on the ANSI/NISO (1993) Z39.19 thesaurus model
and supports downloading, querying, and navigating thesauri. |
|
Methodology |
In 2001-2002,
the ADL Implementation team developed a Thesaurus Service Protocol. It is a lightweight, stateless, XML- and
HTTP-based protocol designed to support searching and retrieval of thesaurus
data. All that is required for its use is the development of a thesaurus
server that can accept the specified XML-encoded queries and return the
specified standard reports. The
demonstrator system loads a thesaurus of choice (from a proffered list). The
thesaurus can then be searched by keyword. Displays of results take several
formats--alphabetical list of retrieved terms with USE references,
hierarchical display, scope notes. |
|
User
interface |
The
Thesaurus Protocol is based on the ANSI/NISO (1993) Z39.19 thesaurus model
and supports downloading, querying, and navigating thesauri. |
|
Relevant
standards |
XML, XML
Schemas, HTTP, ANSI/NISO Z39.19-1993
(thesaurus structure), XPATH, SKOS |
|
Notes |
|
|
Citation |
ADL
Thesaurus Protocol cited in recent articles in Cataloging and Classification
Quarterly (vol. 37 no 3-4 2004) Janée, G,
Ikeda, S. & Hill, L.L. (2002). The ADL Thesaurus Protocol. Binding,
Ceri and Douglas Tudhope. Zeng, M.
& Chan, L.M. (2004). Trends and issues in establishing interoperability
among knowledge organization systems. Journal of the American Society for
Information Science, 55(5), 377-395. |
|
Project
Name |
AGROVOC |
|
Institution
or Agency |
Food and
Agricultural Organization of the United Nations |
|
URL |
http://www.fao.org/agrovoc/ |
|
Contact
Information |
|
|
Project
Type |
Production |
|
Project
Dates |
|
|
Project
Status |
Operational |
|
Languages |
Multilingual:
Arabic, Chinese, Czech, English, French, Portuguese, Spanish |
|
Knowledge
Organization Systems |
Thesaurus |
|
Subject
Coverage |
Agriculture |
|
Description |
Multilingual
agricultural thesaurus. |
|
Methodology |
|
|
User
Interface |
A user
selects one of the languages and submits a string in that language to the
AGROVOC database. The result is a list of terms and phrases that begin with
the string. On the same page is a thesaural display of the first term in the
list, and a list of equivalent terms in the other languages with links to
thesaural displays of the term in these languages. A user select other terms
from the list. |
|
Relevant
Standards |
|
|
Notes |
|
|
Citation |
|
|
Project
Name |
Art &
Architecture Thesaurus (AAT) |
|
Institution
or Agency |
Getty
Research Institute |
|
URL |
http://www.getty.edu/research/conducting_research/vocabularies/aat/ |
|
Contact
Information |
Getty
Research Institute (310)
440-7335 griweb@getty.edu |
|
Project
Type |
Production |
|
Project
Dates |
|
|
Project
Status |
Operational |
|
Languages |
Multilingual |
|
Knowledge
Organization Systems |
Thesaurus |
|
Subject
Coverage |
Art,
Architecture and Material Culture |
|
Description |
The AAT
is one of three Getty vocabularies which provide terminology and other
information about the objects, artists, concepts, and places important to
various disciplines that specialize in art, architecture and material
culture. |
|
Methodology |
The AAT
is a structured vocabulary containing terms and other information about
concepts. Terms for any concept may include the plural form of the term,
singular form, natural order, inverted order, spelling variants, various
forms of speech, equivalent terms in various languages and synonyms of
different etymological roots. Among these terms one is flagged as the
preferred term or descriptor for the concept. |
|
User
Interface |
Online
public access catalogs and/or the Getty Web Site |
|
Relevant
Standards |
MARC 21,
XML |
|
Notes |
The other
two Getty vocabularies are: the Thesaurus of Geographic Names (TGN), which
contains names and other information about places; and the Union List of
Artist Names, which contains names and other information about artists. |
|
Citation |
|
|
Project
Name |
BUBL |
|
Institution
or Agency |
Centre
for Digital Library Service, |
|
URL |
http://bubl.ac.uk/ |
|
Contact
Information |
BUBL
Information Service Centre
for Digital Library Service Department
of Computer and Inofrmation Sciences 0141 548
4752 bubl@bubl.ac.uk |
|
Project
Type |
Production |
|
Project
Dates |
1990- |
|
Project
Status |
Operational |
|
Languages |
English |
|
Knowledge
Organization Systems |
Subject
heading list and classification system BUBL subject tree Dewey Decimal Classification (DDC) |
|
Subject
Coverage |
General |
|
Description |
BUBL is
an Internet-based information service for the |
|
Methodology |
|
|
User
Interface |
A user
can browse for subjects through the BUBL subject tree; browse through the DDC
hierarchy; or search by author, title, subject, DDC, or resource type. |
|
Relevant
Standards |
|
|
Notes |
|
|
Citation |
|
|
Project
Name |
CAMed |
|
Institution
or Agency |
|
|
URL |
http://circe.slis.kent.edu/mzeng/tmshome.html |
|
Contact
Information |
Marcia
Lei Zeng mzeng@kent.edu |
|
Project
Type |
Research/prototype |
|
Project
Dates |
|
|
Project
Status |
Current? |
|
Languages |
Multilingual:
English, French |
|
Knowledge
Organization Systems |
Thesauri AcuBase Thesaurus AMED Thesuarus JICST MiliMedicalThesaurus |
|
Subject
Coverage |
Complementary
and Alternative Medicine |
|
Description |
An
integrated thesaurus management and cross-thesaurus search system for
complementary and alternative medicine ( |
|
Methodology |
Four
thesauri in the areas of |
|
User
Interface |
The
cross-thesaurus search function allows a user to enter a term and search all
or any of the thesauri in this repository.
Software matches the query against the thesauri and gives back all
fully- or partially-matched thesaurus entries. When a term is selected from the search
results, a user can see the details of a thesaurus term entry (including the
broader, narrower, and related terms, as well as non-preferred terms) and continue selecting among the terms
displays. The term-search eventually enables a direct search in four bibliographical databases (samples) that have been integrated in the prototype. The term search function also extends to
the full-text searching of all resources in the CAMed website. |
|
Relevant
Standards |
|
|
Notes |
|
|
Citations |
Zeng, M.
& Chen, Y. (2003). Features of an integrated thesaurus management and
search system for the networked environment. In I.C. McIlwaine (Ed.), Subject
retrieval in a networked environment. Proceedings of an IFLA satellite
meeting held in Zeng
& Chan (2004). |
|
Project
Name |
CARMEN
(Content Analysis, Retrieval and Metadata: Effective Networking) |
|
Institution
or Agency |
|
|
URL |
http://www.bibliothek.uni-regensburg.de/projects/carmen12/index.html.en |
|
Contact
Information |
Dr.
Friedrich Geisselmann Universitätsbibliothek
Regensburg 93042 friedrich.geisselmann@bibliothek.uni-regensburg,de |
|
Project
Type |
Research/prototype |
|
Project
Dates |
|
|
Project
Status |
Current? |
|
Languages |
Multilingual:
English, German |
|
Knowledge
Organization Systems |
Thesauri,
classification systems and subject headings lists Informationszentrum Sozialwissenschaften
(IZT) {Thesaurus} German Institute for Educational
Reasearch Thesaurus Schlagwortnormdatei (SWD) {Subject
heading list} Dewey Decimal Classification (DDC) Regensburger Verbund Klassifikation (RVK) Mathematics Subject Classification (MSC) Physics and Astronomy Classification
Scheme (PACS) |
|
Subject
Coverage |
Social
Sciences, Mathematics, Physics, Astronomy |
|
Description |
The goal
is to provide an integrated subject search in distributed databases
representing different disciplines, taking into account the conceptual
differences of the applied thesauri and classifications by cross
concordances. |
|
Methodology |
Starting
from alphabetical lists which contain descriptors from a specific subject
area, the relationships between IZT, the German Institute for Educational
Research Thesaurus and SWD are determined intellectually. After the
relationships have been established, they are recorded in a link management
system. |
|
User
Interface |
|
|
Relevant
Standards |
|
|
Notes |
|
|
Citations |
Kunz, M.
(2002). Sachliche Suche in verteilten Ressourcen: Ein kurzer Überblick über
neuere Entwicklungen [Subject retrieval in distributed resources: a short review
of recent developments. Paper presented at the 68th IFLA Council and General
Conference, Zeng
& Chen (2004). |
|
Name |
Classification
Web |
|
Institution
or agency |
Library
of Congress |
|
URL |
http://classweb.loc.gov/ |
|
Contact
information |
Cheryl C.
Cook Product
Coordinator Library
of Congress Cataloging
Distribution Service ccoo@loc.gov |
|
Project
type |
Production |
|
Project
dates |
|
|
Status of
project |
Current,
in production |
|
Language |
English |
|
Knowledge
organization systems ( |
Classification
system, Subject heading list Library of Congress Classification (LCC) Library of Congress Subject Headings
(LCSH) {Subject heading list} |
|
Subject
Coverage |
General |
|
Description |
This
project links LCC numbers to LCSH headings and vice versa. |
|
Methodology |
LCC
numbers are added to LCSH authority records; and LCSH headings are added to
LCC authority records. |
|
User
interface |
In
Classification Web users can move across the |
|
Relevant
Standards |
MARC 21 |
|
Notes |
|
|
Citation |
Zeng and
Chan (2004). |
|
Name |
Czech
National Subject Gateway Project and Uniform Information Gateway |
|
Institution
or agency |
National
Library of the |
|
URL |
Uniform
Information Gateway: http://www.jib.cz ;
User interface: |
|
Contact
information |
|
|
Project
type |
Production |
|
Project
dates |
2nd
version released March 2003 |
|
Status of
project |
Current |
|
Language |
Czech |
|
Knowledge
organization systems ( |
|
|
Subject
Coverage |
General |
|
Description |
The
Czechs explored building subject portals for online resources and existing
bibliographic records. They surveyed the field of national bibliographical
agencies to see what sources they use for subject terminology. Like other
similar projects, this is an attempt to achieve interoperability through
control of descriptive cataloging. |
|
Methodology |
Mapping
was being done intellectually on the main classes and principal subdivisions
level: in order to reach the highest possible accuracy in mapping process, it
was necessary to use common auxiliary subdivisions. Contains four files:
geographic, chronological, genre/form, and topical authority files. Subject
categorization of heterogeneous information resources using Conspectus method
is used. The scheme consists of mapping DDC and UDC. Topics authority terms
contain English equivalents. |
|
User
interface |
Aleph
interface allows user to search subjects authority records or conspectus
records in a number of languages. |
|
Relevant
standards |
|
|
Notes |
|
|
Citations |
Stoklasova,
Bohdana, Marie Balikova and Ludmila Celbova. The relationship between subject
gateways and national bibliographies in international context. Paper
presented at 69th IFLA General Conference and Council, 1-9 August
2003, http://www.ifla.org/IV/ifla69/papers/054e-Stoklasova_Balikova_Celbova.pdf |
|
Project
Name |
DARPA
Unfamiliar Metadata Project |
|
Institution
or Agency |
|
|
URL |
http://metadata.sims.berkeley.edu/GrantSupported/unfamiliar.html |
|
Contact
Information |
Michael
Buckland Professor
Emeritus South
Hall 203A (510)
642-3159 buckland@sims.berkeley.edu |
|
Project
Type |
Research/prototype |
|
Project
Dates |
|
|
Project
Status |
Complete? |
|
Languages |
Multilingual:
English, French, German, Russian, Spanish |
|
Knowledge
Organization Systems |
Thesauri
and classification systems INSPEC Thesuarus Medical Subject Headings (MeSH)?
World Intellectual Property Organization International Patent
Classification Library of Congress Classification in the
Physical Sciences Standard Industrial Classification |
|
Subject
Coverage |
Biotechnology,
Physical Sciences, Technology |
|
Description |
"The
objective of this project is to link ordinary language queries to unfamiliar
indexes and classifications." |
|
Methodology |
|
|
User
Interface |
Entry
Vocabulary Modules are built to respond adaptively to a searcher's query
posed in ordinary language. A searcher can enter an ordinary language query
to a particular database, and the searcher will be presented with a ranked
list of terms from the database's vocabulary. The searcher can then use these
terms to perform a search of the database. |
|
Relevant
Standards |
|
|
Notes |
This
project was carried out under the auspices of the Metadata Research Program
of the School of Information Management & Systems, |
|
Citation |
Buckland,
M., Chen, A., Chen, H., Kim, Y., Lam, B., Larson, R., Norgard, B., &
Purat, J. (1999). Mapping entry vocabulary to unfamiliar metadata
vocabularies, D-Lib Magazine [Online], 5(1). Available:
http://www.dlib.org/dlib/january99/buckland/01buckland.html Zeng
& Chen (2004). |
|
Project
Name |
DESIRE |
|
Institution
or Agency |
DESIRE
Consortium |
|
URL |
http://www.desire.org/ |
|
Contact
Information |
Tracy
Hooper DESIRE
Project Manager Institute
for Learning and Research Technology 44 117
928 7197 t.a.hooper@bristol.ac.uk |
|
Project
Type |
Research |
|
Project
Dates |
1998-2000 |
|
Project
Status |
Complete? |
|
Languages |
|
|
Knowledge
Organization Systems |
Subject
gateways |
|
Subject
Coverage |
|
|
Description |
The
Project's focus was on enhancing existing European information networks for
research users across |
|
Methodology |
The
Project participants proposed a representation of the conceptual
relationships typical of controlled vocabularies using the Resource
Description Framework (RDF). It was hoped that such an approach would enable
the use of generic RDF tools as a basis for mapping between subject
vocabularies. The Project report included a proposal for a RDF/XML thesaurus
schema that attempted to demonstrate how the RDF data model could represent a
web of inter-related concepts and terms from more than one thesaurus. Registries
were developed for metadata application profiles (http://desire.ukoln.ac.uk/registry/ra.php3);
and metadata
terminology (http://desire.ukoln.ac.uk/registry/element.php3) |
|
User
Interface |
|
|
Relevant
Standards |
|
|
Notes |
During
the second phase of the Project (DESIRE II) some background work was
conducted on subject vocabularies in order to support the development of
interoperable subject gateways, especially with regard to multilinguality and
the mapping of different vocabularies. |
|
Citation |
|
|
Name |
The FACET
Project |
|
Institution
or agency |
Hypermedia Research Unit School of Computing Pontypridd CD37 1DL |
|
URL |
http://www.comp.glam.ac.uk/~FACET/default.asp |
|
Contact
information |
Douglas
Tudhope (dstudhope@glam.ac.uk) Daniel
Cunliffe (djcunlif@glam.ac.uk) |
|
Project type |
Demonstration |
|
Project
dates |
Initial
funding covered three year period, 2001-2003 |
|
Status of
project |
Current
with demonstrators available for public viewing |
|
Languages |
English |
|
Knowledge
organization systems ( |
Thesauri;
faceted thesauri |
|
Subject Coverage |
not
subject specific; uses thesaurus terms and data from AAT as demonstration |
|
Description |
The
objective of the FACET Project research has been to: “Develop and evaluate retrieval tools based
on a matching function incorporating thesaurus semantic closeness measures.”
The FACET Project attempts to find a way to present thesaurus data to a
searcher, to allow the user to search for appropriate resources from
displayed thesaurus terms and to provide the searcher with behind the scenes
expansion of a search based on concepts of the semantic relationships among
thesaurus terms. One premise of the
project is the value of the facet analysis model of thesaurus building.
Demonstrators for the FACET Project make use of the Art and Architecture
Thesaurus, as an example of a faceted thesaurus. |
|
Methodology |
The FACET
system architecture comprises client and web browser interfaces, utilities
that interact with data objects, and an SQL server database that serves the
thesaurus information. In a recent (2004) publication, the developers of
FACET state that their intention is to “move toward and open (Web service)
platform … and build on a general
programmatic |
|
User
interface |
Several
web based search and display interfaces are proposed in the demonstrators |
|
Relevant
standards |
XML; the
developers are recently acknowledging that there needs to be a standardized
protocol for the presentation of representation of thesaurus data; they
mention the ADL protocol as a step in the right direction. |
|
Notes |
In short,
the project attempts to present thesaurus data in a meaningful way to
searchers, to propose expanded searching options by suggesting terms in
context, and to allow searchers to use the discovered terms in a query of
resources. Initial
funding from Engineering and Physical Sciences Research Council (EPSRC), a |
|
Citation |
Tudhope, Binding,
Ceri and Douglas Tudhope. |
|
Project
Name |
Finnish
Project |
|
Institution
or Agency |
|
|
URL |
|
|
Contact
Information |
|
|
Project
Type |
Research |
|
Project
Dates |
|
|
Project
Status |
Prototype;
research |
|
Languages |
Multilingual:
English, Finnish |
|
Knowledge
Organization Systems |
Subject
heading list and classification system General Finnish Subject Headings (GFSH)
{Subject heading list} Universal Decimal Classification (UDC) |
|
Subject
Coverage |
General |
|
Description |
This
project converts assigned class numbers based on the Finnish abridged edition
of UDC into GFSH headings. |
|
Methodology |
A
dictionary was created that maps UDC numbers to GFSH headings. The dictionary
was mechanically applied to convert the bibliographic databases. |
|
User
Interface |
|
|
Relevant
Standards |
|
|
Notes |
|
|
Citation |
Himanka,
J. & Kautto, V. (1992). Translation of the Finnish abridged edition of
UDC into General Finnish Subject Headings. International Classification, 19,
131-134. Zeng
& Chan (2004). |
|
Project
Name |
HEREIN
(The European Information Network on Cultutral Heritage) Thesaurus |
|
Institution
or Agency |
European
Heritage Network, Council of |
|
URL |
http://www.european-heritage.net/sdx/herein/ |
|
Contact
Information |
|
|
Project
Type |
Production |
|
Project
Dates |
|
|
Project
Status |
In
development |
|
Languages |
Multilingual:
English, French, Spanish |
|
Knowledge
Organization Systems |
Thesaurus |
|
Subject
Coverage |
Cultural
heritage |
|
Description |
This
multilingual thesaurus is attached to the HERIN Project. It intends to offer
a terminological standard for national policies dealing with architectural
and archaeological heritage. |
|
Methodology |
Most of
the terms in the thesaurus come from reports on cultural heritage policy in |
|
User
Interface |
Through
the Project Web site, a user can either search for a specific term, or browse
through the hierarchical classes. |
|
Relevant
Standards |
|
|
Notes |
|
|
Citation |
Thérond,
D. (2000). European-Heritage Net: The European Heritage Network. Cultivate
Interactive, [Online] 2 Available: http://www.cultivate-int.org/issue2/herein/ Zeng and
Chan (2004). |
|
Name |
HILT
(High Level Thesaurus Project) |
|
Institution
or agency |
Funded by
JISC (Joint Informations Systems Company) |
|
URL |
http://hilt.cdlr.strath.ac.uk |
|
Contact
information |
Dennis
Nicholson Director
of Research Centre
for Digital Library Research c/o
Andersonian Library 44 (0)
141 548 2102 d.m.nicholson@strath.ac.uk |
|
Project
type |
Pilot
Project |
|
Project
dates |
2000- |
|
Status of
project |
Current |
|
Language |
Multilingual |
|
Knowledge
organization systems ( |
Thesauri,
classification systems, subject heading lists Art and Architecture Thesaurus (AAT) Dewey Decimal Classification (DDC) Library of Congress Subject Headings
(LCSH) UNESCO Thesaurus RDN terminologies Wordmap taxonomies set |
|
Subject
Coverage |
General
and special |
|
Description |
The pilot
project (Phase II) will develop an online terminologies route map (or TeRM)
that will map subject schemes to user terminologies and to each other. |
|
Methodology |
|
|
User
interface |
|
|
Relevant
Standards |
|
|
Notes |
Phase I
investigated the problem of searching and browsing across a number of
distributed services using different indexing vocabularies and attempted to
derive a set of recommendations to help facilitate cross-searching and
browsing by subject between communities, services and initiatives. The
results of these investigations led to HILT Phase II, the Pilot Project
described above. |
|
Citation |
Nicholson,
D. & Wake, S. (2003). HILT: Subject retrieval in a distributed
environment. In I.C. McIlwaine (Ed.), Subject retrieval in a networked
environment. Proceedings of an IFLA satellite meeting held in Zeng
& Chan (2004). |
|
Name |
H.W. Wilson
Megathesaurus for Omnifile Project |
|
Institution
or agency |
H.W.
Wilson |
|
URL |
|
|
Contact
information |
|
|
Project
type |
Production |
|
Project
dates |
|
|
Status of
project |
Active,
in production |
|
Language |
English |
|
Knowledge
organization systems ( |
Thesauri |
|
Subject
Coverage |
General |
|
Description |
Merges H.W.
Wilson has developed a “megathesaurus” that gathers the vocabulary for all
its indexes for inclusion in its Omnifile product. The Omnifile product now
includes six of the 11 |
|
Methodology |
Concepts
merge into single terms, while the megathesaurus retains the terminology used
in the separate indexes. The individual database products use the same terms
as always; in the Omnifile product, the megathesaurus equivalent appears. |
|
User
interface |
Web,
specifically, "WilsonWeb". Megathesaurus is largely invisible to
the user. |
|
Relevant
standards |
Unknown |
|
Notes |
|
|
Citations |
Kuhr,
P.S. (2003) Putting the world back together: mapping multiple vocabularies
into a single thesaurus. In I.C. McIlwaine (Ed.), Subject retrieval in a
networked environment. Proceedings of an IFLA satellite meeting held in Milstead,
Jessica. Cross file searching: how vendors help--and don’t help--improve
compatibility.” Searcher, vol. 7, no. 5 (May 1999) |
|
Project
Name |
IMesh |
|
Institution
or Agency |
UKOLN:
the |
|
URL |
http://www.imesh.org |
|
Contact
Information |
UKOLN c/o The
Library BA2 7AY 44 1225
38658 imesh-toolkit@imesh.org |
|
Project
Type |
Production |
|
Project
Dates |
Sept.
1999 - July 2003 |
|
Project
Status |
|
|
Languages |
|
|
Knowledge
Organization Systems |
Subject
gateways |
|
Subject
Coverage |
|
|
Description |
The
Project will build on existing subject software to develop a configurable,
reusable and extensible toolkit for subject gateway providers. |
|
Methodology |
Components
evolve independently but rely on each other to accomplish larger tasks. To
achieve interoperability the goal is for components to be able to call on one
another efficiently and conveniently. |
|
User
Interface |
|
|
Relevant
Standards |
RDF, SQL |
|
Notes |
NSF/JISC
International Libraries Initiative. |
|
Citation |
|
|
Project
Name |
LCSH/MeSH
Mapping Project |
|
Institution
or Agency |
Northwestern University Libraries |
|
URL |
http://www.library.northwestern.edu/public/lcshmesh/ |
|
Contact
Information |
Tony
Olson Catalog
Librarian Galter
Health Sciences Library Northwestern
University (312)
503-8125 ajolson@northwestern.edu |
|
Project
Type |
Production |
|
Project
Dates |
1990- |
|
Project
Status |
Active,
in development |
|
Languages |
English |
|
Knowledge
Organization Systems |
Subject
heading lists Library of Congress Subject Headings
(LCSH) Medical Subject headings (MeSH) Thesaurus |
|
Subject
coverage |
General
and medicine |
|
Description |
The goal
of this project is to integrate LCSH and MeSH in online catalogs. |
|
Methodology |
Corresponding
established headings in LCSH and MeSH are mapped, and the mapping data is
entered into 7XX linking fields of LCSH and MeSH MARC 21 authority records.
The data in these fields can be used to generate equivalent term references
in an online catalog. The mapping data is continually updated to take into
account changes in the two |
|
User
Interface |
In online
public access catalogs see also
references will be provided between equivalent LCSH and MeSH headings. |
|
Relevant
Standards |
MARC 21 |
|
Notes |
The
project is still in development because most library management systems do
not yet index 7XX fields in authority records, and consequently do not supply
linking references between equivalent LCSH and MeSH headings. The
mapping data is available for use in other interoperability projects. Files
of enhanced LCSH and MeSH authority records with the mapping data can be
downloaded from the Northwestern public http site above. |
|
Citation |
Olson, T.
& Strawn, G. "Mapping the
LCSH and MeSH Systems." Information Technology and Libraries, Zeng
& Chan (2004). |
|
Name |
LEAF (Linking and Exploring Authority
Files) |
|
Institution
or agency |
multiple
European institutions; Dept. of Manuscripts,
Staatsbibliothek zu Berlin Preussischer Kulturbesitz; |
|
URL |
http://www.crxnet.com/leaf/ |
|
Contact
information |
Name:
WEBER, Jutta (Dr) |
|
Project
type |
Research/prototype |
|
Project
dates |
2001-2004
(Fifth Framework Programme) |
|
Status of
project |
Completed |
|
Languages |
Multilingual |
|
Knowledge
information systems ( |
Name
authority files |
|
Subject
Coverage |
General |
|
Description |
Utility
for creating universal name authority file. [From the
web site of the Fifth Framework Programme] The beneficial potential of
authority information is presently only partly utilised by cultural heritage
organisations: libraries, archives, museums etc. are independently working
with them without jointly exploiting this valuable resource. Public users are
not involved in this scenario neighbouring work in the commercial sector is
not integrated. LEAF proposes a model for harvesting existing
authority data and person name/corporate body information in a multilingual
environment. Via user queries the LEAF system will
automatically and dynamically create a common name authority file with links
to organisations that provide information about a person or corporate body
and/or items connected to them. The LEAF model will be
applicable to all projects and co-operations that are dealing with cultural
heritage data in all kinds of institutions by making authority information
available to everyone involved. The project results will be implemented by
extending an existing, fully functional, international online Search and
Retrieval service network of OPACs that provides information about modern
manuscripts and letters, the MALVINE project. |
|
Methodology |
LEAF
develops a model architecture for establishing links between distributed
authority records and providing access to them. The system allows uploads of
the distributed authorities to the central system and automatically links
those authorities concerning the same entity. Information which is retrieved
as a result of a query will be stored in a pan-European "Central Name
Authority File". This file will grow with each query and at the same
time will reflect what data records are relevant to the LEAF users. Libraries
and archives wanting to improve authority information will thus be able to
prioritise their editing work. Registered users will be able to post
annotations to particular data records in the LEAF system, to search for
annotations, and to download records in various formats. The local
authority data that is uploaded to the central LEAF system is originally
encoded in different formats. In order to be able to compare individual
records and thus make them available for further operations one common
exchange format needed to be identified into which all records, independently
of their native format, can be converted. LEAF has adapted EAC for this
purpose. The conversion module of the central LEAF system consists of data
conversion routines for each local data structure which convert the uploaded
or harvested local records into EAC XML and the different character sets into
Unicode (UTF-8). The converted data are then further processed in the LEAF
system. In addition to the converted form records are saved in their local
formats as provided by the LEAF Data Providers. |
|
User
interface |
None
found ( |
|
Relevant
standards |
XML, EAC |
|
Notes |
most
recent newsletter is 11/03 link to
MALVINE yields a blank page 2004/12/31 most
scheduled documentation of last 2 years not delivered online, including a
final report |
|
Citations |
Kaiser,
Max; Hans-Jorg Lieder, Kurt Majcen and Heribert Vallant. New ways of sharing
and using authority information: the LEAF Project. D-lib magazine, vol. 9,
no. 11 (Nov. 2003), http://www.dlib.org/dlib/november03/lieder/11lieder.html |
|
Project
Name |
Library
& Archives of |
|
Institution
or Agency |
Library
& Archives of |
|
URL |
http://www.collectionscanada.ca/csh/s23-120-e.html
(link to information about CSH and relation to RVM) |
|
Contact
Information |
|
|
Project
Type |
Production |
|
Project
Dates |
|
|
Project
Status |
Operational |
|
Languages |
Multilingual:
English, French |
|
Knowledge
Organization Systems |
Subject
heading lists Canadian Subject Headings (CSH) {Subject
heading list} Répertoire de vedettes-matières (RVM)
{Subject heading list} Library of Congress Subject Headings
(LCSH) {Subject heading list} |
|
Subject
Coverage |
General |
|
Description |
To
support the bilingual cataloging policy of the Library & Archives of |
|
Methodology |
Equivalent
RVM and LCSH headings are entered into 7XX fields of CSH MARC21 authority
records. The equivalent term references displayed in the online catalog are
generated from these 7XX fields. |
|
User
Interface |
Online
public access catalog |
|
Relevant
Standards |
MARC 21 |
|
Notes |
URL for
AMICUS: http://www.collectionscanada.ca/amicus/index-e.html |
|
Citation |
Armstrong,
Pam (2003). "Navigating bilingual subject headings in AMICUS."
Presented at the program, Getting the
Most Out of Subject References in the Online Catalog: Better Than It Used to
Be? American Library Association Annual Conference, |
|
Project
Name |
LIMBER
(Language Independent Metadata Browsing of European Organizations) |
|
Institution
or Agency |
LIMBER Consortium |
|
URL |
http://www.limber.rl.ac.uk/ |
|
Contact
Information |
Michael
Wilson Project
Manager m.d.wilson@rl.ac.uk |
|
Project
Type |
Production,
Development |
|
Project
Dates |
1999-2001 |
|
Project
Status |
Complete |
|
Languages |
Multilingual:
English, French, German, Spanish |
|
Knowledge
Organization Systems |
Thesaurus: ELSST |
|
Subject
Coverage |
Social
Sciences |
|
Description |
The goal
of the LIMBER Project is to develop tools to support multilingual access to
data distributed across the world wide web by using metadata and a
multilingual thesaurus of terms in a restricted vocabulary. |
|
Methodology |
LIMBER is
using W3C's RDF language as the
technology to define metadata and the multilingual thesaurus, and FortH's SIS
multilingual thesaurus management system as the base technology for the
multilingual thesaurus server. The LIMBER tools will be generic, but they
will be demonstrated by enhancing the existing NESSTAR data access system with
multilingual capability, for the domain of social science. Another project FASTER is enhancing the categories of
data that NESSTAR can retrieve. LIMBER is using the UK Data Archive's
Hasset thesaurus of terms in social science as the starting point for a
multilingual thesaurus for social science in English, French, Spanish and
German. LIMBER is advancing the DDI metadata format for
social science data to support multilingual access as a demonstration of
multilingual access in the social science domain. |
|
User
Interface |
Web
Interface |
|
Relevant
Standards |
RDF, DDI |
|
Notes |
LIMBER is
an EU IST programme funded research
and development project. |
|
Citation |
Miller,
Ken and Brian Mathews. Having the right connections: the LIMBER Project.
Journal of Digital Information, vol. 1, no. 8 ( |
|
Project
Name |
MACS
(Multilingual Access to Subjects) |
|
Institution
or Agency |
Conference
of European National Librarians. Project partners are: the Swiss National Library
(SNL), Bibliothèque nationale de France (BnF), the British Library (BL), and
Die Deutsche Bibliotek (DDB) |
|
URL |
https://ilmacs.uvt.nl/pub/ |
|
Contact
Information |
Patrice
Landry |
|
Project
Type |
Production |
|
Project
Dates |
|
|
Project
Status |
In
development |
|
Languages |
Multilingual:
English, French, German |
|
Knowledge
Organization Systems |
Subject
headings lists Schlagwortnormdatei (SWD) Répertoire d'autorité-matière
encylopédique et alphabétique unifié (RAMEAU) Library of Congress Subject Headings
(LCSH) |
|
Subject
Coverage |
General |
|
Description |
MACS aims
to provide multilingual subject access to library catalogues. MACS enables
users to simultaneously search the catalogues of the project's partner
libraries in the language of their choice (English, French, German). |
|
Methodology |
Equivalence
links are created between the three subject headings lists used in the
partner libraries' catalogs. The links are stored in the MACS Links Database.
There are two search interfaces for the Database. (1) The Search Interface:
allows users to browse headings and retrieve bibliographic records by using
the links established between the concepts. The search interface uses the
Z39.50 protocol. (2) The Link Management Interface: enables the creation and
management of links between headings from the subject headings lists. |
|
User Interface |
Online
Public Access Catalog |
|
Relevant
Standards |
NISO
Z39.50 |
|
Notes |
The
headings from the three lists are analyzed to determine whether they are
exact or partial matches, of a simple or complex nature. The end result is
neither a translation nor a new thesaurus but a mapping of existing and
widely used |
|
Citation |
Freyre,
E. & Naudi, M. (2003). MACS: Subject access across languages and
networks. In I.C. McIlwaine (Ed.), Subject retrieval in a networked
environment. Proceedings of an IFLA satellite meeting held in Zeng
& Chan (2004). |
|
Project
Name |
Merimee |
|
Institution
or Agency |
|
|
URL |
|
|
Contact
Information |
|
|
Project
Type |
|
|
Project
Dates |
|
|
Project
Status |
Operational? |
|
Languages |
Multilingual:
English, French |
|
Knowledge
Organization Systems |
Thesauri Le thesaurus de l'architecture Art and Architecture Thesaurus (AAT) English Heritage Thesaurus |
|
Subject
Coverage |
Cultural
heritage, art, architecture |
|
Description |
For the
purpose of indexing complexes, buildings and built structures,Le thesaurus de
l'architecture was created and mapped to AAT and the English Heritage
Thesaurus. |
|
Methodology |
When
mapping from Le thesaurus de l'architecture to the other thesauri, Boolean
operators "AND" and "OR" are used to indicate equivalence
in addition to the exact equivalence types, exact and partial. |
|
User
Interface |
|
|
Relevant
Standards |
|
|
Notes |
|
|
Citation |
Doerr, M.
(2001). Semantic problems of thesaurus mapping. Journal of Digital Information,
[Online], 1 (8). Available:
http://jodi.ecs.soton.ac.uk/Articles/v01/io8/Doerr#Nr.52 Zeng and
Chan (2004). |
|
Project
Name |
MSC and
Schedule 510 in DDC |
|
Institution
or Agency |
University
at |
|
URL |
|
|
Contact
Information |
Iyer
Hemalata University
at hi651@albany.edu |
|
Project
Type |
Research/prototype |