ALCTS Subject Analysis Committee

Subcommittee on Semantic Interoperability

 

Subject Semantic Interoperability: Final Report

 

Report of the Subcommittee on Semantic Interoperability to the ALCTS Subject Analysis Committee

Submitted by Chair, Shelby E. Harken, University of North Dakota, March 2006

 

With assistance of Subcommittee members: Bonnie A. Dede, University of Michigan; Lois M. Chan, University of Kentucky; Anton J. Olson, Northwestern University; Ruth A. Bogan, Rutgers University; Shannon L. Hoffmann, Brigham Young University; Daniel Solomon Lovins, Yale University Libraries; and input from Subcommittee members: Diane Dates Casey, Governors State University; Rebecca J. Dean, OCLC; Giles Stewart Martin, OCLC; Lynn M. El-Hoshy, Library of  Congress; Mary C. Lasater, Vanderbilt University; and non-member Assist. Prof. Joseph Tennis, University of British Columbia

 

Introduction

 

An information system managing its own subject access for a single resource can relatively easily produce a successful database. However, there is an increasing need to access multiple resources in multiple languages or with multiple thesauri or controlled vocabularies. To a point, multiple controlled vocabularies and knowledge organization systems can be made to interoperate. However, without appropriate design, the resulting search results will be ‘non-semantic’ and of little value to users. Given that converging information systems — with their idiosyncratic histories and social functions — are likely to produce overlaps, seams, and gaps in the composite whole, the Subject Analysis Committee formed the Subcommittee on Semantic Interoperability to investigate what techniques are currently being employed by developers to minimize loss of meaning and create true semantic interoperability.

 

Work of the Subcommittee

 

Charge:

Survey the current state of international interoperability projects which focus on subject and/or classification data. Produce a document outlining "best practices" at a level of generality that is both flexible enough to be measured against a variety of actual projects and specific enough to be made operational in current or proposed projects.

 

Specific tasks include, but are not necessarily limited to: a) an inventory of known semantic interoperability projects, with descriptions; b) an evaluation of selected projects in terms of those projects stated objectives; c) an investigation of the various concepts involved in the harmonization of indexing languages.

 

To carry out its charge the Subcommittee undertook a number of tasks which have resulted in the documents appended to this report.

 

  • In order to survey the current state of international interoperability projects, the Subcommittee began with an extensive literature review. This literature review along with other background information is included in Appendix A.

 

  • Based on the literature review and to help guide its work, the Subcommittee developed a glossary of terms used in discussing various semantic interoperability (SI) projects (Appendix B). During its investigations and discussions the Subcommittee developed a working definition of subject semantic interoperability which is given below:

"The ability of two or more systems or components to exchange or harmonize cognate subject vocabularies and/or knowledge organization schemes to be used for the purposes of effective and efficient resource discovery without significant loss of lexical or connotative meaning and without special effort by the user."

 

  • Using the above definition, the Subcommittee identified 37 SI projects, which were then compiled into a list that included descriptions of the projects (Appendix C).

 

  • The results of investigating the various concepts and issues involved in subject semantic interoperability were used to formulate criteria for evaluating and developing SI projects (Appendix D).

 

  • From the criteria in Appendix D, the Subcommittee developed a Checklist which could be used to evaluate or design a SI project (Appendix E).

 

  • Using the Checklist, members of the Subcommittee evaluated seven of the SI projects that had been identified and described in its list of projects. The project evaluations are Appendices F1-F7.

 

  • The Chair of the Subcommittee served on the ALCTS Metadata Enrichment Task Force. The Subcommittee and Task Force presented a joint program at the American Library Association 2004 Annual Conference entitled: "Enriching Subject Access". For more information about the Task Force and a description of the Program, see Appendix G.

 

  • The Subcommittee compiled an annotated bibliography (Appendix H) which included the sources cited in the Literature Review and various other appendices, as well as other background readings not cited elsewhere in this report or its appendices. The Subcommittee believes that this bibliography could aid others in the investigation of the concepts and various projects involved in semantic interoperability.

 

Based on the above list of tasks and accomplishments, the Subcommittee believes that it has met its Charge with one exception, i.e., the development of a "Best Practices" document. The reasons for this are discussed below in the Subcommittee's Findings.

 

Subcommittee Findings

 

1.         The 37 projects in Appendix C, fall into 2 broad categories.

i.          Production projects, such as the H.W. Wilson Megathesaurus, and the MACS (Multilingual Access to Subjects) Project. The goal of these projects is to develop    a product or system that can be used by a large number of users in a setting in which semantic interoperability is needed. Most of the 21 projects in this            category are still in development, and a few have ceased or become inactive.

ii.          Research/demonstration projects, such as the DARPA Unfamiliar Metadata Project and the HILT (High Level Thesaurus) Project. There are 14 projects in        this category, most of which are completed or have become inactive due to a            lack of   funding. It is possible that a few of the active research projects might         evolve into working production systems.

Note that the Subcommittee was unable to classify two of the projects as production or research. Furthermore, lack of documentation made it difficult to determine if some of the projects had been permanently or temporarily suspended.

 

2.         After reviewing the literature and examining various projects, the Subcommittee decided     that a best practices document for semantic interoperability was premature. There were            several reasons for this.

·         The Subcommittee was unable to find any existing tool that could be used to evaluate a semantic interoperability project. Consequently, the Subcommittee would have to first develop an evaluation tool.

·         Once developed, the tool could be employed to evaluate selected projects in order to identify successful methods and models. The methods and models would form the basis of a best practices document. The Subcommittee's three year term proved insufficient to accomplish three major tasks: the development of a tool; the project evaluations; and the analysis of successful projects.

·         Finally, even with a completed evaluation tool, there were still only a few SI projects in full production, and these had not been in production long enough to yield much analyzable data about how successful they were in meeting their goals and objectives.

 

3.         For its evaluation tool the Subcommittee developed a Checklist (Appendix E) comprising       a number of questions to evaluate projects. To test its viability, the Subcommittee used        the Checklist to evaluate seven projects (Appendices F1-F7). Based on these evaluations,        the Subcommittee found that the Checklist could serve as a useful evaluation tool.

 

4.         Many of the questions in the Checklist are the same ones that developers of SI projects       need to answer as they design their projects. Therefore, the Subcommittee also    concluded that the Checklist could serve as a guide to developers of SI projects.

 

5.         The Subcommittee has been able to identify a few semantic interoperability projects that     are in full production. Some examples are listed below.

AGROVOC Thesaurus {Food and Agricultural Association}

Art and Architecture Thesaurus (AAT) {Getty Research Institute}

Bilingual Subject Access {Library & Archives of Canada}

Classification Web {Library of Congress}

H. W. Wilson Megathesaurus

Renardus {Renardus Consortium}

Unified Medical Language System (UMLS) {National Library of Medicine}

WebDewey {OCLC}

These projects share the following attributes:

·         A well developed master plan for life-cycle management and data migration

·         Reliance on international standards

·         A viable business model which provides ongoing financial support for the project

·         Adequate staff, computer software and hardware to support the project

 

Conclusion

 

The need for improved semantic interoperability between and among vocabularies and knowledge organization schemes is undeniable and growing in importance. There is an ever-increasing need to create an environment by which even multiple portals could be accessed via subject metadata using software that is neutral and available ubiquitously or directly to the user, that could be copied by libraries for use in their own environment. In order to develop or improve a knowledge organization system including emerging options in semantic interoperability, scholars and practitioners need to be able to evaluate a wide variety of projects and stay current with the professional literature.

 

Based on its findings, the Subcommittee concludes that the development of a successful subject semantic interoperability project is a long and difficult process. It requires a substantial investment of financial, human and computer resources. The Subcommittee recommends using the information and tools in this report and its appendices to assist in developing a successful project incorporating subject semantic interoperability. Finally the Subcommittee concludes that since this field of endeavor is still relatively young and immature, it is too early to generate a set of Best Practices that could be used in developing a successful project. We are past the theoretical and basic research phase and into the development phase. Even though there are some successful projects in full production, more projects need to reach maturity and much more research needs to be done.

 

Appendices

 

A          Background Information and Literature Review

B.         Glossary

C.         Project Inventory

D.         Criteria for Evaluating and Developing Subject Semantic Interoperability

E.         Checklist for Evaluating and Developing Subject Semantic Interoperability Projects

F.         Project Evaluations

            F1.        ADL Thesaurus Protocol

            F2.        Library & Archives of Canada Bilingual Cataloging

            F3.        H.W. Wilson Megathesaurus

            F4.        HILT

            F5.        MACS

            F6.        RDN Subject Portals

            F7.        UMLS (National Library of Medicine)

G.         Program Summary

H.         Annotated Bibliography


A          Background Information and Literature Review

 

Most online library systems worldwide utilize some type of controlled vocabulary, and in many cases multiple vocabularies.  From a librarian's point of view keyword searching on the Internet has its limitations. Yet, online catalogs exist in the Internet environment along with other remotely accessible databases, which may utilize their own controlled vocabularies. Consequently during an information seeking experience, users may be presented with a myriad of thesauri and other controlled vocabularies. These same problems were identified by Marcia Bates in her report to the Library of Congress[1]. Although some Internet search engines function fairly well, the Subcommittee felt it needed to limit its focus to environments using some type of structured subject-based metadata or embedded metatags, rather than random or weighted keywords.

 

In the ALCTS report, "Subject data in the metadata record"[2] functional requirements for subject access to Internet resources include: a) to assist searchers in identifying the most efficient paths for resource discovery and retrieval; b) help users focus their searches; c) enable optimal recall; d) enable optimal precision; e) assist searchers in developing alternative search strategies; f) provide all of the above in the most efficient, effective and economical manner.

 

In a networked environment, interoperability among disparate systems is necessary to allow users to search among resources from multiple sources generated and organized according to different standards and approaches. Lois Chan in her paper for the Bicentennial Conference on Bibliographic Control for the New Millenium 2000[3] summarized the interoperability requirements as follows: a) interoperability among different systems, metadata standards, and languages; b) flexibility and adaptability to different information communities, not only different types of libraries, but also other communities such as museums, archives, corporate information system, etc; c) extensibility and scalability to accommodate the need for different degrees of depth and different subject domains; d) simplicity in application, i.e. easy to use and to comprehend; e) versatility, i.e. the ability to perform different functions; and f) amenability to computer application. 

 

Doerr (2001)[4] notes that terminological resources are increasingly important for information retrieval in the networked environment, for retrieving documents by querying databases, and for using metadata employing controlled vocabularies. There is a growing interest in developing automated intermediaries to negotiate the differences between controlled vocabulary schemes so that a user can use a familiar set of terms to search collections using other vocabulary schemes.

 

Hunter (2001)[5] points out that networked knowledge organization systems typically contain objects of mixed media types which are described using a multitude of diverse metadata schemas. Hence machine understanding of metadata descriptions which conform to schemas from different domains is a fundamental requirement for access. Yet, problems arise from the differences in terminological semantics and hierarchical relationships within various subject schemes.

 

Bella Hass-Weinberg[6] in Thesaurus Design for Semantic Information Management suggested that "semantic information management” really just means vocabulary control; that ontology usually just means classification scheme, but sometimes is used as a synonym for thesaurus, and that taxonomy is just a synonym for classification. Subject headings lists, such as LCSH are essential tools for managing information in a print environment, while true thesauri are often more useful in the online environment (where they can be viewed hierarchically or combined in Boolean searches). Thesauri often run into the problem of needing to distinguish homographs. The problem in the selection of thesaurus terms is largely one of determining a set of appropriate lexemes, that is, the smallest units of lexicon that can be understood on their own terms. Synonymy is a common problem, though easily managed, e.g. Cancer, see Neoplasm. Other problems: having to choose between singular and plural, parts of speech, etc.

 

A subject portal connects users to a site focusing on a particular subject, with access to high-quality information resources, allowing aggregated cross-searching, streamlined account management, user profiling, or additional services.[7]  However, the user has to know to go to the portal. The number of subject portals is growing.

 

Renardus is an example of a subject gateway/portal project with a goal of providing users with integrated access by searching or browsing, through a single interface, to partners' quality-controlled subject gateways. Further goals are to develop and define organizational models, business models, technical solutions and metadata standards (Renardus Application Profile, Renardus Namespaces, Renardus Collection Level Description). The following elements can be used to define a quality-controlled subject gateway: a) selection and collection development, b) collection management, c) creation, d) resource description and metadata, e) subject access, f) search and browse access, g) standards, h) value-adding features. Each participating partner is responsible for mapping its metadata format to the common Renardus metadata format, derived from Dublin Core. A generic normalization toolkit with Z39.50 configuration files and a conversion script were provided. Each participant set up a Renardus server with their content normalized to the Renardus data model. A set of screens were built for the user interface: a) homepage, b) advanced search screen, c) index scan window, d) advanced search page after index scan, e) browse by subject screen, f) (preliminary) result screen, g) sorted result screen, h) participating gateways screen, and I) help (index) screen. In order to accomplish subject browsing, the various systems, are mapped to a common classification system. The Renardus service provides access to resources from all kinds of subjects, published world-wide and in many languages and it is intended to be offered to an international multi-disciplinary community of users. The Dewey Decimal Classification and Relative Index (DDC) was chosen because of online availability and tools, global usage, suitability of the classification system and its functionality, frequency and character of the updates, research and methodological development efforts.[8]

 

About the same time the SAC Subcommittee on Semantic Interoperability was formed, NISO decided Z39.19 Guidelines for the Construction, Format, and Management of Monolingual Thesauri needed changing to meet the needs of the changing information environment. Their rationale included, "Developers of Internet and Intranet-accessible Web pages, databases, and information systems need better metadata to support non-expert information searches, and metadata developers are recognizing the value of incorporating high-quality, interoperable controlled vocabularies and taxonomies into their schemes."[9]

 

Literature Review

 

Some researchers have been making close examinations of individual projects, while others focus mainly on theoretical issues. Recent noteworthy articles of both types in the library and information science domain include those by Chan & Zeng[10], Tennis[11], and Zeng & Chan[12]; while those in the computer science and database design domain include Dhamankar, et al.[13] Park & Ram[14] and Parsons & Wand[15].

 

The work of Chan and Zeng is particularly useful for breaking down the many variables that make up subject semantic interoperability. One major variable involves the selection of data types, systems, or standards, which are to be made interoperable. There are projects, for example, that harmonize different controlled vocabularies in the same language, e.g., Northwestern University’s mapping of LCSH and MeSH[16], the Wilson Megathesaurus[17], and CARMEN’s integration of multiple German thesauri; projects that aggregate subject vocabularies from among different languages and classification systems, e.g., the Unified Medical Language System (UMLS)[18], the High Level Thesaurus[19] &[20]  ), and the DARPA Unfamiliar Metadata Project[21]; projects that map a controlled vocabulary to a universal classification system such as OCLC’s correlation of LCSH with DDC[22], and the mapping of UDC to General Finnish Subject Headings[23]; and projects that harmonize heterogeneous classification schemes such as the American Mathematical Society’s mapping of Mathematics Subject Classification to Schedule 510 of the DDC[24]. 

 

Some interoperability variables are more methodological in nature. Following the work of Chan and Zeng[25], these may be sorted into six categories: (1) “Derivation/Modeling,” where a relatively simple vocabulary is derived from a more complicated pre-existing source, the way Faceted Application of Subject Terminology (FAST) is extracted from LCSH, for example; (2) “Translation/Adaptation” (e.g., the Bibliothèque Nationale’s Rameau system, generated through translation and adaptation of LCSH and Canadian Subject Heading (CSH)); (3) “Satellite and Leaf Node Linking,” where specialized thesauri (such as The Legislative Indexing Vocabulary (LIV), Thesaurus for Graphic Materials, Global Legal Information Network  (GLIN)) are treated as satellites of a larger entity (LCSH) or conceptualized as leaves (specialized thesauri) attached to a tree structure (the larger thesaurus or vocabulary list); (4) “Direct mapping,” where equivalence between differently-sourced terms and classification numbers are established, usually requiring intensive intellectual effort; (5) linking through a “temporary union list”; and (6) linking through a “thesaurus server protocol,” as with the Alexandria Digital Library project.

 

Other variables discussed in the literature include: How are interoperable links stored and managed? Do they rely on authority records, concordance tables, a central switching language, semantic networks, lexical databases, semantic layers[26], or some other structure?  How are data and metadata in general stored? This is to say, are they being gathered into a union catalog (e.g., American Memory Project, NSDL), or living in a distributed system. How is data structured? For example, do they rely on XML, MARC, Dublin Core, and/or other metadata standards?

 

Yet another set of variables involves difference in degree of granularity, and logical structure. In the chapter “Compatibility and Convertibility” (pp. 179-216) of his Vocabulary Control for Information Retrieval, W.F. Lancaster points out several difficulties with which anyone attempting semantic interoperability (or “vocabulary reconciliation”, as he puts it) must contend: How to reconcile vocabularies which have different degrees of specificity, different degrees of pre-coordination, overlap in subject matter, and different arrangements of hierarchy[27]. Vizine-Goetz, et al.[28]  paraphrases Lancaster’s observations, and add to them the more recently discussed problems of: common versus scientific names from Doerr[29], Olson[30] and “differences in meaning resulting from different classifications of terms[31] &[32].” In an automated environment there is also the problem of different methods and standards for encoding and preserving metadata.

 

 

 

 

 

 

 

 

 

 

 


B.      Glossary

 

 

Terms

Definitions

classification scheme

The terms classification scheme, taxonomy, categorization scheme are often used interchangeably. Though there may be subtle differences from example to example, in general these types of KOSs provide ways to separate entities into buckets or relatively broad topic levels. Some examples provide a hierarchical arrangement of numeric or alphabetic notation to represent broad topics. These types of knowledge organization systems may not follow the strict rules for hierarchy required in the ANSI NISO Thesaurus Standard (Z39.19) (NISO), and often lack the explicit relationships presented in a thesaurus.[33]

concept map

A diagram showing the relationships between concepts. Concepts are connected with labeled arrows, in a downward-branching hierarchical structure. The relationship between concepts is articulated in linking phrases, e.g., "gives rise to", "results in", "is required by," or "contributes to".[34]

concordance table

Also called a correspondence table. Methodologically, a concordance table describes the way in which terms in multiple vocabularies are related.[35]

controlled vocabulary

A subset of a language, consisting of pre-selected words and phrases designated as index terms. In a controlled vocabulary, each subject is represented by one valid term only; and, conversely, each term represents only one subject. References are made from equivalent or synonymous terms not selected as valid index terms. Homographs are disambiguated. In addition, a controlled vocabulary contains links among hierarchically or otherwise related terms. Examples of controlled vocabularies include Library of Congress Subject Headings, Thesaurus of ERIC Descriptors, and Medical Subject Headings. The term "controlled vocabulary" is often used in a broad sense to include scheme-based classification data, which also manifest rigorous structures and embody relationships among concepts.[36]

cross-domain search

A search of multiple resources from different domains through a single interface, using a single query.

crosswalk

A program or algorithm to map elements in different metadata schemes. An example is the Dublin Core/MARC/GILS Crosswalk designed by the Library of Congress.[37]

descriptors

Terms used in indexes, abstracts, or other databases/periodical indexes to describe the subjects of an article.

dictionary

Alphabetical lists of terms and their definitions that provide variant senses for each term, where applicable. They are more general in scope than a glossary. While a dictionary may also provide synonyms and through the definitions, related terms, there is no explicit hierarchical structure or attempt to group terms by concept.[38]

gazetteer

A dictionary of place names. Traditional gazetteers have been published as books or they appear as indexes to atlases.[39]

glossary

A list of terms, usually with definitions. The terms may be from a specific subject field or those used in a particular work. The terms are defined within that specific environment and rarely have variant meanings provided. Examples include the EPA Terms of the Environment.[40]

harmonization

The process of making disparate entities or systems work together. Its purpose is to resolve conflicts and to remove obstacles by overcoming idiosyncrasies of individual systems. Within the context of subject access, harmonization implies efforts to make terms from different controlled vocabularies work together for the benefit of improving retrieval results. Differences may occur in semantics and/or syntax, and among multiple languages. Harmonization provides the ability to accommodate two or more different systems, schemes, or standards to facilitate searching across databases. Methods of harmonization include linking and mapping.[41]

interoperability

The ability of two or more systems or components to exchange information and use the exchanged information without special effort on the part of either system.[42]

knowledge organization system

A general term referring to the tools that present the organized interpretation of knowledge structures; includes authority files, classification systems, concept spaces, dictionaries, gazetteers, glossaries, ontologies, subject heading sets, thesauri; often called KOS, sometimes, knowledge organization scheme.[43]

KOS

See knowledge organization system.

link

A mechanism for associating equivalent or associated terms.

mapping

A special form of linking, with efforts to identify equivalence or establish one-to-one and, in some instances, one-to-many relationships. Mapping facilitates automatic switching between systems or languages. Recent developments include efforts to match elements in the MARC record with those in other metadata records and efforts to identify equivalent terms among different controlled vocabularies or different languages. Examples of mapping of subject entries include the Omni File (based on the indexes to individual WILSONLINE databases) and MACS (Multi-lingual Access to Subject headings), a European project on multilingual access to subject authority files and data to develop a prototype for the mapping of subject entries based on three controlled vocabularies: Library of Congress Subject Headings (LCSH), RAMEAU, and Schlagwortnormdatei (SWD)).[44]

metathesaurus

A "thesaurus of thesauri," serving as a framework within which diverse controlled vocabularies are harmonized for the purpose of facilitating cross-file searching. An example is the UMLS(Unified Medical Language System) Metathesaurus developed and maintained by the National Library of Medicine, in which "alternate names [from different source vocabularies] for the same concept (synonyms, lexical variants, and translations) are linked together. Each Metathesaurus concept has attributes that help to define its meaning, e.g., the semantic type(s) or categories to which it belongs, its position in the hierarchical contexts from various source vocabularies, and, for many concepts, a definition." (National Library of Medicine 1999).[45]

networked knowledge organization system

An interactive information device aimed at supporting the description and retrieval of heterogeneous information resources on the internet; sometimes NKOS.[46]

NKOS

See networked knowledge organization system

ontology

A knowledge representation format. That is, an ontology is a shared understanding of the structure of a domain of interest. Ontologies make it easy both for humans to compile and maintain a body of knowledge, and for computer programs to use this knowledge to intelligently manipulate data. An ontology organizes all data using the concepts of class, object, and relationship. Classes are organized into a hierarchy, ordered by subclass, called a taxonomy. A well-known taxonomy is the biological taxonomy of all living things, in which living things are sub-classed into their kingdom: plant or animal. Plants and animals are further classified into phylum, etc. An ontology extends a taxonomy by including relationships among objects and classes, which can represent properties and values. To continue the biological example, there is a relationship "number of limbs" between certain classes of animals and integers. Many taxonomies have been developed to organize knowledge in particular areas.[47]

ontology mapping

The process of ontology mapping concerns how classes from one ontology can be mapped to classes of another taxonomy in an automated way.[48]

query term

The word or term with which a user begins a search.

semantic interoperability

The ability of two or more systems or components to exchange or harmonize cognate subject vocabularies and/or knowledge organization schemes to be used for the purpose of effective and efficient resource discovery without significant loss of lexical or connotative meaning and without special effort by the user

 

semantic network

A type of  KOS that structures concepts and terms not as hierarchies but as a network or a web; concepts are thought of as nodes with various relationships branching out from them; the relationships generally go beyond the standard BT, NT and RT and may include specific whole-part relationships, cause-effect, parent-child, etc. Examples of semantic networks include Princeton’s WordNet, which is now used in a variety of search engines, and the Unified Medical Language System (UMLS) Semantic Network.[49]

subject authority file

An internal tool for catalog or database management. It contains authority records and provides documentation of a body or list of authorized and authoritative indexing terms in the context and framework of its vocabulary.[50]

subject authority record

A record of a subject heading that shows its established form, cites the authorities consulted in determining the choice and form of the heading, and indicates the cross-references made to and from the heading.[51]

subject headings

A set of controlled terms to represent the subjects of items in a collection. Subject heading lists can be extensive, covering a broad range of subjects. In use, subject headings tend to be pre-coordinated, with rules for how subject headings can be joined to provide more specific concepts. Examples include the Medical Subject Headings (MeSH) and the Library of Congress Subject Headings (LCSH).[52]

switching language

Intermediary terms that serve as a mechanism for moving between vocabularies; unlike links, which are internal, switching language is external to records for the terms being associated

taxonomy

A hierarchical data structure or a type of classification schema made up of classes, where a child of a taxonomy node represents a more restricted, smaller, subclass than its parent.[53]

term list

A list of words or phrases, often with definitions; examples include authority files, glossaries, gazetteers, and dictionaries.[54]

thesaurus

A type of KOS, which is based on concepts that show relationships between terms. Relationships commonly expressed in a thesaurus include hierarchy, equivalence, and associative (or related). These relationships are generally represented by the notation BT (broader term), NT (narrower term), SY (synonym), and RT (associative or related).[55]

 

 

 

 

 

 

 


 

C.                Project Inventory

 

Using the definition of semantic interoperability developed by the Subcommittee, 37 projects were identified. The projects, along with information about them, are listed below alphabetically by name. As can be seen from the list, the amount of information that the Subcommittee was able to find varied from extensive for some projects to very little for others. Minimally, for each project the Subcommittee attempted to provide contact information, a URL, and/or a citation, so that a reader of this report could be directed to additional sources of information about a particular project. The Subcommittee's term ended at the 2005 ALA Annual Conference, so this list has not been updated since June 2005. Since then, information about some of these projects may have changed, and some new projects may have begun. The Subcommittee attempted to be as comprehensive as possible and include all known major SI projects in the List, but of course some projects may have been overlooked. The Subcommittee would especially like to acknowledge the work of Marcia Lei Zeng and Lois Mai Chan, whose list of 18 SI projects[56] (with descriptions) was the starting point for the Subcommittee's list.

 

 

 

Name

ADL Thesaurus Protocol

Institution or agency

University of California, Santa Barbara

URL

project site at http://alexandria.sdc.ucsb.edu/~gjanee/thesaurus/

demonstrator page at http://www.comp.glam.ac.uk/%7Efacet/formats/skos/skos_search.htm

Contact information

Linda Hill, Ph.D.

Alexandria Digital Library Project

UC Santa Barbara

Santa Barbara, California 93106

lhill@alexandria.ucsb.edu

Project type

Production

Project dates

 

Status of project

Current with demonstrator project available for public viewing

Languages

 

Knowledge organization systems (KOS)

Thesauri

Subject Coverage

General

Description

Protocol for exchange of thesaurus information. Thesaurus data exchange tool

The Thesaurus Protocol is based on the ANSI/NISO (1993) Z39.19 thesaurus model and supports downloading, querying, and navigating thesauri.

Methodology

In 2001-2002, the ADL Implementation team developed a Thesaurus Service Protocol.  It is a lightweight, stateless, XML- and HTTP-based protocol designed to support searching and retrieval of thesaurus data. All that is required for its use is the development of a thesaurus server that can accept the specified XML-encoded queries and return the specified standard reports.  The demonstrator system loads a thesaurus of choice (from a proffered list). The thesaurus can then be searched by keyword. Displays of results take several formats--alphabetical list of retrieved terms with USE references, hierarchical display, scope notes.

User interface

The Thesaurus Protocol is based on the ANSI/NISO (1993) Z39.19 thesaurus model and supports downloading, querying, and navigating thesauri.

Relevant standards

XML, XML Schemas,  HTTP, ANSI/NISO Z39.19-1993 (thesaurus structure), XPATH, SKOS

Notes

 

Citation

ADL Thesaurus Protocol cited in recent articles in Cataloging and Classification Quarterly (vol. 37 no 3-4 2004)

Janée, G, Ikeda, S. & Hill, L.L. (2002). The ADL Thesaurus Protocol. Alexandria Digital Library Project. Available: http://www.alexandria.ucsb.edu/thesaurus/protocol/specification.html

Binding, Ceri and Douglas Tudhope. KOS at your service: programmatic access to knowledge organization systems. Journal of digital information: vol. 4, issue 4, art. 265 (Feb. 5, 2004)

Zeng, M. & Chan, L.M. (2004). Trends and issues in establishing interoperability among knowledge organization systems. Journal of the American Society for Information Science, 55(5), 377-395.

 

 

 

Project Name

AGROVOC

Institution or Agency

Food and Agricultural Organization of the United Nations

URL

http://www.fao.org/agrovoc/

Contact Information

 

Project Type

Production

Project Dates

 

Project Status

Operational

Languages

Multilingual: Arabic, Chinese, Czech, English, French, Portuguese, Spanish

Knowledge Organization Systems KOS)

Thesaurus

Subject Coverage

Agriculture

Description

Multilingual agricultural thesaurus.

Methodology

 

User Interface

A user selects one of the languages and submits a string in that language to the AGROVOC database. The result is a list of terms and phrases that begin with the string. On the same page is a thesaural display of the first term in the list, and a list of equivalent terms in the other languages with links to thesaural displays of the term in these languages. A user select other terms from the list.

Relevant Standards

 

Notes

 

Citation

 

 

 

 

Project Name

Art & Architecture Thesaurus (AAT)

Institution or Agency

Getty Research Institute

URL

http://www.getty.edu/research/conducting_research/vocabularies/aat/

Contact Information

Getty Research Institute

1200 Getty Center Drive, Suite 1100

Los Angeles, CA 90049-1688

(310) 440-7335

griweb@getty.edu

Project Type

Production

Project Dates

 

Project Status

Operational

Languages

Multilingual

Knowledge Organization Systems KOS)

Thesaurus

Subject Coverage

Art, Architecture and Material Culture

Description

The AAT is one of three Getty vocabularies which provide terminology and other information about the objects, artists, concepts, and places important to various disciplines that specialize in art, architecture and material culture.

Methodology

The AAT is a structured vocabulary containing terms and other information about concepts. Terms for any concept may include the plural form of the term, singular form, natural order, inverted order, spelling variants, various forms of speech, equivalent terms in various languages and synonyms of different etymological roots. Among these terms one is flagged as the preferred term or descriptor for the concept.

User Interface

Online public access catalogs and/or the Getty Web Site

Relevant Standards

MARC 21, XML

Notes

The other two Getty vocabularies are: the Thesaurus of Geographic Names (TGN), which contains names and other information about places; and the Union List of Artist Names, which contains names and other information about artists.

Citation

 

 

 

 

Project Name

BUBL

Institution or Agency

Centre for Digital Library Service, University of Strathclyde

URL

http://bubl.ac.uk/

Contact Information

BUBL Information Service

Centre for Digital Library Service

Department of Computer and Inofrmation Sciences

University of Strathclyde

Livingstone Tower

26 Richmond Street

Glasgow G1 1XH

U.K.

0141 548 4752

bubl@bubl.ac.uk

Project Type

Production

Project Dates

1990-

Project Status

Operational

Languages

English

Knowledge Organization Systems KOS)

Subject heading list and classification system

    BUBL subject tree

    Dewey Decimal Classification (DDC)

Subject Coverage

General

Description

BUBL is an Internet-based information service for the UK higher education community. BUBL LINK is a catalogue of selected Internet resources for covering all academic subject areas.

Methodology

 

User Interface

A user can browse for subjects through the BUBL subject tree; browse through the DDC hierarchy; or search by author, title, subject, DDC, or resource type.

Relevant Standards

 

Notes

 

Citation

 

 

 

 

Project Name

CAMed

Institution or Agency

Columbia University and Kent State University

URL

http://circe.slis.kent.edu/mzeng/tmshome.html

Contact Information

Marcia Lei Zeng

School of Library and Information Science

Kent State University

Kent, OH 44242-0001

mzeng@kent.edu

Project Type

Research/prototype

Project Dates

 

Project Status

Current?

Languages

Multilingual: English, French

Knowledge Organization Systems KOS)

Thesauri

    AcuBase Thesaurus

    AMED Thesuarus

    JICST

    MiliMedicalThesaurus

Subject Coverage

Complementary and Alternative Medicine

Description

An integrated thesaurus management and cross-thesaurus search system for complementary and alternative medicine (CAM).

Methodology

Four thesauri in the areas of CAM were normalized and stored in a thesaurus repository. This system allows a database manager to manage and edit his thesaurus in his local office through a Web interface, while the thesauri are deposited and hosted on a server at Kent State University.

User Interface

The cross-thesaurus search function allows a user to enter a term and search all or any of the thesauri in this repository.  Software matches the query against the thesauri and gives back all fully- or partially-matched thesaurus entries.  When a term is selected from the search results, a user can see the details of a thesaurus term entry (including the broader, narrower, and related terms, as well as non-preferred terms)  and continue selecting among the terms displays.  The term-search eventually enables a direct search in four bibliographical databases (samples) that have been integrated in the prototype.  The term search function also extends to the full-text searching of all resources in the CAMed website.

Relevant Standards

 

Notes

 

Citations

Zeng, M. & Chen, Y. (2003). Features of an integrated thesaurus management and search system for the networked environment. In I.C. McIlwaine (Ed.), Subject retrieval in a networked environment. Proceedings of an IFLA satellite meeting held in Dublin, Ohio, 14-16 August 2001 (pp. 122-128). Munchen: K.G. Saur.

Zeng & Chan (2004).

 

 

 

Project Name

CARMEN (Content Analysis, Retrieval and Metadata: Effective Networking)

Institution or Agency

 

URL

http://www.bibliothek.uni-regensburg.de/projects/carmen12/index.html.en

Contact Information

Dr. Friedrich Geisselmann

Universitätsbibliothek Regensburg

93042 Regensburg Germany

friedrich.geisselmann@bibliothek.uni-regensburg,de

Project Type

Research/prototype

Project Dates

 

Project Status

Current?

Languages

Multilingual: English, German

Knowledge Organization Systems KOS)

Thesauri, classification systems and subject headings lists

    Informationszentrum Sozialwissenschaften (IZT) {Thesaurus}

    German Institute for Educational Reasearch Thesaurus

    Schlagwortnormdatei (SWD) {Subject heading list}

    Dewey Decimal Classification (DDC)

    Regensburger Verbund Klassifikation (RVK)

    Mathematics Subject Classification (MSC)

    Physics and Astronomy Classification Scheme (PACS)

Subject Coverage

Social Sciences, Mathematics, Physics, Astronomy

Description

The goal is to provide an integrated subject search in distributed databases representing different disciplines, taking into account the conceptual differences of the applied thesauri and classifications by cross concordances.

Methodology

Starting from alphabetical lists which contain descriptors from a specific subject area, the relationships between IZT, the German Institute for Educational Research Thesaurus and SWD are determined intellectually. After the relationships have been established, they are recorded in a link management system.

User Interface

 

Relevant Standards

 

Notes

 

Citations

Kunz, M. (2002). Sachliche Suche in verteilten Ressourcen: Ein kurzer Überblick über neuere Entwicklungen [Subject retrieval in distributed resources: a short review of recent developments. Paper presented at the 68th IFLA Council and General Conference, Aug. 18-24, 2002, Glasgow, UK. Available: http://www.ifla.org/IV/ifla68/papers/007-122g.pdf  English translation available: http: //www.ifla.org/IV/ifla68/papers/007-122g.pdf

Zeng & Chen (2004).

 

 

 

Name

Classification Web

Institution or agency

Library of Congress

URL

http://classweb.loc.gov/

Contact information

Cheryl C. Cook

Product Coordinator

Library of Congress

Cataloging Distribution Service

Washington, DC 20541-4912

ccoo@loc.gov

Project type

Production

Project dates

 

Status of project

Current, in production

Language

English

Knowledge organization systems (KOS)

Classification system, Subject heading list

    Library of Congress Classification (LCC)

    Library of Congress Subject Headings (LCSH) {Subject heading list}

Subject Coverage

General

Description

This project links LCC numbers to LCSH headings and vice versa.

Methodology

LCC numbers are added to LCSH authority records; and LCSH headings are added to LCC authority records.

User interface

In Classification Web users can move across the KOS through the links that habeen established.

Relevant Standards

MARC 21

Notes

 

Citation

Zeng and Chan (2004).

 

 

 

Name

Czech National Subject Gateway Project and Uniform Information Gateway

Institution or agency

National Library of the Czech Republic

URL

Uniform Information Gateway: http://www.jib.cz ; User interface:

 

Contact information

 

Project type

Production

Project dates

2nd version released March 2003

Status of project

Current

Language

Czech

Knowledge organization systems (KOS)

 

Subject Coverage

General

Description

The Czechs explored building subject portals for online resources and existing bibliographic records. They surveyed the field of national bibliographical agencies to see what sources they use for subject terminology. Like other similar projects, this is an attempt to achieve interoperability through control of descriptive cataloging.

Methodology

Mapping was being done intellectually on the main classes and principal subdivisions level: in order to reach the highest possible accuracy in mapping process, it was necessary to use common auxiliary subdivisions. Contains four files: geographic, chronological, genre/form, and topical authority files. Subject categorization of heterogeneous information resources using Conspectus method is used. The scheme consists of mapping DDC and UDC. Topics authority terms contain English equivalents.

User interface

Aleph interface allows user to search subjects authority records or conspectus records in a number of languages.

Relevant standards

 

Notes

 

Citations

Stoklasova, Bohdana, Marie Balikova and Ludmila Celbova. The relationship between subject gateways and national bibliographies in international context. Paper presented at 69th IFLA General Conference and Council, 1-9 August 2003, Berlin.

http://www.ifla.org/IV/ifla69/papers/054e-Stoklasova_Balikova_Celbova.pdf

 

 

 

Project Name

DARPA Unfamiliar Metadata Project

Institution or Agency

University of California Berkeley

URL

http://metadata.sims.berkeley.edu/GrantSupported/unfamiliar.html

Contact Information

Michael Buckland

Professor Emeritus

School of Information Management and Systems

University of California, Berkeley

South Hall 203A

Berkeley, CA 94720-4600

(510) 642-3159

buckland@sims.berkeley.edu

Project Type

Research/prototype

Project Dates

 

Project Status

Complete?

Languages

Multilingual: English, French, German, Russian, Spanish

Knowledge Organization Systems KOS)

Thesauri and classification systems

    INSPEC Thesuarus

    Medical Subject Headings (MeSH)?

    U.S. Patent and Trade Office Patent Classification

    World Intellectual Property Organization International Patent Classification

    Library of Congress Classification in the Physical Sciences

    Standard Industrial Classification

Subject Coverage

Biotechnology, Physical Sciences, Technology

Description

"The objective of this project is to link ordinary language queries to unfamiliar indexes and classifications."

Methodology

 

User Interface

Entry Vocabulary Modules are built to respond adaptively to a searcher's query posed in ordinary language. A searcher can enter an ordinary language query to a particular database, and the searcher will be presented with a ranked list of terms from the database's vocabulary. The searcher can then use these terms to perform a search of the database.

Relevant Standards

 

Notes

This project was carried out under the auspices of the Metadata Research Program of the School of Information Management & Systems, University of California, Berkely (http://metadata.sims.berkeley.edu). Two later projects build on the work of the Unfamiliar Metadata Project: the DARPA TIDES Project, Translingual Information Management Using Domain Ontologies; and the Seamless Searching of Numeric and Textual Resources, funded by the Institute of Museum and Library Services.

Citation

Buckland, M., Chen, A., Chen, H., Kim, Y., Lam, B., Larson, R., Norgard, B., & Purat, J. (1999). Mapping entry vocabulary to unfamiliar metadata vocabularies, D-Lib Magazine [Online], 5(1). Available: http://www.dlib.org/dlib/january99/buckland/01buckland.html

Zeng & Chen (2004).

 

 

 

Project Name

DESIRE

Institution or Agency

DESIRE Consortium

URL

http://www.desire.org/

Contact Information

Tracy Hooper

DESIRE Project Manager

Institute for Learning and Research Technology

University of Bristol

8-10 Berkeley Square

Bristol BS8 1HH UK

44 117 928 7197

t.a.hooper@bristol.ac.uk

Project Type

Research

Project Dates

1998-2000

Project Status

Complete?

Languages

 

Knowledge Organization Systems KOS)

Subject gateways

Subject Coverage

 

Description

The Project's focus was on enhancing existing European information networks for research users across Europe through research and development in three main areas: caching, resource discovery and directory services. The Project proposed development and support of subject gateway services, facilitating access to high-quality internet resources and development of services that would allow cross-browsing and cross-searching across gateways.

Methodology

The Project participants proposed a representation of the conceptual relationships typical of controlled vocabularies using the Resource Description Framework (RDF). It was hoped that such an approach would enable the use of generic RDF tools as a basis for mapping between subject vocabularies. The Project report included a proposal for a RDF/XML thesaurus schema that attempted to demonstrate how the RDF data model could represent a web of inter-related concepts and terms from more than one thesaurus.

 

Registries were developed for metadata application profiles (http://desire.ukoln.ac.uk/registry/ra.php3); and

metadata terminology (http://desire.ukoln.ac.uk/registry/element.php3)

User Interface

 

Relevant Standards

 

Notes

During the second phase of the Project (DESIRE II) some background work was conducted on subject vocabularies in order to support the development of interoperable subject gateways, especially with regard to multilinguality and the mapping of different vocabularies.

Citation

 

 

 

 

Name

The FACET Project

Institution or agency

Hypermedia Research Unit

School of Computing

University of Glamorgan

Pontypridd CD37 1DL

Wales, UK

URL

http://www.comp.glam.ac.uk/~FACET/default.asp

Contact information

Douglas Tudhope (dstudhope@glam.ac.uk)

Daniel Cunliffe (djcunlif@glam.ac.uk)

Project type

Demonstration

Project dates

Initial funding covered three year period, 2001-2003

Status of project

Current with demonstrators available for public viewing

Languages

English

Knowledge organization systems (KOS)

Thesauri; faceted thesauri

Subject Coverage

not subject specific; uses thesaurus terms and data from AAT as demonstration

Description

The objective of the FACET Project research has been to:  “Develop and evaluate retrieval tools based on a matching function incorporating thesaurus semantic closeness measures.” The FACET Project attempts to find a way to present thesaurus data to a searcher, to allow the user to search for appropriate resources from displayed thesaurus terms and to provide the searcher with behind the scenes expansion of a search based on concepts of the semantic relationships among thesaurus terms.  One premise of the project is the value of the facet analysis model of thesaurus building. Demonstrators for the FACET Project make use of the Art and Architecture Thesaurus, as an example of a faceted thesaurus.

Methodology

The FACET system architecture comprises client and web browser interfaces, utilities that interact with data objects, and an SQL server database that serves the thesaurus information. In a recent (2004) publication, the developers of FACET state that their intention is to “move toward and open (Web service) platform …  and build on a general programmatic KOS interface …  rather than the custom API employed in the Web demonstrator.”

User interface

Several web based search and display interfaces are proposed in the demonstrators

Relevant standards

XML; the developers are recently acknowledging that there needs to be a standardized protocol for the presentation of representation of thesaurus data; they mention the ADL protocol as a step in the right direction.

Notes

In short, the project attempts to present thesaurus data in a meaningful way to searchers, to propose expanded searching options by suggesting terms in context, and to allow searchers to use the discovered terms in a query of resources.

Initial funding from Engineering and Physical Sciences Research Council (EPSRC), a UK government funding agency for research and training in engineering and the physical sciences (http://www.epsrc.ac.uk/default.htm)

Citation

Tudhope, Douglas, et al. Compound descriptors in context: a matching function for classifications and thesauri.  http://www.glam.ac.uk/soc/research/hypermedia/publications/jcdl02.pdf

Binding, Ceri and Douglas Tudhope. KOS at your service: programmatic access to knowledge organizations systems. Journal of digital information, vol. 4, issue 4, article no. 265 (2004-02-05) http://jodi.ecs.soton.ac.uk/Articles/v04/i04/Binding/

 

 

 

Project Name

Finnish Project

Institution or Agency

 

URL

 

Contact Information

 

Project Type

Research

Project Dates

 

Project Status

Prototype; research

Languages

Multilingual: English, Finnish

Knowledge Organization Systems KOS)

Subject heading list and classification system

    General Finnish Subject Headings (GFSH) {Subject heading list}

    Universal Decimal Classification (UDC)

Subject Coverage

General

Description

This project converts assigned class numbers based on the Finnish abridged edition of UDC into GFSH headings.

Methodology

A dictionary was created that maps UDC numbers to GFSH headings. The dictionary was mechanically applied to convert the bibliographic databases.

User Interface

 

Relevant Standards

 

Notes

 

Citation

Himanka, J. & Kautto, V. (1992). Translation of the Finnish abridged edition of UDC into General Finnish Subject Headings. International Classification, 19, 131-134.

Zeng & Chan (2004).

 

 

 

Project Name

HEREIN (The European Information Network on Cultutral Heritage) Thesaurus

Institution or Agency

European Heritage Network, Council of Europe

URL

http://www.european-heritage.net/sdx/herein/

Contact Information

 

Project Type

Production

Project Dates

 

Project Status

In development

Languages

Multilingual: English, French, Spanish

Knowledge Organization Systems KOS)

Thesaurus

Subject Coverage

Cultural heritage

Description

This multilingual thesaurus is attached to the HERIN Project. It intends to offer a terminological standard for national policies dealing with architectural and archaeological heritage.

Methodology

Most of the terms in the thesaurus come from reports on cultural heritage policy in Europe, supplemented with additional terms issued from specialized documentary sources. Teams from Spain, France and the UK created separate lists of terms in their own languages. The three teams then compared their lists so as to obtain a pool of words with linguistic equivalencies in the three languages.

User Interface

Through the Project Web site, a user can either search for a specific term, or browse through the hierarchical classes.

Relevant Standards

 

Notes

 

Citation

Thérond, D. (2000). European-Heritage Net: The European Heritage Network. Cultivate Interactive, [Online] 2 Available: http://www.cultivate-int.org/issue2/herein/

Zeng and Chan (2004).

 

 

 

Name

HILT (High Level Thesaurus Project)

Institution or agency

Funded by JISC (Joint Informations Systems Company)

URL

http://hilt.cdlr.strath.ac.uk

Contact information

Dennis Nicholson

Director of Research

Centre for Digital Library Research

c/o Andersonian Library

University of Strathclyde

101 St. James Road

Glasgow G4 0NS

44 (0) 141 548 2102

d.m.nicholson@strath.ac.uk

Project type

Pilot Project

Project dates

2000-

Status of project

Current

Language

Multilingual

Knowledge organization systems (KOS)

Thesauri, classification systems, subject heading lists

    Art and Architecture Thesaurus (AAT)

    Dewey Decimal Classification (DDC)

    Library of Congress Subject Headings (LCSH)

    UNESCO Thesaurus

    RDN terminologies

    Wordmap taxonomies set

Subject Coverage

General and special

Description

The pilot project (Phase II) will develop an online terminologies route map (or TeRM) that will map subject schemes to user terminologies and to each other.

Methodology

 

User interface

 

Relevant Standards

 

Notes

Phase I investigated the problem of searching and browsing across a number of distributed services using different indexing vocabularies and attempted to derive a set of recommendations to help facilitate cross-searching and browsing by subject between communities, services and initiatives. The results of these investigations led to HILT Phase II, the Pilot Project described above.

Citation

Nicholson, D. & Wake, S. (2003). HILT: Subject retrieval in a distributed environment. In I.C. McIlwaine (Ed.), Subject retrieval in a networked environment. Proceedings of an IFLA satellite meeting held in Dublin, Ohio, 14-16 August 2001 (pp. 61-67). Munchen: K.G. Saur.

Zeng & Chan (2004).

 

 

 

Name

H.W. Wilson Megathesaurus for Omnifile Project

Institution or agency

H.W. Wilson

URL

www.hwwilson.com/Databases/omnifile.cfm

Contact information

 

Project type

Production

Project dates

 

Status of project

Active, in production

Language

English

Knowledge organization systems (KOS)

Thesauri

Subject Coverage

General

Description

Merges KOS of different structural types

H.W. Wilson has developed a “megathesaurus” that gathers the vocabulary for all its indexes for inclusion in its Omnifile product. The Omnifile product now includes six of the 11 Wilson periodical files, plus all of the full text from the remaining five files. Eventually Omnifile will probably include all their files, but this may take some time, since the remaining five are very specialized. Files covering non-periodical material use different indexing vocabularies and do not form part of the Omnifile product.

Methodology

Concepts merge into single terms, while the megathesaurus retains the terminology used in the separate indexes. The individual database products use the same terms as always; in the Omnifile product, the megathesaurus equivalent appears. Wilson has changed the vocabularies for individual products where conflict between indexes used to exist. Homographs (two words that look the same though they are not necessarily pronounced the same) are clarified by means of devices such as qualifiers, and if a term was used differently in two indexes, e.g., “writing” as composition versus learning to write has been resolved. Names used as subject descriptors appear uniformly across all files; only styling rules are applied to author names.

User interface

Web, specifically, "WilsonWeb". Megathesaurus is largely invisible to the user.

Relevant standards

Unknown

Notes

 

Citations

Kuhr, P.S. (2003) Putting the world back together: mapping multiple vocabularies into a single thesaurus. In I.C. McIlwaine (Ed.), Subject retrieval in a networked environment. Proceedings of an IFLA satellite meeting held in Dublin, Ohio, 14-16 August 2001 (pp. 33-42). Munchen: K.G. Saur.

Milstead, Jessica. Cross file searching: how vendors help--and don’t help--improve compatibility.” Searcher, vol. 7, no. 5 (May 1999)

 

 

 

Project Name

IMesh

Institution or Agency

UKOLN: the UK Office for Library and Information Networking

URL

http://www.imesh.org

Contact Information

UKOLN

c/o The Library

University of Bath

Bath

BA2 7AY

44 1225 38658

imesh-toolkit@imesh.org

Project Type

Production

Project Dates

Sept. 1999 - July 2003

Project Status

 

Languages

 

Knowledge Organization Systems KOS)

Subject gateways

Subject Coverage

 

Description

The Project will build on existing subject software to develop a configurable, reusable and extensible toolkit for subject gateway providers.

Methodology

Components evolve independently but rely on each other to accomplish larger tasks. To achieve interoperability the goal is for components to be able to call on one another efficiently and conveniently.

User Interface

 

Relevant Standards

RDF, SQL

Notes

NSF/JISC International Libraries Initiative.

Citation

 

 

 

 

Project Name

LCSH/MeSH Mapping Project

Institution or Agency

Northwestern University Libraries

URL

http://www.library.northwestern.edu/public/lcshmesh/

Contact Information

Tony Olson

Catalog Librarian

Galter Health Sciences Library

Northwestern University

303 East Chicago Ave

Chicago, IL 60611

(312) 503-8125

ajolson@northwestern.edu

Project Type

Production

Project Dates

1990-

Project Status

Active, in development

Languages

English

Knowledge Organization Systems KOS)

Subject heading lists

    Library of Congress Subject Headings (LCSH)

    Medical Subject headings (MeSH) Thesaurus

Subject coverage

General and medicine

Description

The goal of this project is to integrate LCSH and MeSH in online catalogs.

Methodology

Corresponding established headings in LCSH and MeSH are mapped, and the mapping data is entered into 7XX linking fields of LCSH and MeSH MARC 21 authority records. The data in these fields can be used to generate equivalent term references in an online catalog. The mapping data is continually updated to take into account changes in the two KOS.

User Interface

In online public access catalogs see also references will be provided between equivalent LCSH and MeSH headings.

Relevant Standards

MARC 21

Notes

The project is still in development because most library management systems do not yet index 7XX fields in authority records, and consequently do not supply linking references between equivalent LCSH and MeSH headings.

The mapping data is available for use in other interoperability projects. Files of enhanced LCSH and MeSH authority records with the mapping data can be downloaded from the Northwestern public http site above.

Citation

Olson, T. & Strawn, G.  "Mapping the LCSH and MeSH Systems."  Information Technology and Libraries, 16(1) March 1997: p. 5-19.

Zeng & Chan (2004).

 

 

 

Name

LEAF (Linking and Exploring Authority Files)

Institution or agency

multiple European institutions;  Dept. of Manuscripts, Staatsbibliothek zu Berlin Preussischer Kulturbesitz;

URL

http://www.crxnet.com/leaf/

Contact information

Name: WEBER, Jutta (Dr)
Tel: +49-30-2662416
Fax: +49-30-2663007
Email: jutta.weber@sbb.spk-berlin.de

Project type

Research/prototype

Project dates

2001-2004 (Fifth Framework Programme)

Status of project

Completed

Languages

Multilingual

Knowledge information systems (KOS)

Name authority files

Subject Coverage

General

Description

Utility for creating universal name authority file.

[From the web site of the Fifth Framework Programme] The beneficial potential of authority information is presently only partly utilised by cultural heritage organisations: libraries, archives, museums etc. are independently working with them without jointly exploiting this valuable resource. Public users are not involved in this scenario neighbouring work in the commercial sector is not integrated. LEAF proposes a model for harvesting existing authority data and person name/corporate body information in a multilingual environment. Via user queries the LEAF system will automatically and dynamically create a common name authority file with links to organisations that provide information about a person or corporate body and/or items connected to them. The LEAF model will be applicable to all projects and co-operations that are dealing with cultural heritage data in all kinds of institutions by making authority information available to everyone involved. The project results will be implemented by extending an existing, fully functional, international online Search and Retrieval service network of OPACs that provides information about modern manuscripts and letters, the MALVINE project.

Methodology

LEAF develops a model architecture for establishing links between distributed authority records and providing access to them. The system allows uploads of the distributed authorities to the central system and automatically links those authorities concerning the same entity. Information which is retrieved as a result of a query will be stored in a pan-European "Central Name Authority File". This file will grow with each query and at the same time will reflect what data records are relevant to the LEAF users. Libraries and archives wanting to improve authority information will thus be able to prioritise their editing work. Registered users will be able to post annotations to particular data records in the LEAF system, to search for annotations, and to download records in various formats.

The local authority data that is uploaded to the central LEAF system is originally encoded in different formats. In order to be able to compare individual records and thus make them available for further operations one common exchange format needed to be identified into which all records, independently of their native format, can be converted. LEAF has adapted EAC for this purpose. The conversion module of the central LEAF system consists of data conversion routines for each local data structure which convert the uploaded or harvested local records into EAC XML and the different character sets into Unicode (UTF-8). The converted data are then further processed in the LEAF system. In addition to the converted form records are saved in their local formats as provided by the LEAF Data Providers.

User interface

None found (12/31/2004)

Relevant standards

XML, EAC

Notes

most recent newsletter is 11/03

link to MALVINE yields a blank page 2004/12/31

most scheduled documentation of last 2 years not delivered online, including a final report

Citations

Kaiser, Max; Hans-Jorg Lieder, Kurt Majcen and Heribert Vallant. New ways of sharing and using authority information: the LEAF Project. D-lib magazine, vol. 9, no. 11 (Nov. 2003), http://www.dlib.org/dlib/november03/lieder/11lieder.html

 

 

 

Project Name

Library & Archives of Canada Bilingual Cataloguing

Institution or Agency

Library & Archives of Canada

URL

http://www.collectionscanada.ca/csh/s23-120-e.html (link to information about CSH and relation to RVM)

Contact Information

 

Project Type

Production

Project Dates

 

Project Status

Operational

Languages

Multilingual: English, French

Knowledge Organization Systems KOS)

Subject heading lists

    Canadian Subject Headings (CSH) {Subject heading list}

    Répertoire de vedettes-matières (RVM) {Subject heading list}

    Library of Congress Subject Headings (LCSH) {Subject heading list}

Subject Coverage

General

Description

To support the bilingual cataloging policy of the Library & Archives of Canada (L&AC), all publications cataloged by the L&AC are assigned subject headings in both official languages, English and French. References between equivalent CSH and RVM headings are displayed in the L&AC's online public access catalog, AMICUS.

Methodology

Equivalent RVM and LCSH headings are entered into 7XX fields of CSH MARC21 authority records. The equivalent term references displayed in the online catalog are generated from these 7XX fields.

User Interface

Online public access catalog

Relevant Standards

MARC 21

Notes

URL for AMICUS: http://www.collectionscanada.ca/amicus/index-e.html

Citation

Armstrong, Pam (2003). "Navigating bilingual subject headings in AMICUS." Presented at the program, Getting the Most Out of Subject References in the Online Catalog: Better Than It Used to Be? American Library Association Annual Conference, June 21, 2003, Toronto, Ontario.

 

 

 

Project Name

LIMBER (Language Independent Metadata Browsing of European Organizations)

Institution or Agency

LIMBER Consortium

URL

http://www.limber.rl.ac.uk/

Contact Information

Michael Wilson

Project Manager

m.d.wilson@rl.ac.uk

Project Type

Production, Development

Project Dates

1999-2001

Project Status

Complete

Languages

Multilingual: English, French, German, Spanish

Knowledge Organization Systems KOS)

Thesaurus:  ELSST

Subject Coverage

Social Sciences

Description

The goal of the LIMBER Project is to develop tools to support multilingual access to data distributed across the world wide web by using metadata and a multilingual thesaurus of terms in a restricted vocabulary.

Methodology

LIMBER is using W3C's RDF language as the technology to define metadata and the multilingual thesaurus, and FortH's SIS multilingual thesaurus management system as the base technology for the multilingual thesaurus server. The LIMBER tools will be generic, but they will be demonstrated by enhancing the existing NESSTAR data access system with multilingual capability, for the domain of social science. Another project FASTER is enhancing the categories of data that NESSTAR can retrieve. LIMBER is using the UK Data Archive's Hasset thesaurus of terms in social science as the starting point for a multilingual thesaurus for social science in English, French, Spanish and German. LIMBER is advancing the DDI metadata format for social science data to support multilingual access as a demonstration of multilingual access in the social science domain.

User Interface

Web Interface

Relevant Standards

RDF, DDI

Notes

LIMBER is an EU IST programme funded research and development project.

Citation

Miller, Ken and Brian Mathews. Having the right connections: the LIMBER Project. Journal of Digital Information, vol. 1, no. 8 (Feb. 5, 2001), http://jodi.ecs.soton.ac.uk/Articles/v01/i08/Miller/

 

 

 

Project Name

MACS (Multilingual Access to Subjects)

Institution or Agency

Conference of European National Librarians. Project partners are: the Swiss National Library (SNL), Bibliothèque nationale de France (BnF), the British Library (BL), and Die Deutsche Bibliotek (DDB)

URL

https://ilmacs.uvt.nl/pub/

Contact Information

Patrice Landry
MACS Project Leader
Chef du Catalogage matières
Bibliothèque nationale Suisse
Hallwylstrasse 15
3003
Berne
Suisse
Tel.: +41 31 324 06 25
Fax: +41 31 322 84 63
E-mail: patrice.landry@slb.admin.ch

Project Type

Production

Project Dates

 

Project Status

In development

Languages

Multilingual: English, French, German

Knowledge Organization Systems KOS)

Subject headings lists

    Schlagwortnormdatei (SWD)

    Répertoire d'autorité-matière encylopédique et alphabétique unifié (RAMEAU)

    Library of Congress Subject Headings (LCSH)

Subject Coverage

General

Description

MACS aims to provide multilingual subject access to library catalogues. MACS enables users to simultaneously search the catalogues of the project's partner libraries in the language of their choice (English, French, German).

Methodology

Equivalence links are created between the three subject headings lists used in the partner libraries' catalogs. The links are stored in the MACS Links Database. There are two search interfaces for the Database. (1) The Search Interface: allows users to browse headings and retrieve bibliographic records by using the links established between the concepts. The search interface uses the Z39.50 protocol. (2) The Link Management Interface: enables the creation and management of links between headings from the subject headings lists.

User Interface

Online Public Access Catalog

Relevant Standards

NISO Z39.50

Notes

The headings from the three lists are analyzed to determine whether they are exact or partial matches, of a simple or complex nature. The end result is neither a translation nor a new thesaurus but a mapping of existing and widely used KOS.

Citation

Freyre, E. & Naudi, M. (2003). MACS: Subject access across languages and networks. In I.C. McIlwaine (Ed.), Subject retrieval in a networked environment. Proceedings of an IFLA satellite meeting held in Dublin, Ohio, 14-16 August 2001 (pp. 3-10). Munchen: K.G. Saur.

Zeng & Chan (2004).

 

 

 

Project Name

Merimee

Institution or Agency

 

URL

 

Contact Information

 

Project Type

 

Project Dates

 

Project Status

Operational?

Languages

Multilingual: English, French

Knowledge Organization Systems KOS)

Thesauri

    Le thesaurus de l'architecture

    Art and Architecture Thesaurus (AAT)

    English Heritage Thesaurus

Subject Coverage

Cultural heritage, art, architecture

Description

For the purpose of indexing complexes, buildings and built structures,Le thesaurus de l'architecture was created and mapped to AAT and the English Heritage Thesaurus.

Methodology

When mapping from Le thesaurus de l'architecture to the other thesauri, Boolean operators "AND" and "OR" are used to indicate equivalence in addition to the exact equivalence types, exact and partial.

User Interface

 

Relevant Standards

 

Notes

 

Citation

Doerr, M. (2001). Semantic problems of thesaurus mapping. Journal of Digital Information, [Online], 1 (8). Available: http://jodi.ecs.soton.ac.uk/Articles/v01/io8/Doerr#Nr.52

Zeng and Chan (2004).

 

 

 

Project Name

MSC and Schedule 510 in DDC

Institution or Agency

University at Albany, State University of New York

URL

 

Contact Information

Iyer Hemalata

School of Information Science and Policy

University at Albany, State University of New York

hi651@albany.edu

Project Type

Research/prototype