ALCTS Subject Analysis Committee

Subcommittee on Semantic Interoperability

 

Annotated Bibliography

 

 

Note: Text often recorded as it appears in the article.

 

 

The Alexandria Digital Earth Modeling System (ADEPT) : Towards a Distributed Digital Model of the Earth in Support of Learning. <http://www.alexandria.ucsb.edu/adept/proposal.pdf> (2003)

ADEPT is being developed as an integrated learning environment based on ADL geospatial digital library technology. It is currently used to teach Physical Geography to undergraduate students at University of California, Santa Barbara.

 

ADEPT will provide gazetteer, thesaurus, and geo-ontology services. The gazetteer will be built from the ADL project gazetteer and serve as an index supporting transformations between named places and geographic coordinates. Thesauri provide a basis for resolving semantic inconsistencies, for example between alternative names for geographic feature types. They will build a set of core thesauri covering geographic representations of regions in space and space relations with objects. Geo-ontology services: the vocabularies used to describe geographic features and phenomena vary by discipline. By knowing which ontologies are used in different contexts, and by mapping between them, it is possible to make appropriate semantic correlations between different information sources. They will build 1) a set of domain-specific ontologies for geospatial information; and 2) a set of domain-independent ontologies supporting system, syntactic, and structural interoperability.

 

American Library Association. "Subject data in the metadata record", Division of Association for Libraries and Technical Services, Cataloging and Classification Section, Subcommittee on Metadata and Subject Analysis (1999). <http://www.ala.org/ala/alctscontent/catalogingsection/catcommittees/subjectanalysis/metadataandsubje/subjectdata.htm> (Sept. 19, 2005)

           

Ardo, Anders. Browsing Engineering Resources on the Web: a General Knowledge Organization Scheme (Dewey) vs. a Special Scheme (EI). <http://staff.oclc.org/~vizine/traugott/OCLC_NetLab_ISKO6.html> (Aug. 8, 2002); no longer available.

Published

Ardö, Anders, Godby, Jean, Godby, Houghton, Andrew, Koch, Traugott, Reighart, Ray, Thompson, Roger and Vizine-Goetz, Diane. "Browsing Engineering Resources on the Web" in Dynamism and Stability in Knowledge Organization: Proceedings of the Sixth ISKO Conference, 10-13 July, 2000: 385-390 (2000)

The goal of the DESIRE II project is to explore automated methods for gathering and organizing Web resources to improve resource discovery on the Internet. Researchers at NetLab and OCLC provided searching and browsing of a test collection of engineering documents on the Web. The goal of the project is to explore simple methods of automatic classification to provide subject browsing of a robot-generated engineering index. At NetLab the documents were automatically classified and organized using an engineering-specific scheme, the Engineering Index (Ei) Thesaurus and Classification; at OCLC the Dewey Decimal Classification (DDC), a general knowledge organization scheme was used. The enhanced DDC database includes several mechanisms for incorporating new terminology. Scorpion is used to do automatic class number assignment. WordSmith software was used to create a small set of high-quality topical vocabulary suitable as an index or browse display and that can supplement the subject indexes provided by the Ei Thesaurus or the DDC.

 

Ardo, Anders, Berggren, Marten, Koch, Traugott, and Kringstad, Reidun. Nordic Interconnected Subject-based Information Gateways (NISBIG). Final report  (2002).  <http://www.lub.lu.se/nisbig/slutrapport.html> (Oct. 8, 2002).

Project final report addresses all types of metadata including subject access for use in a quality-controlled subject gateway. It discussed problems and limitations, and recommends pursing Renardus, IMesh Toolkit, etc. Subject gateways were developed in order to support discovery and retrieval of Internet resources as well as to integrate Internet resources with "traditional" library resources.  Apart from gaining experience with content and metadata profiling and classification mapping for cross-browsing, the main technical goal of the project was to explore the applicability of the LDAP-based Isaac Network software developed by the US Internet Scout Project to provide cross-searching between the involved three Nordic subject gateways and other gateways joining the Isaac Network.

 

 

Baker, Thomas and Dekkers, Makx. "Identifying Metadata Elements with URIs : the CORES Resolution." D-Lib Magazine, 9, no. 7/8 (July/August 2003). <http://www.dlib.org/dlib/july03/baker/07baker.html> (2003).

At a meeting organized by the CORES Project (Information Society Technologies Programme, European Union), several organizations regarded as maintenance authorities for metadata elements achieved consensus on a resolution to assign Uniform Resource Identifiers (URIs) to metadata elements as a useful first step towards the development of mapping infrastructures and interoperability services.  The maintainers of GILS, ONIX, MARC 21, CERIF, DOI, IEEE/LOM, and Dublin Core reported on their implementations of the resolution and highlighted issues of relevance to establishing good-practice conventions for declaring, identifying, and maintaining metadata elements more generally. In November 2002, they committed to implementing the agreement to define URI assignment mechanisms, assign URIs to elements, and formulate policies for the persistence of those URIs.

 

Baker, Thomas. "What Terms Does Your Metadata Use? Application Profiles as Machine-Understandable Narratives." Journal of Digital Information, 2, no. 2 (Nov. 6, 2001). <http://jodi.ecs.soton.ac.uk/Articles/v02/i02/Baker/>  (Aug. 6, 2002).

 Rachel Heery and Manjula Patel have defined application profiles as 'schemas which consist of data elements drawn from one or more namespaces, combined together by implementers, and optimized for a particular application.' By definition, such profiles depend for their elements on namespaces. Namespaces, in this context, are element sets maintained as stable points of reference. They serve to 'identify the management authority for an element, support definition of unique identifiers for elements, [and] uniquely define particular data element sets or vocabularies". The registry prototyped in the DESIRE Project focused on the disclosure of information about the authoritative use of metadata -- element definitions, usage notes, allowed schemes, and mappings to other namespaces -- and explored typical user queries. The SCHEMAS registry builds on the DESIRE experience.

 

Bates. Marcia. After the Dot-bomb: Getting Web Information Retrieval Right this Time. 2002 <http://www.firstmonday.org/issues/issue7_7/bates/> (Sept. 28, 2002).

The author proposes using systems already design for information retrieval, e.g. faceted classification and information resources thesauri, which have an internal structure, concept clusters, etc. The long-term solution to index the Web is probably overlapping methods of classifying and indexing knowledge. She disapproves of the use of the word "ontology" since it refers to the philosophical issues surrounding the nature of being.

 

Bates, Marcia. Task Force Recommendation 2.3 Rresearch and Design Review: Improving User Access to Library Catalog and Portal Information : Final Report. (2003).

Selections:

It is recommended that with regard to access vocabulary:

• A cluster vocabulary be created, based on the searcher vocabulary developed by Sara Knapp (1993, 2000), if she and her publisher agree.

Bates Task Force 2.3 Review 51

• For the price of a share of the maintenance of the database, libraries and commercial firms may subscribe to the searcher vocabulary database, and install it in their catalogs, portals, and websites.

• With experience, other types of clusters are added--for names, works, geographical locations, etc.

• Access to catalogs and portal information should be available both directly through and around the vocabulary database. In this way, searchers may choose to use the database or not, and, if they do choose it, they do not have to enter and exit a separate database (a violation of the ever-present Principle of Least Effort).

• Institutional users may link the searcher vocabulary with their own controlled vocabulary. As a result, users of these sites may input their search term(s), be shown a cluster of terms, including “legitimate” controlled terms, and use the clusters as a basis for selecting terms for either controlled vocabulary or keyword searching.

• With this vocabulary as a core, one or two lexicographers are hired cooperatively to maintain the searcher vocabulary, adding popular new terms as they come along, and adding terms found by cooperating organizations in “zero hit” searches. As changes are made in the vocabulary, rather than in millions of individual cataloging records, cultural and research changes can be accommodated much more rapidly and cheaply.

• These vocabularies become part of a "Vocabulary Headquarters" (VHQ) website, supported by the library community or organizations therein.

It is recommended that with regard to bibliographic families:

• Preliminary agreement be gained on what shall constitute bibliographic families at the work level, probably based on the work of Tillett, Smiraglia, Hickey, and others. It may be found that work-sets, as described by Hickey et al. should also be considered.

• As these bibliographic families probably follow the Bradford Distribution, there will be some few that are very large, and many that are very small or singletons. As the larger families are much more likely to cause difficulties for searchers, and as they are also often around canonical works that attract a great deal of research and cultural interest, the larger families should be grouped first.

• At first on an experimental basis, individual libraries or other institutions offer each to do the work to collect just one large family (from records already created at the individual level). The results of these experiences are shared at conferences and other meetings.

• Based on these experiences, criteria are finalized for the creation of bibliographic families. Libraries may acquire the cataloging information for the families in a manner similar to the currently existing cooperative cataloging arrangements.

Bates Task Force 2.3 Review 52

• Further experience will also provide enlightenment regarding just how far down the chain of family size the cooperative effort should go.

• Eventually, with further technological advances, it becomes possible that whenever a searcher happens on a record that is part of a bibliographic family, the searcher may click on a “related records” link and see displayed on the screen the progenitor record plus links to all the different types of bibliographically related records arrayed around the core record.

It is recommended that with regard to staging of access to records:

• Libraries and other information institutions take as an objective the approach of providing staged access to information that drops down into the information in a 1:30 ratio. For example, in a catalog a book has a title of a few words, and an abstract of about 30 times the number of words in the title. With this ratio specifically in mind, the effectiveness of catalogs so designed can be tested.

• Current cooperation with publishers can be extended, including use of book flap and contents information that is already in electronic form for catalog records.

• The online bookstore, amazon.com, contains within it many of the design features that have been recommended by catalog and database user studies over the years. Amazon.com can be seen as a source of ideas and prior testing of design features.

 

 

Becker, Hans J. "Cultural Heritage Projects: Renardus". Paper presented at TEL Milestone Conference, April 29-30, 2002, Frankfurt am Main, Germany. <http://www.europeanlibrary.org/ppt/tel_milconf_presentation_becker.ppt> (Aug. 7, 2002).

Goals: a) to improve access to existing academic subject gateway services in Europe; b) to develop a 'broker' service that will allow integrated searching and browsing of distributed resource collections; c) to develop models for sharing metadata, agreement on technical solutions and other standards. Subject gateway definition: "quality controlled subject gateways and resource discovery broker systems, target audience is predominantly higher education and academic research communities across Europe: a) selection and collection development (human intellectual effort, certain policy with regard to collection development, documented selection criteria); b) Collection management (maintaining or improving the level of quality of the collection, certain policy with regard to maintenance); c) Resource description (all selected resources are described according to a fixed and documented metadata set, metadata are structured in well-defined semantic fields to enable structured searching); d) Subject classification (all resources are indexed according to a subject classification scheme in order to enable subject browsing). Various aspects of the project will be addressed by different groups in areas called 'work packages'.

 

 

Beghtol, Clare. "The Iter Bibliography: International Standard Subject Access to Medieval and Renaissance materials (400-1700)."  Paper presented at IFLA Satellite Meeting: Subject Retrieval in a Networked World, OCLC, Dublin, Ohio, Aug. 14-16, 2001. <http://www.library.utoronto.ca/iter/ifla.htm> (Oct. 26, 2002). No longer available.

The Iter Bibliography contains unique provisions for subject analysis and access. It uses a combination of multiple LCSH headings and multiple DDC notations for subject specification in order to incorporate the strengths of each system, and it also provides uncontrolled keywords to cater for terms that would likely to be used by Medieval and Renaissance scholars.

 

Bird, Steven and Simons, Gary. "The OLAC Metadata Set and Controlled Vocabularies."  ArXiv, May 21, 2001. <http://arXiv.org/abs/cs/0105030> (Feb. 16, 2005)

This paper describes a new digital infrastructure for language resource discovery, based on the Open Archives Initiative, and called OLAC - Open Language Archives Community. The OLAC Metadata Set and the associated controlled vocabularies facilitate consistent description and focused searching.

 

Brickley, Dan and Miller, Libby. Imesh Tk: Subject Gateway Review Plan, 2000. <http://www.ilrt.bris.ac.uk/discovery/2000/07/itk-sgr/> (Aug. 7, 2002)

The objective of the Subject Gateway Review is to ensure that the IMesh Tk architectural and technical strategies are well-grounded in the documented needs and practical requirements of the internet cataloging community as they stand now, with a view to the next 2-3 years. The Review will be responsible for producing scope and prioritization guidelines and a literature review. The relationship between XML-based metadata systems, notably RDF and other traditions such as LDAP and X39.50 is not yet clear. XML's popularity stems in large part from its cross-domain generality: XML representations of white pages data, bibliographic metadata, structured documents etc. can (to some extent) exploit common tools and software components. One issue that the Subject Gateway Review will need to address is the distinction between data-format based interfaces and API/protocol interfaces. The latter addresses the possibility of tools such as on-the-fly adaptors that translate (say) Z39.50 queries into LDAP queries or vise-versa., while the former addresses the need for common data formats/information models for data exchange. Need to address: Do gateway managers prefer query-time protocol mapping to scenarios in which they 'batch convert' (given some standard data format, e.g. some flavor of qualified Dublin Core) records to make them available in multiple search protocols?

 

Buchel, Olha and Coleman, Anita. "How Can Classificatory Structures be Used to Improve Science Education?" Library Resources & Technical Services, 47, no. 1 (2003): 4-15

The Alexandria Digital Earth Prototype (ADEPT) project provides the test bed for instructional materials and user analyses. ADEPT is supported by the National Science Foundation Digital Libraries Initiative, Phase 2 and is a successor to the Alexandria Digital Library (ADL) project. http://www.alexandria.uscb.edu

 

 

Buckland, Michael, and others. "Mapping Entry Vocabulary to Unfamiliar Metadata Vocabularies." D-Lib Magazine, 5, no. 1 (January 1999). <http://www.dlib.org/dlib/january99/buckland/01buckland.html> (Feb. 16, 2005)

 Proposes an entry module to help the user get started. Mapping entry vocabulary modules use classification clustering, exploit the combination of linguistic analysis with statistical methods, and is based on searching fragments within the metadata and databases, performing statistical and linguistic analysis, presenting the user with a familiar term.

 

There is always one additional vocabulary in play - the User's .

 

 The network environment is leading to an increasing number of heterogeneous repositories, using diverse metadata vocabularies (categorization codes, classification numbers, index and thesaurus terms) This is creating more and more unfamiliar sets of terms users must employ to access Internet resources. It has been argued that the most cost-effective single investment for improving effectiveness in the searching of repositories would be technology to assist the searcher in coping with unfamiliar metadata vocabularies.

 

A DDC number is a word/meaning. The Relative index provides the English to DDC number translation. What is now needed is a natural language index ('ordinary English") to the Relative Index and/or DDC numbers. The Entry Vocabulary Module helps the searcher be more effective and, thereby, provides a value-added enhancement.

 

Research has focused on: development of tools to support the creation of Entry Vocabulary Modules; creation of a set of prototype Entry Vocabulary Modules for a challenging range of examples, including subdomains; deployment; use of natural language processing techniques in addition to statistical term co-occurrence; recommendations for the improvement of metadata documentation for numeric databases.

 

Prototype available at: http://www.sims.berkeley.edu/research/metadata/oasis.html

 

Chan, Lois Mai and Zeng, Marcia Lei. "Ensuring Interoperability among Subject Vocabularies and Knowledge Organization Schemes: a Methodological Analysis." Paper presented at the 68th IFLA Council and General Conference, Glasgow, Scotland, Aug. 18-24, 2002. <http://www.ifla.org/IV/ifla68/prog02.htm http://www.ifla.org/IV/ifla68/papers/008-122e.pdf> (2002)

The ideal approach would be to provide "one-stop: seamless searching instead of requiring the user to search individual databases or collections separately. To enable such an approach, it is important to render the different knowledge organization systems, such as controlled vocabularies and classification schemes, interoperable within a single search apparatus. A number of projects are trying to achieve interoperability between and among different subject vocabularies (including both controlled and uncontrolled vocabularies) and knowledge organization systems. They include efforts at establishing interoperability among vocabularies in the same language or in different languages, among different classification schemes, and between controlled vocabularies and classification schemes.

 

Chan, Lois (2000) "Exploiting LCSH, LCC, and DDC to retrieve networked resources issues and challenges", Library of Congress (2002) <http://www.loc.gov/catdir/bibcontrol/chan.html>

(Sept. 25, 2005)

Vocabulary control for improved precision and recall and structured organization for efficient shelf location and browsing have contributed to effective subject access to library materials. The question is whether existing tools can continue to function satisfactorily in dealing with web resources. To meet the challenges of web resources, certain operational requirements must be taken into consideration, the most important being the ability to handle a large volume of resources efficiently and interoperability across different information environments and among a variety of retrieval models. Schemes that are scalable in semantics and flexible in syntax, structure, and application are more likely to be capable of meeting the requirements of a diversity of information retrieval environments and the needs of different user communities.

 

 

Chan, Lois Mai, and others. "A Faceted Approach to Subject Data in the Dublin Core Metadata Record." Journal of Internet Cataloging, 4, no. 1 / 2 (2001): 35-47. 

For the Dublin Core metadata record, a new approach to subject vocabulary was investigated. Faceted Application of Subject Terminology (FAST), is based on the existing vocabulary in Library of Congress Subject Headings. It is applied in a simpler syntax. In FAST, non-topical (geographic, chronological, and form) data are separate from topical data and placed in different elements provided in the Dubin Core metadata record.

 

Chan, Lois Mai, Lin, Xia, Zeng, Marcia (1999). "Structural and Multilingual Approaches to Subject Access on the Web."  Paper presented at the 65th IFLA Council and General Conference, Bangkok, Thailand, Aug. 20-28, 1999.  <http://www.ifla.org/IV/ifla65/papers/012-117e.htm> (Aug. 7, 2002)

A report in three parts.

 

Part I. Structural approaches to organizing web resources.

Using hierarchical or classification-based formats to organize web resources should have important advantages, among which are improved subject browsing facilities, potential multi-lingual access and improved interoperability with other services. In the web environment, subject data often are separate from or reside outside the resources themselves. It can be stored in interfaces that link subject data to the resources but do not affect them otherwise. The advantage of "linking-to" rather than "storing-with" is flexibility. Desirable characteristics: a) intuitive, logical and easy to use … with expressive captions; b) flexible, adjustable, and expandable; c) useful in a wide range of settings; d) relatively easy to maintain and revise.

 

Part II. Knowledge Class.

The purpose of this research project is to create and test a device called "Knowledge Class", designed for customizing knowledge organization and access, to supplement and complement existing devices for Web users. Knowledge Class contains two basic components: a) an organizing framework, and b) interface for access to and retrieval of web resources. I) The organizing framework is a classified mini-thesaurus, consisting of a hierarchically structured collection of terms on a specific topic or discipline of interest or concern to an individual user. The user can initiate searches by selecting the display terms or by using pre-stored search strategies, which often contain synonyms and can also connect to sites previously discovered by clicking on links with pre-stored URLs.

 

Part III. 

Multilingual approach to subject access.

Multilingual processing has emerged as a key issue in the evolution of search engine technologies. Major search engines have developed new services functional as regional search guides in these areas: a) domain filtering, b) domain direction, c) mirror sites, d) language specific search, e) multilingual search, f) regional interfaces, g) localized subject directories.

 

The road towards a fully functional cross-lingual subject access is both optimistic and sophisticated. Many other technical issues as well as social and cultural issues also need to be addressed. These include character encoding support, user interface linguistic translation, support of culture-specific data formats (date, currency, etc.), user interface graphical modification (color, images), foreign products support (.e.g. databases), and operating system compatibility. In summary, there has been an increasing need for effective mechanisms to organize web resources for exploration, discovery, and retrieval.

 

Cherry, Steven M. "Weaving a Web of Ideas." IEEE spectrum, Sept. 2002.

Software agents, robots, were not successful in dealing with semantics, with multiple meanings of words. The Semantic Web idea, instead suggests the Web pages should contain their own semantics. Successful search engines have developed sophisticated methods of delivering documents. The Semantic Web aims to get to the information in the documents by using an ontology - a collection of related RDF statements, which together specify a variety of relationships among data elements and ways of making logical inferences among them. It addresses syntax, which is the set of rules or patterns according to which words are combined into sentences. Semantics is the meaningfulness of the terms - how the terms related to real things.  Search engines have room for improvement. One method (Palo Alto Research Center) is scatter/gather, which takes a random collection of documents and gathers them into clusters, each denoted by a single topic word.  The user then picks several of the clusters and the software rescatters and reclusters them until a user gets a desirable set. Another method (Autonomy) is using Bayesian networks which is a pattern-matching engine that distinguishes different meanings of the same term and so "understands" them as different concepts. 

 

Clark, Judith. "Subject Portals." Ariadne, 29 (Oct. 2, 2001).  <http://www.ariadne.ac.uk/issue29/clark/> (Jan. 21, 2003).

The author describes a 3-year project to develop a set of subject portals or hubs, part of the Development Programme of the Distributed National Electronic Resource (DNER), funded by the JISC. The project aims to enhance resource discovery by developing a series of portals focused on the requirements of end-users located in a variety of learning environments within higher education sectors. The first phase of the project (2000-2001) was to build a Z39.50 cross search prototype at three RDN hubs, SOSIG, EEVL, and BIOME. The second phase ads HUMBUL and PSIgate. Sites are selected on the basis of selection criteria, cataloged following consistent practices, and analyzed by people with expertise I the relevant subject discipline. Links are checked daily in an automated process and all entries are updated regularly by subject specialists. These are classified using an appropriate controlled vocabulary.

 

RDN portals (http://www.rdn.ac.uk/projects/) are primarily concerned with technologies that broker subject-oriented access to resources. Effective cross-searching depends on consistent metadata standards. Z39.50 is the standard that has been adopted for preliminary cross-search functionality. Further functionality is being developed using RSS (Rich Site Summary) and OAI (Open Archives Initiative). Other standards applications that underpin the portals are notably Dublin Core and a variety of subject-specific thesauri such as the CAB Thesaurus and MeSH.

 

Clavel-Merrin, Genevieve. "Multilingual Access to Subjects: the MACS Prototype." Paper presented at TEL Milestone Conference, April 29-30, 2002, Frankfurt am Main, Germany.  <http://www.europeanlibrary.org/doc/tel_milconf_presentation_clavel.doc> (Oct. 8, 2002)

National and other libraries have invested heavily in encyclopedic subject heading languages that offer a complementary access to their collections. The tasks of creation, management and maintenance of these subject heading languages require significant resources, and rely generally on co-operation so that this approach is naturally considered as a way to extend access to users from other linguistic areas. Therefore, the CoBRA+ Working Group on Multilingual Subject Access conducted a feasibility study between Autumn 1997 and February 1999 on linking headings between the three Subject Heading Languages (SHL's) used in the Bibliothèque Nationale, Die Deutsche Bibliothek, the Swiss National Library and the British Library. The SHLs used were RAMEAU, SWD/RSWK and LCSH. As a result the MACS (Multilingual Access to Subjects) project was set up to develop a prototype system testing the recommendations and findings of the feasibility study

 

Clavel-Merrin, Genevieve. "The Need for Co-operation in Creating and Maintaining Multilingual Subject Authority Files."  Paper presented at the 65th IFLA Council and General Conference, Bangkok, Thailand, Aug. 20-28, 1999. <http://www.ifla.org/IV/ifla65/papers/080-155e.htm> (Aug. 7, 2002).

In 1997, the Conference of European National Librarians (CENL) asked Computerized Bibliographic Record Actions) CoBRA+ to consider the problem of multilingual subject access to bibliographic databases and conduct a pilot study in French, German and English. The aim of the study was to establish equivalents between RAMEAU, SWD/RSWK and LCSH: 1) establish a methodology for the selection and linking of headings, 2) link headings and analyze the results in the selected subject areas, 3) see the practical applications of these linked headings by indexing a test group of titles, 4) compare the indexing of titles in other subject fields. The study did confirm the following: 1) the number of headings and subdivision which may be combined and the complexity of the strings which may result varies from language to language, 2) the number of strings that may be applied to a document also varies according to the different rules applied.

 

CORES - A Forum on Shared Metadata Vocabularies. <http://www.ukoln.ac.uk/metadata/cores/> (Aug. 6, 2002)

CORES project is funded within the Information Societies Technology (IST) Programme; managed by the Information Society Directorate-General of the European Commission. The central objective of the CORES project is to encourage the sharing of metadata semantics. CORES will address the need to reach consensus on a data model for declaring semantics of metadata terms in a machine-readable way. Consensus of the ground-rules for declaring standard definitions of terms, as well as local usage and adaptations, will enable the diversity of existing standards to "play together" in an integrated, machine-understandable Semantic Web environment. In order to achieve this level of interoperability, CORES will support applications re-using and adapting terms maintained by key organizations and standardization initiatives.

For more detailed information, see the CORES website: http://www.cores-eu.net/

 

Day, Michael. "Metadata in Support of Subject Gateway Services and Digital Preservation." Draft version of paper presented at Electronic Resources: Definition, Selection and Cataloguing, Rome, Italy, Nov. 2001.<http://www.ukoln.ac.uk/metadata/presentations/rome-2001/paper.html> (Aug. 8, 2002).

This paper provides an introduction to two of the metadata-related projects in which UKOLN has been a partner. It first describes the development of services known as quality controlled subject gateways and looks in more detail at the Resource Discovery Network and the EU Renardus project. It then provides an outline of recent preservation metadata initiatives and describes the way the OAIS model has been used in the Cedars project.

 

DESIRE Information Gateways Handbook (2000).  <http://www.desire.org/handbook> (Aug. 7, 2002).

This is a thorough guide to creating a high quality portal or gateway on the Internet. Section 2 of the handbook covers important decisions to be made when setting up a new gateway (such as choosing a metadata format, designing a user interface, writing a selection policy) but also covers issues such as cataloging and resource discovery. Subject gateways should aim to guarantee high quality resources and facilitate subject-based access to the collection. Information gateways are characterized by their creation of third-party metadata records - individual descriptions of Internet resources held in a database that have separate fields for different attributes of the resources, such as title, author, URL, etc. The role of cataloging rules or guidelines is to specify how the content of a metadata format is entered in accordance with certain rules and will often include additional features such as classification, subject analysis and authority control. Once a metadata format is selected, a metadata content standard needs to be selected or developed to address dates, language codes, name authority files, and subject information. The use of classification schemes, keywords and thesauri are central features of the formal resources descriptions provided by a gateway service.  Browsing (through a directory-like structure) is usually based on subject classification schemes or thesauri. Classification schemes differ from other subject indexing systems, such as subject headings and thesauri, by trying to create collections or related resources in a hierarchical structure. Cross-browsing two or more gateways is useful, but difficult. Mapping methods can be used, e.g. DESIRE II and has been tested by ROADS.  "As with cross-browsing using classification schemes, cross-searching only becomes possible if either of the different catalogs use the same vocabulary or if a mapping has been done between two or more different schemes." Gateways need to address the language needs of their audiences. Users may want to search a multilingual collection by using queries in one language or to retrieve documents in a number of specific languages, preferably also via an interface in the language of their choice. There are two issues: the storing, processing, and presentation of information in many languages; and multilingual search and retrieval. Each chapter includes a bibliography.

 

Dhamankar, R., Lee, Y., Doan, A., Halevy, A., & Domingos, P. (2004). “iMAP: Discovering complex semantic matches between database schemas.” in SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on management of data, Paris, France, 2004, p. 383-394.

 

Doerr, M. "Semantic problems of thesauri mapping." Journal of Digital Information, vol. 1, no. 8 (Mar. 26, 2001) <http://jodi.ecs.soton.ac.uk/Articles/v01/i08/Doerr/> (Sept. 25, 2005)

 

With networked information access to heterogeneous data sources, the problem of terminology provision and interoperability of controlled vocabulary schemes such as thesauri becomes increasingly urgent. Solutions are needed to improve the performance of full-text retrieval systems and to guide the design of controlled terminology schemes for use in structured data, including metadata. Thesauri are created in different languages, with different scope and points of view and at different levels of abstraction and detail, to accommodate access to a specific group of collections. In any wider search accessing distributed collections, the user would like to start with familiar terminology and let the system find out the correspondences to other terminologies in order to retrieve equivalent results from all addressed collections. This paper investigates possible semantic differences that may hinder the unambiguous mapping and transition from one thesaurus to another.

 

 

Dunsire, Gordon. "Joined up Indexes: Interoperability Issues in Z30.50 Networks." Paper presented at the 68th IFLA Council and General Conference, Glasgow, Scotland, Aug. 18-24, 2002. <http://www.ifla.org/IV/ifla68/prog02.htm> <http://www.ifla.org/IV/ifla68/papers/022-144e.pdf> (Jan. 18, 2003).

The paper discusses issues in the interoperability of indexes to metadata records in distributed information retrieval networks, based on the findings of Cooperative Academic Information Retrieval Network for Scotland (CAIRNS) and Scottish Collections Network Extension (SCONE) projects. The two have evolved services which together provide user-driven collection identification and selection mechanisms and the ability to cross-search related metadata for item discovery and access. The CAIRNS Cataloguing Issues Working Group identified a number of factors affecting cross-searching of metadata indexes for authors, titles, subjects and control numbers, including local cataloging policies, content standards, and index structures. The SCONE project has identified issues in subject indexing at the collection level, in particular the relationship between collections with specific subject content and general collections for which Conspectus-type subject strength mappings are appropriate.

 

Duval, Erik, Hodgins, Wayne, Sutton, Stuart, and Weibel, Stuart L. "Metadata Principles and Practicalities." D-Lib Magazine, 8, no. 4 (April 2002). <http://www.dlib.org/dlib/april02/weibel/04weibel.html> (May 1, 2002).

The focus of the article is metadata in general, but some information is apropos to subject analysis. The use of controlled vocabularies is another important approach to refinement that improves the precision for descriptions and leverages the substantial intellectual investment made by many domains to improve subject access to resources. The Dewey Decimal Classification System, for example, affords a multilingual classification system long used in traditional library environments that can be applied to electronic resources as well. There are hundreds of domain-specific thesauri and classification systems, as well, that can be imported into the Web metadata architecture to support subject descriptions. Specifying the use of a particular vocabulary in a given collection of metadata will allow applications to provide more coherent search and browsing facilities. It is essential to adopt metadata architectures that respect linguistic and cultural diversity. However, unless such resources can be made available to users in their native languages, in appropriate character sets, and with metadata appropriate to management of the resources, the Web will fail to achieve its potential as a global information system.]

 

By elucidating shared principles and practicalities of metadata, the authors hope to raise the level of understanding among our respective (and shared) constituents. The ideas in this paper are divided into two categories a) Principles, and b) Practicalities.

 

Eden, Brad. "Metadata and its Application." Library technology reports, 38, no. 5 (Sept./Oct. 2002): p. 1-77.

This report is a guide to current metadata standards and their application.  Major standards are included. The report examines: which metadata is suitable for certain libraries, linking initiatives and how they relate to metadata, how to use metadata to build an enriched library catalog, how metadata assists in natural language recognition technology.

 

Creating metadata is important because metadata facilitates the discovery of relevant information and resources. Metadata help identify resources, distinguish among dissimilar resources, bring similar resources together, allow resources to be found by relevant criteria and give location information. Metadata promotes interoperability if accompanied by careful mapping of data elements and crosswalking of standards. Interoperability shows multiple systems to exchange data with minimal loss of content and functionality, regardless of different hardware and software platforms, data structures, and interfaces. The use of metadata allows resources to be searched seamlessly across networks through crosswalks and shared transfer protocols. Metadata ensures resources will be accessible into the future, can provide persistent and unique digital identification, can track rights and reproduction information, and organize information. Problems with polysemy (words with multiple meanings), ambiguity of meaning, and synonymy can all be alleviated by the proper application of metadata, either manually or through selected harvesting. Interoperability has become the key shared focus if multiple metadata standards are to survive.

 

Fitch, Kent. Taking RDF and Topic Maps Seriously.  <http://ausweb.scu.edu.au/aw02/papers/refereed/fitch2/> (July 18, 2002)

One of the core ideas behind the Semantic Web is the creation of machine-processable relationships between resource identifiers (URI's). Two often discussed ways of representing those relationships are RDF and Topic Maps. A topic is simply a representation of any subject or concept of interest; it is the 'proxy' of that subject in the topic map. Topics have characteristics: names of different types, roles played by the topic in associations with other topics, occurrences, which are resources pertinent to the topic, also of different types.  Topic characteristics can be asserted as being valid with in a "scope" which acts as a context for assertions. Topics in a Topic Map each play an identified "role". Topic Maps tend to start with the 'abstract' and optionally extend to include concrete resources, whereas RDF tends to start with defining relationships between concrete resources and optionally building abstract conceptual links between those relationships.

 

Fr^ancu, Victoria. "The Impact of Specificity on the Retrieval Power of a UDC-based Multilingual Thesaurus." Cataloging & Classification Quarterly, v. 37, no. 1 / 2 (2003): p. 49-64.

Summary: The article describes the research done over a bibliographic database in order to show what impact the specificity of the knowledge organizing tools may have on information retrieval. For this purpose two multilingual Universal Decimal Classification (UDC) based thesauri having different degrees of specificity are considered. Issues of harmonizing a classificatory structure with a thesaurus structure are introduced, and significant aspects of information retrieval in a multilingual environment are examined.

 

 Franklin, Rosemary Aud. "Re-inventing Subject Access for the Semantic Web". Online Information Review, 27, no. 2 (2003): 94-101.

Second generation web research is beginning to model subject access with library science principles of bibliographic control and cataloging. Harnessing the Web and organizing the intellectual content with standards and controlled vocabulary provides precise search and retrieval capability, increasing relevance and efficient use of technology. Current research points to a type of structure based on a system of faceted classification. This system allows the semantic and syntactic relationships to be defined. Controlled vocabulary can be assigned, not in a hierarchical structure, but rather as descriptive facets of relating concepts.

 

Garrison, William A. "Retrieval Issues for the Colorado Digitization Project's Heritage Database," D-Lib Magazine, 7, no. 10 (Oct. 2001). <http://www.dlib.org/dlib/october01/garrison/10garrison.html> (Oct. 26, 2002).

The Colorado Digitization Project (CDP) is a collaborative initiative involving Colorado's archives, historical societies, libraries and museums. The project is creating a union catalog of metadata records and has developed tools for the creators of metadata records, the assignment of subject headings, and the use of name headings. The CDP is also investigating the use of Dewey Decimal Classification number through WebDewey to allow linkage of general subject terms and highly specialized subject terms within a subject browse feature of the union catalog.

 

Geisselmann, Friedrich. CARMEN. WP12: Cross concordances of classifications and thesauri, 2004. <http://www.bibliothek.uni-regensburg.de/projects/carmen12/index.html.en> (Jan. 2005)

The goal is to allow an integrated search for subject aspects in distributed data holdings with different intentional emphases taking into account the conceptual differences of the applied thesauri and classifications by cross concordances.

 

Godby, Carol Jean and Reighart, Ray. "Terminology Identification in a Collection of Web Resources." Journal of Internet Cataloging, 4, no. 1 /2 (2001): 49-65.

The primary goal of OCLC's WordSmith project was to obtain subject terminology directly from raw text. The hypothesis was that reliable subject terms can be automatically collected, re-used, and organized into thesaurus-like objects that enhance access to Internet material that is too time consuming to catalog by hand. 

 

Godby, C. Jean. The WordSmith Indexing System.   <http://www.oclc.org/research/publications/arr/1998/godby_reighart/wordsmith.htm> (Dec. 27, 1999).

The OCLC WordSmith indexing system uses the results of research in computational linguistics to implement a series of largely statistical filters to identify descriptive vocabulary in collections of English-language text of arbitrary subjects.

 

Godby, Carol Jean and Stuler, Jay. "The Library of Congress Classification as a Knowledge Base for Automatic Subject Categorization."  Paper presented at the IFLA Satellite Meeting: Subject Retrieval in a Networked Environment, Dublin, Ohio, Aug. 2001. <http://staff.oclc.org/~godby/auto_class/godby-ifla.html> (Oct. 26, 2002)

This paper describes a set of experiments in adapting a subset of the Library of Congress Classification for use as a database for automatic classification.  A high degree of concept integrity was obtained when subject headings were mapped from OCLC's WorldCat database and filtered using the log-likelihood statistic. The project had three goals: 1) to adapt the LCC for use as a knowledge base for automatically classifying full text, 2) to exploit the LCC's structure for online subject-oriented browsing, and 3) to make the results of the work freely available to the library community.

 

Hardin, Chris. "3 questions: Semantic Interoperability Defined." ITBusinessEdge, (June 16, 2005). <http://www.dlib.org/dlib/april02/weibel/04weibel.html> (July 18, 2005).

An example from the business arena for the need for semantic interoperability among records.

 

Harken, S. (2005). SAC subcommittee on semantic interoperability: Introduction/criteria [draft], 2005. <http://www.und.nodak.edu/dept/library/Departments/abc/SACSEM-Criteria.htm> (Sept. 25, 2005)

 

Heery, Rachel, Carpenter, Leona, Day, Michael. "Renardus Project Developments and the Wider Digital Library Context." D-Lib Magazine, 7, no. 4 (April 2001). <http://www.dlib.org/dlib/april01/heery/04heery.html> (Aug. 8, 2002)

A subject gateway provides a search service to high quality web resources selected from a particular subject area. This work was informed by earlier modeling work carried out in the context of Moving to Distributed Environments for Library Services (MODELS). It is hoped that results of the Renardus work will feed back to the ongoing development of the MODELS application framework, and also to the Imesh Toolkit project. The IMEsh Toolkit project is providing subject gateway developers with a systems framework for an extendable set of interoperable tools and components.

 

Enhanced subject access is considered a key difference offered by subject gateways, and an important part of the Renardus service will be its attempt to provide some kind of subject directory browsing service across the participating gateways. In order to achieve this, a classification scheme has been chose to act as an 'interlingua' within the Renardus pilot. The scheme chose is the Dewey Decimal Classification (DDC). Gateways participating in the Renardus system will be invited to map DDC terms to the subject terms used in their own browse hierarchies. In order to facilitate this process, the project established a small working group to prepare guidelines for this work. In addition, the software tool developed as part of the German CARMEN project has been adapted to facilitate the relevant workflow. The Renardus browse system will link directly into the subject hierarchies of individual gateways. If a part of an individual gateway's browse structure has been mapped to this DDC term, the gateway's name is visibile and this becomes a hyperlink to the relevant part of the local browse structure. It relates to work currently taking place within the UK HILT project which is studying the problem of cross-searching and browsing by subject across a range of communities, services, and service or resource types. HILT will assist with consensus building on best practice in the sort to medium term perspective as regards working with existing or new subjects schemes and thesauri. Renardus will feed back experience to Network Knowledge Organization Systems/Services (NKOS), a loose coalition of people and organizations concerned with the use of knowledge organization systems such as classification systems, thesauri, gazetteers, and ontologies, to support description and retrieval of resources via the Web.  

 

A draft Renardus application profile has been agreed upon to form the basic metadata schema. Definitions of the semantics of these elements are based, where possible, on the Dublin Core Metadata Element Set. There is the possibility of expanding the scope of the Renardus search service to the end-user. One proposal suggests that it would be possible to combine a brokered gateway service with Web indexes based on harvesting techniques. Within Renardus they intend to explore the possible benefits of collaborative cataloging for creating metadata about web resources. There may no longer be a need to duplicate metadata describing the same resource in so many locations, rather original metadata will be created and further enhancements to that metadata will be linked to an original authoritative metadata instance. One possible methodology to achieve this is to use XML/RDF annotations. Within Renardus they may explore linking local metadata enhancements to metadata residing in a central 'union catalog'.

 

 Heery, Rachel and Wagner, Harry  (2002). "A Metadata Registry for the Semantic Web." D-Lib Magazine, v. 8, no. 5 (May 2002). <http://www.dlib.org/dlib/may02/wagner/05wagner.html> (May 17, 2002).

The article primarily deals with schema registries. Registries essentially provide an index of terms. RDF provides the basis for declaring the schema in use. Work is underway to add richness and fullness to the schema language, a) Web Ontology Group, and b) Ontology Interface Lay (OIL) http://www.ontoknowledge.org/oil  The Dublin Core Metadata Initiative (DCMI) has defined a relatively small set of data elements (referred to within the CDMI as the DCMI vocabulary or DCMI terms) for use in describing Internet resources as well as to provide a base-line element set for interoperability between richer vocabularies. The aim was to enable registration, discovery, and navigation of semantics as defined by DCMI. Two of several goals: 1) automating identification of relationships between terms in vocabularies, 2) be multilingual. Tried several prototypes including using the Extensible Open RDF Toolkit (EOR) for database management and Extensible Stylesheet Language Transformation (XSLT) for the user interface. A multi-lingual schema language must always be identified when registering a schema; it helps enable discovery and navigation; multi-lingual interface is accomplished using XSLT 'translate' stylesheet. (Relational databases don't support good performance)

 

Himanka, J. and Kautto, V.  Translation of the Finnish abridged edition of UDC into general Finnish subject headings. International Classification, 19, no. 3 (1992): 131-4+.

 

HILT Project Overview. <http://hilt.cdlr.strath.ac.uk/About-HILT/overview.html> (March 26, 2002).

The project is jointly funded by the RSLP and the JISC. The purpose of the first-year of the project was to study and report on the problem of cross-searching and browsing by subject across a range of communities, services, and service or resource types. Phase II aims to move the findings of Phase I into a "Pilot Project" stage. The project encompasses partners and stakeholders from a wise range of communities including archives, museum and libraries, amongst others.

 

Hudon, Michele. "Multilingual Thesaurus Construction: Integrating the View of Different Cultures in One Gateway to Knowledge and Concepts," in Knowledge Organization, v. 24, no. 2 (1997): 84-91.

Focuses on the social/political aspects of treating multiple languages in egalitarian fashion, along with the technical implications.

 

Hunter, Jane. "MetaNet - a Metadata Term Thesaurus to Enable Semantic Interoperability between Metadata Domains," JoDI. v. 1, no. 8 (Feb. 2001). <http://jodi.ecs.soton.ac.uk/Articles/v01/i08/Hunter/> (Feb. 17, 2005)

Abstract

Metadata interoperability is a fundamental requirement for access to information within networked knowledge organization systems. The Harmony international digital library project has developed a common underlying data model (the ABC model) to enable the scalable mapping of metadata descriptions across domains and media types. The ABC model provides a set of basic building blocks for metadata modeling and recognizes the importance of 'events' to describe unambiguously metadata for objects with a complex history. To test and evaluate the interoperability capabilities of this model, we applied it to some real multimedia examples and analysed the results of mapping from the ABC model to various different metadata domains using XSLT. This work revealed serious limitations in the ability of XSLT to support flexible dynamic semantic mapping. To overcome this, we developed MetaNet, a metadata term thesaurus which provides the additional semantic knowledge that is non-existent within declarative XML-encoded metadata descriptions. This paper describes MetaNet, its RDF Schema representation and a hybrid mapping approach which combines the structural and syntactic mapping capabilities of XSLT with the semantic knowledge of MetaNet, to enable flexible and dynamic mapping among metadata standards.

 

Huxley, Lesly, Carpenter, Leona, Peereboom, Marianne. Collaborative Systems and Tools: Renardus Case Study. (2002) Abstract, http://www.internet-librarian.com/presentations/huxley.pdf> No longer available.

Renardus builds on existing trends towards greater collaboration, standardization, and interoperability between information services. The ability to cross-search and particularly to cross-browse participating gateways' records led to development of tools to support the integration and 'sensible' presentation of records from a wide range of services, each using unrelated classification systems and data models, providing interfaces and data in different languages, based on different technical solutions.

 

IFLA.  Classification and Indexing Section, Division of Bibliographic Control. Newsletter, 27 (May 2003).

Sect. 2.2 Changing Roles of Subject Access Tools describes several projects: a) FAST, faceted Library of Congress Subject Headings; b) UDC implementation (UK) - role of classification in information retrieval systems to serve as an underlying knowledge structure to provide systematic subject organizations and thus complement the search using natural language terms; c) SWD/RSWK (SZ) after 5 years. Dewey Decimal Classification is being translated into German and is being used for the ePrint UK project. A subject indexing and classification project at the National Library of the Czech Republic involves Subject categorization of heterogeneous information using the Conspectus method based on intellectual mapping of DDC and UDC notations. The authority file contain four types of files: a) geographic, b) chronological, c) genre/form, d) topical

 

IFLA. Classification and Indexing Section. Working Group on Multilingual Thesauri. Guidelines for Multilingual Thesauri. http://www.ifla.org/VII/s29/pubs/Draft-multilingualthesauri.pdf (Apr. 20, 2005)

The IFLA Working Group on Guidelines for Multilingual Thesauri started

to prepare this document in 2002. The objective of the document is to

add to the existing Guidelines for Multilingual Thesauri as worded in

the ISO-standard for multi-lingual thesauri (ISO-5964-1985) or in

handbooks on thesaurus building, such as Aitchison et al.,(2000). The

general principles for the building of monolingual thesauri are

assumed.

 

There are three approaches in the development of multilingual thesauri:

1. building a new thesaurus from the bottom up

. starting with one language and adding another language or languages

. starting with more than one language simultaneously

2. combining existing thesauri

. merging two or more existing thesauri into one new (multilingual)

information retrieval language to be used in indexing and retrieval

. linking existing thesauri and subject heading languages to each

other; using the existing thesauri and/or subject heading languages

both in indexing and retrieval

3. translating a thesaurus into one or more other languages.

 

IFLA. Section on Classification and Indexing, Division of Bibliographic Control. Newsletter, 24 (Dec. 2001).

Czechia

 More detailed subject access to documents to get a piece of information has become the vital need in the online environment where the best solution seems to be combination of keywords with a controlled vocabulary. Merging many external documents into the database of Union Catalogue gives rise to discrepancies between index terms (lexical units), application syntax and hierarchical structure of original indexing systems.

Subject authority file: a) an integrated indexing and retrieval tool, in which the verbal terms of a thesaurus (controlled vocabulary) are combined with equivalent notations of a classification scheme (e.g. UDS); it enables subject access to documents either via verbal terms (searching) or through the classification notation) browsing; b) application of this integrated tool in online (Web) environment may support automatic indexing and classification of web resources; in this case would be very useful to apply such verbal expressions and UDC notations that are reflecting real situations. Since subject access depends on national languages … it was difficult to find and apply any international recipe. After much debate LCSH system has been finally chosen. However it was considered useful at that time to meet local needs and requirements as well, so some modifications of LCSH were formulated such as: direct form of geographical subdivisions, form subdivisions were made separate headings, used generic headings for classes of persons or types of corporate bodies more often, etc.

 

France

 

RAMEAU is not the subject authority file of the Bibliotheque nationale de France, but the common French indexing language. We are classifying our RAMEAU subject headings in about sixty broad subject fileds, named RAMEAU Domains, which are more or less arranged on the basis of DDC numbers. This work is partly done thanks to an automatic mapping between call numbers and subject indexing. t will allow to propose thematic views of RAMEAU and to provide consistent files of headings for our multilingual subject access project MACS. 

 

Royal Library in Sweden is mapping Swedish subject to LCSH

 

Imesh Toolkit, 2002. <http://www.imesh.org/toolkit> (Aug. 6, 2002).

The Imesh Toolkit project evolved out of discussions within the Imesh community which was set up to encourage international collaboration amongst subject gateways. The project will build on existing subject gateway software to develop a configurable, reusable, and extensible toolkit for subject gateway providers.

 

The project plan: a) manual selection, description and classification; b) a structured record format; c) some search and retrieve protocol; d) mechanism for routing queries between gateways. For this reason, in the subject gateways review interviews are restricted to the needs of the Renardus definition of quality controlled subject gateways. It is a subject-based resource discovery guide which provides links to information resources (documents, collections, sites or services), predominantly accessible via the Internet, and applies a documented set of quality measures to support systematic resource discovery. It is also managed, collected by humans according to documented selection criteria, with maintenance criteria, with a fixed metadata set and controlled subject classification. It will eventually be a broker system for simultaneous access to quality-controlled subject gateways and other Internet-based, distributed services.

 

Current and Possible Future Technologies and Standards: a) Z39.50 is the protocol of choice for the majority of the services; b) Whois++ is a very simple search and retrieval protocol which provides a profile and a protocol at once; c) LDAP is light-weight directory access protocol. XML offers the possibility of combining QSBIG records with other non-QSBIG sources. XML is not sufficient on its own; analogously to Z39.50 requiring a profile for interoperability, XML requires a syntax upon which to be agreed. Some form of DC/RDF/XML protocol was strongly supported in the Renardus survey.  SOAP is a remote procedure call proposal which uses XML and http as the carrying mechanism. Queries are couched in XML and results are received in XML.

 

IMesh Toolkit.

The IMesh Toolkit project evolved out of discussions aimed at encouraging international collaboration amongst subject gateways and subject-based resource discovery services. To include: Resource collection, cataloging, management and discovery (e.g. academic guides, virtual libraries and subject gateways); sharing technical, marketing, standards and cataloging effort, investigating cross-searching, cross-browsing and developing standards for related software and information issues.   

 

IMesh Toolkit: an architecture and toolkit for distributed subject gateways. <http://www.imesh.org/toolkit> (Aug. 6, 2002).

The project will build on exiting subject gateway software to develop a configurable, reusable and extensible toolkit for subject gateway providers.

 

            IMesh Toolkit: subject gateway requirements. <http://www.imesh.org/toolkit/work/requirements> (Aug. 6, 2002).

The objective of this work package is to ensure that the IMesh toolkit architectural and technical strategies are well-grounded in documented needs and practical requirements of subject gateways.

 

IMesh toolkit: General architectural overview of the IMesh Toolkit.  <http://www.imesh.org/toolkit/work/architecture/notes.php3> (Aug. 9, 2002).

Focuses on discussion of how to achieve interoperability for the IMesh toolkit, particularly in regards to architecture and functionality of query languages, etc.

 

IMesh toolkit: architecture. <http://www.imesh.org/toolkit.work/architecture> (Aug. 6, 2002).

Architectural diagram.

 

Information and Documentation - a Reference Ontology for the Interchange of Cultural Heritage Information (2002). ISO/CD 21127. <http://www.niso.org/international/SC4/n491.pdf> (Oct. 27, 2002).

The primary purse of ISO 21127 is to offer a conceptual basis for the mediation of information between cultural heritage organizations such as museums, libraries, and archives. The standard aims to provide a common reference point against which divergent and incompatible sources of information can be compared and, ultimately, harmonized. It is designed to be explanatory and extensible rather than prescriptive and restrictive. Consequently, the model has been formulated as an object-oriented semantic model, which can easily be converted into other object-oriented models. All cross-references and inheritance of properties are explicitly resolved. The exchange of information relevant to museum collections with libraries and archives falls within the scope of the standard.

 

ISO 2788-1986. Documentation - Guidelines for the Establishment and Development of Monolingual Thesauri. <http://www.nlc-bnc.ca/iso/tc46sc9/standard/2788e.htm> (July 1, 2002).

 

Iyer, H., & Giguere, M. D. “Towards designing an expert system to map mathematics classificatory structures”. Knowledge Organization, 22, no. 3-4 (1995), 141-147.

 

Janee, Greg, Satoshi Ikeda, Linda L. Hill. The ADL Thesaurus Protocol.  2003. <http://alexandria.sdc.ucsb.edu/~gjanee/thesaurus/specification.html> (April 9, 2003).

The document describes an XML- and HTTP-based protocol for accessing thesauri: structured, controlled vocabularies of words and phrases that represent conceptual categories. The protocol is intended to allow programmatic clients to easily access and utilized existing thesauri, and thus the services offered by the protocol are oriented around querying thesauri and navigating within thesauri. The protocol does not support creation, maintenance, or sharing of thesauri, or mapping between thesauri. It does address the term that represents a conceptual category which may have a scope note. Terms may be preferred or nonpreferred. It includes the reciprocal term relations of narrower, broader, related, use (use instead) and used-for. Eight XML formats are used. The hierarchy feature describes the hierarchy of terms above (broader) or below (narrower) including the starting term itself. Operators include "equals", "contains-all-words", "contains-any-word", "matches-regexp" (a perl-like regular expression). The protocol provides five independent, stateless services which are invoked over the HTTP protocol.

 

Koch, Traugott and Neuroth, Heike. Classification Mapping for Cross-browsing in the European Subject Gateway Broker Renardus. Presentation at the NKOS workshop at JDL, June 28, 2001. <http://www.lub.lu.se/tk/renardus/NKOS01-pres.htm> (Nov. 7, 2002).

 

Koch, Traugott (2001). Controlled Vocabularies, Thesauri and Classification Systems Available in the WWW . DC Subject, 2001. <http://www.lub.lu.se/metadata/subject-help.html> (July 29, 2002).

Lists a large number available on the web.

 

Koch, Traugott (2000). Quality-controlled Subject Gateways on the Internet, 2000. <http://www.lub.lu.se/tk/demos/Sgin.html> (Aug. 8, 2002).

This paper summarizes DESIRE approach, software solutions, cooperative subject gateway projects, broker architectures, metadata mapping and cross-searching, browsing structure in a subject gateway, and classification mapping and cross-browsing problems and issues. "Quality-controlled subject gateways" are Internet services which apply a rich set of quality measures to support systematic resource discovery. Considerable manual effort is used to secure a selection of resources which meet quality criteria and to display a rich description of these resources with standards-based metadata. Regular checking and updating ensure good collection management. A main goal is to provide a high quality of subject access through indexing resources using controlled vocabularies and by offering a deep classification structure for advanced searching and browsing.

 

Koch, Traugott.  "The Renardus Broker: a 'Meta Subject Gateway'."  Presentation at ELAG 2001. <http://www.lub.lu.se/tk/renardus/tokyoren.html> (Nov. 7, 2002).

 

Koch, Traugott, Neuroth, Heike, and Day, Michael. "Renardus: Cross-browsing European Subject Gateways via a Common Classification System (DDC)."  Paper delivered at the IFLA satellite meeting: Subject Retrieval in a Networked Environment, OCLC, Dublin, Ohio, USA, 14-16 August 2001. <http://www.ukoln.ac.uk/metadata/renardus/papers/ifla-satellite/ifla-satellite.html> (June 11, 2002).

The paper presents the approach and first results of the classification mapping process in the EU project Renardus. The outcome is a cross-browsing feature based on the Dewey Decimal Classification (DDC) and improved subject searching across distributed and heterogeneous European subject gateways. The project aims to develop a Web-based service to enable searching and browsing across a range of distributed European-based information services designed for the academic and research communities - and in particular those services known as subject gateways. Predecessor projects like the EU project DESIRE have already developed solutions for the description of individual resources and for automatic classification at the level of an individual subject gateway using established classification systems. Renardus intends to develop a service that can cross-search and cross-browse a number of distributed subject gateways through the use of a common metadata profile and by the mapping all locally-used classification schemes to a common scheme.

 

Kriewel, Sascha, and others. "DAFFODIL - Strategic Support for User-oriented Access to Heterogeneous Digital Libraries." D-Lib Magazine, 10, no. 6 (June 2004) <http://www.dlib.org/dlib/june04/kriewel/06kriewel.html> (June 2004).

DAFFODIL (Distributed Agents for User-Friendly Access to Digital Libraries) is a search system for digital libraries aiming at strategic support during the information search process. It is a system for integrated search with the heterogeneous digital libraries of a scientific community with merging of results. It combines browsing and searching strategies in a natural way. It uses a classification tool which provides users with access to a hierarchical, topic oriented representation of the search domain. It allows the browsing of classification schemes like the ACM Computing Classification system. The thesaurus tool can be used to get more general or more specific terms (hypernyms or hyponyms), or semantic definitions for a search term. Subject specific and web-based thesauri are used for finding related terms. The resulting terms can then be used in other tools for further queries.

 

Kuhr, Patricia S. "Putting the World Back Together: Mapping Multiple Vocabularies into a Single Thesaurus." Paper delivered at the IFLA satellite meeting: Subject Retrieval in a Networked Environment, OCLC, Dublin, Ohio, USA, 14-16 August 2001.

This paper describes an ongoing project by the H.W. Wilson Company in which the subject headings contained in twelve controlled vocabularies covering multiple disciplines from the humanities to the sciences and including law and education among others are being collapsed into a single vocabulary and reference structure. Wilson decided on a megathesaurus format and automatic switching.

 

Kunz, Martin. "Subject Retrieval in Distributed Resources: a Short Review of Recent Developments."  Paper presented at the 68th IFLA Council and General Conference, Aug. 18-24, 2002. <http://www.ifla.org/IV/ifla68/papers/007-122e.pdf> (Oct. 27, 2002).

Subject searching across distributed resources is a current challenge when carrying out online searches for bibliographic data. The construction of portals for comparable sources is only the first step; the subsequent navigation of disparate search interfaces still presents problems. Both broad and specialist vocabularies exist. If retrieval is to be improved, there must be some adaptation of these differing resources.  There are techniques for relating various subject terminologies, but they have their problems and limitations. Whether you call it a cross-concondance or a crosswalk, it is about creating links between equivalent terms describing similar concepts in two (or more) thesauri AND it is about affiliation of documentary languages. New developments in MACS, CARMEN, and Economics cross-concordance are discussed.

 

One part of the CARMEN Project concerns itself with the association of the thesaurus of the Informationszentru Sozialwissenschaften (IZT) with the SWD. Starting from alphabetical lists which contain the keyword material from a specific subject area, the relationships between the two thesauri are determined intellectually and recorded in a link management system.

 

The aim of MACS is to study the links between the three extensive subject heading authority files - LCSH, RAMEAU, and SWD. The immediate objective is to indicate in each authority file the equivalent preferred descriptors of the other authority files for a few chosen subject areas. The process being developed for MACS will not affect the structure of the individual national authority files. It uses intellectually-determined equivalencies to link the content of the bibliographic databases which use a controlled vocabulary to describe their content and present in an ordered, structured way. MACS is based on the assumption that the users accesses the results of intellectually assigned subject descriptions via a thesaurus. Thesauri can distinguish to a better and more comprehensive degree between material to be indexed than a method based on syntactical indexing.

 

Kwasnik, Barbara H. and Rubin, Victoria L. "Stretching Conceptual Structures in Classifications across Languages and Cultures," Cataloging & Classification Quarterly, v. 37, no. 1 / 2 (2003):  33-47.

Summary: The authors describe the difficulties of translating classifications from a source language and culture to another language and culture. To demonstrate these problems, kinship terms and concepts from native speakers of fourteen languages were collected and analyzed to find differences between their terms and structure and those used in English. At issue are vocabulary, syntax, and semantics. In harmonizing classification schemes across languages and culture, one must address the way these terms are bound up in knowledge representations. 

 

Lancaster, F. Wilfrid. Vocabulary control for information retrieval. 2nd ed. Arlington, W.Va.: Information Resources Press, 1986.

 

Landry, Patrice (2000).  "The MACS Project: Multilingual Access to Subjects (LCSH, RAMEAU, SWD)." Paper presented at the 66th IFLA Council and General Conference, Jerusalem, Aug. 13-18, 2000.   <http://www.ifla.org/IV/ifla66/papers/165-181e.pdf> (Aug. 7, 2002).

A report on the progress of the project during the previous year. Based on the final report of the CoBRA+ working group on multilingual subject access, the importance of co-operation in the quest for multilingual subject access was stressed.  The goal is to allow the user to conduct a subject search in catalogs in their preferred language. The link management software should have a file management and a maintenance structure that allows data to be easily added and amended. The prototype should provide for any user the possibility to choose a source language and or more target catalogues. The Link Management Interface should only be accessed by the partner libraries to add and to manage the links between the different subject heading lists. The Search Results screen shows which links have been made to a particular subject heading in the focus subject heading list. The View Link function is primarily an editorial function. From this screen, a term (authority) or the link can be modified. The Search Interface was designed to give the library users the possibility of using their preferred subject heading list and doing their search in the catalogs of one or many libraries. The Browse button will show all the headings where a particular heading term is used and the links to these headings. The library user can access the full bibliographic record by clicking on the title. The interface will retrieve the bibliographic record in the selected library and will display the record in the bibliographic format used by that library.

 

Lauser, Boris, and others. "A Comprehensive Framework for Building Multilingual Domain Ontologies: Creating a Prototype Biosecurity Ontology." Paper presented at Proceedings of the International Conference on Dublin Core and Metadata for e-Communities, 2002: 113-123  <http://www.bncf.net/dc2002/program/ft/paper13.pdf> or  <http://www.bncf.net/dc2002/program/papers.html> (2002)

 This paper presents ongoing work in establishing a multilingual domain ontology for a biosecurity portal. The project is embedded into the bigger context of the Food and Agriculture Organization (AOS) project of the FAO. The paper focuses on introducing a comprehensive, reusable framework for the process of semi-automatically supported ontology evolvement. An extendable layered ontology modeling approach will address multilinguality issues. In the context of the AOS, an ontology is a system of terms, the definition of these terms and the specification of relationships between the terms. It extends the approach of classical thesauri by providing the opportunity of creating an infinite number of different semantic relationships. Semantic robustness towards representational changes, as well as multilingualism, are crucial for the development of the domain ontology. Therefore, they distinguish between terms, and the concepts these terms represent.  These are called Lexical Entries with two attributes, the concept it refers to and its language. RDFS (http://www.w3.org/TR/rdf-schema/#intro) is used to define vocabularies of resources and relationships amongst them. Using several tools a list of terms is developed. This is combined with terms in AGROVOC, a multilingual agricultural thesaurus, in which all terms have also been converted to concepts. Hence automated and manual processes have been used to create a single ontology which is reviewed by specialists.

 

Lee, Jonghoon, Dubin, David S., Kurtz, Michael J. "Co-occurrence Evidence for Subject Vocabularly Reconciliation in ADS Databases," in ASP Conference Series, vol. 172, Astronomical Data Analysis Software and Systems, VIII, 1999.  <http://monet.astro.uiuc.edu/adass98/Proceedings/leej/> (Sept. 27, 2002)

Reports on a project to reconcile heterogeneous indexing vocabularies in the NASA Astrophysics Data System (ADS) which mixes controlled vocabularies and keywords.  The mixture of different descriptor vocabularies in ADS defeats the standardization goal, and the merging of the abstract and key word indexes limits the search precision function of the subject indexing. Descriptors representing identical concepts can stand in several different relationships to each other.  A project at the University of Illinois investigates sources of evidence to support the automatic and/or computer-assisted reconciliation of the heterogeneous indexing in ADS. Two sources of evidence have been investigated: 1) lexical resemblance between descriptors, and 2) consistent assignment of descriptors from different vocabularies. The consistent assignment of two or more terms from different vocabularies to the same documents suggests some kind of semantic relationship among the terms. The activation of terms at input is spread through the network to the connected documents and from there to the output terms. By assessing multiple occurrence, using weighting, and setting a cut-off resulted in fairly good matches.

 

Lee, Maria, Stewart Baillie, Jon Dell'Oro (1999). "TML: a Thesaural Markup Language." Paper presented at Proceedings of the 4th Australasian Document Computing Symposium, Coffs Harbour, Australia, Dec. 3, 1999. <http://www.ted.cmis.csiro.au/omt/tml.pdf>  (April 7, 2003).

Thesauri are used to provide controlled vocabularies for resource classification. Their use can greatly assist document discovery because thesauri mandate a consistent shared terminology for describing documents. A particular thesaurus classifies documents according to an information community’s needs. As a result, there are many different thesaural schemas. This has led to a proliferation of schema-specific thesaural systems. In their research, they exploit schematic regularities to design a generic thesaural ontology and specify it as a markup language. The language provides a common representational framework in which to encode the idiosyncrasies of specific thesauri. This approach has several advantages: it offers consistent syntax and semantics in which to express thesauri; it allows general purpose thesaural applications to leverage many thesauri; and it supports a single thesaural user interface by which information communities can consistently organize, store and retrieve electronic documents.

 

An ontology, in computer science, has come to denote an explicitly specified conceptualization of part of the world. In software, an ontology is implemented as a data structure. What distinguishes the ontology from the data structure is semantics: that it talks about something in the world. An ontology provides users with a representation which is essential to effective communication and coordination.

 

The general thesaural ontology gives us a conceptual representation of thesauri. A thesaural markup language (TML) manifests this as a grammar in which to express the content and structure of specific thesauri. TML is specified as an XML schema which defines the permitted markup element types and embedding structure. The TML syntax consists of the element names and structure.

TML provides a way to represent task-domain specific thesauri and make them available to a document management system. In order to demonstrate this generality, the authors developed a Thesaural Explorer application. The Explorer reads a thesaurus from its TML file, presents it graphically, and supports browser style term navigation. The user selects a thesaurus to explore and then can navigate the structure along inter-term relations by clicking on terms or using various look up tables such as ordered lists by class, term alphabetic, and browsing history.

 

Library of Congress Portals Applications Issues Group. "List of Portal Application Functionalities for the Library of Congress", 2003. <http://www.loc.gov/catdir/lcpaig/> ; <http://www.loc.gov/catdir/lcpaig/portalfunctioanlitieslist4publiccomment1st-03.pdf> (June 3, 2003).

The list represents the results of market analysis to study portal functionality of particular products. Functionalities include: a) general requirements, b) client requirements, c) searching and search results, d) knowledge database, e) patron authentication, and f) portal administration and vendor support. One aspect of a portal is its database and the subject metadata used within it and its maintenance. LCPAIG focused its explorations and testing on portals as tools for organized knowledge discovery rather than as enterprise interfaces. Portals may be characterized by their ability to: a) assist users in identifying and selecting appropriate target resources, b) help users determining the target resources most useful to their research by providing effective search interfaces and an architecture that supports groupings and rich descriptions of resources, c) provide federated searching and information retrieval of descriptive metadata from multiple, diverse target resources, including but not limited to commercial or licenses electronic resources, databases, Web pages, and library catalogs, d) integrate and manage search results, e) save and export search results, f) link search results to full-text or other content delivery options, g) manage access to target resources and portal functionalities for authenticated users.

 

Several relevant points: The vendor must maintain descriptive metadata and configuration information for core target databases, including target title or name, subject terms, etc. Ability to locally define and configure composite search qualifier groupings, e.g. name/author. Ability for user to search descriptive metadata in multiple metadata forms. Ability for user to search by specific fields in advanced searches. Should support keyword and browse searches, including: a) ability to browse a list of targets, b) ability to search target descriptions by keyword, c) ability to present different views of targets (e.g. by subject, user group, etc.), d) ability to brows target resources in hierarchical displays, e) ability to browse a composite list of target resources (aggregated databases), f) ability to present different views of the target resources. Ability to integrate metadata for target resources from more than one source.

 

Lin, Dekang and Pantel, Patrick (2001). "Induction of Semantic Classes from Natural Language Text", in KDD-2001, Proceedings of the seventh ACM SIGKIDD International Conference on Knowledge Discovery and Data Mining, Aug. 26-29, 2001, San Francisco, Calif., 317-322. <http://www.acm.org/sigkdd/kdd2001> (Oct. 26, 2002).

 

 

Lovins, Daniel. Thesaurus Design for Semantic Information Management. a day-long seminar led by Prof. Bella Hass-Weingberg in New York, April 16, 2002, email (May 6, 2002)

Published:  Cataloging and Classification Quarterly, 34, no. 4 (2003) <http://catalogingandclassificationquarterly.com/ccq36nr1news.html >

Bella suggested that "semantic information management", really just means vocabulary control; that ontology usually just means classification scheme, but sometimes gets used as a synonym for thesaurus, and the taxonomy is just a synonym for classification. Subject headings lists, such as LCSH are essential tools for managing information in a print environment, while true thesauri are often more useful in the online environment (where they can be viewed hierarchically or combined in Boolean searches) Thesauri often run into the problem of needing to distinguish homographs. The problem in the selection of thesaurus terms is largely one of determining a set of appropriate lexemes, that is, the smallest units of lexicon that can be understood on their own terms. Synonymy is a common problem, though easily manage, e.g. Cancer, see Neoplasm. Other problems: having to choose between singular and plural, parts of speech, etc.

 

 

MACS: Multilingual Access to Subjects, 2002. <http://infolab.kub.nl/prj/macs>

(Mar. 26, 2002).

MACS aims to provide multilingual subject access to library catalogues. It enables users to simultaneously search the catalogues of the project's partner libraries in the language of their choice (English, French, German) Partners are: Swiss National Library (SNL), Bibliotheque nationale de France, British Library (BnF), Die Deutsche Bibliothek (DDB), and it is running under auspices of Conference of European National Librarians (CENL). This multilingual search is made possible thanks to the equivalence links created between the three indexing languages used in these libraries: SWD , RAMEAU, LCSH. Topics (headings) from the three lists are analyzed to determine whether they are exact or partial matches, of a simple or complex nature. The end result is neither a translation nor a new thesaurus but a mapping of existing and widely used indexing languages.

 

MACS (Multilingual Access to Subjects) Project, report for 2000-2001. <http://infolab.kub.nl/prj/macs/pub/MACSreport3.pdf> (Aug. 7, 2002)

MACS is a cooperative Conference of European National Libraries (CENL) project to develop a prototype system for providing multilingual subject access searching between the catalogs of the partner libraries to: 1) research the technical and organizational issues involved in managing a working system for creating and maintaining links between the three subject headings lists (SHL), and 2) demonstrate the effectiveness of the linked SHLs for retrieving results for the end-user. The CoBRA study group defined a specific approach to mapping headings based on a number of core principles including: 1) all SHLs are equal, 2) headings are only mapped to equivalent headings judged to be synonymous in meaning, 3) hierarchical structures and thesaural relationships are not mapped or reproduced as part of the process of linking individual headings, 4) only headings at the authority level are linked, 5) where an equivalence cannot be found a proposed heading should stand alone in the system to represent the concept (for future possible mapping). Items are cataloged in the local library's language and SHL. Hierarchical navigation is only possible within each SHL, so it is envisaged that searches are refined by the user in his own language until the required concept is identified and then expanded for linguistics equivalences and documents in other libraries. Two interfaces proposed: 1) A Link Management Interface to support management of the links, their creations, and maintenance; 2) User Search Interface to support end user searching and links to the partners' catalogs. Partners share equal responsibility for authorization of links and validation of links proposed to their own SHL. MACS is to be an external link database, with each SHL remaining independent and linked to other SHLs only through MACS. 

 

Mai, Jens-Erik. "The Future of General Classification."  Cataloging & Classification Quarterly, v. 37, no. 1 / 2 (2003): 3-31.

Summary: Discusses problems related to accessing multiple collections using a single retrieval language. Surveys the concepts of interoperability and switching language. Finds that mapping between more indexing languages will always be an approximation.

 

The paper treats the issues related to subject representation and focuses on the use of general classification schemes for accessing documents across domains and collections. The goal of iinteroperability is to build coherent services for users, from components that are technical different and managed by different organizations. This requires agreements on three levels: technical, content, and organizational. The problem is using switching languages is in mapping meaning of words in context of the language. Mapping will always be an approximation due to pre-coordination, hierarchical structure, and the absence of concepts to match.

 

Maniez, Jacques. "Database Merging and the Compatibility of Indexing Languages," in Knowledge Organization, 24, no.4 (1997): 213-224.

This article contains succinct and critical descriptions of concordance tables, switching languages, and reference languages, and their usability in the harmonization of information languages.

 

McKiernan, Gerry.  Beyond Bookmarks: Schemes for Organizing the Web, 2001. <http://www.public.iastate.edu/~CYBERSTACKS/CTW.htm> (Aug. 6, 2002).

Schemes for Organizing the Web is a clearinghouse of World Wide Web sites that have applied or adopted standard classification schemes or controlled vocabularies to organize or provide enhanced access to Internet resources. Coovers Classifications systems: Alphabetic, Numeric, Alphanumeric; and Controlled vocabularies

 

Medical Subject Authority in OCLC: Background and Resources. Informal discussion during ALA Midwinter 2002, January 18, 2002.  <http://corc.oclc.org/WebZ/XpathfinderQuery?sessionid=0:term=3049:xid=LTM> (March 26, 2002).

An OCLC pathfinder listing resources dealing with inclusion of medical subject heading authority records in OCLC services.

 

MetaSearch Initiative. <http://www.niso.org/committees/MetaSearch-info.html> (May 10, 2003).

Metasearch, parallel search, federated search, broadcast search, cross-database search, search portal have become commonplace in the information community's vocabulary. They speak to a common theme of allowing search and retrieval to span multiple databases, sources, platforms, protocols, and vendors at once.

One-search access to multiple resources holds the promise of enabling libraries to offer portal environments so their users can enjoy the same easy searching found in web-based services like Google.

 

Michel, Dee and Kuhr, Pat. Taxonomy of Subject Relationships. Appendix B, 1996. <http://www.ala.org/alcts/organization/ccs/sac/appendxb.html> (March 26, 2002).

Shows associative, equivalence, and hierarchical relationships.

 

Miles, Alistair and Brickley, Dan. SKOS Core Guide.  < http://www.w3.org/TR/swbp-skos-core-guide/> (Aug. 25, 2005).

SKOS stands for Simple Knowledge Organisation System. The name SKOS was chosen to emphasise the goal of providing a simple yet powerful framework for expressing knowledge organisation systems in a machine-understandable way.

 

A 'concept scheme' is defined here as: a set of concepts, optionally including statements about semantic relationships between those concepts. Thesauri, classification schemes, subject heading lists, taxonomies, terminologies, glossaries and other types of controlled vocabulary are all examples of concept schemes.

 

SKOS Core provides a model for expressing the basic structure and content of concept schemes (thesauri, classification schemes, subject heading lists, taxonomies, terminologies, glossaries and other types of controlled vocabulary).

The SKOS Core Vocabulary is an application of the Resource Description Framework (RDF), that can be used to express a concept scheme as an RDF graph. Using RDF allows data to be linked to and/or merged with other RDF data by semantic web applications.

This document is a guide using the SKOS Core Vocabulary, for readers who already have a basic understanding of RDF concepts.

 

See also  Quick Guide to Publishing a Thesaurus on the Semantic Web http://www.w3.org/TR/swbp-thesaurus-pubguide/

See also the SKOS Core Vocabulary Specification http://www.w3.org/TR/swbp-skos-core-spec

 

Miller, Libby, Brickley, Dan and Hamilton, Martin. Imesh Tk: Subject Gateway Review Literature Review, 2002.  <http://www.ilrt.bris.ac.uk/discovery/2000/09/imesh/> (Aug. 6, 2002)

The goal of the literature review is: a) to try to define the scope of the IMesh Toolkit, b) its purpose - improve speed of searching, c) enable cross-searching more easily between gateways, or enable portalization of gateways, d) draw together existing research, e) summarize current and possible future technologies, f) form preliminary conclusions about possible archictures which could be used in IMesh Toolkit.

 

Miller, Ken and Matthews, Brian. "Having the Right Connections: the LIMBER Project." JoDi, 1, no. 8 (Aug. 2001). <http://jodi.ecs.soton.ac.uk/Articles/v01/i08/Miller/> (Aug. 2, 2002).

Cross-discipline interoperability will be provided via a uniform metadata description. In addition, the provision of multilingual user interfaces and the controlled vocabulary of a multi-lingual thesaurus will make these datasets globally accessible in a range of end-user natural languages. LIMBER will use the multi-lingual European Language Social Science Thesaurus (ELSST) derived and translated from HASSET. Tools developed in LIMBER will work with any thesaurus marked up in the LIMBER RDF format, and the semi-automatic indexing tool will apply keywords from these thesauri to any metadata record marked up in either XML or RDF. LIMBER will still be able to provide multi-lingual interfaces to thesaurus-aided searching across domains, using thesauri conforming to the LIMBER RDF schema and retrieving metadata mapped to the Dublin Core with assigned keywords translated back to the user's native language, the underlying metadata having been semi-automatically indexed by terms from the conforming thesauri. The project plans to develop a high-level object-oriented conceptual model that could be translated in whichever format becomes internationally accepted. All screens and drop-down menus will be available in German, French, Spanish and English to begin with, but defined in a standard format that can easily be translated to other languages in the future. LIMBER is designed as three stand-alone products: 1) multi-lingual thesaurus management tool, 2) user browsing interface, 3) semi-automatic indexing tool.

 

 Miller, Joseph. "An Overview of Subject Cataloging and the Absence of a Code." Presented at ARLIS/NA Annual Conference, Pittsburgh, March 2000. <http://www.geocities.com/WestHollywood/9783/arllisna/miller.html> (March 26, 2002).

Subject cataloging deals with what a book or other library item is about, and the purpose of subject cataloging is to list under one uniform word or phrase all the materials on a given topic that a library has in its collection. A subject heading is that uniform word or phrase used in the library catalog to express a topic. The use of authorized words or phrases only, with cross-reference from unauthorized synonyms, is the essence of bibliographic control in subject cataloging.

 

Miller, Paul. "I Say What I Mean, but Do I Mean What I Say?" Ariadne, 23 (2000).  <http://www.ariadne.ac.uk/issue23/metadata/> (Aug. 7, 2002).

Addresses: 1) issues surrounding the use of controlled vocabulary, 2) recent MODELS 11 workshop, 3) some recommendations for future work. First, there is a need some mechanism for querying multiple resources simultaneously. Second, there is a need for some commonality of content or description across information resources being made available for searching. To ensure common meanings across applications and between users and between applications, the normal solution is to impose a degree of control upon the terms used by both parties. At its most basic, this control will involve no more than defining a list of words, from which application and user have to select. In more complex instances, fully formed thesauri may be employed, rich with hierarchy, synonyms, and relationships. In an uncontrolled environment, users will consistently either use the wrong terms or use right terms in wrong contexts. In the same uncontrolled environment, creators will potentially use terms inconsistently.  Terminology tools are: controlled vocabularies (created manually or generated automatically by harvest keywords), alphanumeric classification schema, and thesauri. Thesauri follow the structural guidelines in ISO 2788 or ISO 5964, includes synonyms, complex hierarchies, scope notes, and inter-relationships (equivalence, hierarchy, association).  MODELS 11's aim was to explore the value practicality of creating a single high-level thesaurus. There is a need to study user behavior with respect to terminology.

 

Milstead, Jessica. Report on the Workshop on Electronic Thesauri, November 4-5, 1999. Presented at NISO/APA/ASI/ALCTS. <http://www.niwo.org/news/events_workshops/thes99rprt.html> (March 26, 2002). No longer available.

  The definition of "thesaurus" for purposes of this meeting was broader than that of the present standard for thesauri ANSI/NISO Z39.19-1993 (R1998).  The meeting considered vocabularies that meet two basic criteria: 1) use to facilitate analysis of texts and their subsequent retrieval (or retrieval of the information which they contain); 2) and inclusion of a rich set of semantic relationships among their constituent terms. The scope included: standard thesauri, subject headings lists, semantic networks, and taxonomies (Internet directories). It excluded: simple term lists without equivalence relationships and dictionaries.

 

They identified 4 key issues:

1) the need for (and feasibility of developing) a standard that speaks to criteria and/or methods for generating thesauri by machine-aided or automatic means

2) the need for (and feasibility of developing) a standard set of tools which show semantic relationships among terms, as aids to text and information analysis and retrieval

3) the need for (and feasibility of developing) a standard structure that supports a variety of electronic thesaurus displays

4) the need for (and feasibility of developing) a standard that supports interoperability protocols, structures, and/or semantics applicable to thesauri.

 

Mongin, Larry, Yueyu Fu, Javed Mostafa. "Open Archives Data Service Prototype and Automated Subject Indexing Using D-lib Archive Content as a Testbed", D-Lib Magazine, 9, no. 12 (Dec. 2003). <http://www.dlib.org/dlib/december03/mongin/12mongin.htm> (Dec. 18, 2003).

The Indiana University School of Library and Information Science's laboratory has as it purpose to work in areas of information retrieval and information visualization. They decided to use OAI-PMH as a resource discovery tool. Since the D-Lib metadata file does not contain a subject term, they decided to use IR algorithms to generate them. After running the Java program that computed subject terms, they read each article to make a judgment on whether the computed subject terms were relevant to that article. The criteria was not whether the program selected the best subject terms for that text, but rather whether the term generally reflected the semantic meaning of the article. The resulting scores varied from 70-95%. 

 

Murata, Masaki, and others. "Meaning Sort - Three Examples: Dictionary Construction, Tagged Corpus Construction, and Information Presentation System," ArXiv, 12 March 2001 <http://arxiv.org/abs/cs/0103012> (Feb. 17, 2005)

It is often useful to sort words into an order that reflects relations among their meanings as obtained by using a thesaurus. In this paper, the authors introduce a method of arranging words semantically by using several types of "is-a" thesauri and a multi-dimensional thesaurus.

 

Murray-Rust, Peter and West, Lesley. Terminology in a Global Context: VHG and XML. Part II, 2002. <http://www.vhg.org/uk.pub/vhgnews2.html> (March 26, 2002). No longer available.

 The aim of this article is to set out the technical aspects of VHG.

XML is ideally suited to delivering terminology over the web. Thus, in the spirit of XML, a simple subset of ISO FDIS 12620 data categories is chosen to represent the communality of the semantics of a majority of web-based glossaries.  VHG is a platform- and convention-independent specification. We put a high value on interoperability and achieve this by reliance on several current W3C initiatives in XML.  Semantics are added through a mechanism which would link any tags starting with <VHG: to the semantics in the Unique Resource Locator (URL). This distinguishes the VHG approach, so that when someone encounters a VHG glossary it is self-identifying and can be processed with VHG-compliant software. In a related manner, a document can link to a number of glossaries simultaneously. It might use absolute URLS or it might use a namespace mechanism. An element in a document linked to any number of glossaries may provide complementary or even conflicting views. In the spirit of the WWW, the reader of the document resolves the appropriate ontology.

 

National Library of Medicine. (2005). Fact sheet: UMLS metathesaurus, 2005 . <http://www.nlm.nih.gov/pubs/factsheets/umlsmeta.html> (Jan.7, 2005)

 

Neuroth, Heike and Koch, Traugott. Cross-browsing and Cross-searching in a Distributed Network of Subject Gateways: Architecture, Data Model, and Classification, 2001. <http://www.stk.cz/elag2001/Papers/HeikeNeuroth/HeikeNeuroth.html> (Aug. 8, 2002).

The aim of the Renardus project is to provide users with integrated access by searching or browsing, through a single interface, to partners' quality-controlled subject gateways. Further goals are to develop and define organizational models, business models, technical solutions and metadata standards (Renardus Application Profile, Renardus Namespaces, Renardus Collection Level Description). The following elements can be used to define a quality-controlled subject gateway: Selection and collection development, Collection management, Creation, Resource description and metadata, Subject access, Search and browse access, Standards, Value-adding features. Each participating partner is responsible for mapping his metadata format to the common Renardus metadata format, derived from Dublin Core. A generic normalization toolkit with Z39.50 configuration files and a conversion script were provided. Each participant set up a Renardus server with their content normalized to the Renardus datamodel. A set of screens were built for the user interface: Homepage, Advanced Search screen, Index scan window, Advanced search page after index scan, Browse by subject screen, (Preliminary) Result screen, Sorted result screen, Participating gateways screen and Help (index) screen. In order to accomplish subject browsing, the various systems will be mapped to a common classification system. The Renardus service will give access to resources from all kinds of subjects, published world-wide and in many languages and it is intended to be offered to an international multi-disciplinary community of users. Dewey Decimal Classification was chosen because of: online availability and tools, global usage, suitability of the classification system and its functionality, frequency and character of the updates, Research and methodological development efforts.

 

Neuroth, Heike. Metadata issues: Renardus. Presented at Cultural Heritage Projects Concertation Event, June 30, 2000, Bundesamtsgebaude Wien. <http://www.cscaustria.at/events/documents/renardus.ppt> (Aug. 6, 2002).

The data model is mostly Dublin Core compatible with some Renardus specific extension. The definition of a Renardus Schema is in progress. Still need to address how to handle mutli-linguality.

 

Neuroth, Heike and Koch, Traugott. "Metadata Mapping and Application Profiles: Approaches to Providing the Cross-searching of Hetergeneous Resources in the EU project Renardus", 2001. <http://www.lub.lu.se/~traugott/drafts/DC2001-neuroth.pdf> (Nov. 7, 2002).

The paper presents the approach and results of a mapping process to define a common metadata format for cross-searching distributed and heterogeneous subject gateways in the heterogeneous subject gateways in the EU project Renardus.  The outcome in is a well defined data model with semantic and syntactical definitions of each metadata element. It results in richer and semantically controlled cross-searching. The metadata elements are mainly based on Dublin Core.  The aim of Renardus is to provide user with integrated access, through a single interface, to high-quality Internet resources.  It is also to provide high quality subject access through indexing resources using controlled vocabularies and by offering a deep classification structure for advanced searching and browsing. All gateways participating in Renardus apply resource descriptions and subject classification to all their records. Participants have agreed to use a core set of metadata elements and qualifiers: Title, Creator, Description, Subject, Identifier, Language, and Type; plus Country. Further, they focused on the following characteristics for each metadata element: semantic definition, syntactic definition, associated qualifiers, cataloging rules, namespace definition, repeatability of elements, form of obligation, language qualifiers.  For Subject, Renardus has four different namespaces plus they will develop a cross-browsing structure based on Dewey Decimal Classification with added European specific captions.

 

Nicholson, Dennis. "Subject-based Interoperability: Issues from the High Level Thesaurus (HILT) Project."  Paper presented at 68th IFLA Council and General Conference, Glasgow, Scotland, 18-24, 2002. <http://www.ifla.org/IV/ifla68/prog02.htm> (2002)

HILT Phase 2 will create a pilot terminologies mapping service or route map with a specific focus on current concerns in the developing Distributed National Electronic Resource (DNER), covering primarily higher education. HILT Phase I discovered that the various service providers use a range of subject schemes (LCSH, UNESCO, DDC, AAT, MeSH). If cross-searching and browsing is to function coherently for users of the Information Environment (IE), these (multiple, varied) subject schemes must be mapped to one another, perhaps using a common 'spine' such as DDC with international and multi-lingual application and the potential to facilitate machine to machine interworking. The terminologies must be disambiguated, then translated into the service-assigned terms the users need to cross-search browse the group of services of relevance to their query. The aim of HILT Phase II is to build and evaluate a pilot service that will mediate as a DNER shared service in the IE. The pilot TeRM would be built using commercially available Wordmap software (http://www.wordmap.com); examples at: http://www.oingo.com or  http://vivisimo.com) The initial illustrative TeRM would be based on the RDN (http://www.rdn.ac.uk/cgi-bin/browse) terminologies available as part of the Wordmap taxonomies set, which include, in particular, a set of terms used by general Internet users, and on selective subsets of LCSH, DDC, UNESCO, and AAT. At issue, is the question of whether a spine such as DDC should be used to map everything else to and also is it better to adopt (adapt) an existing scheme or create a new one. The aim is to utilize 'native subject schemes' for the collections in the environment users use them, and to use the pilot TeRM to 'disambiguate" user terms and resolve differences between schemes. TeRM supports creation, editing, display, and User [user interface], staff, and system interaction with terminologies map showing terms in use and inter-relationships. It interacts with users and systems to establish term and service context of search (e.g. archives only), provides synonyms, broader, narrower, related terms, other contexts and service-set navigational aids for cross searching browsing as required. See: http://hilt.cdlr.strath.ac.uk/Reports/FinalReport.html

 

 Nicholson, Dennis, and others.  HILT: High-Level Thesaurus Project: Final Report to RSLP & JISC, December 2001. <http://hilt.cdlr.strath.ac.uk/Reports/FinalReport.html> (Oct. 29, 2002).

There is evidence of growing agreement that interoperability in respect of subject schemes in a distributed environment is recognized as an issue and that a standards-based approach is the answer, but no evidence to suggest that one particular scheme or single approach will provide the answer. There is very little information available on the needs and behavior of users as regards subject searching in a distributed environment. It is suggested that a mix of controlled vocabularies and free text in searching gives the best results and is preferred by users. HILT's recommended option - map LCSH AAT UNESCO UDC to DDC. Set up a mapping service, ideally with international participation and support, and gradually build towards a complete mapping of LCSH, UNESCO, UDC, and ATT to a DDC backbone. Conclusion: best way forward for HILT was a pilot mapping services as described in option 5.2. The pilot should: have a strong user focus, determine reliable costs, includes cost benefits, involve international players, look at how best to integrated semantic web and artificial intelligence developments, involved a broad range of target services, use existing machine-readable mappings wherever possible, be closely linked to a cross-sectoral and cross-domain task force, use contexts, relationships, clustering, etc. look at user terminology as against DDC as the central spine to which other schemes were to be mapped. DDC by itself is not a solution, but mapped to more specific subjects schemes was worth being a pilot project.

 

Nicholson, Dennis. "HILT High Level Thesaurus Project : Interoperability and Cross-searching Distributed Services." Presented at Satellite Meeting: Subject Retrieval in a Networked World, OCLC, Dublin, Ohio, Aug. 14-16, 2001. <http://hilt.cdlr.strath.ac.uk/Dissemination/Talks/HILTD%20Nicholson.ppt> (Oct. 26, 2002).

Published. In C. C. Chen (Ed.), Global digital library development in the new millennium: Fertile ground for distributed cross-disciplinary collaboration, 2001. Beijing: Tsinghua University Press.

Presentation describes background of HILT and the HILT Stakeholder Survey.

 

 

Nicholson, Dennis. HILT: High Level Thesaurus Project : Investigating the Problems of Cross-searching Distributed Services by Subject in the UK. Paper presented at Satellite Meeting: Subject Retrieval in a Networked World, OCLC, Dublin, Ohio, Aug. 14-16, 2001.   <http://hilt.cdlr.strath.ac.uk/Dissemination/Talks/hiltchin2.ppt> (Oct. 26, 2002).

 

NISO. Developing the Next Generation of Standards for Controlled Vocabularies and Thesauri, (2005) <http://www.niso.org/committees/MT-info.html> (Feb. 15, 2005)

 

NISO Z39.19-1993. American National Standards Institute. Guidelines for the Construction, Format, and Management of Monolingual Thesauri, 1993. <http://www.niso.org/standards/resources/Z39-19.html> (July 1, 2002).

Abstract.

A thesaurus is a controlled vocabulary arranged in a known order and structured so that equivalence, homographic, hierarchical, and associate relationships among terms are displayed clearly and identified by standardized relationship indicators that are employed reciprocally. The primary purposes of a thesaurus are a) to facilitate retrieval of documents, and b) to achieve consistency in the indexing of written or otherwise recorded documents and other items, mainly for post-coordinate information storage and retrieval systems. This standard provides guidelines for constructing monolingual thesauri: formulating the descriptors, establishing relationships among terms, and effectively presenting the information in print and on a screen. It also includes thesaurus maintenance procedures and recommended features of thesaurus management systems.

 

Normore, Lorraine and Bendig, Mark. "Using a Classification-based Information Space."  Paper presented at Satellite Meeting: Subject Retrieval in a Networked World, OCLC, Dublin, Ohio, Aug. 14-16, 2001. <http://staff.oclc.org/~normorel/ppt/ifla_preconf_2001.htm> (Oct. 26, 2002).

The goals of the project were to: 1) use information visualization to help searchers understand and explore information spaces, and 2) use the metadata in library records to accomplish this end; specifically to explore the use of a classification system. One approach is cluster-based spaces which uses clustering to coalesce documents/topics, multidimensional scaling techniques to create space and spatial metaphors to show relationships. Users infer the semantics of the space from the characteristics of the clusters.

 

Olson, Tony. “Integrating LCSH and MeSH in information systems.” Subject retrieval in a networked environment: Papers presented at an IFLA satellite meeting sponsored by the IFLA section on classification and indexing & IFLA section on information technology, Dublin, Ohio, 2001.

 

Olson, Tony. "The Integration of Information Languages and Interoperability." Present at "Real World Steps to Interoperability in Libraries", ALCTS/LITA Authority Control in the Online Environment Interest Group, ALA Annual Conference, June 16, 2002. <http://www.lita.org/igs/Acig/2002authcontrol.pdf> (Nov. 8, 2002).

There are two types of indexing languages: 1) information languages, and 2) natural languages. Information languages include: classification systems (e.g. DDC), controlled vocabularies (e.g. thesauri like AAT), and subject headings lists (e.g. LCSH). Issues regarding controlled vocabularies are discussed. By their very nature different controlled vocabularies are incompatible. While controlled vocabularies promote consistency within the systems for which they are design, they tend to reduce intersystem and database compatibility. Major problems are: 1) conflicts between cross references in one vocabulary and established headings in the other vocabularies; 2) no references or links between corresponding headings from different vocabularies; 3) Differences in syntax in the construction of subject heading strings; 4) Although a substantial majority of the correspondences between terms in different vocabularies my be one-to-one, there is a significant number of correspondences that are not; 5) Difference in semantic relationships between vocabularies, which in turn also lead to one-to-many correspondences; 6) identical headings in different vocabularies can cause the retrieval of duplicate entries.

 

Methods and projects undertaken in an effort to integrate various information languages include: 1) Mapping to a larger metathesaurus, e.g. Unified Medical Language System (UMLS), which integrates over 60 biomedical vocabularies and classifications and links many different names for the same concepts and the H.W. Wilson mapping of 12 different Wilson vocabularies; 2) Multilingual Access to Subjects (MACS) Project is an example of integrating multiple subject languages by providing links between equivalent subject headings; 3) Another method of integration is to use a reference language. In this case terms from various information languages are mapped to a term (or classification number) in a single particular information language (called a reference language); 4) The High Level Thesaurus Project (HILT) project was to study the problems of incompatibility among various information languages utilized by various libraries and information centers.  One of the recommendations was to set up a mapping service that would eventually carry-out a mapping of LCSH, the UNESCO thesaurus, AAT, UDC to a DDC backbone, as the reference language; 5) in the Renardus Project local classification schemes that are use in subject gateways, are mapped to DDC; 6) The LCSH/MESH mapping project at Northwestern University is another approach to the integration of controlled vocabularies. 

  

In the LCSH/MESH mapping project, instead of creating a separate database that contains the linking data, the data is entered into the authority records of the vocabularies being mapped. The LCSH/MESH mapping project at Northwestern University is another approach to the integration of controlled vocabularies. The LCSH/Mesh project developed a combination of computer-assisted techniques and human editorial review. The 750 and 788 linking entry fields are use to record the "opposite" heading. A difficult problem is mapping one-to-multiple correspondence between headings in different controlled vocabularies.

 

Another aspect is differing semantic relationships in different vocabularies. It was decided to map at similar levels and use each vocabulary's structure to trace relationships. . Two issues still exist: 1) broader/narrow term relationship are not explicit in MeSH but are implicit in category (tree) number. A program was written to take data in 072 fields and put in 550 fields of authority records; 2) the syndetic structure of LCSH is not complete (especially as distributed) containing only narrow term references and not explicit broader term references. Another problem is the syntactical differences between subject heading strings in the various vocabularies or no string heading exists.

 

Olson, Tony and Strawn, Gary. "Mapping the LCSH and MeSH Systems." Information Technology and Libraries, 16, no. 1 (March 1997): 5-19.

In an effort to resolve problems of two subject systems in one online catalog, this project maps the LCSH and MESH vocabularies. The two systems are integrated by a) mapping terms and headings from one system to corresponding headings in the other system; b) adding the mapping data to authority records, c) enhancing the library management system software so that mapping data in authority records can be used to develop syndetic structures that relate the systems smoothly and consistently, while enhancing subject retrieval.

 

Open Metadata Registry. < http://avalon.ulis.ac.jp/~sugimoto/RPs/dc2001.pdf > ( Feb. 17, 2005)

Open Metadata Registry has much in common the SCHEMAS. It will be used to promote the discovery and reuse of semantics within existing vocabularies and the creation of new vocabularies. It will register vocabularies relating to the Dublin Core Metadata Initiative.

 

Park, J., and Ram, S. “Information systems interoperability: What lies beneath?” ACM Transactions on Information Systems, 22, no. 4 (2004): 595-632.

 

Parsons, J., and Wand, Y.  “Choosing classes in conceptual modeling.” Communications of the ACM, 40 (1997): 63-69.

 

Patton, Glenn. "International Efforts to Improve Interoperability". Presented at "Real World Steps to Interoperability in Libraries", ALCTS/LITA Authority Control in the Online Environment Interest Group, ALA Annual Conference, June 16, 2002. <http://www.lita.org/lgs/Acig/authcontrol3_files/v3_slide0001.htm> (Nov. 8, 2002).

 

 

RDF Topicmaps : Theory. OCLC Research. <http://topicmap.oclc.org:5000/theory.html> (Oct. 31, 2002).

The goal of the Topicmaps is to bootstrap the efforts to meld natural-language-processing technologies with Semantic Web development. It is comprised of: 1) the noun phrase extractor, 2) noun phrase filter, and 3) relationship generator, wherein the goal was to identify simple, thesaurus-like relations such as "broader-than" using only a list of words as input.

 

Renardus. <http://www.renardus.org> (March 26, 2002).

Renardus is a collaborative project that aims to improve academic users' access to a range of existing Internet-based information services across Europe. The aim is to provide users with integrated access, through a single interface, to access selected, quality resources and other Internet-based distributed services. Renardus exploits the success of subject gateways, where subject experts select quality resources for their users, usually within the academic and research communities. Renardus is based on a distributed model where major subject gateway services across Europe can be searched and browsed together through a single interface provided by the Renardus broker. A special feature of Renardus is the option to "Browse by Subject" through hierarchical trees of topics and subsequentially to jump to one or several related subcollections of the contributing Subject gateways. The Renardus service allows you to 1) search several Subject Gateways simultaneously. This means that you are searching the "catalogue records" (metadata), not the actual resources, of quality controlled Web resources; or 2) browse through a hierarchy of subject categories in order to explore parts of the participating Subject Gateways which contain Internet.

 

 

Renardus Project Deliverables (2000?)

This project deliverable intends to ensure that any chosen broker architecture for Renardus is based on existing models and/or emerging developments. It provides an extensive and comprehensive review of 18 existing brokers models that have been developed for a variety of existing services, projects, or initiatives.

 

Renardus Project deliverable: specification of functional requirements for the broker system. <http://www.renardus.org/about_us/deliverables/d1_3/titlePage.html> (Aug. 7, 2002).

 

Evaluation report of existing broker models. <http://www.renardus.org/about_us/deliverables/d_1/D1_1summ.html> (Aug. 7, 2002).

 

Specification of functional requirements for the broker system. <http://www.renardus.org/about_us/deliverables/d1_3/D1_3bsumm.html> (Aug. 7, 2002).

 

Data model: requirements and specification. <http://www.renardus.org.about_us/deliverables/d6_4/D6_4summ.html> (Aug. 7, 2002).

 

Resnik, Philip (1995). "Disambiguating Noun Groupings with Respect to WordNet Senses."  ArXiv,     (Nov. 29, 1995). http://xxx.lanl.gov/abs/cmp-lg/9511006> (Feb. 17, 2005).

In word groupings within online thesauri, one is interested in the relationships among word senses, not just words. The paper presents a method for automatic sense disambiguation of nouns appearing within sets of related nouns - the kind of data one finds in online thesauri or as the output of distributional clustering algorithms.

 

Report of the SAC Subcommittee on Subject Reference Structures in Automated Systems: Recommendations for Providing Access to, Display of, Navigation within and among, and Modifications of Existing Practice Regarding Subject Reference Structures in Automated Ssystems. 2003. <http://www.ala.org/ala/alctscontent/catalogingsection/catcommittees/subjectanalysis/subjectreference/subjectreference.htm> (Feb. 17, 2005).

The subcommittee concentrated on maximizing the use of existing subject reference structures in automated systems. The recommendations are divided in four sections: access to reference structures, display of reference structures, navigation among and within reference structures, and changes to the policies and practices that govern creation of the authority records that underlie these reference structures in automated systems.

 

Resource Discovery Network. "Renardus Project"; "Subject Portals Development Project". (2002) <http://rdn.ac.uk/projects/> (Completed projects) (Sept. 25, 2005)

 

Resource Organisation and Discovery in Subject-based Services: ROADS, 2000.  <http://www.ukoln.ac.uk/metadata/roads/> (Aug. 6, 2002)

The overall object of the ROADS projects was to design and implement a user-oriented resource discovery system. It investigated the creation, collection, and distribution of resources descriptions, to provide a transparent means of searching for, and using resources. The object was not to create an individual and idiosyncratic system but to draw on, and help create, standards of good practice which can be widely adopted by subject communities to aid and automate the process of resources organization and discovery. See http://www.ukoln.ac.uk/roads/

 

ROADS: Interoperability and Metadata, 1998. <http://www.ukoln.ac.uk/metadata/roads/interoperability/inter-meta.html> (Aug. 7, 2002).

ROADS began work in a context where interoperability is becoming increasingly important as a means to integrate the wide range of information services. Users require distributed information services to interwork in terms of search, location and delivery. Semantic interoperability: Users will be searching a variety of indexes constructed from a number of different underlying database structures. Effective searching across services requires that semantically equivalent fields in these indexes are mapped to each other. In addition semantics in the search (client) must be managed so that they match the semantics in the indexes (targets). Z39.50 allows indexed to be mapped to standard sets of attributes, hiding the underlying structure of the target database. A common indexing protocol enables routing of queries to the most appropriate database via a mesh of centroids or index summaries. Resource Description Framework (RDF) aims to provide a framework for expression machine-readable metadata about resources. It is designed to enable different applications to interoperate by using a common data model. RDF uses Extensible Markup Language (XML) as the encoding syntax.

 

Russell, Rosemary and Day, Michael. Automated and Manual Approaches to the Provision of Thesauri and Subject Vocabularies, 2001. <http://www.ukoln.ac.uk/metadata/hilt/interfaces/> Accessed June 11, 2002; ; final report <http://hilt.cdlr.strath.ac.uk/Reports/FinalReport.html> (Feb. 25, 2003).

The term thesaurus is used in different contexts to describe tools that fulfill different functions. From an information science point of view, thesauri were originally developed as tools to allow terminology control of detailed subject indexing of printed documents. What distinguishes thesauri from some other subject vocabulary types is that they show relationships between concepts. Relationships commonly expressed in thesaurus include hierarchy, equivalence (synonymy), and association or relatedness. These relationships are generally represented by the notation BT (broader term), NT (narrower term), SY (synonymy), and RT (associated or related term)

 

In addition to thesauri, there is a range of other types of controlled subject terminologies (or vocabularies). One can either browse alphabetical lists or the hierarchy of subject terms that may be hyperlinked, or one can search terms and if a non-preferred term is used, the user will be taken to the preferred term.

 

SCHEMAS Project  <http://reg.ukoln.ac.uk/registry/jsp/sforum.jsp>

Currently in development by the United Kingdom Office for Library and Information Networking (UKOLN). Its goal is the development of a comprehensive database of RDF schemas, application profiles, and related semantics that have been used by programs under the IST Program and other related European initiatives. The SCHEMAS database will be used to promote the reuse and interoperability of semantics for existing and new projects. It will register RDF schemas and namespaces used by projects within the European Union.

 

SCHEMAS Registry, 2002. <http://www.schemas-forum.org/registry/> (Aug. 6, 2002).

One important focus of the SCHEMAS Project (to provide standards for metadata schema designers) is provision of a registry of metadata schemas. The registry itself will serve as a good-practice example of registry use and benefits. Workpackage6 aims to promote the deployment of metadata registries defined with the Resource Description Framework (RDF), promote standards and methods for creating and processing schemas in multiple languages and writing systems, encourage re-use and adaptation of global metadata elements in local schemas, formulate and disseminate good-practice guidelines and investigate the process for managing the evolution of multilingual registries.

 

Sheikholeslami, Gholamhosein and Chang, Wendy and Zhang, Aidong. "SemQuery: Semantic Clustering and Querying on Heterogeneous Features for Visual Data." IEEE transactions on knowledge and data engineering, v. 14, no. 5 (2002): 988-1002.

The effectiveness of content-based image retrieval can be enhanced using heterogeneous features embedded in the images. However, since the features in text, color, and shape are generated using different computation methods, and thus may require different similarity measurements, and integration of the retrievals on heterogeneous features is a nontrivial task. In this paper the authors present a semantics-based clustering and indexing approach, termed SemQuery, to support visual queries on heterogeneous features of images.

 

Slater, Jenny. References - Taxonomies and thesauri. CETIS, Metadata Special Interest Group, 2002. <http://cetis-metadata.lboro.ac.uk/vocab-ref.htm> (July 29, 2002).

Lists a large number available on the web.

 

Stoklasova, Bohdana, Marie Balikova, Ludmila Celbova. "The Relationship between Subject Gateways and National Bibliographies in International Context". Paper delivered at World Library and Information Congress, 69th IFLA General Conference and Council, Berlin, 2003. <http://www.ifla.org/IV/ifla69/papers/054e-Stoklasova_Balikova_Celbova.pdf> (Sept. 19, 2003).

The paper examines the relationship between subject gateways and national bibliographies together with general principles of universal bibliographic control in the broader context of the need for integration of heterogeneous information sources. The paper gives examples from the Czech Republic's experience and illustrates problems with integrating heterogeneous resources from different countries covering different subjects. The Czech National Subject GatewayProject is connected with the Uniform Information Gateway Project (http://www.jib.cz) which integrates heterogeneous information resources including full texts and digital objects from different countries. The paper concludes with recommendations for improvement of bibliographic control. Subject authority system is multilingual and uses Aleph software:          http://sigma.nkp.cz/F/?func=file%file_name=find-b&local_base=auv&con_LNG=ENG

 

Subcommittee on Subject Relationships/Reference Structure. Report to the ALCTS/CCS Subject Analysis Committee. Appendix A, 1996.  <http://www.ala.org/alcts/organiztion/ccs/sac/appendxa.html> (March 26, 2002).

The charge was to investigate: 1) the kinds of relationships that exist between subjects, the display of which are likely to be useful to catalog users; 2) how these relationships are or could be recorded in authorities and classification formats; 3) options for how these relationships should be presented to users of online and print catalogs, indexes, etc.

 

One conclusion was there is a need for BT and NT and related browsing or exploding. Because Library of Congress only distributes only the broader code, OPACs can display only broader-to-narrower references. However, Gary Strawn has demonstrated that systems can be programmed to generate narrower-to-broader references without anyone having to add "narrower" 5XX fields to the authority records. Non-specific "see also" relationships can be generated by coding the byte used for reference relationships coding "n". Indexing databases often use an alphabetical browsing list which then displays broader, narrower, and related terms for a chosen subject. In addition an "explode" function employs these term relationships along with several others (synonym, abbreviation and language equivalent) to automatically retrieve all records bearing on the chose term or related terms.

 

Subject Gateways, 1999.  <http://www.desire.org/html/subjectgateways/subjectgateways.html> (Aug. 7, 2002).

What is a subject gateway? "Subject gateways are online services and sites that provide searchable and browsable catalogues of internet based resources. Subject gateways will typically focus on a related set of academic subject areas." Many of the activities and research project within DESIRE are focused on developing the ideas behind this definition of a subject gateway, as well as developing methodology and tools that provide the functionality needed for a subject gateway to function.

 

Sugimoto, Shiegeo, and others. "Developing Community-oriented Metadata Vocabularies: Some Case Studies." Paper presented at International Symposium on Digital Libraries and Knowledge Communities in Networked Information Society (DLKC'04), 2004.  <http://www.kc.tsukuba.ac.jp/dlkc/> ;

<http://www.kc.tsukuba.ac.jp/dlkc/e-proceedings/papers/dlkc04pp128.pdf> (Sept. 13, 2004).

This paper presents two case studies which include the development of domain-specific subject vocabularies - a core subject vocabulary for a subject gateway for library and library-and-information science (LIS) resources, and subject vocabularies of a portal service for a regional community. These case studies show that small subject vocabularies are useful for these community-oriented services, and that maintenance is a crucial issue for the development and use of the vocabularies.  In order to build a community-oriented information environment in the Internet, we have to solve two contradictory requirements for metadata schemas - specialization (or localization) in a community and interoperability among communities.

 

Metadata, which has been widely recognized as a key component for the Web and digital libraries in local or domain-specific communities would need to define metadata schemas and controlled vocabularies in accordance with their requirements in the case that their requirements are difficult to be satisfied only by those defined for the global communities. On the other hand, community-oriented specialization of schemas and vocabularies would raise a bar for interoperability issues for cross-community use of metadata and information resources. In addition, long-term maintenance of the schemas and vocabularies is a crucial aspect for the communities.  Thus, we need to satisfy the contradictory requirements to metadata in order to create a community-oriented information environment.

 

The rest of the paper presents two case studies of metadata centered research and a conceptual

model of metadata schema for interoperability. The first case study is a development of a subject vocabulary for the ULIS-DL metadata database that has about 27000 records of Simple Dublin Core metadata for Web resources published by and/or useful for libraries and library and information science (LIS) institutions. They developed an XML-based software to create a subject directory from the metadata database using the core vocabulary which is encoded in Web Ontology Language (OWL).  The second case study is a development of a set of vocabularies for an information navigation service named Digital Okayama Dai-Hyakka (Digital Encyclopedia of Okayama). Its metadata schema is defined based on Simple Dublin Core and it uses a subject vocabulary designed for the local community.

 

A major issue to enhance the usability of ULIS-DL has been (semi-)automatic creation of a directory style interface for navigating users to appropriate resources in addition to the text-based retrieval function. A subject vocabulary is required to create the directory interface.

Based on the experiences in IPL-Asia, we have defined the following guidelines to build subject vocabularies for community-oriented metadata: 1) create a core subject vocabulary which should be a reasonably small set of subject terms; 2) create subject vocabularies by tailoring the core vocabulary and associating appropriate expressions to every subject term in order to present the subject terms in accordance with the properties of users, i.e., age range and language; 3) encode the vocabularies in an ontology description language such as XML Topic Maps and OWL. This encoding is essential not only for automatic creation of subject directories from metadata records but also for interoperability of the subject vocabularies and for long-term maintenance of the subject vocabularies.

 

Discussion on Subject Vocabulary Maintenance:

In the preliminary study, the authors built the subject term vocabulary for IPL-Asia using XML Topic Maps in which each subject term is defined as a topic and associated with multiple presentation labels in the CJK languages. They applied the multi-lingual subject vocabulary to the IPL-Asia metadata and the DODH metadata in order to build subject-based directories of the resources. This experimental study, which is a straightforward approach, has shown the feasibility of building a user interface that has multiple presentation modes.

From this study, they learned that ontology description languages such as XML Topic Maps and OWL are useful not only for encoding the vocabulary in a machine understandable form but also for maintaining the vocabulary for long term. Vocabulary maintenance is a crucial issue even if OKV is a small set of terms since it evolves over time, for example, evolution of subject terms and subject groups, and update of presentation labels. XML-based encoding is not a panacea but will help to decrease the cost of maintenance.

 

It is useless to assume that a single metadata element set will meet the needs of all domains and purposes.  It is also impractical to develop metadata sets application by application: the result would be expensive and chaotic, and interoperability would be non-existent. On the other hand, it is desirable for application developers to use established metadata schemas and adopt them in accordance with local requirements.

 

Dublin Core Metadata defines the vocabulary of metadata, i.e., terms and their meanings, but in general does not specify the encoding or syntactic characteristics. These requirements can be defined independently of the vocabulary definitions. Description of this application-specific syntactic feature is called an application profile. Any application can have its own application profile, which specifies a set of metadata vocabulary terms used in the application as well as syntactic or structural features of the particular application. The application profile could be used to define a mapping between the application’s scheme to a global scheme(s), which is crucial for interoperability.  A conceptual model of metadata schema for interoperability was defined. Metadata for an application is composed of three layers:

(1) Layer 1 - Semantic Definition Layer: Definition of terms used in the schema.

In general, two types of metadata terms are included in the metadata vocabulary - property vocabulary and value vocabulary. A property vocabulary, or in other words element vocabulary, is a set of property terms, for example, elements and element refinement qualifiers of DCMES. A value vocabulary is a set of value terms, for example, encoding schemes.

(2) Layer 2 - Structural Constraints Definition Layer: Definition of syntactic features A set of terms used in the schema and structural constraints applied to each term should be included in a definition. Application profiles are given in this layer.

(3) Layer 3 - Implementation Dependent Syntax Definition Layer: Definition of syntax of metadata in an implementation.

 

In addition to these definitions, each application schema developer would provide guidelines for creating metadata. A metadata schema registry is a key software tool to enhance interoperability of metadata schemas expressed in all layers. Metadata schema registries are useful to store and provide all types of metadata vocabularies, i.e., application profiles, subject terms and other vocabularies.

 

Svenonius, E. The intellectual foundation of information organization. Cambridge, Mass.: MIT Press, 2001.

 

Taylor, Mike. Zthes: a Z39.50 Profile for Thesaurus Navigation. Ver. 4.0, 2000.  <http://www.lcweb.loc.gov/z3950/agency/profiles/zthes-04.html> (March 26, 2002).

This document describes an abstract model for representing and searching thesauri - semantic hierarchies of terms as described in ISO 2788  - and specifies how this model may be implemented using the Z39.50 protocol. It also suggests how the model may be implemented using other protocols and formats.

 

This profile is laid out in two main sections. The first is concerned solely with the abstract representation of thesaurus terms and how they may be searched; and the second with the implementation of these abstract concepts in Z39.50: how thesaurus terms are encoded in the GRS-1 record structure, how searches are encoded in the type-1 query, etc. It is intended that the abstract model described here is sufficiently general that it can also be implemented by protocols and data formats other than Z39.50. This profile does not mandate any relationship between a thesaurus and any other database. The model is that terms from any thesaurus database may be used to search any other database (called a target database). This profile represents a thesaurus as a database of inter-linked terms. If multiple thesauri are to be supported by a single server, then they must be presented as separate databases.

 

Tennant, R. (2004). Metadata's bitter harvest. Library Journal (1976), 129, no. 12 (?? 2004) , 32.

 

Tennis, Joseph T. “Layers of meaning: Disentangling subject access interoperability.” Advances in Classification Research, 12 (2004)

 

Therond, Daniel. "Www.European-Heritage.Net: The European Heritage Network". Cultivate Interactive, issue 2, no. 16 (Oct. 2000). <http://www.cultivate-int.org/issue2/herein/> (Aug. 7, 2002).

The European Information network on cultural heritage policies (HEREIN Project) recommended setting up a permanent information system for authorities, professionals, researchers and training specialists. The aim of the project was to convert the Council of Europe's paper databank on architectural and archaeological heritage into a system a) with fast, easy access via the Internet, and b) which correspondents in member countries would be able to update easily by email. 

 

Tillett, Barbara. "A Virtual International Authority File."  Presentation to the

Giornata di studio sul controllo di autorità nel Servizio Bibliotecario Nazionale

Nov. 22, 2002. <http://www.iccu.sbn.it/TillettAF.ppt> (April 1, 2003).

Objectives: a) facilitate sharing to reduce cataloguing costs to libraries, museums, archives, rights management agencies, etc. b) simplify creation and maintenance of authority records internationally, c) enable users to access information in the language, script, form they prefer.

 

Authority control virtues: a) “Precision” in searching, b) syndetic structure of references to help navigate (the variant forms of name/title/subject/etc.), c) displays to collocate works, d) links to forms used in particular resources, e) bring library catalogues into the mix of tools available on the Web.

 

There a number of p

rojects to facilitate or that incorporate aspects of authority control on a international scale: EU: AUTHOR Project, LEAF, <indecs>, INTERPARTY, HKCAN, IFLA: MLAR, GARR, FRANAR, Dublin Core “Agents”, DELOS/NSF Working Group “Actors/Roles”, EAC (Encoded Archival Context), CORC/Connexion, Unicode/Multiple Scripts, NACO/SACO for AACR2 and LSCH.  There is increased need for interoperability exemplified to efforts to map different communication formats with Z39.50 protocols, create crosswalks to the “MARCs”, XML, ONIX. The Virtual International Authority File (VIAF) supports IFLA UBC authority principles. Each country is responsible for authority headings for its own personal and corporate authors. National authority records are available for everyone to use. The same form and structure would be used worldwide.

 

VIAF proposes using programs to facilitate authority work, that would do automatic check of headings against existing local authority file, and if not found, would automatically check against “virtual” international authority file. It would display found matches for editing or reference and insert authorized forms into local authority record for future linking. The author would like to test using the unique, persistent record control numbers such as the International Standard Authority Number or the International Standard Authority Data Number and see if that works or possibly use the number assigned to an information package for an entity under OAI (Open Archive Initiative) protocols. There are many models that can be envisioned for a virtual international authority file to help with cataloging. Some of which are: a) a distributed system with the independent National Bibliographic Agencies (NBA's) being searchable using the next generation of Z39.50 protocols; b) a linked model that would use a search protocol, such as Z39.50 going to any one of the linked authority files (LEAF is testing this model); c) a centralized model that uses Open Archive Initiative protocols to harvest the metadata from authority files of the National Bibliographic Agencies on one or more servers; or d) providing a centralized link, where one authority file is viewed as the central point to which all others are linked.

 

Tudhope, Douglas, Alani, Harith, Jones, Christopher. "Augmenting Thesaurus Relationships: Possibilities for Retrieval," JoDI, 1, no. 8 (Feb. 5, 2001). <http://jodi.ecs.soton.ac.uk/Articles/v01/08/Tudhope/> (June 27, 2002).

The paper discusses the augmentation of thesaurus relationships. First the authors discussed a case study that explored the retrieval potential of an augmented set of thesaurus relationships by specializing standard relationships into richer subtypes, in particular hierarchical geographical containment and the associative relationship. Various attempts to build taxonomies of thesaurus relationships are discussed. They concluded by discussing the feasibility of hierarchically augmenting the core set of thesaurus relationships, particularly the associate relationship. They discussed the possibility of enriching the specification and semantics of Related Term (RT relationships), while maintaining compatibility with traditional thesauri via a limited hierarchical extension of the associative relationships. They first illustrated how hierarchical spatial relationships can be used to provide more flexible retrieval for queries incorporating place names in applications employing online gazetteers and geographical thesauri. The work described was part of a larger project, Ontologically Augmented Spatial Information System (OASIS). Another aim was to explore the potential of reasoning over the semantic relationships in thesauri to assist retrieval. The three main types: a) equivalence (equivalent terms), b) hierarchical (broader/narrower terms: BT/NT's), c) Associative (related terms: RT's)

 

UK Interoperability Focus, 2000. <http://www.ukon.ac.uk/interop-focus/about/> (Aug. 7, 2002).

UK Interoperability Focus is hosted by UKOLN. This post is responsible for exploring, publicizing and mobilizing the benefits and practice of effective interoperability across diverse information sectors. Interoperability is a broad term, encompassing many of the issues impinging upon the effectiveness with which diverse information resources might fruitfully co-exits. The issues are many be may be defined as:

1) Technical Interoperability: consideration of technical issues includes ensuring an involvement in the continued development of communication, transport, storage and representation standards such as Z39.50, ISO-ILL, XML, etc. <technical architecture>

2) Semantic Interoperability: … individual resources - each internally constructed in their own semantically consistent fashion - are made available through gateways or catalogs. Almost inevitably these discrete resources use different terms to describe similar concepts, or even identical terms to mean very different things. The development and distributed use of thesauri such as those from Getty is worthy of further consideration.

3) Political / Human Interoperability: there are implications for the organizations concerned who may see it as a lost of control or ownership. Staff may need extensive training or retraining to ensure effective long-term use of any service

4) Inter-community Interoperability: between institutions

5) International Interoperability: existing issues magnified with varied languages, differences in technical approach, working practices, etc.

 

Unified Medical Language System (UMLS). <http://www.nlm.nih.gov/research/umls/> (March 26, 2002). 

NLM's Unified Medical Language System (UMLS) project develops and distributes multi-purpose, electronic "knowledge sources" and associated lexical programs. The Metathesaurus provides a uniform, integrated distribution format for more than 100 biomedical and health-related vocabularies, classifications, and coding systems (some in multiple languages) and links many different names for the same concepts. System developers can use the UMLS products to enhance their applications. There are three UMLS Knowledge Sources: the Metathesaurus ® , the Semantic Network, and the SPECIALIST lexicon. They are distributed with flexible lexical tools and the MetamorphoSys install and customization program.

 

Van de Sompel, Herbert Van, Jeffrey A. Young, Thomas B. Hickey. "Using the OAI-PMH … Differently," in D-Lib magazine, 9, no. 7/8 (July 3, 2003).  <http://www.dlib.org/dlib/july03/young/07young.html> (July 23, 2003).

 The Open Archives Initiative's Protocol for Metadata Harvesting (OAI-PMH) was created to facilitate discovery of distributed resources. The OAI-PMH achieves this by providing a simple, yet powerful framework for metadata harvesting. The OAI-PMH has been widely accepted, and until recently, it has mainly been applied to make Dublin Core metadata about scholarly objects contained in distributed repositories searchable through a single user interface. Initially, the descriptive metadata provided by OAI-PMH repositories was to a large extent limited to the mandatory unqualified Dublin Core, but an evolution towards the provision of more extensive descriptive metadata, such as MARC21, is becoming apparent. Metadata records in the OAI-PMH are any data that can be validated against a W3C XML Schema. Therefore, the OAI-PMH can be a medium for incremental, data-sensitive exchange of any form of semi-structured data. The metadata contained in OAI-PMH repositories is typically gathered by harvesters that process it and make it searchable through a user interface. In these uses of the OAI-PMH, repositories are never directly accessed by end-users; the "customers" of the repositories are robots. A section of the article describes an approach to overlay OAI-PMH repositories with an interface allowing users to directly navigate the repository content. The authors also show how this approach has been used to make the GSAFD Thesaurus, the OpenURL Registry and the XTCat Thesis Catalog user-accessible.

 

Veen, Theo van and Robina Clayphan.  "Metadata in the Context of the European Library Project." Presented at Proceedings of the International Conference on Dublin Core and Metadata for e-Communities, 2002: 19-26 <http://www.bncf.net/dc2002/program/papers.html>

The European Library sponsored by the European Commission, brings together 10 major European national libraries and library organizations to investigate the technical and policy issues involved in sharing digital resources.

 

Vizine-Goetz, Diane. "Dewey in CORC: Classification in Metadata and Pathfinders." Journal of Internet Cataloging, 4, no. 1 / 2 (2001): 67-80.

The Cooperative Online Resource Catalog (CORC) project provided an opportunity for OCLC research and Dewey editors to explore the potential of the Dewey Decimal Classification (DDC) system for organizing electronic resources. The mapped vocabulary was used in the following ways: 1) to improve access to Dewey by expanding the indexing vocabulary; 2) to assist in the assignment of subject elements during metadata creation; 3) to provide supplemental terminology for automated classification; 4) to provide alternative access mechanisms for views to resources in the CORC database.

 

Vizine-Goetz, D., Hickey, C., Houghton, A. H., & Thompson, R. (2004). “Vocabulary mapping for terminology services.” Journal of Digital Information, 4, no. 4 (2004)

 

Vizine-Goetz, Diane. "Terminology Services." Presentation. 2003. <http://www.oclc.org/research/projects/mswitch/> ; <http://www.oclc.org/research/projects/mswitch/4_termservs.shtm> (Feb. 18, 2005),

Discusses research at OCLC to add value to metadata. Metadata Switch is a project involving a set of projects: harvesting metadata, merging metadata from different sources, schema transformation, terminology and name authority services, enrichment or augmentation of records with various types of data.  DDC, Thesaurus of ERIC descriptors, GSAFD genre terms, MeSH, LSCH, and LCSHAC were converted to a common content model and linked using intellectual and automated mapping techniques.

 

Wagner, Harry R. "The EOR toolkit: an Open Source Solution for RDF Metadata," Information Technology and Libraries, 21, no. 1 (March 2002): 27-31.

RDF provides solutions that will enable a significantly higher degree of reliability, relevance, and accuracy for applications and services focused on resource discovery and management of Web sites and other Internet resources. Through its use of machine-understandable semantics, RDF enables the automated discovery, management, and exchange of metadata. It significantly improves resource discovery by enabling a finer degree of granularity and improved precision. In addition to facilitating the creation of new resources descriptions, RDF builds on the established work of various resources communities by enabling the interoperability of existing metadata vocabularies within those communities.
EOR is one of a large and growing number of open resources applications that are being used to develop applications and services focused on the discovery, management, integration, and navigation of electronic resources. http://eor.dublincore.org

 

Wake, Susannah and Nicholson, Dennis. "HILT - High-Level Thesaurus Project: Building Concensus for Interoperable Subject Access across Communities." D-Lib Magazine, 7, no. 9 (Sept. 2001). <http://www.dlib.org/dlib/september01/wake/09wake.html> (Oct. 26, 2002).

The article provides an overview of the work carried out by the HILT Project http://hilt.cdlr.strath.ac.uk in making recommendations towards interoperable subject access, or cross-searching and browsing distributed services amongst the archives, libraries, museums and electronic services sectors. The article discusses the consensus achieved at the June 19, 2001 HILT Workshop. The best way forward for HILT was the pilot mapping service combined to an extent with a terminologies task force. The service envisaged would map key schemes like LCSH, UNESCO, DDC, Universal Decimal Classification, Art and Architecture Thesaurus, and possibly user and regional terminologies, and local adaptations of standard schemes. Users would be able to: a) input the term or terms that describe their problem using the terminology that is most meaningful to them; b) specify their query more closely if necessary by specifying a context; and c) obtain a list of equivalent or near-equivalent terms with which they could then cross-search or cross browse the various services.

 

Wake, Susannah (2001). HILT: High-Level Thesaurus Project. Paper presented at IFLA Satellite Meeting: Subject Retrieval in a Networked World, OCLC, Dublin, Ohio, Aug. 14-16, 2001. <http://hilt.cdlr.strath.ac.uk/Dissemination/Talks/hilt-ifla.ppt> (Oct. 26, 2002).

Presentation gives background of HILT and summarizes the work of the June 2001 HILT Workshop.

 

Whitehead, C. “Mapping LCSH into thesauri: The AAT model”. In T. Peterson, & P. Moholt (Eds.), Beyond the book: Extending MARC for subject access. Boston: G.H. Hall, 1990. 81

 

Willpower Information. Publications on Thesaurus Construction and Use. <http://www.willpower.demon.co.uk/thesbibl.htm> (July 1, 2002).

This is a list of printed and electronic publications about the principles of constructing and using information retrieval thesauri. It is not a list of existing thesauri, although some thesauri have been included when they are good examples or illustrate the results of different approaches to thesaurus construction. References to lists of thesauri and systems that provide for thesaurus use by combining terms from multiple facets in search interfaces are given at the end.

 

WordNet : a Lexical Database of the English Language, 2001. <http://www.cogsci.princeton.edu/~wn/> (Aug. 6, 2002).

WordNet is an online lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets. Developed by the Cognitive Science Laboratory at Princeton University under the direction of Prof. George A. Miller.

 

Xiaoming Liu, [OAI-Implementers] "Dublin Core XML and OAI," March 29, 2002, personal email to listserv. <http://arc.cs.edu/edu>  

Xiaoming Lui's work on ARC, building on Open Archives Initiative work includes a subject file from various schemas.

 

Young, Iain. "Da Chanan / Two Languages: Creating Bi-lingual Name Authorities."  Paper presented at the 68th IFLA Council and General Conference, Glasgow, Scotland, Aug. 18-24, 2002. <http://www.ifla.org/IV/ifla68/prog02.htm> ; <http://www.ifla.org/IV/ifla68/papers/025-144e.pdf> (Jan. 18, 2003).

The issue is how to create standard name authorities in a bi-lingual environment. Using as a specific example the project undertaken by the Scottish Poetry Library to create name authorities for Gaelic poets, some with Gaelic and English forms of their names, issues raised are examined.

Zeng, M. L., & Chan, L. M. (2004). “Trends and issues in establishing interoperability among knowledge organization systems.”  Journal of the American Society for Information Science and Technology, 55 (5), 377-395.