Project Name:  H.W. Wilson Megathesaurus  

Project URL:  www.hwwilson.com/Databases/omnifile.cfm

Project Description: Merges KOS of different structural types

H.W. Wilson has developed a “megathesaurus” that gathers the vocabulary for all its indexes for inclusion in its Omnifile product. The Omnifile product now includes six of the 11 Wilson periodical files, plus all of the full text from the remaining five files. Eventually Omnifile will probably include all their files, but this may take some time, since the remaining five are very specialized. Files covering non-periodical material use different indexing vocabularies and do not form part of the Omnifile product.

Provisional Checklist

Please use the following form to evaluate semantic interoperability projects. 

1. Types of data being integrated

                             Does the project have:

    (a) different controlled vocabularies in same language?

    (b) different controlled vocabularies in different languages?

    (c) different classification schemas (e.g., DDC, UDC, LCC)?

             If yes, which ones?      

    (d) controlled vocabularies combined with classification schemas?

    (e) different metadata framework schemas (e.g., XML, MARC, Dublin Core)?  

             If yes, which ones? SFX

    (f) different communication protocols?

    (g) other:      

 

2. Autonomy and Integrity of Constituent Parts

    (a) Is standardization, reconciliation, or conversion of semantic data                       reversible? 

         (a.1) Can precoördinated strings, once filtered or deconstructed for semantic matching, later be put back together again?

     

 

    (b) Is full complement of metadata and indigenous subject hierarchies preserved?

            If so, how?  Diligent editing reconciles subject headings in the various specialties for uniformity throughout, so your search won’t miss a single relevant citation. What’s more, the acclaimed Wilson Name Authority File keeps corporate and personal names uniform throughout the database, so you can be confident that name searches are thorough and yield the desired results every time.     

    (c) Does project rely on principle of least common denominator?

If so, many data sets may be able to coexist in database, but given resulting stripped-down or ‘dumbed-down’ resource descriptions, the database may no longer serve the interests of readers. (cf. recently cited problems with Dublin Core20   How does the use of least common denominator effect the quality of service?     

            (d) How is data stored: gathered into a union catalog (e.g., American Memory Project, NSDL), vs. distributed database?  Distributed database

    (e) How are metadata (including SI links) stored?  (e.g., via authority records, concordance tables, a central switching language, semantic networks, lexical databases, semantic layers, etc.)   Diligent editing reconciles subject headings in the various specialties for uniformity throughout, so your search won’t miss a single relevant citation. What’s more, the acclaimed Wilson Name Authority File keeps corporate and personal names uniform throughout the database, so you can be confident that name searches are thorough and yield the desired results every time.    

 

 

 

3. Reconciliation of heterogeneous vocabularies

                   (a) How is correlations established when a single term in one source has no equivalent term in the other?  Diligent editing reconciles subject headings in the various specialties for uniformity throughout, so your search won’t miss a single relevant citation. What’s more, the acclaimed Wilson Name Authority File keeps corporate and personal names uniform throughout the database, so you can be confident that name searches are thorough and yield the desired results every time.     

   (b) Certain vocabularies are highly structured and hierarchical, while others contain terms lacking any structure at all aside from serial numbers or other unique identifiers. How are these differences reconciled?    Diligent editing reconciles subject headings in the various specialties  

                  (c) How are conflicts resolved when an established heading in one vocabulary matches a cross reference in other vocabularies? (E.g., Tumors is an established LCSH heading, but in MeSH it is a cross reference to Neoplasms; and vice versa)Diligent editing reconciles subject headings in the various specialties

 

   (d) If multiple vocabularies are used in a single bibliographic record, and the headings from such vocabularies are identical (after normalization), how are duplicate retrievals handled?   Unknown

 

 

 

4. Effective and Efficient Resource Discovery (Precision and Recall), Satisfying User Needs

     (a) Does project provide high or satisfactory levels of precision and recall?

 

 Precoordination   (b) To what extent does project rely on precoördination?

 

        If mostly post-coordinate, then:

i)                    by what means is recall maximized?    

 

ii)                   by what means is precision maximized?     

 

 

    (d) Does project provide faceted approach (facilitating polysemy) while retaining option for browsable hierarchy (facilitating navigation)?

    (e) Are the following objectives and functions supported in the S.I. environment?

        i)         Locate entities in the system via surrogates (find)

         ii)      Identify a surrogate that matches an entity (collocate)

         iii)     Select an entity appropriate to a user’s need via surrogates (choice facilitation)

        iv)       Obtain access to the entity via the system and its surrogates (acquisition)

         v)            Navigate the system and its surrogates (navigation)

   (f) Has developer released beta version for general testing?

   (g) Have user satisfaction surveys been conducted?

 

 

5. Ease of Use (this is actually part of our definition, i.e., SI should function “without special effort by the user,” (where “users” include information creators and managers, and end-users)).

    (a) Intuitive interface for data entry, searching, browsing, etc.?

    (b) Automate validation, mapping, metadata extraction, etc., as much as possible?

    (c) Availability of documentation?

 

6. Long-term viability

    (a) Master plan for life-cycle management and data migration?

    (b) Reliance on open-source international standards versus proprietary standards?

    (c) viable business model (e.g., not based exclusively on research grant with likely expiration)?