Project
Name: UMLS Metathesaurus
Project
URL: http://www.nlm.nih.gov/pubs/factsheets/umlsmeta.html
Project
description: The UMLS consists of three
Knowledge Sources: the UMLS Metathesaurus, the SPECIALIST lexicon, and the UMLS
Semantic Network. The Metathesaurus is a database containing semantic
information about biomedical concepts, their various names, and the
relationships among them.
The
Metathesaurus is built from over 100 biomedical source vocabularies, some in
multiple languages. The 2003 edition includes 875,255 concepts and 2.14 million
concept names. The UMLS Semantic Network is used for mapping index terms from
different thesauri through its 134 semantic types which provides a consistent
categorization of all concepts represented in the Metathesaurus.
Provisional Checklist
Please
use the following form to evaluate semantic interoperability projects.
1. Types of
data being integrated
Does the project have:
|
(a) different controlled vocabularies in
same language?
(b) different controlled vocabularies in
different languages?
(c) different classification schemas
(e.g., DDC, UDC, LCC)?
If yes, which ones?
(d) controlled vocabularies combined with
classification schemas?
(e) different metadata framework schemas
(e.g., XML, MARC, Dublin Core)?
If yes,
which ones? Dublin
Core
(f) different communication protocols?
(g) other:
|
2.
Autonomy and Integrity of Constituent Parts
|
(a) Is standardization, reconciliation,
or conversion of semantic data
reversible?
(a.1) Can precoördinated strings, once filtered or deconstructed for semantic matching, later be put back together again?
(b) Is full complement of metadata and
indigenous subject hierarchies preserved?
If so, how? Through linking
(c) Does project rely on principle of
least common denominator?
If so, many data sets
may be able to coexist in database, but given resulting stripped-down or ‘dumbed-down’
resource descriptions, the database may no longer serve the interests of
readers. (cf. recently cited problems with Dublin Core20 How does the use of least common
denominator effect the quality of service?
(d)
How is data stored: gathered into a union catalog (e.g., American Memory
Project, NSDL), vs. distributed database?
Distributed database
(e) How are metadata (including SI
links) stored? (e.g., via authority records, concordance tables, a
central switching language, semantic networks, lexical databases, semantic
layers, etc.) Linking records
|
|
|
3. Reconciliation of heterogeneous vocabularies
|
(a) How is correlations established
when a single term in one source has no equivalent term in the other? Linking
(b) Certain vocabularies are highly
structured and hierarchical, while others contain terms lacking any structure
at all aside from serial numbers or other unique identifiers. How are these
differences reconciled? By adding certain basic information to each concept
and establishing new relationships between terms from different source
vocabularies
(c) How are conflicts resolved
when an established heading in one vocabulary matches a cross reference in
other vocabularies? (E.g., Tumors is an established LCSH heading, but in MeSH
it is a cross reference to Neoplasms; and vice versa). By linking concepts that are similar along some
dimension
(d) If multiple vocabularies are used
in a single bibliographic record, and the headings from such
vocabularies are identical (after normalization), how are duplicate
retrievals handled? Through
linking
|
4. Effective
and Efficient Resource Discovery (Precision and Recall), Satisfying User Needs
|
(a) Does project provide high or
satisfactory levels of precision and recall?
Precoordination
(b) To
what extent does project rely on precoördination?
If mostly post-coordinate, then:
i)
by what means is recall maximized?
ii)
by what means is precision maximized?
(d) Does project provide faceted approach
(facilitating polysemy) while retaining option for browsable hierarchy
(facilitating navigation)?
(e) Are the following objectives and
functions supported in the S.I. environment?
i) Locate entities in the
system via surrogates (find)
ii)
Identify a surrogate that matches an entity (collocate)
iii)
Select an entity appropriate to a user’s need via surrogates
(choice facilitation)
iv) Obtain access
to the entity via the system and its surrogates (acquisition)
v)
Navigate
the system and its surrogates (navigation)
(f) Has developer released beta version
for general testing?
(g) Have user satisfaction surveys been
conducted?
|
5. Ease of Use (this is actually part of our
definition, i.e., SI should function “without special effort by the user,”
(where “users” include information creators and managers, and end-users)).
|
(a) Intuitive interface for data entry,
searching, browsing, etc.?
(b) Automate validation, mapping,
metadata extraction, etc., as much as possible?
(c) Availability of documentation?
|
6. Long-term viability
|
(a) Master plan for life-cycle management
and data migration?
(b) Reliance on open-source international
standards versus proprietary standards?
(c) viable
business model (e.g., not based exclusively on research grant with likely
expiration)?
|