Project
name: ADL Thesaurus Protocol
Project
URL: http://alexandria.sdc.ucsb.edu/~gjanee/thesaurus/
Project Description:
Protocol for exchange of thesaurus information. Thesaurus data exchange
tool.
The
Thesaurus Protocol is based on the ANSI/NISO (1993) Z39.19 thesaurus model and
supports downloading, querying, and navigating thesauri.
Protocol: XML- and HTTP-based protocol
Standards: ANSI/NISO
Z39.19-1993: Guidelines for the Construction, Format, and Management of
Monolingual Thesauri.
Hierarchy: hierarchy of terms above (broader than) or below
(narrower than) a starting preferred term, including the starting term itself.
The hierarchy is indicated by the nesting of XML elements.
The protocol provides five independent, stateless services.
Queries the thesaurus by term name and returns a list of the
matching terms. operator is the matching
operator to employ
Provisional Checklist:
Please
use the following form to evaluate semantic interoperability projects.
1. Types of
data being integrated
Does the project have:
|
(a) different
controlled vocabularies in same language?
(b) different
controlled vocabularies in different languages?
(c) different
classification schemas (e.g., DDC, UDC, LCC)?
If yes,
which ones?
(d) controlled
vocabularies combined with classification schemas?
(e) different
metadata framework schemas (e.g., XML, MARC, Dublin Core)?
If yes,
which ones? XML- and HTTP-based protocol
(f) different
communication protocols?
(g) other:
|
2.
Autonomy and Integrity of Constituent Parts
|
(a) Is standardization, reconciliation,
or conversion of semantic data reversible?
(a.1) Can precoördinated strings,
once filtered or deconstructed for semantic matching, later be put back
together again?
There are five ways to search the data:
1.Hierarchy—Search asking for broader terms
2.Hierarchy—Search asking for narrower terms
3. String
4. Single term
5. Boolean searches
(I think this is what Daniel is asking but I am
not clear on this)
(b) Is full complement of metadata and
indigenous subject hierarchies preserved?
If so, how? Hierarchies are preserved through broader and
narrower terms
(c) Does project rely on principle of
least common denominator?
If so, many data sets
may be able to coexist in database, but given resulting stripped-down or
‘dumbed-down’ resource descriptions, the database may no longer serve the
interests of readers. (cf. recently cited problems with Dublin Core20 How does the use of least common
denominator effect the quality of service? Hierarchies are preserved through broader
and narrower terms
(d)
How is data stored: gathered into a union catalog (e.g., American Memory
Project, NSDL), vs. distributed database?
distributed database
(e) How are metadata (including SI links)
stored? (e.g., via authority records, concordance tables, a central
switching language, semantic networks, lexical databases, semantic layers,
etc.)
|
|
|
3. Reconciliation of heterogeneous vocabularies
|
(a) How is correlations established when a
single term in one source has no equivalent term in the other? Used for terms are listed
(b) Certain vocabularies are highly
structured and hierarchical, while others contain terms lacking any structure
at all aside from serial numbers or other unique identifiers. How are these
differences reconciled?
(c) How are conflicts resolved when an
established heading in one vocabulary matches a cross reference in other
vocabularies? (E.g., Tumors is an established LCSH heading, but in MeSH it is
a cross reference to Neoplasms; and vice versa)
(d) If multiple vocabularies are used in
a single bibliographic record, and the headings from such vocabularies
are identical (after normalization), how are duplicate retrievals handled?
|
4. Effective
and Efficient Resource Discovery (Precision and Recall), Satisfying User Needs
|
(a) Does project provide high or
satisfactory levels of precision and recall?
(b) To what extent does project rely on
precoördination?
If mostly post-coordinate, then:
i)
by what means is recall maximized?
ii)
by what means is precision maximized?
(d) Does project provide faceted approach
(facilitating polysemy) while retaining option for browsable hierarchy
(facilitating navigation)?
(e) Are the following objectives and
functions supported in the S.I. environment?
i) Locate entities in the
system via surrogates (find)
ii)
Identify a surrogate that matches an entity (collocate)
iii)
Select an entity appropriate to a user’s need via surrogates
(choice facilitation)
iv) Obtain access
to the entity via the system and its surrogates (acquisition)
v) Navigate the
system and its surrogates (navigation)
(f) Has developer released beta version
for general testing?
(g) Have user satisfaction surveys been
conducted?
|
5. Ease of Use (this is actually part of our
definition, i.e., SI should function “without special effort by the user,”
(where “users” include information creators and managers, and end-users)).
|
(a) Intuitive interface for data entry,
searching, browsing, etc.?
(b) Automate validation, mapping,
metadata extraction, etc., as much as possible?
(c) Availability of documentation?
|
6. Long-term viability
|
(a) Master plan for life-cycle management
and data migration?
(b) Reliance on open-source international
standards versus proprietary standards?
(c) viable business model
(e.g., not based exclusively on research grant with likely expiration)?
|