January 1, 1998Open Access

Identifiers and Their Role In Networked Information Applications

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Identifiers are an enormously powerful tool for communication within and between communities. For example, the International Standard Book Number (ISBN) has played a central role in facilitating business communications between booksellers and publi shers; it has also been important to libraries in identifying materials. The International Standard Serial Number (ISSN) plays a pivotal role in facilitating commerce among publishers, libraries and serials jobbers; it is also vital to libraries in managi ng their own internal processes, such as serials check-in. Bibliographic utility identifier numbers such as the OCLC or RLIN numbers are used in duplicate detection and consolidation in the construction of online union catalog databases. The traditional bibliographic citation can be viewed as an identifier of sorts, albeit one that is not rigorously defined; it has many variations in style and data elements based on editorial policies. Yet the ability to cite is central both to the construction of the record of discourse for our civilization and to the development of scholarship; the citation plays an essential role in allowing authors to reference other works and in permitting readers to locate these works. The assignment of identifiers to works is a very powerful act; it states that, within a given intellectual framework, two instances of a work that have been assigned the same identifier are the same, while two instances of a work with different ide ntifiers are distinct. The use of identifiers outside of their framework of assignment, though, is often problematic. For example, normal practice assigns a paperback edition of a book one ISBN and the hardcover edition another, so bookstores can distingu ish between these versions, which usually vary in price and availability. But ISBNs are also used sometimes in bibliographic citations; in this situation, when the content and pagination of the hardcover and paperback editions are identical, either will s erve equally well for a reader tracking down a citation, and the inclusion of an ISBN as an identifier for the cited work may actually cause problems because it is making an unnecessary distinction (for this purpose) between versions of the same work. A great deal of scholarship involves the development of identifier systems that allow scholars to name things in a way which makes distinctions and recognizes logical equivalence - ways of identifying editions of major authors or composers, variati ons in coinage having numismatic significance, or the identification of chemicals, proteins or biological species. Often the rules for assigning identifiers to objects are the subject of ongoing scholarly debate and form a key part of the intellectual fra mework for a field of study. Identifiers take on a new significance in the networked environment. To the extent that a computational process can allow a user to move from the occurrence of an identifier to accessing the object being identified, identifiers become actionable. F or example, in the World Wide Web links can be constructed between the entries in an article's bibliography and digital versions of the cited works, links that can be transversed with a mouse-click. In the networked information environment, we have recently seen the emergence of a number of important new identifiers, some of which are relatively mature and others that are still under development. The remainder of this article briefly discusses a number of these identifiers. Uniform Resource Locators (URLs) are a class of identifiers that became popular with the emergence of the World Wide Web. We first saw them on Web pages, later in newspaper advertising and on the sides of buses, and then everywhere; currently they serve as the key links between physical artifacts and content on the Web, as well as providing linkage between objects within the Web. URLs have clearly been very effective; yet they are unsatisfactory in one very major way. They are really not names, in that they don't specify logical content, but, rather, are merely instructions on how to access an object. URLs include a service name (such as "FTP" for file transfer or "HTTP" for the Web's hypertext transfer protocol) and parameters that are passed to the specified service - most typically a host name and a file name on that host, both of which may be ephemer al. From a long-term perspective, the service name is also ephemeral - for example, content may well outlive a specific service (as has already been the case with the GOPHER service). It is important to recognize that URLs were never intended to be long-l asting names for content; they were designed to be flexible, easily implemented and easily extensible ways to make reference to materials on the 'Net. The Internet Engineering Task Force (IETF), which manages standards development for the Internet, realized the limitations of URLs for persistent reference to digital objects several years ago and as a result began a program to develop a parallel s ystem called Uniform Resource Names (URNs). The IETF URN working group recognized that the URN system must accommodate a multiplicity of naming policies for the assignment of identifiers. Put another way, the IETF URN effort establishes a framework within which identifiers can be defined; it does not itself define any identifier systems. Roughly speaking, the syntax of a URN for a digital object is defined as consisting of a naming authority identifier (which is assigned through a central registry) and an object identifier which is assigned by that naming authority to the object in question; the specific content of the identifier may have structure and significance to users familiar with the practices of a given naming authority, but has no predefined meaning within the overall URN framework. Note that the URN syntax does not speci fy an access service for the object, unlike a URL. The second key idea in the URN framework is that of resolution services or processes - which may be as complex as new network protocols and infrastructure (analogous to the Domain Name System, for example) or processes as simple as a database lookup - which translate a URN into instructions for accessing the named object. Systems which provide resolution services are called "resolvers"; sometimes the IETF work also refers to "resolution databases" which provide the mapping from names to object locations and access services. URNs are resolved to sets of URLs which provide access to instances of the named digital object. A URN may resolve to more than one URL because there are copies of the digital object that have been replicated at multiple locations such as mirror si tes, or because the URN (as defined by the relevant naming authority) specifies the object at a high degree of abstraction, and multiple manifestations of the object (for example, in different formats, such as ASCII, SMGL and PDF) are available. There is no explicit requirement that the URN to URL resolution process expose the mapping from an abstract definition of content to a variety of specific manifestations; it is equally legitimate for the choice of format to be made as part of a protocol negotiatio n in evaluating a URL when using a sophisticated protocol such as the Z39.50 Information Retrieval Protocol which supports such negotiation. As the location and means of access for objects change, the resolver's database is updated; thus, resolving a URN tomorrow may return a different set of URLs. Today's standard browsers do not yet understand URNs and how to invoke resolvers to convert them to URLs, but hopefully this support will be forthcoming in the not too distant future. One can reasonably view the URN framework as the means by which both existing and new identifier systems will be moved into the networked environment. The URN framework is intended to be sufficiently flexible to subsume virtually all existing bibliographic identifiers (sometimes referred to as "legacy" ident ifier systems); for example, the IETF working group documented how the ISSN, ISBN and SICI might be implemented as URNs. The IETF uses the term Uniform Resource Identifiers (URIs) as a generic name to cover both URLs and URNs, along with the still immature concept of Uniform Resource Characteristics (URCs), which can be thought of as structures which allow one or mor e URNs (perhaps from different naming frameworks) to be related both to sets of URLs and to metadata describing the objects identified by the URNs and URLs. As a stopgap measure to address some of the problems with the persistence of URLs, about two years ago OCLC deployed a system called the PURL (Persistent URL). Basically, PURLs are HTTP URLs where the usual hostname has been replaced with the host "PURL.ORG" and the filename is an identifier for the "real" content being referenced. The PURL.ORG host will be maintained for the long term by OCLC under that name; when someone registers an object with this PURL server they provide t he current hostname and filename for the object and the PURL server creates a database entry linking this hostname and filename to the identifier that will appear in the PURL. When the PURL server is contacted because someone is evaluating a PURL, it look s up the identifier in its database, finds out where the object in question currently resides, and uses the redirect feature of the HTTP protocol to connect the requester to the host housing the object. Content providers are responsible for sending update s to the PURL server when the content file name and/or location changes. PURLs share the idea of indirection - looking up an identifier in a database to find out where the object is currently stored - with URN resolvers as a means of achieving persistence. They are a very clever and practical design, in that they work w ith the existing installed base of Web browsers. However, they are not truly names, since they only permit content to be accessed through a specific service, namely HTTP. PURLs will probably no longer work as new protocols appear that supersede HTTP and a s content migrates to access through such successor protocols. The Serial Item and Contribution Identifier (SICI) code was recently revised by a standards committee under the auspices of the National Information Standards Organization (NISO), the ANSI-accredited standards body serving libraries, publishers and information service providers; it is described in American National Standard Z39.56-1996. The SICI relies in an essential way on the ISSN to identify the serial and can be used to identify a specific issue of a serial or a specific contribution within an issue (such as an article or the table of contents). The SICI code is starting to see wide implementation and is likely to serve a central role in a number of applications: it can be used not only to identify articles, but also to link citations from article bibliographies or abstracting and indexing databases to articles in electronic form. It is an important part of the infrastructure that supports ARL's NAILLD program to streamline interlibrary loan and document delivery. One of the great strengths of the SICI is that it can be determined directly from an issue of a journal (or an article within the issue), assuming only that the ISSN for the journal can be somehow determined. As such, it represents an open standard for creating linkages to articles or other serial components. Also under NISO auspices, work has just begun on a new identifier with the working name of Book Item and Contribution Identifier (BICI). The BICI can be used to identify specific volumes within a multi-volume work or components such as chapters wit hin a book. There are still a number of unresolved issues surrounding the exact scope of this standardization effort, both in terms of the range of works that it applies to (for example, sound recordings as well as books) and the level of granularity of t he identifier (for instance, whether it can identify a specific illustration or table within a work, something the SICI is not currently designed to do). In the past few months, the Association of American Publishers (AAP) and their technical contractor, the Corporation for National Research Initiatives (CNRI), have issued a great deal of publicity about a new identifier called, rather grandly, the Digital Object Identifier (DOI). The DOI is based on CNRI's "handle" system - a very general identifier system that fits roughly within the URN framework and that provides a mechanism for implementing naming systems for arbitrary digital objects . Thus far, the DOI has been demonstrated within the context of online consumer acquisition of intellectual property and perhaps for this reason, it is somewhat difficult to disentangle the proposed DOI standard, the demonstration implementation of the DO I, and applications enabled by it. Major demonstrations of the DOI system were scheduled for the Frankfurt Book Fair in October 1997. There are a number of misconceptions surrounding various aspects of the DOI. Its development does not mean that everything on the Web will become pay-per-view; rather, the DOI provides a method for collecting revenue for access to material that is described by a DOI (either on a one-time license or pay-per-view basis), if the organization that owns the rights to the object wishes to do this. Some objects described by DOIs may be accessible without charge. DOIs in and of themselves are only identifi ers and do not imply that any sort of copyright enforcement mechanisms (like an "envelope" or other secure container) will be bundled with the objects that they describe; the presence or absence of such copyright enforcement technologies is an e ntirely separate issue. These copyright enforcement technologies can be used with objects described by all sorts of identifiers, not just DOIs. I believe there are some legitimate concerns about the use of DOIs as a means of implementing actionable citations among works on the Web, since this is likely to mean that the author of the citing work is going to need to obtain the DOI of the wor k that he or she wishes to cite either from the owner of the cited work or from some third party, and following a citation would then involve interaction with the DOI resolution service, raising privacy and control issues. But the notion that the use of D OIs will make the networked environment "safe" for proprietary intellectual property in a way that it is not today is as improbable as the idea that the introduction of DOIs, as one type of commonly used URN, will somehow convert the entire Web into a pay-per-view environment. Discussions with the DOI developers suggest that the DOI's role will be as an identifier of content that is available for acquisition; there is currently some ambiguity as to whether it actually identifies content directly or if it simply identifie s a method of acquiring content (such as an order screen). It is also extremely unclear under what circumstances similar objects are assigned distinct DOIs. Current plans seem to be to carefully control what organizations are permitted to assign DOIs, lim iting the groups to "legitimate" publishers; thus, a DOI is hoped to offer some "brand name" confidence to consumers purchasing content on the 'Net. DOIs will be assigned to content as it is made available for acquisition and perhaps r emoved from the DOI database as content is withdrawn from availability for acquisition. It is important to recognize that there does not seem to be consensus on most of these issues at present within the DOI developer community, which underscores the unce rtainties about the potential roles and utility of the DOI outside of its use as a means for consumers to acquire content. In general, one cannot determine the DOI assigned to a digital object, or even whether the object has a DOI, unless the object carries it as a label. However, this can be confusing, because some publishers use, for those digital objects which are w ithin the scope of the SICI, the SICI code as their (publisher-assigned) identifier. The implications of this practice will require careful examination and analysis. It is also unclear what role the DOI can usefully play in identifying material outside of acquisitions - for example, for material that is already licensed and is part of a library's collection, where it would be desirable to resolve "bibliographic" links to this material, but when it is inappropriate to connect library to t he acquisitions defined by the DOI. It that DOIs can be implemented within the IETF URN framework, there are a few having to do with to the of no has yet been which out these of the DOI developer have the for Information to work with them to of the DOI's and as they to library a to suggest ways in which the DOI might be made more to the bibliographic NISO has also been in to the work on DOIs to the of the NISO and a in to for general bibliographic identifiers in the networked environment. The DOI as it currently to be is likely to be a tool to permit consumers to acquire content from publishers on the with some confidence about they are business present concerns with it to the k of surrounding many aspects of this the very by the name DOI, which seem to be with its definition Object or something might be more and the very potential that are if this identifier is into such as a means of implementing citations in digital In a very there are no identifi but it is very to identifiers to or inappropriate new identifier systems are some have been for the networked information environment, while others are identifiers that are being into the digital When evaluating a new there are a number of essential to is the scope of the identifier of objects can be identified with is permitted to assign identifiers, and how are these organizations identified, and are the rules for assigning new when are two instances of a work the same assigned the same within the and under what are they distinct assigned different from distinctions that are by the assignment of does one determine the identifier for the work, and can one it from the work or does one need to some proprietary database maintained by a third To what class of objects are the identifiers ithin this class of is there an method of identifiers under the identifier or does someone have to make a specific to assign an identifier to an makes this and Note that, the identifier cannot be from the identified work, it is for use as a identifier within any system of open The of reference not proprietary databases or services. is the identifier that how does one from the identifier to the identified work or to other identifiers or metadata to permit the instances of the work to be and what is the role of proprietary third databases in resolving the the or of these resolution services have control are the to entry for new resolution are the policies of the resolution services in such as user privacy and persistent is the identifier one still resolve it the work to be Identifiers that on the of the are very for citations or other that can serve the long-term or scholarly of the new identifiers are likely to be to some community, for some but it will be essential to determine what roles new identifier is for and to using various of identifiers in roles that are The URN framework being by the IETF also all are to on networked information to carefully what they need from identifier systems and whether those are by new systems. by The author to this article for use as long as the author and are For use, a be to the author URLs are defined in Internet for URNs are defined in Internet and the syntax are defined in There are also a number of systems that are currently being deployed on an on the Internet for example, There are also a number of Internet that are currently under in Internet that cover such as system and t he use of bibliographic identifiers as URNs. For information on the DOI, see Information on the system can be at The OCLC PURL is described at

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo