Transcription

OCLC RESEARCH REPORTTransitioning to theNext Generationof MetadataKaren Smith-Yoshimura

Transitioning to the NextGeneration of MetadataKaren Smith-YoshimuraSenior Program Officer

2020 OCLC.This work is licensed under a Creative Commons Attribution 4.0 International /September 2020OCLC ResearchDublin, Ohio 43017 USAwww.oclc.orgISBN: 978-1-55653-167-5DOI: 110.25333/rqgd-b343OCLC Control Number: 1197990500ORCID iDsKaren 62Please direct correspondence to:OCLC [email protected] citation:Smith-Yoshimura, Karen. 2020. Transitioning to the Next Generation of Metadata. Dublin, OH: OCLCResearch. https://doi.org/10.25333/rqgd-b343.

CONTENTSExecutive Summary. viIntroduction.1The Transition to Linked Data and Identifiers.4Expanding the use of persistent identifiers . 4Moving from “authority control” to “identity management”. 9Addressing the need for multiple vocabularies and equity,diversity, and inclusion.11Linked data challenges. 14Describing “Inside-Out” and “Facilitated” Collections. 15Archival collections. 16Archived websites. 17Audio and video collections. 18Image collections. 19Research data.22Evolution of “Metadata as a Service”.24Metrics. 24Consultancy. 25New applications. 25Bibliometrics. 26Semantic indexing.27Preparing for Future Staffing Requirements.28The culture shift. 28Learning opportunities. 29New tools and skills. 29Self-education. 30Addressing staff turnover. 31

Impact.32Acknowledgments.33Appendix.34Notes.35

FIGURESFIGURE 1“Changing Resource Description Workflows” by OCLC Research.4FIGURE 2Some 300 abbreviated author names for a five-page articlein Physical Review Letters.6FIGURE 3Examples of some DOI and ARK identifiers.8FIGURE 4One Wikidata identifier links to other identifiers and labelsin different languages.9FIGURE 5Excerpt from the survey results from the 2017 EDI survey ofthe Research Library Partnership. 13FIGURE 6Responses to 2019 survey on challenges related to managingA/V collections. 19FIGURE 7The OCLC ResearchWorks IIIF Explorer retrieves images about“Paris Maps” across CONTENTdm collections.21FIGURE 8Distribution of 465 Indigenous language codes in the AustralianNational Bibliographic Database from the Austlang NationalCodeathon .26FIGURE 9UK Hatchette’s “River of Authors” generated from the BritishLibrary’s catalog metadata . 27

EXECUTIVE SUMMARYThe OCLC Research Library Partners Metadata Managers FocusGroup, first established in 1993, is one of the longest-standinggroups within the OCLC Research Library Partnership (RLP), atransnational network of research libraries. The Focus Groupprovides a forum for administrators responsible for creating andmanaging metadata in their institutions to share informationabout topics of common concern and to identify metadatamanagement issues. The issues raised by the Focus Group arepursued by OCLC Research in support of the RLP and informOCLC products and services.This report, Transitioning to the Next Generation of Metadata, synthesizes six years (2015-2020)of OCLC Research Library Partners Metadata Managers Focus Group discussions and what theymay foretell for the “next generation of metadata.” The firm belief that metadata underlies alldiscovery regardless of format, now and in the future, permeates all Focus Group discussions.Yet metadata is changing. Format-specific metadata management based on curated text stringsin bibliographic records understood only by library systems is nearing obsolescence, bothconceptually and technically. Innovations in librarianship are exerting pressure on metadatamanagement practices to evolve as librarians are required to provide metadata for far moreresources of various types and to collaborate on institutional or multi-institutional projects withfewer staff. This report traces how metadata is evolving and considers the impact this transitionmay have on library services, posing such questions as: Why is metadata changing? How is the creation process changing? How is the metadata itself changing? What impact will these changes have on future staffing requirements, and how canlibraries prepare?The future of linked data is tied to the future of metadata: the metadata that libraries, archives,and other cultural heritage institutions have created and will create will provide the contextfor future linked data innovations as “statements” associated with those links. The impact willbe global, affecting how librarians and archivists will describe the inside-out and facilitatedcollections, inspiring new offerings of “metadata as a service,” and influencing future staffingrequirements.Transitioning to the next generation of metadata is an evolving process, intertwined withchanging standards, infrastructures, and tools. Together, Focus Group members came to acommon understanding of the challenges, shared possible approaches to address them, andinoculated these ideas into other communities that they interact with.vi

INTRODUCTIONThe OCLC Research Library Partners Metadata Managers FocusGroup (hereafter referenced as the Focus Group),1 first establishedin 1993, is one of the longest-standing groups within the OCLCResearch Library Partnership (RLP),2 a transnational networkof research libraries. The Focus Group provides a forum foradministrators responsible for creating and managing metadatain their institutions to share information about topics of commonconcern and to identify metadata management issues. The issuesraised by the Focus Group are pursued by OCLC Research in supportof the RLP and inform OCLC products and services.The firm belief that metadata underlies all discovery regardless of format, now and in thefuture, permeates all Focus Group discussions. Metadata provides the research infrastructurenecessary for all libraries’ “value delivery systems,” fulfilling their community’s requests forinformation and resources. Metadata is crucial for transitioning to next generations of libraryand discovery systems. Good metadata created today can easily be reused in a linked dataenvironment in the future.3 As noted in the British Library’s Foundations for the Future: “Ourvision is that by 2023 the Library’s collection metadata assets will be unified on a single,sustainable, standards-based infrastructure offering improved options for access, collaborationand open reuse.”4Format-specific metadata managementbased on curated text strings inbibliographic records understood only bylibrary systems is nearing obsolescence,both conceptually and technically.Format-specific metadata management based on curated text strings in bibliographic recordsunderstood only by library systems is nearing obsolescence, both conceptually and technically.Innovations in librarianship are exerting pressure on metadata management practices to evolveas librarians are required to provide metadata for far more resources of various types and tocollaborate on institutional or multi-institutional projects with fewer staff. “Traditional methodsof metadata generation, management and dissemination,” suggests the British Library’sCollection Management Strategy, “are not scalable or appropriate to an era of rapid digitalchange, rising audience expectations and diminishing resources.”5 Focus Group members areeager to unleash the power of metadata in legacy records for different interactions and uses byboth machines and end-users in the future. Consistent metadata created according to past rulesor standards need to be transformed into new structures.1

Why is metadata changing?Traditional library metadata was and is made by librarians conforming to rules that aremainly used and understood by librarians. It is record-centered, expensive to produce, andhas historic size limitations. Metadata is limited in its coverage, notably not including articleswithin scholarly journals or other scholarly outputs. The infrastructure has been inadequatefor managing corrections and enhancements, inducing an emphasis on perfection that hasexacerbated the slowness of metadata creation. In short, the metadata could be better,there is not enough of it, and the metadata that does exist is not used widely outside thelibrary domain.How is the creation process changing?Metadata is no longer created by library staff alone. Today, publishers, authors, and otherinterested parties are equally involved in metadata creation. Metadata creation has also beenpushed forward in the scholarly life cycle, with publishers creating metadata records earlierthan in the traditional cataloging process. Metadata can now be enhanced or corrected bymachines or by crowdsourcing.How is the metadata itself changing?Machine-readable cataloging (MARC) was created to replicate the metadata traditionallyfound on library catalog cards. We are transitioning from MARC records to assemblages ofwell-coded and shareable, linkable components, with an emphasis on references, and weare eliminating anachronistic abbreviations not understood by machines. Instead of relyingonly on library vocabularies such as subject headings and coded lists, the developingassemblages can accommodate vocabularies created for specific domains, expanding themetadata’s potential audiences.In short, the metadata could bebetter, there is not enough of it, and themetadata that does exist is not usedwidely outside the library domain.The Focus Group’s composition has fluctuated over time, and currently comprisesrepresentatives from 63 RLP Partners in 11 countries spanning four continents.6 The groupincludes both past and incoming chairs of the Program for Cooperative Cataloging (PCC),7providing cross-fertilization between the two. Topics for group discussions can be proposedby any Focus Group member and are selected by an eight-member Planning Group (seeappendix), who then write “context statements” explaining why the topic is considered timelyand important and then develop question sets that delve into the topic. Context statementsand question sets are then distributed to all Focus Group members who are given three to fiveweeks to submit their responses. Compilations of the Focus Group’s responses inform face-toface discussions held in conjunction with the American Library Association conferences8 and insubsequent virtual meetings.As the Focus Group facilitator, I have summarized and synthesized these discussions in a seriesof OCLC Research Hanging Together Blog publications.9 Nearly 40 blog posts on a wide rangeof metadata-related topics have been published on this forum over the past six years.2 Transitioning to the Next Generation of Metadata

The Metadata Managers Focus Group is just one activity within the broader OCLC ResearchLibrary Partnership, which is devoted to extensive professional development opportunities forlibrary staff. Focus Group members value their affiliation with the Research Library Partnershipas a channel to becoming the “change agents” of future metadata management.10 FocusGroup members’ responses to question sets have facilitated intra-institutional discussions andhelped metadata managers understand how their institutions’ situation compares with peerswithin the Partnership.These Focus Group discussions identified a broad range of metadata-related issues,documented in this report. Transitioning to the next generation of metadata is an evolvingprocess, intertwined with changing standards, infrastructures, and tools. Together, Focus Groupmembers came to a common understanding of the challenges, shared possible approaches toaddress them, and inoculated these ideas into other communities that they interact with.Collectively, Focus Group members command a wide range of experiences with linked data.The Focus Group’s keen interest in linked data implementations sparked the series of OCLCResearch’s International Linked Data Surveys for Implementers.11 A subset of Focus Groupmembers have participated in various linked data projects, including the OCLC Research ProjectPassage and CONTENTdm Linked Data pilot, OCLC’s Shared Entity Management Infrastructure,Library of Congress’ Bibliographic Framework Initiative (BIBFRAME), the Mellon-funded LinkedData for Production (LD4P) project, the Share-VDE initiative, and the IMLS planning grantShareable Local Name Authorities, which exposed issues raised by identifier hubs in the linkeddata environment.12 In addition, Focus Group members contribute to the PCC task groupsaddressing aspects of linked data work, including the PCC Task Group on Linked Data BestPractices, Task Group on Identity Management, Task Group on URIs in MARC, and the PCCLinked Data Advisory Committee.13 This cross-fertilization has prompted the Focus Group toexamine issues around the entities represented in institutional resources.This report synthesizes six years (2015-2020) of OCLC Research Library Partners MetadataManagers Focus Group discussions and what they may foretell for the “next generation ofmetadata.” The document is organized in the following sections, each representing an emergingtrend identified in the Focus Group’s discussions: The transition to linked data and identifiers: expanding the use of persistent identifiers aspart of the shift from “authority control” to “identity management” Describing the “inside-out” and “facilitated” collections: challenges in creating andmanaging metadata for unique resources created or curated by institutions in variousformats and shared with consortia Evolution of “metadata as a service”: increased involvement with metadata creationbeyond the traditional library catalog Preparing for future staffing requirements: the changing landscape calls for new skill setsneeded by both new professionals entering the field and seasoned catalogersThe document concludes with some observations on the forecasted impact of the nextgeneration of metadata on the wider library community.Transitioning to the Next Generation of Metadata 3

The Transition to Linked Data and IdentifiersLinked data offers the ability to take advantage of structured data with an emphasis oncontext. It relies on language-neutral identifiers pointing to objects, with a focus on “things”replacing the “strings” inherent in current authority and catalog records. These identifiers canthen be connected to related data, vocabularies, and terms in other languages, disciplines,and domains, including nonlibrary domains. Linked data applications can consume others’contributions and thus free metadata specialists from having to re-describe things alreadydescribed elsewhere, allowing them instead to focus on providing access to their institutions’unique and distinctive collections. This promises a richer user experience and increaseddiscoverability with more contextual relationships than is possible with our current systems.Furthermore, linked data offers an opportunity to go beyond the library domain by drawing oninformation about entities from diverse sources.14FIGURE 1. “Changing Resource Description Workflows” by OCLC Research 15The hope is that linked data will allow libraries to offer new, value-added services that currentmodels cannot support, that outside parties will be able to make better use of library resourcedescriptions, and that the data will be richer because more parties share in its creation. Movingto a linked data environment portends changes to resource description workflows, as shown infigure 1.The drive to move metadata operations to linked data depends on the availability of tools,access to linked data sources for reuse, documented best practices on identifiers andthe metadata descriptions associated with them (“statements”), and a critical mass ofimplementations on a network level.EXPANDING THE USE OF PERSISTENT IDENTIFIERSThe Focus Group discussed the “future-proofing” of cataloging, which refers to theopportunities to unleash the power of metadata in legacy records for different interactions anduses in the future. Persistent identifiers were viewed as crucial to transitioning from currentmetadata to future applications.16 Identifiers, in the form of language-neutral alphanumericstrings, serve as a shorthand for assembling the elements required to uniquely describe anobject or resource. They can be resolved over networks with specific protocols for finding,identifying, and using that object or resource. In the nonlibrary domain, Social Security andemployee numbers are examples of such identifiers. In the library and academic domains,4 Transitioning to the Next Generation of Metadata

Focus Group members pointed to ORCID (Open Researcher and Contributor ID)17 as a “glue”that holds together the four arms of scholarly work: publishing, repository, library catalog, andresearchers—but ORCID is limited to only living researchers. ORCID is increasingly used in STEM(science, technology, engineering, mathematics) journals for all authors and contributors18and included in institutions’ Research Information Management systems. ISNI (InternationalStandard Name Identifier)19 uniquely identifies persons and organizations involved in creativeactivities used by libraries, publisher