Session 502: Untangling the Web: Diverse Experiences with Access from the Web Archiving Trenches (lightning session)
Guest author Karl Blumenthal
The diversity of SAA 2014’s myriad sessions, meet-ups, and posters on the topic represented the expanding landscape of web archiving in academic, government, non-profit, and professional repositories. As more archivists and institutions accept the challenge of capturing the culture of the web, infinitely more complexities emerge. The rapid maturation of the task, however, has also enabled archivist to work towards common objectives rooted in traditional archival goals, values, and best practices.
Session 502, “Untangling the Web: Diverse Experiences with Access from the Web Archiving Trenches,” specifically surveyed what archivists at 10 leading institutions are doing to improve and to integrate access to their collections. This engaged each of them in equally rich technical and methodological investigations, and while (often tentative-) conclusions were as unique as the institutions them selves, the themes of multi-disciplinary and multi-institutional collaboration, social activism, and the importance of researcher use, abounded.
Nicholas Taylor, Web Archiving Service Manager for Stanford University Libraries, presented the collaborative project that he and archivists and academics at Stanford, Oberlin College, and the University of California at Berkeley, have engaged to capture political campaign websites from the 2014 midterm election cycle. Taylor introduced a problem repeated often at this session and in others at SAA: the “chicken and egg” problem of building web archives to suit researcher needs without a rich evidence base of use cases. Results of this highly researcher-driven project will emerge after the end of the election cycle, but in the meantime Taylor encourages all web archivists to share their use cases, to continue facilitating the work of researchers, and to use collaborations as opportunities for leverage on large issues like this one.
Deborah Kempe, Chief of Collections Management & Access at the Frick Art Reference Library, presented the current state of efforts to integrate access to web archives curated among the three partners in the New York Art Resources Consortium (NYARC): the libraries of the Frick Collection, the Brooklyn Museum, and the Museum of Modern Art (MoMA). She described the partnership’s progress to further integrate access to these collections with access to others through the partnership’s shared discovery tool, aspiring to a prototype that could be benchmarked across the field. She shared images of proofs of concept for this prototype, but reminded attendees that this is a small step in a process that has already taken NYARC 4 years, and which taught her to repeat the motto: “expect the unexpected.”
Beatrice Colastin Skokan, Special Collections Librarian at University of Miami Libraries, similarly presented her institution’s web archiving charge as an explicit effort to supplement and enhance existing print collections. These collections, which focus on grassroots organizations that increasingly use the web for their primary advocacy work, are defined by their vulnerability and ephemerality. Web archiving, then, has proven an invaluable way to collect and even replace analogous print-based time-sensitive materials, and moreover to fill gaps among them.
Skokan’s colleague and the Archivist of the Cuban Heritage Collection at University of Miami Libraries, Natalie Baur, expanded on these themes with a case study of archiving web materials from organizations based on the island of Cuba. Pilot efforts in this pursuit have brought new clarity to the problems posed by censorship and access restrictions, and especially to the ephemerality of materials suddenly and permanently removed from the public view. These problems have enabled subject librarians, digital preservation specialists, and cataloging/metadata staff to collaborate as a Web Archiving and E-Records Working Group, which looks forward to next tackling the issue of OPAC integration.
Dina Mein, Library & Archives Manager at The Henry Ford, extended the theme of collaboration to the web archiving work shared between her 501(c)3 non-profit organization and others under the Ford Center umbrella. She explained how shared engagement with recent live and online museum exhibits like the 1924-25 Ford Motor Company Advertising Campaign sparked interest in web archiving among diverse offices and departments. She looks forward in particular to opportunities to expand this collaboration beyond the boundaries of the Ford family of organizations, and highlighted recent work with San Diego State University to archive The Henry Ford’s own site as a potential model for this.
Abbie Grottke, Lead Information Technology Specialist at the Library of Congress, added the layer of user experience to the conversation. Specifically, she described LC’s attempts to define and apply metrics for user engagement with its web archives. “Use is not high,” she explained, but LC is not discouraged. Taking the long view of collecting for posterity and for future needs, Grottke cited examples like Iraq War and September 11 web archiving efforts as especially rich future destinations for research. In the meantime, LC has leafed that access is indeed the biggest inhibitor to use, and so that expansion of the collections beyond LC’s walls and especially into other existing discovery layers is essential.
Lori Donovan, Partner Specialist at Internet Archive, described how IA and its web archiving software service Archive-It are evolving to meet emergent access needs. She focused especially on their introduction of WAT as advancement beyond the traditional WARC web archiving file format. WAT, she explained, will provide end-users more access points than the full-text and user-generated metadata routes currently supported by WARC. Demonstrating the value of web archiving to internal stakeholders with examples of meeting researcher needs is critical to gaining leverage for your project, Donovan emphasized. The WAT format, for example, was designed to meet needs expressed by a diverse range of researchers who want to use the web as a source for expansive data analysis and visualization.
Erik Moore, University Archivist at the University of Minnesota, demonstrated the utility of web archiving to fulfill a university archives’s mission to document its institution’s history. In this instance, un- or under-managed born-digital resources are always in danger of being lost, and most especially when the institution migrates entire web platforms, as Moore’s is preparing to do. Moore’s current goal is to capture university web content for inclusion in its institutional repository, which he expects to leverage further opportunities to manage IR content and optimize it for search engine indexing.
Like Moore, Angelina Altobellis, Digital Archivist at Rollins College, is archiving her college’s full web presence to provide a rich reference source for its many departments, online publications, course catalogs, and more. This strategy, she explained, was the direct result of the high volume of research inquiries that could be answered by referencing superseded course catalogs that were “hidden” by their removal from the college’s live website on an annual basis, then typically only available as unwieldy printed volumes. The current collection is suboptimal because, as others have pointed out, its reliance on Internet Archive’s Wayback Machine means that only known URL and full text searching access points exist. Rollins is, however, working with Archive-It to extract other data from their WARCs to test and develop other access points that could integrate with the college library’s existing OPAC.
Anne Petrimoulx, Archivist for Trinity Wall Street, also seeks to satisfy internal institutional needs, but in a smaller and much less formalized archival setting. While her institution is in fact an historic church, its archives function much more like that of a corporation, providing legal and business reference for Trinity’s real-estate and broadcasting enterprises. Petrimoulx uses Archive-It to archive Trinity’s Webby Award-winning site for the photos, blogs, videos, and other news sources documenting the life of its parish much as their legacy yearbook series did, but looks forward to a time when adequate resources can support the design of a custom interface with more user-driven access points.
Lisa Snider, moderator of this session and Electronic Records Archivist at the University of Texas at Austin’s Harry Ransom Center, underlined many of the above themes in her case study of a prominent writer’s collection currently undergoing accession at her institution. She described how anticipating potential researcher interests in the writer’s social media presence in particular should and will shape the design of its ultimate access layer. The most important thing, she concluded, is to keep options open. The volume and diversity of file formats and framing web environments necessitate than an archivist design access to such a collection to compliment, if not necessarily emulate, their original born-digital nature. Because the ultimate goal is to satisfy the researcher, it is vitally important to—as Kempe introduced—expect the unexpected and “embrace change” when an opportunity arises to change the course of your design thinking.
Clearly, collaboration among the above archivists and others committed to the continual improvement of access to web archives will continue. Each of Session 502’s speakers is passionate about the future of access and eager to both contribute and benefit. So, if the future record of our lives online is important to you, keep an eye on all of these projects and, most importantly, reach out to these leaders.
Karl-Rainer Blumenthal is a 2014 graduate of the MSLIS program at Drexel University and the 2014-15 National Digital Stewardship Resident for the New York Art Resources Consortium (NYARC), a collaborative effort of the Frick Collection, Brooklyn Museum, and Museum of Modern Art libraries. He can be reached at email@example.com or @landlibrarian