SAA 2015: Collection Management Tools Roundtable

In advance of the 2015 Annual Meeting, we invited SNAP members to contribute summaries of panels, roundtable and section meetings, forums, and pop-up sessions. Summaries represent the opinions of their individual authors; they are not necessarily endorsed by SNAP, members of the SNAP Steering Committee, or SAA.

Guest Author: Michael Barera, Archivist at Texas A&M University-Commerce

After briefly discussing the agenda for the roundtable’s meeting, which included time for announcements, a discussion of the new documentation portal, and then lightning presentations (as well as a note that the session would be recorded for YouTube), the session began with a discussion of documentation. According to a survey taken earlier this year, there is widespread interest in such documentation, and as a result of this there is now a CMT Documentation Library on the SAA website (in the form of a microsite). The participants then outlines their plans for the future, which included more (user-generated) guides and manuals, “official” documentation (especially from Archivists’ Toolkit and Archon), moving from Google Drive to Drupal, and adding information about additional resources. Also noted was that the idea for the portal came out of discussion at last year’s roundtable at the 2014 SAA Annual Meeting.

Council liaison Rachel Betts then shared some Council discussions with the roundtable; included among her remarks was the proposed change to roundtable/section alignment (which would merge the two into a single class of “affinity groups” that would require 4% of SAA membership to continue to operate, as well as the requirement that each group member must be an SAA member; however, a new class of “virtual groups” were proposed, which could both be smaller groups and have non-SAA members), new terms for A&A list participation, changes made by the Standards Committee, the new strategic plan, and a graduated, over-three-years proposal to increase dues (which is pending on the result of an online referendum this fall).

The first lightning presentation was “DAOism for the Masses: Pragmatic DAO Creation for Mass Digitization”, by Andra Darlington of the Getty Research Institute (GRI). Darlington began by defining what a DAO is (a digital archival object) and then gave examples of EAD-encoded DAOs in various different environments, from raw code and HTML websites to Archivists’ Toolkit (AT) and ArchivesSpace (ASpace). She then outlined a mass digitization project at the GRI, which digitized archival collections such as the Julius Shulman photos (with over 50,000 images) using ExLibris Rosetta and Primo (although she noted that Primo searches are pretty unwieldy). She then explained that the GRI’s current collection management tool is AT, while they are presently testing ASpace, although she conceded that “neither is ideal”; for DAO creation, they either have to do it manually or by ingesting CSV files that they then have to link manually. She then explained the GRI’s solution: the program DAOism, which is used for adding DAOs to an EAD file on the GRI website. It requires a a unique identifier in the URI/handle for the digital object and corresponding EAD components, and has the caveats that EAD round-tripping is problematic in AT and ASpace and that, because of this, reported round-tripping may result in duplicate DAOs in AT and ASpace. She closed by thanking her colleague Josh Gomez as well as the audience, and noted that the DAOism code is on

The next lightning talk was given by Mark Custer, of the Beinecke Rare Book and Manuscript Library, on the subject of converting between EAD and Excel files. His “talk” consisted entirely of two narrated screencasts. In the first, he started with an Excel spreadsheet, then created a hierarchy (C1-C12), noted that the extant red highlights will be converted into “titles” in EAD, pointed out the date-normalized fields for begin/end years, and then saved the file with the “save as” function as an XML spreadsheet. He then demonstrated that the file’s look and feel is still the same after saving it as an EAD. Finally, he opens the code in Oxygen, where it is clearly EAD. Custer’s second screencast began with Oxygen, and then went the other way, from EAD to Excel. He notes that no columns are hidden by default in this screencast, unlike the first one. After this, he then demoed the search capability, before finishing by opening the file again in Oxygen, to show the conversion.

The next lightning talk was “Steady: Spreadsheets to EAD”, by Linda Sellers of the Special Collections Research Center at North Carolina State University Libraries. Sellers began by introducing Steady, which is a Ruby on Rails application that converts CSV files to EAD. She also noted that the Steady schema are the allowable values for the CSV header, at least at the present. She then presented a series of screenshots, including a container list for Steady and its subdomain on the Herokuapp website, and then demonstrated how the application integrates with Aspace’s import function. She concluded by noting that Steady’s user interface is available on Heroku and its its code is on GitHub.

Following this, University of Albany university archivist Gregory Wiederman presented his talk, “Managing Descriptive Metadata with Open XML”. He began by noting the various reasons why the University of Albany has not (yet) migrated to Archives Space; it has legacy unstructured HTML finding aids, is finishing a large EAD conversion project, faces a challenging migration of its local accession database, migration to ASpace is costly (and he feels that there is a disproportionate membership fee), and there is relatively little documentation available for ASpace. He then outlines the opportunity at the moment, as he sees it, which is to develop the basic network infrastructure first, and then implement more complex tools later. First among these is a tool that enables “consistent creation”: EADMachine, a Python tool that converts between Excel and EAD and also creates flat HTML. With it, they have had both successes and difficulties; namely, it is their first large-scale project, and it has resulted in “lots of bad code”. The next tool that Wiederman discussed was one that enables “strict control”: EADValidator. It is also a Python rule-based tool, but it is simpler than EADMachine and it mandates many DACS rules and also checks for errors. Wiederman concluded his presentation by outlining other steps to the University of Albany’s metadata workflow, which includes unique identification, automated uploading, and metadata infrastructure


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s