Session 708: Archivist! Data Librarian! Asset Manager! Do the Differences Really Matter?
Guest author Ariadne Rehbein
“We designed ‘Archivist! Data Librarian! Asset Manager! Do the Difference really matter?’ to break down barriers and explore the diversity of roles the digital data world,” explained Dana Lamparello, the chair of the session. After Dana encouraged us to be active participants, the four discussion leaders described their professional responsibilities. As they did this, each held up a paper sign to reveal one of four “archetypes”, including Embedded Data Manager, Archivist, Data Librarian, and Data User.
Karen Baker, as Information Manager Integrative Oceanography Division of the Scripps Institution of Oceanography at UC San Diego served as our “Embedded Data Manager.” Karen works closely with researchers who generate data. She has developed procedures and best practices for assembling this data, and is working on ways to share data across disciplines and with the public.
Wendy Hagenmaier, as Digital Collections Archivist at the Georgia Institute of Technology served as our “Archivist.” She strives to partner with data creators and encourages reuse of data, but has encountered barriers to these goals.
Janina Mueller, as Design Data Librarian at the Harvard University Graduate School of Design served as our “Data Librarian.” She provides GIS and data services to students and faculty and lead the creation of a student work collection. She is currently experimenting with geospatial data creation tools for students.
Steward Varner, as Digital Scholarship Librarian at the University of North Carolina, Chapel Hill served as our “Data User.” He facilitates the performance of text analysis and data mining by faculty members.
With this framework established, we embarked upon a series of well-coordinated discussions. A pair of archetypes led two concurrent groups, focusing the discussion on data creation, data access, and data use. The archetypes also rotated with each topic so that no pair was repeated for a group. While we were allowed to “follow” an archetype’s rotation, most participants stayed seated with one group for each 12 minute session. I also followed this strategy. If you are interested in learning about the breakout discussions of the other group, we will also provide a link to the session organizers’ notes when they are available.
Data Creation – Karen Baker & Stewart Varner
o What is your current role in the data creation stage?
o What is your ideal role in the data creation stage?
To start off the discussion, Erika Castano explained the approach she has taken at University of Arizona. With the help of the campus digital repository manager, she developed a service model for faculty members. Her pitch is that they will do better research if they manage their data as they go. Focusing efforts on a scholarly group that may have problematic records is another strategy. Members of this group should be interviewed to discover their “common vocabulary” or any existing disciplinary metadata standards. This information should be integrated into the data deposit form to further simplify the process for faculty members. At the University of Arizona, subject librarians play a key role, and a metadata librarian will soon be involved. The process may take months and will depend upon researchers’ availability over the course of the academic calendar, among other factors. The participants also wondered about the effects of data management upon scholarship at large. It may be argued that it “interferes” with researchers’ original file structures and processes, and therefore alters future analysis of their research.
Data Access – Karen Baker & Janina Mueller
o Describe the current state of access at your respective institutions.
o Ideally, what should access to data entail?
Data managers help make data accessible to researchers and to the public. As Karen explained, access to data from her institution will eventually be provided through a web of repositories and data hubs. This is an agency mandate she is working to fulfill, though the infrastructure is not yet in place. Data hubs may also allow researchers to access their work from previous institutions.
Student records, stored in systems such as PeopleSoft, illustrate some of the challenges of data access. Once removed from their interface, these records are raw data. What will happen if the academic institution stops using the current system? One participant is trying to find a solution. He explained his current struggle to find a way to extract, maintain, and continue the relational nature of these records.
Data Reuse – Janina Mueller & Stewart Varner
o Describe your current strategy to promote reuse of data.
o Describe your ideal strategy to promote reuse of data.
Data reuse comes with its own set of ethical and logistical questions. Janina has found both interest in and resistance to the reuse of data at the Harvard University Graduate School of Design. Currently, the student work collection is only available to administrators to maintain privacy. It is used to create promotional materials. Some professors would like to allow students to build upon the work of their predecessors, particularly when data can be updated over time. Professors at her institution and others are concerned about the potential for plagiarism.
In traditional humanities scholarship, it can be difficult to distinguish between primary and secondary sources. This is compounded for digital humanities because of the cyclical nature of research data: researchers may create data by reusing previous data sets. Stewart currently serves as an advocate for digital humanities work and explained some basic challenges for data in this field. He recommended working with an IT department to find ways to store a wide range of files, including OCR output, databases, Twitter scrapes, and websites that are part of the scholarly record. Providing access to text files can also be a challenge. Currently, the text files he provides to researchers are only available to download one at a time, which slows down the process of large scale text analysis. He has considered putting them on GitHub and is seeking ways to include metadata.
This session provided me with welcome insight into the current solutions, challenges, and goals of data management in a variety of contexts, including academic libraries, research institutes, and administrative offices. I was struck by willingness of the organizers and participants to share their experiences and their determination to seek solutions.