[Ask an Archivist] Q: “What programming languages are most helpful for a digital archivist to know?”

We had to reach out to those with programming experience for this question! Feel free to add to the conversation in the comments.

Ask an Archivist Answers:

I think it depends on how much computer programming (software modification/development) their situation calls for.

I feel like every archivist dealing with born-digital materials should have at least a basic grasp of how to move around and perform basic tasks (list directory/file info, move/copy files, generate checksums) from the Linux or Windows Command Line (CLI). I prefer Linux CLI (Mac or Ubuntu) because it’s been more extensively modified and has more command line tools available, but it depends on what OS is available. Beyond that it’s very, very helpful to learn some XSL/XPath. The syntax can be difficult but the ability to set up a “transformation script” that can batch convert encoded information between Excel/CSV, TXT, METS, XML etc. can save a ton of manual data entry/manipulation time.

If more serious programming is required, Ruby and PHP are both good “gateway” programming language in that they’re quite powerful but easier to learn, conceptually, than older languages like Java or C++. BasicPHP knowledge, along with knowing how to set up, populate and query a MySQL database via PHP would give a digital archivist significant programming power with (IMHO) relatively little learning commitment–you could learn the basics of both with a good book for each and a few months.

– Matthew McKinley

The thing I would stress is that they don’t necessarily need advanced knowledge of programming languages in order to do good work. I have had lots of exposure to code and other areas of IT through my former career in corporate IT, but I never spent enough time with any technology to become fluent in it. I consider myself conversant enough to be able to work in both worlds — IT and archives — and I think just having that is a good start. My programming skills are probably not even in the same ballpark as Matthew. I think anyone who gains expertise in whatever Matthew recommend will be a rock star in our field, but there’s still a need for people who can straddle that fence between technology and archives. 

Lacking expertise, just let them know they should be willing to experiment and dabble in technology. I am not particularly skilled in working from the command line, but I spent a little time recently figuring out how to run HTTrack from the command line on my Mac, and it was extremely satisfying! I had never touched XSLT until last year, but spent a couple days hacking away it so we could start publishing our EAD finding aids to the web. Archives need people who are not afraid of the technology as much as they need actual technologists.

– Ben Goldman

This entry was posted in Ask an Archivist and tagged , , on by .

About Lisa H

Lisa is the archivist/librarian at the Swenson Swedish Immigration Research Center at Augustana College. She stumbled into archival work by way of the museum world and is always seeking ways to break down the silos of these professions. Lisa has worked in museums, libraries, and archives in Illinois, New York, and Alaska. While not at work, Lisa spends her free time biking, working on art projects, and putting her useless knowledge to good use on bar trivia teams. You can find her on Twitter @lisahuntsha.

2 thoughts on “[Ask an Archivist] Q: “What programming languages are most helpful for a digital archivist to know?”

  1. Seth Shaw (@seth_e_shaw)

    I would echo most of what Ben & Matthew said. To start get comfortable with the CLI. From there it depends on where you want to go.

    On the presentation/web-side I would start (almost obviously) with HTML + Javascript. From there I would move on to PHP + MySQL mainly because there is a lot of stuff built on it already (Drupal & WordPress come to mind).

    If you are more interested in data-munging & workflow automation I would start off with a combination of Python + SQLite (which easily translates to other relational DBs like MySQL) + REGEX. (Don’t let REGEX scare you. Just approach it with an open hand and a smile. It won’t bite. All it wants to do is look for patterns.) If most of your data is XML then I would start with the XML/XPath first.
    ** I was tempted to say BASH instead of Python, but that is only if you took my advice and became comfortable with the command-line first; oh, and are using Linux or OS X.

    Finally, if you are interested in working with most of the *big* archives tools out there (Jhove, FITS, Droid, Fedora, DSpace, etc…) you will need to buckle down and learn Java. That is what most of them are written in.

    – Seth

  2. sarahdorpinghaus

    There’s also a lot of value in realizing when programming can be a useful solution to a labor-intensive (and repetitive) task, even if you are not fluent in any languages. Several times I’ve collaborated with programmers to automate aspects of our work that I originally did not think could be made more efficient.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s