SAA 2015: Session 707, Recordkeeping in the Cloud and the Advent of Open Data: Mission Critical or Mission Impossible?

In advance of the 2015 Annual Meeting, we invited SNAP members to contribute summaries of panels, roundtable and section meetings, forums, and pop-up sessions. Summaries represent the opinions of their individual authors; they are not necessarily endorsed by SNAP, members of the SNAP Steering Committee, or SAA.

Guest Author: Michael Barera, Archivist at Texas A&M University-Commerce

Luciana Duranti, the chair of the session, began with the provocative claim that “security is the new authenticity.”

The first presentation was “How Open is Open? Record keeping in the era of open data and open government: Citizen engagement initiatives in Canada”, by Jim Suderman of the City of Toronto and the InterPARES Trust. Suderman began with the question, “What is Open Government?”, noting that in 1957 Wallace Parks’ article “The Open Government Principle” was published in the George Washington Law Review, in 1980 the “Open Government” episode of Yes, Minister aired, and in 2011 the Open Government Partnership was created; prior to this, open data was the central focus of open government. He then dug into the implications of open government, first by discussing the geographical focus of his work, which is at the national (Canada), provincial (British Columbia, Alberta, and Ontario), and municipal (Vancouver and Toronto) levels.

He also outlined the International Association for Public Participation spectrum, which is used for identifying types of engagement: its five elements are to inform, consult, involve, collaborate, and empower. At this point, Suderman notes that “the findings I’m presenting here should be considered as illustrative, not authoritative.” First of all, he found implications for governance, which related to purpose, policy, measures, practice, de-centralized/centralized, open data, technology, and archives. In his view, governments have largely been moving to more open government initiatives, most of which have been centralized (or centralizing) efforts; however, the responsibility for open data sets has largely been decentralized. The second implication was provenance; he noted that “citizen engagement is taking place at all levels of Canada…but they are drawn from common pools”. Furthermore, “there is little understanding or concern about differences between governments”, either between agencies or simply different levels of government. So, who is the steward of particular government information? He further notes that the lack of “whole-of-government guidelines” has been an issue. Thirdly, procedures also constitute an implication; as he noted, a Toronto citizen-engagement program drew from other similar already-existing counterparts, including one in Austin.

At the same time, however, Vancouver doesn’t have any plans for open government right now. The fourth implication that Suderman discussed was technology, in which context he noted that citizen engagement is “multi-channel” (face-to-face, social media, conventional submissions, etc.) and that technology creates challenges (such as universal access to all technologies in play, multiple custodians and varied retentions, management of personal information, and the fact that the archival bond may be less evident and thus harder to sustain). In his words, “Trust is mediated by technology, not created by it.” The final implications that he discussed were data and records, which included the issue of completeness of records (or authority, due to inconsistent or incomplete procedures), reliability for memorial function, and custody and control. He then delivered his preliminary conclusions, which focused on a continuum of citizen engagement requiring flexible record keeping; among its tenets are record creation as engagement, trust in process being more important than in records, and the idea that technology mediates trust.

The second presentation was “Retention & Disposition in the Cloud: Mission Critical and/or Mission Impossible?”, by Patricia C. Franks of San Jose State University. Franks began by noting that “sometimes it can seem like both”. She then outlined the three major types of clouds for cloud computing (while also noting the overlap between them): enterprise clouds, business service clouds, and consumer clouds. In her words, “the different clouds give us different levels of control.” She then quoted Frank Lloyd Wright: “’Think simple’ as my old master used to say – meaning reduce the whole of its parts into the simplest terms, getting back to first principles.”

Franks then discussed three key areas: core functional requirements necessary to implement retention and disposition actions, functionality of several cloud models based on user experiences, and best practices for developing a defensible retention and disposition strategy for records residing in the clouds. She next addressed retention and disposition system requirements, noting especially the need to facilitate and implement R&D decisions, anytime in the existence of records, activated automatically, and provide audit trails. From here, she turned her attention to “Defensible Disposition”, which is comprised of allowing definition of retention periods (from one day to indefinite), allowing definition of disposition classes, including a disposition trigger, a retention period, and a disposition action, as well as supporting disposition actions such as “review”, “export”, “transfer”, and “destruction”. She then detailed how her team came up with a checklist for retention and disposition functional requirements: this included privacy and security considerations, establishing disposition authorities, applying disposition authorities, executing disposition authorities, documenting disposal actions, reviewing disposition, and integration.

In Franks’ words, “You want the systems to work together – you want a holistic system.” She then gave an overview of the cloud vendors under review; this group includes Amazon, ArchiveSocial, CenturyLink, Cloud 9 Discovery, Dropbox, Google Apps, and Microsoft; in total, over 20 vendors are currently under review. Next, she turned her attention to cloud storage, RM software and add-ons, “and many more”. Going forward, there are numerous different options: HP TRIM (from HP Records Manager – the “Oregon solution”), Rackspace (infrastructure-as-a-service), Smarsh (“we archive everything”; but for how long?), and Preseervica (follows the OAIS model, as does Archivematica). Next she outlined a strategic approach to best practices: understand the use of cloud services within the organization, become involved in the selection of a cloud provider that will help the organization archive its goals, identify content stored in the cloud that is evidence of an activity or transaction, evaluate the cloud services using a tool, identify the risks inherent in the choice of a cloud, and implement a review process to determine the cloud’s effectiveness.

The third and final presentation was “Preservation as a Service for Trust: InterPARES Trust”, by Adam Jansen an information management consultant who is currently pursuing a PhD at the University of British Columbia. Jansen began by going back to trying to define the role of the archives, including the idea of the physical and moral defense of documentary evidence of activity (as put forward by Jenkinson), and roles that involve not only the object itself, but also naturalness, interrelatedness, impartiality, authenticity, and uniqueness. He then turns his attention to archival transition, from a “semi-passive, response paradigm” to “one of driving preservation, compliance and assurance”; according to him, this requires proactively influencing system design, technology acquisition, and implementation as well as truly believing that preservation begins at creation. This is further complicated by “the cloud”, though.

Increasingly government and public trust organizations are shifting to the cloud, while the key issues of ownership, jurisdiction, and privacy have emerged with Internet-based record repositories. Furthermore, there is a core question of trust, pertaining to the records, the process, and the custodians. But, are cloud trustees actually trustworthy? Jansen quoted from two service agreements that cloud providers require their archival clients to sign. The first states that “The service offerings are provided ‘As Is’. We and our affiliates and Licensors make no representations or warranties of any kind, whether express, implied, statutory or otherwise regarding the services offerings or the third party content will be uninterrupted, error free or free of harmful components, or that any content, including your content or the third party content, will be secure.” The second declared that it is “Error-free 99.9% of the time.” (Jansen wondered how well archives can deal with “what happens the other 0.1% of the time”, especially with large collections.)

He then turned his attention to the authenticity of a digital record, noting that -reservation will require repeated conversion and migration (and that trustworthiness cannot be established on the records themselves), authenticity is an inference that one draws from the data maintained about the creation, handling and maintenance of the record, and that it is done through Metadata (identify/integrity), chain of custody, and method of custody. Jansen proceeded to the idea of “preservation as a service for trust”, which according to him provides “insight and guidance to both those who entrust records to the Internet and those who provide Internet services for the records.” From here, he outlined six preservation services, as he sees them: 1) “Preservation Receive Service” (“complete and intact, and is in compliance with any agreements that are in force”), 2) “Preservation Characterization Service”, 3) “Preservation Authentication Service”, 4) “Preservation Storage Service”, 5) “Preservation Transformation Service”, and 6) “Preservation Access Service”.

Next, Jansen turned his attention to his conceptualization of the “Service Construct”, which consists of functional requirements, pre-conditions, main workflow, alternate flows (because “stuff does happen”), UML (Unified Modeling Language) Class Diagram (which is “language agnostic”, which is good, according to Jansen), and other supporting documentation. The next steps are the road to standardization: InterPARES joined the Object Management Group (including UML, SysML, MDA), and then joined the Government Domain Task Force (which required initial approval to proceed, submit for RFP, review responses, and submit for vote), and finally is on to ISO JTC1 for a final vote. In conclusion, Jansen notes that organizations are increasingly turning to cloud services, that there are unanswered questions of risk and trust (including those of jurisdiction, control, management, and privacy), a focus on the relationship between records creator and records “preserver”, and to identify, capture and/or create metadata that supports authenticity.


Questions from the audience were as follows:

  • More about the issue of trust and jurisdiction. (“It is an issue because a lot of these organizations are multi-jurisdictional.” Tthe servers are often in different countries, and you don’t always know which country. However, there are laws in some countries that restrict storing government records on servers in other countries. In addition, some providers are starting to realize that the location of their servers is “a real concern”. There is another project at InterPARES working on developing contracts that aren’t “absurd”, although it is recognized that it is sometimes possible to stipulate a contract that keeps your data on a server in your country, although doing so tends to be more expensive. Sometimes, hosts don’t dispose of records properly when told to delete them: they sometimes just “cut the link”, and then the content can stay there indefinitely, a phenomenon known as “involuntary permanency”.)
  • For e-mail, what is the difference in scheduling record retention for someone who is still an employee versus a former employee? (The speakers note that risk management needs to be applied, as well as NARA’s capstone policy, which has been “we are not appraising the e-mails, we are appraising the creators”, a throwback to 19th-century German archival practice. Quite simply, “it’s much easier just to take, or reject, the whole thing.”)
  • Is there a fundamental problem with definitions and expectations of trust? How do we define trust that we would be comfortable with? And what are we trying to achieve? Is trustworthiness even the right word? (In the discussion that ensured, “auditing” and “verification” emerge as important terms; “we say trust, but we only know how to trust because we know how to verify”. “It is almost as if we need a list of trusted service providers”, to which Duranti replies, noting that we now have a list of all the requirements that we need. The next step is to test the providers against the requirements. A key now is clarifying terminology, as “the definitional aspect of trust is key”. Also, expecting cloud service providers to become certified by TRAC is simply not realistic. The last comment is from a Preservica employee, who argues that the issue of trust is “absolutely pivotal”.)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s