The Bigger Picture: Visual Archives and the Smithsonian
To Preserve or Not to Preserve: Social Media
The Smithsonian Institution currently has over five hundred social media, social networking, and other "web 2.0" accounts (many of them are listed on si.edu’s "Connect" page). These accounts include approximately 143 Facebook accounts, one hundred Twitter accounts, seventy-four blogs, sixty-six Flickr accounts, and sixty-one YouTube accounts. These accounts are used for public outreach and to bring attention to the Smithsonian’s objects, exhibitions, research, programs, projects, events, activities, staff, and educational resources . Each of these accounts focuses on a different audience and specializes in a unique topic.
These vehicles for engaging audiences may be new, but the Smithsonian has always performed this sort of outreach using a variety of means including articles in scholarly journals, Smithsonian-published magazines, newsletters, press releases, teacher packets, email lists, and websites. Since all of these have been considered historically valuable materials, it is only logical that the Smithsonian’s social media accounts should also be preserved.
Preservation of any type of digital record is more complicated than paper preservation and requires more resources over time. Keeping this in mind, we closely look at the records to determine if we need to preserve them in their entirety. Our goal is to preserve enough data to satisfy the needs of future researchers while minimizing the amount of duplicate, extraneous, and less historically valuable data. We attempt to find this "happy medium" as part of our appraisal process—the process by which we determine what will become part of the Archives’ collections.
When we appraise social media accounts, we look at each account individually because they are all used differently. Some accounts contain mostly original content or other information that is not quickly and easily available elsewhere. Other accounts consist primarily of links to the Smithsonian’s own websites or to news articles or the websites of other organizations. Many social media accounts fall somewhere in the middle. A major factor in how we appraise a social media account is the amount of significant original content it includes. This is more of an art than a science and we attempt to err on the side of caution.

Social media accounts with significant original content are captured in full or at least back to the last time they were captured. Social media accounts with little original content are also captured and preserved to document their existence and how they were used, but we will generally only capture a sample of the account, such as two or three months of a Facebook timeline.
There are other ways to minimize the amount of data we are preserving from the social media accounts. Some accounts are structured in such a way that the content and metadata can be exported as a spreadsheet or XML document. Twitter is a good example. The size of these documents is often much smaller than the data collected by crawling the account. We will also preserve a screenshot of the account to document its look. For accounts with more complicated structures, we will often look at the entire account and determine if there are pieces that are not necessary to preserve. Oftentimes photographs, videos, or calendar of events uploaded to the account are also available on a Smithsonian website or publication which is also being preserved. In some cases, these duplicate items can be excluded when we capture the account.

Another major concern when appraising social media is privacy. Personal information is everywhere in social media applications and we do our best to minimize the amount of that information that we capture and preserve. We avoid capturing content outside of the scope of the Smithsonian-administered account, meaning that we do not capture the profiles or accounts of the individuals who like, follow, or connect with Smithsonian accounts. That does not mean that we do not capture any personal information. For instance, if you comment on a blog or a Facebook post, the text of your comment as well as your name, profile picture, and any other publicly displayed information will likely be captured. However, if we feel that too much personal information would be disclosed by capturing the account, we do not capture it.
While the popularity of individual social media providers will likely fade over time and become just a blip in web history, they exemplify current and future trends in communication. By capturing and preserving the Smithsonian’s social media presence, we are continuing to document the evolution of the Institution’s methods of sharing information and engaging new audiences.
Related Resources
- The Smithsonian: Using and Archiving Facebook, The Bigger Picture Blog, Smithsonian Institution Archives
- Smithsonian Institution Archives' Appraisal Methodology, PDF
Comments (9) – Leave a comment
Preserving is a good practice. I adore Smithsonian on continuing to document the evolution of the Institution’s methods of sharing information and engaging new audiences specially that large number of social media users are Youths. It's a way to reach more minds and spread valuable information to new generation.
Popularity will fade…just a blip…interesting perspective. The topic of social media preservation is important to preserve and document our time in history. Not just the Smithsonian accounts, but the preservation of social media in a wider arena.
Social media captures communication that during other time periods in history may never have been documented. Some of the social media is conversational, information that would not have been historically preserved. Just think if we had a few tweets from our predecessors as they were hunting a wooly mammoth.
Communication contained in social media is frequently original content that gets duplicated and shared. With the interconnectivity of social media it would seem to be very difficult to capture only the original content.
Social media communication may not be documented anywhere else. There’s not going to be a box of handwritten letters found describing the last flight of the space shuttle.
It’s real. Social media is real time feelings, thoughts and conversations. Not years of editing and formal re-writes. Raw comments, thoughts and feelings. Seems as valuable as any printed newspaper.
I think digital media is really no different than traditional media; it's creative art that appeals to our 5 senses... I like the idea of them saving things that are considered to have cultural and relevant significance.
Informative piece, Jennifer and SIA. Thanks for sharing SIA's approach to the appraisal and capture of SI's social media!
I was amazed by what I learn from your article. I thought I can only preserve paper memories like journals and scrapbook. Yet, it is nice to know that I can also preserve social media profile.
Generally speaking - and simplifying drastically - I think that harvesting social media sites is essentially not particularly hard. If a browser can render a page then it ought to to be possible to automate that process in heritrix or some other headless-browser and save the result to disk. Having an API makes it even easier.
The real issue is how to replay the results to the "end user". The essential problem is that these sites (twitter even more so than facebook) are not "websites" at all. Fundamentally twitter _is_ its API, and there is no uniform way to preserve the experience of its users who are using 100s of different clients to access it. In addition, twitter is highly linked - click on a tweet, a hashtag, a username and some highly complex javascript magic suddenly produces a whole new view of your data. Just try preserving that behaviour :-) It's not to hard to preserve and replay a static view of a single twitter search/listing or facebook page, but to what extent does that actually preserve anything of what's really interesting about these media?
I don't claim to have the right answer but I think any serious attempt to preserve them has to be based on some kind of mixed strategy - perhaps using an API for discovery (using keywords, location tagging etc.), a web-crawler for harvesting, and ideally a tailor-made client for archival-browsing. Now who has the resources to build all that? And to maintain it every time twitter/facebook change their API or when the next big thing (pinterest?) comes along?
After the Facebook debut on the stock market I believe the answer to this one became a no-brainer: Preserve.
Most of us, individuals, never preserve our social media activity because there is no easy way known to us to do this. But preserving is important, especially for Smithsonian, when it contains original media. There is also a cost factor involved, because if you need to preserve images & videos, it will require some gigabytes of disk space. I don't know how you do it, may be another detailed explanation will be a helpful clue for us.
Leave a comment
Produced by the Smithsonian Institution Archives. For copyright questions, please see the Terms of Use.
About
Smithsonian on Flickr Commons
Topics/Tags
- See Here (612)
- American History (544)
- Science (431)
- Archive (332)
- Cities/Places (279)
- Exhibitions (235)
- Web/Tech (211)
- Photo History (189)
- Link Love (154)
- Politics/Government (153)
Blog Roll
Categories
- Collections in Focus (991)
- What Gets Saved (338)
- Behind the Scenes (212)
- Smithsonian History (136)
Monthly Archive
- May 2013 (26)
- April 2013 (26)
- March 2013 (26)
- February 2013 (26)
- January 2013 (28)
- December 2012 (26)
- November 2012 (28)
- October 2012 (32)
- September 2012 (26)
- August 2012 (31)
- July 2012 (26)
- June 2012 (27)
- May 2012 (27)
- April 2012 (27)
- March 2012 (28)
- February 2012 (27)
- January 2012 (26)
- December 2011 (31)
- November 2011 (28)
- October 2011 (35)
- September 2011 (31)
- August 2011 (35)
- July 2011 (41)
- June 2011 (43)
- May 2011 (33)
- April 2011 (40)
- March 2011 (43)
- February 2011 (35)
- January 2011 (36)
- December 2010 (42)
- November 2010 (40)
- October 2010 (44)
- September 2010 (37)
- August 2010 (39)
- July 2010 (38)
- June 2010 (37)
- May 2010 (42)
- April 2010 (44)
- March 2010 (47)
- February 2010 (40)
- January 2010 (39)
- December 2009 (43)
- November 2009 (34)
- October 2009 (11)
- September 2009 (11)
- August 2009 (12)
- July 2009 (14)
- June 2009 (10)
- May 2009 (12)
- April 2009 (14)
- March 2009 (10)
- January 2009 (1)