Tuesday, 16 April 2013

'An (even briefer) history of open cultural data' at GLAM-Wiki 2013

These are some of my notes for my invited plenary talk at GLAM-Wiki 2013 (Galleries, Libraries, Archives, Museums & Wikimedia, #GLAMWiki), held at the British Library on April 12-13, 2013. I don't think I stuck that closely to them on the day, and in the interests of brevity I've left out the 'timeline' bits (but you can read about some of them in a related MuseumID article, 'Where next for open cultural data in museums?') to focus on the lessons to be learnt from changes so far. There were lots of great talks and discussion at the event, you can view some of the presentations on Wikimedia UK's YouTube channel.

A (now very) brief history of open cultural data

Firstly, thank you for the invitation to speak... This morning I want to highlight some key moments of change in the history of open cultural data - a history not only of licenses and data, but also of conversations, standards, and collaborations, of moments where things changed... I've included key moments from funders, legislative influences and the commercial sector too, as they create the context in which change happens and often have an effect on what's considered possible. I'll close by considering some of the lessons learnt.

[Please help improve this talk]

A caveat - there may well be a bias towards the English-speaking world (and to museums, because of my background). If you know of an open GLAM (gallery, library, archive, museum) data source I've missed, you can add it to the open cultural data/GLAM API wiki... or Lotte's Belice's list of open culture milestones  timeline.


'open cultural data' is data from cultural institutions that is made available for use in a machine-readable format under an open licence. But each word in open, cultural, data is slightly more complicated so I'll unpack them a little...


Office clerks, FNV. Voorlichting.
While the degree of openness required to be 'open' data can be contentious, at its simplest, 'open' refers to content that is available for use outside the institution that created it, whether for school homework projects, academic monographs or mobile phone apps. 'Open' may refer to licences that clarify the permissions and restrictions placed on data, or to the use of non-proprietary digital technologies, or ideally, to a combination of both open licences and technologies.

Ideally, open data is freely available for use and redistribution by anyone for any purpose, but in reality there are often restrictions. GLAMs may limit commercial use by licensing content for 'non-commercial use only', but as there is no clear definition of 'non-commercial use' in Creative Commons licences, some developers may choose not to risk using a dataset with an unclear licence. GLAMs may also release data for commercial use but still require attribution, either to help retain the provenance of the content, to help people find their way to related content or just because they'd like some credit for their work. GLAMs might also release data under custom licences that deal with their specific circumstances, but they are then difficult to integrate with content from other openly-licensed datasets.

Hybrid licensing models are a pragmatic solution for the current environment. They at least allow some use and may contribute to greater use of open cultural data while other issues are being worked out. For example, some institutions in the UK are making lower resolutions images available for re-use under an open licence while reserving high resolution versions for commercial sales and licensing. Or they may differentiate between scholarly and commercial use, or use more restrictive licences for commercially valuable images and release everything else openly.

I think this type of access is better than nothing, particularly if organisations can learn from the experience and release more data next time. Because these hybrid models are often experimental, their reception is important, and it's helpful for GLAMs to be able to show they've had a positive impact and hopefully helped create relationships with groups like Wikipedia.


Cultural data is data about objects, publications (such as books, pamphlets, posters or musical scores), archival material, etc, created and distributed by museums, libraries, archives and other organisations.


It's a useful distinction to discuss early with other cultural heritage staff as it's easy to be talking at cross-purposes: data can refer to different types of content, from metadata or tombstone records (the basic titles, names, dates, places, materials, etc of a catalogue record), to entire collection records (including data such as researched and interpretive descriptions of objects, bibliographic data, related themes and narratives) to full digital surrogates of an object, document or book as images or transcribed text. Some organisations release open metadata, others release all their data including their images. If you can't do open data (full content or 'digital surrogates' like photographs or texts) then at least open up the metadata (data about the content) as e.g. CC0 and the rest with another licence. Releasing data may involve licensing images, offering downloads from catalogue sites; 'content donations', APIs and machine-facing interfaces; term lists, etc. Much of the data that isn't images isn't immediately interesting, and may be designed for inter-collections interoperability or mashups rather than media commons.

Why is open cultural data important?

Before I go on, why do we care? Open cultural data is the foundation on which many projects can be built. It helps achieve organisational goals, mission; can help increase engagement with content; can create 'network effect' with related institutions; can be re-used by people who share your goals around access to knowledge and information – people like Wikipedians.

Some key moments in open cultural data

Events I discussed included the founding of Wikimedia, Europeana and Flickr Commons, previous GLAM-Wiki conferences, changes in licences for art images, library catalogue records and museum content, GLAM APIs and linked data services and the launch of the Digital Public Library of America next week.

Lessons learnt

Many of the changes are the results of years of conversation and collaboration – change is slow but it does happen. GLAMs work through slow iterations – try something, and if no-one dies, they'll try something else. We are all ambassadors, and we are all translators, helping each domain understand the other.

Contradictory things GLAMs are told they must do

  • Give content away for the benefit of all
  • Monetise assets; protect against loss of potential income; protect against mis-use of collections; conserve collections in perpetuity; protect the IP of artists; demonstrate ROI on digitisation
It's not easy for GLAMs to release all their data under an entirely open licence, but they don't do it just to be annoying - it's important to understand some of the pressures they're under.  For example, GLAMs usually need to be able to track uses of their data and content to show the impact of digitising and publishing content, so they prefer attribution licences.

The issue of potential lost income - imaginary money that could be made one day if circumstances change, or profit that someone else makes off their opened data - is particularly difficult as hard to deal with [and here I ad-libbed, saying that it was like worrying about failing to meet the love of your life because you got on a different tube carriage - you can't live your life chasing ghosts]. Ideally, open data needs to be understood as an input to the creative economy rather than an item on the balance sheet of an individual GLAM.

GLAMs worry about reputational damage, whether appearing on the front page of a tabloid newspaper for the 'wrong' reasons, questions being asked in Parliament, or critique from Wikipedians.  Over time, their mindset is changing from keeping 'our data' to being holders, custodians of our shared heritage.

Conversations, communities, collaborations

Conversations matter... we're all working towards the same goal, but we have different types of anxieties and different problems we have to address.

GLAMs are about collections, knowledge, and audiences. Unlike most online work, they are used to seeing the excitement people experience walking through their door - help GLAMs understand what Wikipedians can do for different audiences by making those audience real to them. GLAMs are also used to being wined and dined before you lay the hard word on them. Just because you don't need to ask for permission to use content doesn't mean you shouldn't start a conversation with an organisation. There are lots of people with similar goals inside organisations, so try to find them and work with them. Trust is a currency, don't blow it!

Being truly collaborative sometimes means compromising (or picking your battles) and it definitely means practising empathy. Open data people could stop talking about open data as something you *do* to GLAMs, and GLAMs could stop thinking open data people just want to make your life difficult.

The role of higher powers

Government attitudes to open data make a big difference and they can also change the risks associated with publishing orphan works.  Governments can also help GLAMs open up their content by indemnifying them against the chance that someone else will monetise their data – consider it not a failure of the GLAM but a contribution to the creative and digital economy.

Things that are better than a poke in the eye with a sharp stick

  1. Kittens (and puppies)
  2. Cultural data that's available online but isn't (yet) openly licensed
  3. Cultural data online that is licensed for non-commercial use
Yes, the last two aren't ideal, but they are great deal better than nothing.

Into the future...

GLAMs and Wikipedians may move at different paces, and may have different priorities and different ways of viewing the world, but we're all working towards the same goals. Not everything is as open, but a lot more is open than it used to be. I sensed yesterday [the first day of the conference] that there are still some tensions between Wikimedians and GLAMers, moments when we need to take a deep breath and put empathy before a pithy put down, but I loved that Kat Walsh's welcome yesterday described how Wikipedia used to focus on how different from others but now focuses on reaching out to others and figuring out how we're the same.

GLAMs and Wikipedians have already used open cultural data to make the world a better place. Let's celebrate the progress we've made and keep working on that...
GLAM-WIKI 2013 Friday attendees photograph by Mike Peel (www.mikepeel.net).
Congratulations to everyone who helped make it a great event, but particularly to Daria Cybulska and Andrew Gray (@generalising) for making everything work so smoothly, and Liam Wyatt (@wittylama) for the original invitation to speak.

No comments:

Post a Comment