Open Objects... has moved: data visualisation

Showing posts with label data visualisation. Show all posts

Monday, 12 November 2012

Reflections on teaching Neatline

I've called this post 'Reflections on teaching Neatline' but I could also have called it 'when new digital humanists meet new software'. Or perhaps even 'growing pains in the digital humanities?'.

A few months ago, Anouk Lang at the University of Strathclyde asked me to lead a workshop on Neatline, software from the Scholar's Lab that plots 'archives, objects, and concepts in space and time'. It's a really exciting project, designed especially for humanists - the interfaces and processes are designed to express complexity and nuance through handcrafted exhibits that link historical materials, maps and timelines.

The workshop was on Thursday, and looking at the evaluation forms, most people found it useful but a few really struggled and teaching it was also slightly tough going. I've been thinking a lot about the possible reasons for that and I'm sharing them both as a request for others to share their experiences in similar circumstances and also in the hope that they'll help others.

The basic outline of the workshop was an intros round (who I am, who they are and what they want to learn); information on what Neatline is and what it can do; time to explore Neatline and explore what the software can and can't do (e.g. login, follow the steps at neatline.org/plugins/neatline to create an item based on a series of correspondence Anouk had been working on, deciding whether you want to transcribe or describe the letter, tweaking its appearance or linking it to other items); and a short period for reflection and discussion (e.g. 'What kinds of interpretive decisions did you find yourself making? What delighted you? What frustrated you?') to finish. If you're curious, you can follow along with my slides and notes or try out the Neatline sandbox site.

The first half was fine but some people really struggled with the hands-on section. Some of it was to do with the software itself - as a workshop, it was a brilliant usability test of the admin interfaces of the software for audiences outside the original set of users. Neatline was only launched in July this year and isn't even in version 2 yet so it's entirely understandable that it appears to have a few functional or UX bugs. The documentation isn't integrated into the interface yet (and sometimes lacks information that is probably part of the shared tacit knowledge of people working on the project) but they have a very comprehensive page about working with Neatline items. Overall, the process of handcrafting timelines and maps for a Neatline exhibit is still closer to 'first, catch your rabbit' than making a batch of ready-mix cupcakes. Neatline is also designed for a particular view of the world, and as it's built on top of other software (Omeka) with another very particular view of the world (and hello, Dublin Core), there's a strong underlying mental model that informs the processes for creating content that is foreign to many of its potential users, including some at the workshop.

But it was also partly because I set the bar too high for the exercises and didn't provide enough structure for some of the group. If I'd designed it so they created a simple Neatline item by closely following detailed instructions (as I have done for other, more consciously tech-for-beginners workshops), at least everyone would have achieved a nice quick win and have something they could admire on the screen. From there some could have tried customising the appearance of their items in small ways, and the more adventurous could have tried a few of the potential ways to present the sample correspondence they were working with to explore the effects of their digitisation decisions. An even more pragmatic but potentially divisive solution might have been to start with the background and demonstration as I did, but then do the hands-on activity with a smaller group of people who were up for exploring uncharted waters. On a purely practical level, I also should have uploaded the images of the letters used in the exercise to my own host so that they didn't have to faff with Dropbox and Omeka records to get an online version of the image to use in Neatline.

And finally it was also because the group had really mixed ICT skills. Most were fine (bar the occasional bug), but some were not. It's always hard teaching technical subjects when participants have varying levels of skill and aptitude, but when does it go beyond aptitude into your attitude about being pushed out of your comfort zone? I'd warned everyone at the start that it was new software, but if you haven't experienced beta software before I guess you don't have the context for understanding what that actually means.

I should make it clear here that I think the participants' achievements outshine any shortcomings - Neatline is a great tool for people working with messy humanities data who want to go beyond plonking markers on Google Maps, and I think everyone got that, and most people enjoyed the chance to play with Neatline.

But more generally, I also wonder if it has to do with changing demographics in the digital humanities - increasingly, not everyone interested in DH is an early, or even a late adopter, and someone interested in DH for the funding possibilities and cool factor might not naturally enjoy unstructured exploration of new software, or be intrigued by trying out different combinations of content and functionality just 'to see what happens'.

Practically, more information for people thinking of attending would be useful - 'if you know x already, you'll be fine; if you know y already, you'll be bored' would be useful in future. Describing an event as 'if you like trying new software, this is for you' would probably help, but it looks like the digital humanities might also now be attracting people who don't particularly like working things out as they go along - are they to be excluded? If using software like this is the onboarding experience for people new to the digital humanities, they're not getting the best first impression, but how do you balance the need for fast-moving innovative work-in-progress to be a bit hacky and untidy around the edges with the desires of a wider group of digital humanities-curious scholars? Is it ok to say 'here be dragons, enter at your own risk'?

Thursday, 14 August 2008

Freebase meetup, London, August 20

As she explains on the Freebase blog, Kirrily from Freebase will be in London for a little while this month and she'd having an informal meet up with Freebase users and those who might be interested to learn more about it:

We'll be meeting at the Yorkshire Grey Pub in Holborn from 6:30pm, having a few drinks, and talking about open data, building communities around free information, mashups, and more. If you're interested, please stop by. There'll be free wifi available, so bring your laptops if you've got them.

You can RSVP on upcoming.org. I'm going because I think Freebase could be really useful for a personal project but also because it's another way of helping people make the most of their digital heritage.

If you don't know much about Freebase, or haven't seen it lately, this video on Parallax, their new browsing interface should give you a pretty good idea of how useful it can be for cultural heritage and natural history data. It's 8 minutes long, and it's really worth taking the time to watch particularly for the maps and timelines, but if you're pressed for time then skip the first two minutes.

You can also get more background at The Future of the Web or Freebase: Dispelling The Skepticism. There are lots of possibilities for museums, archaeology and other cultural content so come along for a chat and a pint.

[Update: if you're not in London but have some questions about Freebase and digital heritage that you think might be useful for discussion or need some context to explain, drop me a line via the form on miaridge.com and I'll take them along.]

Saturday, 5 July 2008

Introducing modern bluestocking

[Update, May 2012: I've tweaked this entry so it makes a little more sense. These other posts from around the same time help put it in context: Some ideas for location-linked cultural heritage projects, Exposing the layers of history in cityscapes, and a more recent approach '...and they all turn on their computers and say 'yay!'' (aka, 'mapping for humanists'). I'm also including below some content rescued from the ning site, written by Joanna:

What do historian Catharine Macauley, scientist Ada Lovelace, and photographer Julia Margaret Cameron have in common? All excelled in fields where women’s contributions were thought to be irrelevant. And they did so in ways that pushed the boundaries of those disciplines and created space for other women to succeed. And, sadly, much of their intellectual contribution and artistic intervention has been forgotten.

Inspired by the achievements and exploits of the original bluestockings, Modern Bluestockings aims to celebrate and record the accomplishments not just of women like Macauley, Lovelace and Cameron, but also of women today whose actions within their intellectual or professional fields are inspiring other women. We want to build up an interactive online resource that records these women’s stories. We want to create a feminist space where we can share, discuss, commemorate, and learn.

So if there is a woman whose writing has inspired your own, whose art has challenged the way you think about the world, or whose intellectual contribution you feel has gone unacknowledged for too long, do join us at http://modernbluestocking.ning.com/, and make sure that her story is recorded. You'll find lots of suggestions and ideas there for sharing content, and plenty of willing participants ready to join the discussion about your favourite bluestocking.

And more explanation from modernbluestocking on freebase:

Celebrating the lives of intellectual women from history...

Wikipedia lists bluestocking as 'an obsolete and disparaging term for an educated, intellectual woman'. We'd prefer to celebrate intellectual women, often feminist in intent or action, who have pushed the boundaries in their discipline or field in a way that has created space for other women to succeed within those fields.

The original impetus was a discussion at the National Portrait Gallery in London held during the exhibition 'Brilliant Women, 18th Century Bluestockings' (http://www.npg.org.uk/live/wobrilliantwomen1.asp) where it was embarrassingly obvious that people couldn't name young(ish) intellectual women they admired. We need to find and celebrate the modern bluestockings. Recording and celebrating the lives of women who've gone before us is another way of doing this.

However, at least one of the morals of this story is 'don't get excited about a project, then change jobs and start a part-time Masters degree. On the other hand, my PhD proposal was shaped by the ideas expressed here, particularly the idea of mapping as a tool for public history by e.g using geo-located stories to place links to content in the physical location.

While my PhD has drifted away from early scientific women, I still read around the subject and occasionally adding names to modernbluestocking.freebase.com. If someone's not listed in Wikipedia it's a lot harder to add them, but I've realised that if you want to make a difference to the representation of intellectual women, you need to put content where people look for information - i.e. Wikipedia.

And with the launch of Google's Knowledge Graph, getting history articles into Wikipedia then into Freebase is even more important for the visibility of women's history: "The Knowledge Graph is built using facts and schema from Freebase so everyone who has contributed to Freebase had a part in making this possible. ...The Knowledge Graph is built using facts and schema from Freebase soeveryone who has contributed to Freebase had a part in making this possible. (Source: this post to the Freebase list). I'd go so far as to say that if it's worth writing a scholarly article on an intellectual woman, it's worth re-using your references to create or improve their Wikipedia entry.]

Anyway. On with the original post...]

I keep meaning to find the time to write a proper post explaining one of the projects I'm working on, but in the absence of time a copy and paste job and a link will have to do...

I've started a project called 'modern bluestocking' that's about celebrating and commemorating intellectual women activists from the past and present while reclaiming and redefining the term 'bluestocking'. It was inspired by the National Portrait Gallery's exhibition, 'Brilliant Women: 18th-Century Bluestockings'. (See also the review, Not just a pretty face).

It will be a website of some sort, with a community of contributors and it'll also incorporate links to other resources.

~~We've started talking about what it might contain and how it might work at modernbluestocking.ning.com~~ (ning died, so it's at modernbluestocking.freebase.com...)

Museum application (something to make for mashed museum day?): collect feminist histories, stories, artefacts, images, locations, etc; support the creation of new or synthesised content with content embedded and referenced from a variety of sources. Grab something, tag it, display them, share them; comment, integrate, annotate others. Create a collection to inspire, record, commemorate, and build on.
What, who, how should this website look? Join and help us figure it out.

Why modernbluestocking? Because knowing where you've come from helps you know where you're going.

Sources could include online exhibition materials from the NPG (tricky interface to pull records from). How can this be a geek/socially friendly project and still get stuff done? Run a Modernbluestocking, community and museum hack day app to get stuff built and data collated? Have list of names, portraits, objects for query. Build a collection of links to existing content on other sites? Role models and heroes from current life or history. Where is relatedness stored? 'Significance' -thorny issue? Personal stories cf other more mainstream content? Is it like a museum made up of loan objects with new interpretation? How much is attribution of the person who added the link required? Login v not? Vandalism? How do deal with changing location or format of resources? Local copies or links? Eg images. Local don't impact bandwidth, but don't count as visits on originating site. Remote resources might disappear - moved, permissions changed, format change, taken offline, etc, or be replaced with different content. Examine the sources, look at their format, how they could be linked to, how stable they appear to be, whether it's possible to contact the publisher...

Could also be interesting to make explicit, transparent, the processes of validation and canonisation.

Monday, 23 June 2008

Quick and light solutions at 'UK Museums on the Web Conference 2008'

These are my notes from session 4, 'Quick and light solutions', of the UK Museums on the Web Conference 2008. In the interests of getting my notes up quickly I'm putting them up pretty much 'as is', so they're still rough around the edges. There are quite a few sections below which need to be updated when the presentations or photos of slides go online. [These notes would have been up a lot sooner if my laptop hadn't finally given up the ghost over the weekend.]

Frankie Roberto, 'The guerrilla approach to aggregating online collections'
He doesn't have slides, he's presenting using Firefox 3. [You can also read Frankie's post about his presentation on his blog.]

His projects came out of last year's mashed museum day, where the lack of re-usable cultural heritage data online was a real issue. Talk in the pub turned to 'the dark side' of obtaining data - screen scraping was one idea. Then the idea of FoI requests came up, and Frankie ended up sending Freedom of Information requests to national museums in any electronic format with some kind of structure.

He's not showing site he presented at Montreal, it should be online soon and he'll release the code.

Frankie demonstrated the Science Museum object wiki.

[I found 'how it works' as focus of the object text on the Science Museum wiki a really interesting way of writing object descriptions, it could work well for other projects.]

He has concerns about big top down projects so he's suggesting five small or niche projects. He asked himself, how do people relate to objects?
1. Lots of people say, "I've got one of these" so: ivegotoneofthose.com - put objects up, people can hit button to say 'I have one of those'. The raw numbers could be interesting.
[I suggested this for Exploring 20th Century London at one point, but with a bit more user-generated content so that people could upload photos of their object at home or stories about how they got it, etc. I suppose ivegotoneofthose.com could be built so that it also lets people add content about their particular thing, then ideally that could be pulled back into and displayed on a museum site like Exploring. Would ivegotoneofthose.com sit on top of a federated collections search or would it have its own object list?]
2. Looking at TheyWorkForYou.com, he suggests: TheyCollectForYou.com - scan acquisition forms, publish feeds of which curators have bought what objects. [Bringing transparency to the acquisition process?]
3. Looking at howstuffworks.com, what about howstuffworked.com?
4. 'what should we collect next?' - opening up discourse on purchasing. Frankie took the quote from Indiana Jones: thatbelongsinamuseum.com - people can nominate things that should be in a museum.
5. pricelessartefact.com - [crowdsourcing object evaluation?] - comparing objects to see which is the most valuable, however 'valuable' is defined.
[Except that possibly opens the museum to further risk of having stuff nicked to order]

Fiona Romeo, 'Different ways of seeing online collections'
I didn't take many detailed notes for this paper, but you can see my notes on a previous presentation at Notes from 'Maritime Memorials, visualised' at MCG's Spring Conference.

Mapping - objects don't make a lot of sense about themselves, but are compelling as part of information about an expedition, or failed expedition.

They'll have new map and timeline content launching next month.

Stamen can share information about how they did their geocoding and stuff.

Giving your data out for creative re-use can be as easy as giving out a CSV file.
You always want to have an API or feed when doing any website.
The National Maritime Museum make any data set they can find without licensing restrictions and put it online for creative re-use.

[Slide on approaches to data enhancement.]
Curation is the best approach but it's time-consuming.

Fiona spoke about her experiments at the mashed museum day - she cut and paste transcript data into IBM's Many Eyes. It shows that really good tools are available, even if you don't have resources to work with a company like Stamen.

Mike Ellis presented a summary of the 'mashed museum' day held the day before.

Questions, wrap up session
Jon - always assume there (should be) an API

[A question I didn't ask but posted on twitter: who do we need to get in the room to make sure all these ideas for new approaches to data, to aggregation and federation, new types of experiences of cultural heritage data, etc, actually go somewhere?]

Paul on fears about putting content online: 'since the state of Florida put pictures of their beaches on their website, no-one goes to the beach anymore'.

Metrics:
Mike: need to go shout at DCMS about the metrics, need to use more meaningful metrics especially as thinking of something like APIs
Jon: watermark metadata... micro-marketing data.
Fiona: send it out with a wrapper. Make it embeddable.

Question from someone from Guernsey Museum about images online: once you've downloaded your nice image its without metadata. George: Flickr like as much data in EXIF as possible. EXIF data isn't permanent but is useful.

Angela Murphy: wrappers are important for curators, as they're more willing to let things go if people can get back to the original source.

Me, referring back to the first session of the day: what were Lee Iverson's issues with the keynote speech? Lee: partly about the role of institution like the BBC in modern space. National broadcaster should set social common ground, be a fundamental part of democratic discussion. It's even more important now because of variety of sources out there, people shutting off or being selective about information sources to cope with information overload. Disparate source mean no middle ground or possibility of discussion. BBC should 'let it go' - send the data out. The metric becomes how widely does it spread, where does it show up? If restricted to non-commercial use then [strangling use/innovation].

The 'net recomender' thing is a flawed metric - you don't recommend something you disagree with, something that is new or difficult knowledge. What gets recommended is a video of a cute 8 year old playing Guitar Hero really well. People avoid things that challenge them.

Fiona - the advantage of the 'net recomender' is it's taking judgement of quality outside originating institution.

Paul asked who wondered why 7 - 8 on scale of 10 is neutral for British people, would have thought it's 5 - 6.

Angela: we should push data to DCMS instead of expecting them to know what they could ask for.

George: it's opportunity to change the way success is measured. Anita Roddick says 'when the community gives you wealth, it's time to give it back'. [Show, don't tell] - what would happen if you were to send a video of people engaging instead of just sending a spreadsheet?

Final round comments
Fiona: personal measure of success - creating culture of innovation, engagement, creating vibrant environment.

Paul: success is getting other people to agree with what we've been talking about [at the mashed museum day and conference] the past two days. [yes yes yes!] A measure of success was how a CEO reacted to discovering videos about their institution on YouTube - he didn't try to shut it down, but asked, 'how we can engage with that'

Ross on 'take home' ideas for the conference
Collections - we conflate many definitions in our discussions - images, records, web pages about collections.

Our tone has changed. Delivery changed - realignment of axis of powers, MLA's Digital portfolio is disappearing, there's a vacuum. Who will fill it? The Collections Trust, National Museum Directors' Conference? Technology's not a problem, it's the cultural, human factors. We need to talk about where the tensions are, we've been papering over the cracks. Institutional relationships.

The language has changed - it was about digitisation, accessibility, funding. Three words today - beauty, poetry, life. We're entering an exciting moment.

What's the role of the Museums Computer Group - how and what can the MCG do?

Wednesday, 4 June 2008

Nice information design/visualisation pattern browser

infodesignpatterns.com is a Flash-based site that presents over 50 design patterns 'that describe the functional aspects of graphic components for the display, behaviour and user interaction of complex infographics'.

The development of a design pattern taxonomy for data visualisation and information design is a work in progress, but the site already has a useful pattern search, based on order principle, user goal, graphic class and number of dimensions.

Saturday, 31 May 2008

Some ideas for location-linked cultural heritage projects

I loved the Fire Eagle presentation I saw at the WSG Findability event [my write-up] because it got me all excited again about ideas for projects that take cultural heritage outside the walls of the museum, and more importantly, it made some of those projects seem feasible.

There's also been a lot of talk about APIs into museum data recently and hopefully the time has come for this idea. It'd be ace if it was possible to bring museum data into the everyday experience of people who would be interested in the things we know about but would never think to have 'a museum experience'.

For example, you could be on your way to the pub in Stoke Newington, and your phone could let you know that you were passing one of Daniel Defoe's hang outs, or the school where Mary Wollstonecraft taught, or that you were passing a 'Neolithic working area for axe-making' and that you could see examples of the Neolithic axes in the Museum of London or Defoe's headstone in Hackney Museum.

That's a personal example, and those are some of my interests - Defoe wrote one of my favourite books (A Journal of the Plague Year), and I've been thinking about a project about 'modern bluestockings' that will collate information about early feminists like Wollstonecroft (contact me for more information) - but ideally you could tailor the information you receive to your interests, whether it's football, music, fashion, history, literature or soap stars in Melbourne, Mumbai or Malmo. If I can get some content sources with good geo-data I might play with this at the museum hack day.

I'm still thinking about functionality, but a notification might look something like "did you know that [person/event blah] [lived/did blah/happened] around here? Find out more now/later [email me a link]; add this to your map for sharing/viewing later".

I've always been fascinated with the idea of making the invisible and intangible layers of history linked to any one location visible again. Millions of lives, ordinary or notable, have been lived in London (and in your city); imagine waiting at your local bus stop and having access to the countless stories and events that happened around you over the centuries. Wikinear is a great example, but it's currently limited to content on Wikipedia, and this content has to pass a 'notability' test that doesn't reflect local concepts of notability or 'interestingness'. Wikipedia isn't interested in the finds associated with an archaeological dig that happened at the end of your road in the 1970s, but with a bit of tinkering (or a nudge to me to find the time to make a better programmatic interface) you could get that information from the LAARC catalogue.

The nice thing about local data is that there are lots of people making content; the not nice thing about local data is that it's scattered all over the web, in all kinds of formats with all kinds of 'trustability', from museums/libraries/archives, to local councils to local enthusiasts and the occasional raving lunatic. If an application developer or content editor can't find information from trusted sources that fits the format required for their application, they'll use whatever they can find on other encyclopaedic repositories, hack federated searches, or they'll screen-scrape our data and generate their own set of entities (authority records) and object records. But what happens if a museum updates and republishes an incorrect record - will that change be reflected in various ad hoc data solutions? Surely it's better to acknowledge and play with this new information environment - better for our data and better for our audiences.

Preparing the data and/or the interface is not necessarily a project that should be specific to any one museum - it's the kind of project that would work well if it drew on resources from across the cultural heritage sector (assuming we all made our geo-located object data and authority records available and easily queryable; whether with a commonly agreed core schema or our own schemas that others could map between).

Location-linked data isn't only about official cultural heritage data; it could be used to display, preserve and commemorate histories that aren't 'notable' or 'historic' enough for recording officially, whether that's grime pirate radio stations in East London high-rise roofs or the sites of Turkish social clubs that are now new apartment buildings. Museums might not generate that data, but we could look at how it fits with user-generated content and with our collecting policies.

Or getting away from traditional cultural heritage, I'd love to know when I'm passing over the site of one of London's lost rivers, or a location that's mentioned in a film, novel or song.

[Updated December 2008 to add - as QR tags get more mainstream, they could provide a versatile and cheap way to provide links to online content, or 250 characters of information. That's more information than the average Blue Plaque.]

Thursday, 29 May 2008

Fun with Freebase

A video of a presentation to the Freebase User Group with some good stuff on data mining, visualisation (and some bonus API action) via the Freebase blog.

If you haven't seen it before, Freebase is 'an open database of the world's information', 'free for anyone to query, contribute to, built applications on top of, or integrate into their websites'. Check out this sample entry on the early feminist (and Londoner) Mary Wollstonecraft. The Freebase blog is generally worth a look, whether you're interested in Freebase or just thinking about APIs and data mashups.

Wednesday, 28 May 2008

Google release AJAX loader

From the Google page, AJAX Libraries API:

The AJAX Libraries API is a content distribution network and loading architecture for the most popular open source JavaScript libraries. By using the Google AJAX API Loader's google.load() method, your application has high speed, globaly available access to a growing list of the most popular JavaScript open source libraries including:

jQuery

prototype

script.aculo.us

MooTools

dojo

Google works directly with the key stake holders for each library effort and accept the latest stable versions as they are released. Once we host a release of a given library, we are committed to hosting that release indefinitely.

The AJAX Libraries API takes the pain out of developing mashups in JavaScript while using a collection of libraries. We take the pain out of hosting the libraries, correctly setting cache headers, staying up to date with the most recent bug fixes, etc.

There's also more information at Speed up access to your favorite frameworks via the AJAX Libraries API.

To play devil's avocado briefly, the question is - can we trust Google enough to build functionality around them? It might be a moot point if you're already using their APIs, and you could always use the libraries directly, but it's worth considering.

Thursday, 22 May 2008

Play with your customer profiles

It's a bit early for a random Friday fun link, but this Forrester 'Build Your Customers' Social Technographics Profile' interactive counts as work too.

Companies often approach Social Computing as a list of technologies to be deployed as needed — a blog here, a podcast there — to achieve a marketing goal. But a more coherent approach is to start with your target audience and determine what kind of relationship you want to build with them, based on what they are ready for. You can use the tool on this page to get started.

You can pull down menus to change the age group, country and gender of your target audience, and the graph below updates to show you how many are in each 'Social Technographics' group.

The definitions of the 'Social Technographics' groups are given in a slideshow.

Hat tip to Nina Simon. [Update to get Nina's name right, I'm very sorry!]

Thursday, 15 May 2008

Notes from 'Aggregating Museum Data – Use Issues' at MW2008

These are my notes from the session 'Aggregating Museum Data – Use Issues' at Museums and the Web, Montreal, April 2008.

These notes are pretty rough so apologies for any mistakes; I hope they're a bit useful to people, even though it's so late after the event. I've tried to include most of what was covered but it's taken me a while to catch up on some of my notes and recollection is fading. Any comments or corrections are welcome, and the comments in [square brackets] below are me. All the Museums and the Web conference papers and notes I've blogged have been tagged with 'MW2008'.

This session was introduced by David Bearman, and included two papers:
Exploring museum collections online: the quantitative method by Frankie Roberto and Uniting the shanty towns - data combining across multiple institutions by Seb Chan.

David Bearman: the intentionality of the production of data process is interesting i.e. the data Frankie and Seb used wasn't designed for integration.

Frankie Roberto, Exploring museum collections online: the quantitative method (slides)
He didn't give a crap of the quality of the data, it was all about numbers - get as much as possible to see what he could do with it.

The project wasn't entirely authorised or part of his daily routine. It came in part from debates after the museum mash-up day.

Three problems with mashing museum data: getting it, (getting the right) structure, (dealing with) dodgy data

Traditional solutions:
Getting it - APIs
Structure - metadata standards
Dodgy data - hard work (get curators to fix it)

But it doesn't have to be perfect, it just has to be "good enough". Or "assez bon" (and he hopes that translation is good enough).

Options for getting it - screen scrapers, or Freedom of Information (FOI) requests.

FOI request - simple set of fields in machine-readable format.

Structure - some logic in the mapping into simple format.

Dodgy data - go for 'good enough'.

Presenting objects online: existing model - doesn't give you a sense of the archive, the collection, as it's about the individual pages.

So what was he hoping for?
Who, what, where, when, how. ['Why' is the other traditional journalists questions but too difficult in structured information]

And what did he get?
Who: hoping for collection/curator - no data.
What: hoping for 'this is an x'. Instead got categories (based on museum internal structures).
Where: lots of variation - 1496 unique strings. The specificity of terms varies on geographic and historical dimensions.
When: lots of variation
How: hoping for donation/purchase/loan. Got a long list of varied stuff.

[There were lots of bits about whacking the data together that made people around me (and me, at times) wince. But it took me a while to realise it was a collection-level view, not an individual object view - I guess that's just a reflection of how I think about digital collections - so that doesn't matter as much as if you were reading actual object records. And I'm a bit daft cos the clue ('quantitative') was in the title.

A big part of the museum publication process is making crappy date and location and classification data correct, pretty and human-readable, so the variation Frankie found in data isn't surprising. Catalogues are designed for managing collections, not for publication (though might curators also over-state the case because they'd always rather everything was tidied than published in a possible incorrect or messy state?).

It would have been interesting to hear how the chosen fields related to the intended audience, but it might also have been just a reasonable place to start - somewhere 'good enough' - I'm sure Frankie will correct me if I'm wrong.]

It will be on museum-collections.org. Frankie showed some stuff with Google graph APIs.

Prior art - Pitt Rivers Museum - analysis of collections, 'a picture of Englishness'.

Lessons from politics: theyworkforyou for curators.

Issues: visualisations count all objects equally. e.g. lots of coins vs bigger objects. [Probably just as well no natural history collections then. Damn ants!]

Interactions - present user comments/data back to museums?

Whose role is it anyway, to analyse collections data? And what about private collections?

Sebastian Chan, Uniting the shanty towns - data combining across multiple institutions (slides)
[A paraphrase from the introduction: Seb's team are artists who are also nerds (?)]

Paper is about dealing with the reality of mixing data.

Mess is good, but... mess makes smooshing things together hard. Trying to agree on standards takes a long time, you'll never get anything built.

Combination of methods - scraping + trust-o-meter to mediate 'risk' of taking in data from multiple sources.

Semantic web in practice - dbpedia.

Open Calais - bought out from Clearforest by Reuters. Dynamically generated metadata tags about 'entities' e.g. possible authority records. There are problems with automatically generated data e.g. guesses at people, organisations, whatever might not be right. 'But it's good enough'. Can then build onto it so users can browse by people then link to other sites with more information records about them in other datasets.

[But can museums generally cope with 'good enough'? What does that do to ideas of 'authority'? If it's machine-generated because there's not enough time for a person in the museum to do it, is there enough time for a person in the museum to clean it? OTOH, the Powerhouse model shows you can crowdsource the cleaning of tags so why not entities. And imagine if we could connect Powerhouse objects in Sydney with data about locations or people in London held at the Museum of London - authority versus utility?

Do we need to critically examine and change the environment in which catalogue data is viewed so that the reputation of our curators/finds specialists in some of the more critical (bitchy) or competitive fields isn't affected by this kind of exposure? I know it's a problem in archaeology too.]

They've published an OpenSearch feed as GeoRSS.

Fire eagle, Yahoo beta product. Link it to other data sets so you can see what's near you. [If you can get on the beta.]

I think that was the end, and the next bits were questions and discussion.

David Bearman: regarding linked authority files... if we wait until everything is perfect before getting it out there, then "all curators have to die before we can put anything on the web", "just bloody experiment".

Nate (Walker): is 'good enough' good enough? What about involving museums in creating better and correcting data. [I think, correct me if not]
Seb: no reason why a museum community shouldn't create an OpenCalais equivalent. David: Calais knows what reuters know about data. [So we should get together as a sector, nationally or internationally, or as art, science, history museums, and teach it about museum data.]

David - almost saying 'make the uncertainty an opportunity' in museum data - open it up to the public as you may find the answers. Crowdsource the data quality processes in cataloguing! "we find out more by admitting we know less".

Seb - geo-location is critical to allowing communities to engage with this material.

Frankie - doing a big database dump every few months could be enough of an API.

Location sensitive devices are going to be huge.

Seb - we think of search in a very particular way, but we don't know how people want to search i.e. what they want to search for, how they find stuff. [This is one of the sessions that made me think about faceted browsing.]

"Selling a virtual museum to a director is easier than saying 'put all our stuff there and let people take it'".

Tim Hart (Museum Victoria) - is the data from the public going back into the collection management system? Seb - yep. There's no field in EMu for some of the stuff that OpenCalais has, but the use of it from OpenCalais makes a really good business case for putting it into EMu.

Seb - we need tools to create metadata for us, we don't and won't have resources to do it with humans.

Seb - Commons on Flickr is good experiment in giving stuff away. Freebase - not sure if go to that level.

Overall, this was a great session - lots of ideas for small and large things museums can do with digital collections, and it generated lots of interesting and engaged discussion.

[It's interesting, we opened up the dataset from Çatalhöyük for download so that people could make their own interpretations and/or remix the data, but we never got around to implementing interfaces so people could contribute or upload the knowledge they created back to the project, or how to use the queries they'd run.]

Saturday, 15 December 2007

Browse with maps on Flickr

Flickr have introduce a new 'places' feature, which makes geo-tagged photos easier to find by navigating through a map, browsing or searching. There's an end-user focussed screencast explaining how it works. There are more technical links under the 'Are you nerdy?' heading.

Features like this and Google maps seem to be creating a much more 'map savvy' generation of online users - I think this could be really beneficial because they're educating our users about mapping technologies and interfaces as well as making it possible for ordinary people to create geo-referenced content.

Flickr have also introduced stats for Pro accounts, which will make evaluating the use of our content a lot easier.

Thursday, 30 August 2007

Useful summary of data visualisation methods and tools

Data Visualization: Modern Approaches presents the "most interesting modern approaches to data visualization" for displaying mind maps, news, data, connections, websites, articles and resources and tools and services.

Monday, 12 March 2007

Exposing the layers of history in cityscapes

I really liked this talk on "Time, History and the Internet" because it touches on lots of things I'm interested in.

I have a on-going fascination with the idea of exposing the layers of history present in any cityscape.

I'd like to see content linked to and through particular places, creating a sense of four dimensional space/time anchored specifically in a given location. Discovering and displaying historical content marked-up with the right context (see below) gives us a chance to 'move' through the fourth dimension while we move through the other three; the content of each layer of time changing as the landscape changes (and as information is available).

Context for content: when was it written? Was it written/created at the time we're viewing, or afterwards, or possibly even before it about the future time? Who wrote/created it, and who were they writing/drawing/creating it for? If this context is machine-readable and content is linked to a geo-reference, can we generate a representation of these layers on-the-fly?

Imagine standing at the base of Centrepoint at London's Tottenham Court Road and being able to ask, what would I have seen here ten years ago? fifty? two hundred? two thousand? Or imagine sitting at home, navigating through layers of historic mapping and tilting down from a birds eye view to a view of a street-level reconstructed scene. It's a long way off, but as more resources are born or made discoverable and interoperable, it becomes more possible.