I loved the Fire Eagle presentation I saw at the WSG Findability event [my write-up] because it got me all excited again about ideas for projects that take cultural heritage outside the walls of the museum, and more importantly, it made some of those projects seem feasible.
There's also been a lot of talk about APIs into museum data recently and hopefully the time has come for this idea. It'd be ace if it was possible to bring museum data into the everyday experience of people who would be interested in the things we know about but would never think to have 'a museum experience'.
For example, you could be on your way to the pub in Stoke Newington, and your phone could let you know that you were passing one of Daniel Defoe's hang outs, or the school where Mary Wollstonecraft taught, or that you were passing a 'Neolithic working area for axe-making' and that you could see examples of the Neolithic axes in the Museum of London or Defoe's headstone in Hackney Museum.
That's a personal example, and those are some of my interests - Defoe wrote one of my favourite books (A Journal of the Plague Year), and I've been thinking about a project about 'modern bluestockings' that will collate information about early feminists like Wollstonecroft (contact me for more information) - but ideally you could tailor the information you receive to your interests, whether it's football, music, fashion, history, literature or soap stars in Melbourne, Mumbai or Malmo. If I can get some content sources with good geo-data I might play with this at the museum hack day.
I'm still thinking about functionality, but a notification might look something like "did you know that [person/event blah] [lived/did blah/happened] around here? Find out more now/later [email me a link]; add this to your map for sharing/viewing later".
I've always been fascinated with the idea of making the invisible and intangible layers of history linked to any one location visible again. Millions of lives, ordinary or notable, have been lived in London (and in your city); imagine waiting at your local bus stop and having access to the countless stories and events that happened around you over the centuries. Wikinear is a great example, but it's currently limited to content on Wikipedia, and this content has to pass a 'notability' test that doesn't reflect local concepts of notability or 'interestingness'. Wikipedia isn't interested in the finds associated with an archaeological dig that happened at the end of your road in the 1970s, but with a bit of tinkering (or a nudge to me to find the time to make a better programmatic interface) you could get that information from the LAARC catalogue.
The nice thing about local data is that there are lots of people making content; the not nice thing about local data is that it's scattered all over the web, in all kinds of formats with all kinds of 'trustability', from museums/libraries/archives, to local councils to local enthusiasts and the occasional raving lunatic. If an application developer or content editor can't find information from trusted sources that fits the format required for their application, they'll use whatever they can find on other encyclopaedic repositories, hack federated searches, or they'll screen-scrape our data and generate their own set of entities (authority records) and object records. But what happens if a museum updates and republishes an incorrect record - will that change be reflected in various ad hoc data solutions? Surely it's better to acknowledge and play with this new information environment - better for our data and better for our audiences.
Preparing the data and/or the interface is not necessarily a project that should be specific to any one museum - it's the kind of project that would work well if it drew on resources from across the cultural heritage sector (assuming we all made our geo-located object data and authority records available and easily queryable; whether with a commonly agreed core schema or our own schemas that others could map between).
Location-linked data isn't only about official cultural heritage data; it could be used to display, preserve and commemorate histories that aren't 'notable' or 'historic' enough for recording officially, whether that's grime pirate radio stations in East London high-rise roofs or the sites of Turkish social clubs that are now new apartment buildings. Museums might not generate that data, but we could look at how it fits with user-generated content and with our collecting policies.
Or getting away from traditional cultural heritage, I'd love to know when I'm passing over the site of one of London's lost rivers, or a location that's mentioned in a film, novel or song.
[Updated December 2008 to add - as QR tags get more mainstream, they could provide a versatile and cheap way to provide links to online content, or 250 characters of information. That's more information than the average Blue Plaque.]
Saturday, 31 May 2008
I loved the Fire Eagle presentation I saw at the WSG Findability event [my write-up] because it got me all excited again about ideas for projects that take cultural heritage outside the walls of the museum, and more importantly, it made some of those projects seem feasible.
On Wednesday I went to the WSG London Findability event, and I've finally got the last of my notes up.
The final talk was from Steve Marshall, on 'Finding yourself with Fire Eagle'.
Steve doesn't work on Fire Eagle but made the Python library.
Fire Eagle is a service that helps you manage your location data.
Most location-aware applications have two parts - getting the location, and using the location.
Better model - distribute the location information, but the application getting the location still has to know who's using it.
Even better model: a brokering service. Fire Eagle sits between any 'getting' applications and any 'using' applications, and handles the exchange.
[FWIW, 'Fire Eagle is a brokering service for location data' is probably the best explanation I've heard, and I'd heard it before but I needed the context of the 'get' and the 'use' applications it sits between for it to stick in my brain.]
So how does it work? In the web application context (it's different for desktop or mobile applications):
Web app: app asks for Request Token
Fire Eagle: returns Request Token
Web app: user sent to Fire Eagle with token in URL
Fire Eagle: user chooses privacy levels and authorises app
Web app: user sent back to callback URL with Request Token
Web app: app initiates exchange of Request Token
Fire Eagle: Request Token exchanged for Access Token
Web app: app stores Access Token for user
You can manage your applications, and can revoke permissions (who can set or get your location) at any time. You can also temporarily hide your location, or purge all your data from the service. [Though it might be kept by the linked applications.]
How to use:
1. Get API key
2. Authenticate with user (OAuth)
3. Make API call
Locations can be a point or a bounding box.
Location hierarchy - a set of locations at varying degrees of precision.
[There was some good stuff on who could/is using it, and other ideas, but my notes got a bit useless around this point.]
In summary: you can share your location online, control your data and privacy, and easily build location services.
Question: what makes it special? Answer: it's not coupled to anything. It plays to the strengths of the developers who use it.
'Fire Eagle: twitter + upcoming + pixie dust'.
URLs are bookmarkable, which means they can be easy to use on phone [hooray]. It doesn't (currently) store location history, that's being discussed.
Qu: what's the business model? Ans: it's a Brickhouse project (from an incubator/start-up environment).
All methods are http requests at the moment, they might also use XMP ping.
Qu: opening up the beta? Ans: will give Fire Eagle invitations if you have an application that needs testing.
I had to leave before the end of the questions because the event was running over time and I had to meet people in another pub so I missed out on the probably really interesting conversations down the pub afterwards.
Looking at their hierarchy of 'how precisely will you let application x locate you', it strikes me that it's quite country-dependent, as a postcode identifies a location very precisely within the UK (to within one of a few houses in a row) while in Australia, it just gives you the area of a huge suburb. I'm not sure if it's less precise in the US, where postcodes might fit better in the hierarchy.
I've also blogged some random thoughts on how services like Fire Eagle make location-linked cultural heritage projects more feasible.
Thursday, 29 May 2008
Last night I went to the WSG London Findability event at Westminster University, part of London Web Week; here's part two of my notes.
Stuart Colville's session on 'Building websites with findability in mind' was full of useful, practical advice.
Who needs to find your content?
Understand potential audience(s)
Accessibility (for people and user agents)
Search engine friendly
Content [largely about blogs]:
Make it compelling for your audience
There's less competition in niche subjects
Originality (synthesising content, or representing existing content in new ways is also good)
Stay on topic
Provide free, useful information or tools
Comments and discussion (from readers, and interactions with readers) are good
Author or user-generated, or both
Good for searching
Replaces fixed categories
Enables arbitrary associations
Markup (how to make content more findable):
Use web standards. They're not a magic fix but they'll put you way ahead. The content:code ratio is improved, and errors are reduced.
Use semantic markup. Adds meaning to content.
Try the roundabout SEO test
Make your sites accessible. Accessible content is indexable content.
Keywords versus descriptions. Tailor descriptions for each page; they can be automatically generated; they can be used as summaries in search results.
WordPress has good plugins - metadescription for auto-generated metadata, there are others for manual metadata.
Markup titles and headings:
Make them good - they'll appear as bookmark etc titles.
One H1 per page; the most meaningful title for that page
Separate look from heading structure.
Use semantically correct elements to describe content. Strong, em, etc.
Background images are fine if they're only a design element.
Use image replacement if the images have meaning. There are some accessibility issues.
Use attributes correctly, and make sure they're relevant.
Microformats are a simple layer of structure around content
They're easy to add to your site
Yahoo! search and technorati index them, Firefox 3 will increase exposure.
Start unobtrusive and enhance according to the capabilities of the user agent.
Don't be stupid. Use onClick, don't kill href (e.g. href="#").
Use event delegation - no inline events. It's search engine accessible, has nice clean markup and you still get all the functionality.
[Don't break links! I like opening a bunch of search results in new tabs, and I can't do that on your online catalogue, I'll shop somewhere I can. Rant over.]
Performance and indexation:
Use 'last modified' headers - concentrate search agents on fresh content
Sites with Google Ads are indexed more often.
Hackable URLs are good [yay!].
Avoid query strings, they won't be indexed
Put keywords in your URL path
Use mod_rewrite, etc.
"They should be forever". But you need to think about them so they can be forever even if you change your mind about implementation or content structure.
Use rewrites if you do change them.
De-indexing (if you've moved content)
Put up a 404 page with proper http headers. 410 'intentionally gone' is nice.
There's a tool on Google to quickly de-index content.
Make 404s useful to users - e.g. run an internal search and display likely results from your site based on their search engine keywords [or previous page title].
Robots.txt - really good but use carefully. Robots-Nocontent - Yahoo! introduced 'don't index' for e.g. divs but it hasn't caught on.
Use 301. Redirect users and get content re-indexed by search engines.
Tools for analysing your findability:
Google webmaster tools, Google analytics, log files. It's worth doing, check for broken links etc.
Think about findability before you write a line of code.
Start with good content, then semantic markup and accessibility.
Use sensible headings, titles, links.
Last night I went to the WSG London Findability event at Westminster University. The event was part of London Web Week. As always, apologies for any errors; corrections and comments are welcome.
First up was Cyril Doussin with an 'introduction to findability'.
A lot of it is based on research by Peter Morville, particularly Ambient Findability.
So what do people search for?
Knowledge - about oneself; about concepts/meaning; detailed info (product details, specs); entities in society (people, organisations, etc.)
Opinions - to validate a feeling or judgement; establish trust relationships; find complementary judgements.
What is information? From simple to complex - data -> information -> knowledge.
Findability is 'the quality of being locatable or navigatable'.
Item level - to what degree is a particular object easy to discover or locate?
System level - how well does the environment support navigation and retrieval?
Wayfinding requires: knowing where you are; knowing your destination; following the best route; being able to recognise your destination; being able to find your way back.
The next section was about how to make something findable:
The "in your face" discovery principle - expose the item in places known to be frequented by the target audience. He showed an example of a classic irritating Australian TV ad, a Brisbane carpet store in this case. It's disruptive and annoying, but everyone knows it exists. [Sadly, it made me a little bit homesick for Franco Cozzo. 'Megalo megalo megalo' is also a perfect example of targeting a niche audience, in this case the Greek and Italian speakers of Melbourne.]
Hand-guided navigation - sorting/ordering (e.g. sections of a restaurant menu); sign-posting.
Describe and browse (e.g. search engines) - similar to asking for directions or asking random questions; get a list of entry points to pages.
Mixing things up - the Google 'search within a search' and Yahoo!'s 'search assist' box both help users refine searches.
Recommendations (communication between peers) - the searcher describes intent; casual discussions; advice; past experiences.
The web is a referral system. Links are entry doors to your site. There's a need for a relevancy system whether search engines (PageRank) or peer-based systems (Digg).
Measuring relevance (effectiveness):
Precision - if it retrieves only relevant documents
Recall - whether it retrieves all relevant documents.
Good tests for the effectiveness of your relevance mechanism:
Precision = number of relevant and retrieved documents divided by the total number retrieved.
Recall = number of relevant and retrieved documents divided by the total number of relevant documents.
Relevance - need to identify the type of search:
Sample search - small number of documents are sufficient (e.g. first page of Google results)
Existence search - search for a specific document
Exhaustive search - full set of relevant data is needed.
Sample and existence searches require precision; exhaustive searches require recall.
Taxonomy - organisation through labelling [but it seems in this context there's no hierarchy, the taxon are flat tags].
Ontology - taxonomy and inference rules.
Folksonomy - a social dimension.
[In the discussion he mentioned eRDF (embedded RDF) and microformats. Those magic words - subject : predicate : object.]
Content organisation is increasingly important because of the increasing volume of information and sharing of information. It's also a very good base for search engines.
Measuring findability on the web: count the number of steps to get there. There are many ways to get to data - search engines, peer-based lists and directories.
Aim to strike a balance between sources e.g. search engine optimisation and peer-based.
Know the path(s) your audience(s) will follow (user testing)
Understand the types of search
Make advertising relevant (difficult, as it's so context-dependent)
Make content rich and relevant
Make your content structured
I've run out of lunch break now, but will write up the talks by Stuart Colville and Steve Marshall later.
A video of a presentation to the Freebase User Group with some good stuff on data mining, visualisation (and some bonus API action) via the Freebase blog.
If you haven't seen it before, Freebase is 'an open database of the world's information', 'free for anyone to query, contribute to, built applications on top of, or integrate into their websites'. Check out this sample entry on the early feminist (and Londoner) Mary Wollstonecraft. The Freebase blog is generally worth a look, whether you're interested in Freebase or just thinking about APIs and data mashups.
Wednesday, 28 May 2008
From the Google page, AJAX Libraries API:
Google works directly with the key stake holders for each library effort and accept the latest stable versions as they are released. Once we host a release of a given library, we are committed to hosting that release indefinitely.
There's also more information at Speed up access to your favorite frameworks via the AJAX Libraries API.
To play devil's avocado briefly, the question is - can we trust Google enough to build functionality around them? It might be a moot point if you're already using their APIs, and you could always use the libraries directly, but it's worth considering.
Monday, 26 May 2008
These are my notes from the workshop on "'How Can Culture Really Connect? Semantic Front Line Report" at Museums and the Web 2008. This session was expertly led by Ross Parry.
The paper, "Semantic Dissonance: Do We Need (And Do We Understand) The Semantic Web?" (written by Ross Parry, Jon Pratty and Nick Poole) and the slides are online. The blog from the original Semantic Web Think Tank (SWTT) sessions is also public.
These notes are pretty rough so apologies for any mistakes; I hope they're a bit useful to people, even though it's so late after the event. I've tried to include most of what was discussed but it's taken me a while to catch up.
There's so much to see at MW I missed the start of this session; when we arrived Ross had the participants debating the meaning of terms like 'Web 2.0', 'Web 3.0', 'semantic web, 'Semantic Web'.
So what is the semantic web (sw) about? It's about intelligent and efficient searching; discovering resources (e.g. URIs of picture, news story, video, biographical detail, museum object) rather than pages; machine-to-machine linking and processing of data.
Discussion: how much/what level of discourse do we need to take to curators and other staff in museums?
me: we need to show people what it can do, not bother them with acronyms.
Libby Neville: believes in involving content/museum people, not sure viewing through the prism of technology.
[?]: decisions about where data lives have an effect.
Slide 39 shows various axes against which the Semantic Web (as formally defined) and the semantic web (the SW 'lite'?) can be assessed.
Discussion: Aaron: it's context-dependent.
'expectations increase in proportion to the work that can be done' so the work never decreases.
sw as 'webby way to link data'; 'machine processable web' saves getting hung up on semantics [slide 40 quoting Emma Tonkin in BECTA research report, ‘If it quacks like a duck…’ Developments in search technologies].
What should/must/could we (however defined) do/agree/build/try next (when)?
Discussion: Aaron: tagging, clusters. Machine tags (namespace: predicate: value).
me: let's build semantic webby things into what we're doing now to help facilitate the conversations and agreements, provide real world examples - attack the problem from the bottom up and the top down.
Slide 49 shows three possible modes: make collections machine-processable via the web; build ontologies and frameworks around added tags; develop more layered and localised meaning. [The data (the data around the data) gets smarter and richer as you move through those modes.]
I was reminded of this 'mash it' video during this session, because it does a good jargon-free job of explaining the benefits of semantic webby stuff. I also rather cynically tweeted that the semantic web will "probably happen out there while we talk about it".
Thursday, 22 May 2008
It's a bit early for a random Friday fun link, but this Forrester 'Build Your Customers' Social Technographics Profile' interactive counts as work too.
Companies often approach Social Computing as a list of technologies to be deployed as needed — a blog here, a podcast there — to achieve a marketing goal. But a more coherent approach is to start with your target audience and determine what kind of relationship you want to build with them, based on what they are ready for. You can use the tool on this page to get started.
You can pull down menus to change the age group, country and gender of your target audience, and the graph below updates to show you how many are in each 'Social Technographics' group.
The definitions of the 'Social Technographics' groups are given in a slideshow.
Hat tip to Nina Simon. [Update to get Nina's name right, I'm very sorry!]
Wednesday, 21 May 2008
This is a good summary of why content that has meaning to other computers (is machine-readable in an intelligent sense) is useful: Microformats and accessibility - a request for help:
The web is a wonderful place for humans but it's a less friendly place for machines. When we read a web page we bring along our own learning, mental models and opinions. The combination of what we read and what we know brings meaning. Machines are less bright.
Given a typical TV schedule page we can easily understand that Eastenders is on at 7:30 on the 15th May 2008. But computers can't parse text the way we can. If we want machines to be able to understand the web (and there are many reasons we might want to) we have to be more explicit about our meaning.
Which is where microformats come in. They're a relatively new technology that allow publishers to add semantic meaning to web pages. These might be events, contact details, personal relationships, geographic locations etc. With this additional machine friendly data you can add events from a web page directly to your calendar, contacts to your address book etc. In theory it's a great combination of a web for people and a web for machines. But it has some potential problems.
One potential problem is microformat's use of something called the abbreviation design pattern.
Basically, if you have a screen reader and have abbreviation expansion turned on, they'd like to hear from you.
This overloading of the abbreviation tag also has implications for people using abbr correctly. It's a nice inline way to help explain jargon, but if browsers and screen readers change the way they parse and present the content, we'll lose that functionality.
The BBC guys also have a very interesting post on 'Helping machines play with programmes'.
Quick and dirty notes from geeKyoto, held at Conway Hall, Saturday May 17, 2008. These are pretty much as entered on my phone. The theme of the event was vaguely 'we broke the world, how can we fix it?'. This isn't strictly a post about IT, but there were lots of good presentations and discussion about visualisation, data sharing, communication, building online community, IT enabling communication, some excellent websites and examples of APIs in use. There was also something about re-connecting with offline communities with a 'secular sabbath'.
Christian Nold: in-between the individual and the masses is group and community. Lots on mapping emotions as people walk around.
Alex Haw: spatial control and methods for losing it. Surveillance. Run through stuff. Making the surveillance visible with scaffolding. Re-displaying movement, occupancy. Visualising financial transactions database on physical space. Re-scaling. Coding information analysis. Performance.
Simon Downs, Moixa. Modern design is responsible for climate change, so it should also fix it. Universal design so can upgrade chips instead of throwing out computer when operating system upgrades. Change behaviours with low overhead, easy methods e.g. put balloons on monitors that have been left on overnight. Local dc not remote ac power. How can design work to stop people throwing things away? Batteries rechargeable on USB ports. Disposability is unsustainable. When you buy a phone in China the chargers are standardised so you don't need to manufacture/buy all the accessories again. Consumers make changes, not governments, in what they buy. There was a question and discussion re price of the USB-chargeable battery.
Adrian Hon, Naomi Alderman: Secular Sabbath. Changing state of anything electrical isn't allowed, so instead people have meals with friends, read, go for walks, have conversation, sleep, singing. It facilitates relationships. Tested impact on environment, change in usage of devices. A really good point: behavioural changes in environmental energy don't have to be a sacrifice. Take a day off, get over feeling you'll miss something. Recommendations: invite friends over for monthly secular Sabbath. Chat, walk, good food. No TV, phone, computers. Don't travel unless walk. Enjoy a sense of place. Travel can turn into 'work', be stressful. Day of rest is good and the environment benefits, yay. Bikes are ok too. It helps you prioritise your day, because you can't do stuff, only think. [It's a bit like earth hour, which was quite nice not only in terms of participating in something big but also because it meant remembering that conversation is good and doesn't require any additional power. But a secular Sabbath also means friends must be nearby or stay overnight, this might limit it to very good friends unless you somewhere comfortable to put them up. I wish there was a non-religious equivalent for 'sabbath'. That said, I really loved this idea. Even having an offline night sounds like a good idea - I could read or garden instead.]
Avoiding mass extinction: amee.cc "If all the energy data in the world were accessible, what would you build?". Dashboards on energy consumption. API. Aggregates data and metrics. Your energy identity. Fed from lots of data sources. Credit card transactions have calculated carbon footprint! Remit to measure all energy in world. Data owned by providers, they're a neutral platform. 75% of change doesn't require new technology. Action to measure and compare, design new life, innovate. People aren't interested in comparisons with the national average, but they do compare themselves with their neighbours. Look to industry for mass production models for energy devices. Make energy *the* performance metric. Questions/discussion: they publish their methodologies on wiki; question about using the tax system to motivate change. Taxes are currently on good things not bad things.
Vincenzo and Bruno from Central St Martins on sustainable development. Sustainable tourism, consumerism. Bruno: play in a changing public realm. Sustainable communities. Observe behaviour and subtle clues to help discover problems. Swings at bus stops! [Cool in so many ways].
Futurelab, beyond current horizons. Images of the future, how and what we think about the future influences what the future becomes. [I wonder if they could be involved with bathcamp?] Thoughts from public. Name-checked blog called Paleofuture. 'Choose the future you want because it won't happen if you don't'.
DIY Kyoto. Create awareness by empowering the individual. Focus on positive messages. Then offer practical solutions. Tangible visualisation of electricity usage: Wattson. Feedback on your usage. Holmes. Download last 28 days data. [It's expensive for an individual but it would be great for a business, put it somewhere like a reception desk where everyone passes it to encourage people to help reduce electricity usage.]
Africa, communications. ICT4D. Kenya, falling apart within 24 hours of election. People in rural areas were the ones who missed out on information. Government, broadcasters, problematic. People used phones, internet. When the internet meets politics. Erik Hersman, Africa and IT. People reporting problems, putting it on map. Small agile projects are more effective than big slow ones in this environment.
Bryony Worthington, emissions trading. Buy permits back and remove from system, campaign for more caps. Take control, expose issue to public scrutiny, provide potential for mass collaboration and mobilisation. Sandbag.org.uk is new campaign to bring emissions trading into public domain. Bring personal responsibility to companies who are trading, real names and addresses. Compare allowance against emissions.
James Smith. Can software save the planet? Socially responsible software. Carbon diet. Visualise your usage. Do the Green Thing: making being green fun, empowering, gives status. Online community around monthly actions. They had a really good list of simple but clever actions you could do. Building a volunteer community of green geeks. Google group: green-web-uk. Using free software and open standards.
Government/policy guys. They did a bar camp at google. Sliding scale from community of trust to universe of discourse, and the problems this creates in getting the right information to the right people when they need it. People in authority have trouble admitting they don't know everything, asking outside their circles for information is problematic as it can be seized on for political gain (so a bit like some experts in all fields then). They looked to open source for models.
2gether08 - conference later this year. Proposers. Enablers, supporters -> convene -> delivery. Open process. Mapping networks, comparing mapping afterwards to see if event was a success.
[Somewhere along the line I tweeted 'programming at work/home is like a mullet, .Net business in front, OSS party in the back'. Clearly my brain was starting to fade.]
From discussion: Ch 4 have two twitter feeds from news service, exploring possibilities.
Arctic explorer Ben Saunders: think about what you're doing with the tiny amount of time we have in your life. Referenced Bill Bryson on how many hours we have in our lifetime, think about what you're doing with each one. No-one else is the authority on your potential.
Monday, 19 May 2008
Personal web servers on a laptop or desktop machine are very handy if you're looking for a local development environment. This article offers a few options for Mac, Linux and Windows: Set up your personal webserver.
Friday, 16 May 2008
The details for two events you might be interested in have been finalised.
The program for UK Museums on the Web Conference 2008 has been announced. It's a great line-up, so I'll see you there if you can get to the University of Leicester for 19 June 2008.
And the date and venue for BathCamp have been confirmed as Saturday 13th - Sunday 14th September 2008 at the Invention Studios in Bath. More information at that blog link, or my previous post: Calling geeks in the UK with an interest in cultural heritage content/audiences.
And I've been hassled by my legion of fan
s to point out that you can nominate me in the Programming and development blogs: ComputerWeekly.com IT Blog Awards 08 (and you might win a £50 Amazon voucher). There's a lovely badge but I can't quite bring myself to use it, I've only just gotten used to the idea that anyone apart from three or four people I know read this blog. Anyway, there you go.
And if all that's too much excitement for you, go read about the lamest Wikipedia edit wars ever.
Thursday, 15 May 2008
These are my notes from the session 'Aggregating Museum Data – Use Issues' at Museums and the Web, Montreal, April 2008.
These notes are pretty rough so apologies for any mistakes; I hope they're a bit useful to people, even though it's so late after the event. I've tried to include most of what was covered but it's taken me a while to catch up on some of my notes and recollection is fading. Any comments or corrections are welcome, and the comments in [square brackets] below are me. All the Museums and the Web conference papers and notes I've blogged have been tagged with 'MW2008'.
This session was introduced by David Bearman, and included two papers:
Exploring museum collections online: the quantitative method by Frankie Roberto and Uniting the shanty towns - data combining across multiple institutions by Seb Chan.
David Bearman: the intentionality of the production of data process is interesting i.e. the data Frankie and Seb used wasn't designed for integration.
Frankie Roberto, Exploring museum collections online: the quantitative method (slides)
He didn't give a crap of the quality of the data, it was all about numbers - get as much as possible to see what he could do with it.
The project wasn't entirely authorised or part of his daily routine. It came in part from debates after the museum mash-up day.
Three problems with mashing museum data: getting it, (getting the right) structure, (dealing with) dodgy data
Getting it - APIs
Structure - metadata standards
Dodgy data - hard work (get curators to fix it)
But it doesn't have to be perfect, it just has to be "good enough". Or "assez bon" (and he hopes that translation is good enough).
Options for getting it - screen scrapers, or Freedom of Information (FOI) requests.
FOI request - simple set of fields in machine-readable format.
Structure - some logic in the mapping into simple format.
Dodgy data - go for 'good enough'.
Presenting objects online: existing model - doesn't give you a sense of the archive, the collection, as it's about the individual pages.
So what was he hoping for?
Who, what, where, when, how. ['Why' is the other traditional journalists questions but too difficult in structured information]
And what did he get?
Who: hoping for collection/curator - no data.
What: hoping for 'this is an x'. Instead got categories (based on museum internal structures).
Where: lots of variation - 1496 unique strings. The specificity of terms varies on geographic and historical dimensions.
When: lots of variation
How: hoping for donation/purchase/loan. Got a long list of varied stuff.
[There were lots of bits about whacking the data together that made people around me (and me, at times) wince. But it took me a while to realise it was a collection-level view, not an individual object view - I guess that's just a reflection of how I think about digital collections - so that doesn't matter as much as if you were reading actual object records. And I'm a bit daft cos the clue ('quantitative') was in the title.
A big part of the museum publication process is making crappy date and location and classification data correct, pretty and human-readable, so the variation Frankie found in data isn't surprising. Catalogues are designed for managing collections, not for publication (though might curators also over-state the case because they'd always rather everything was tidied than published in a possible incorrect or messy state?).
It would have been interesting to hear how the chosen fields related to the intended audience, but it might also have been just a reasonable place to start - somewhere 'good enough' - I'm sure Frankie will correct me if I'm wrong.]
It will be on museum-collections.org. Frankie showed some stuff with Google graph APIs.
Prior art - Pitt Rivers Museum - analysis of collections, 'a picture of Englishness'.
Lessons from politics: theyworkforyou for curators.
Issues: visualisations count all objects equally. e.g. lots of coins vs bigger objects. [Probably just as well no natural history collections then. Damn ants!]
Interactions - present user comments/data back to museums?
Whose role is it anyway, to analyse collections data? And what about private collections?
Sebastian Chan, Uniting the shanty towns - data combining across multiple institutions (slides)
[A paraphrase from the introduction: Seb's team are artists who are also nerds (?)]
Paper is about dealing with the reality of mixing data.
Mess is good, but... mess makes smooshing things together hard. Trying to agree on standards takes a long time, you'll never get anything built.
Combination of methods - scraping + trust-o-meter to mediate 'risk' of taking in data from multiple sources.
Semantic web in practice - dbpedia.
Open Calais - bought out from Clearforest by Reuters. Dynamically generated metadata tags about 'entities' e.g. possible authority records. There are problems with automatically generated data e.g. guesses at people, organisations, whatever might not be right. 'But it's good enough'. Can then build onto it so users can browse by people then link to other sites with more information records about them in other datasets.
[But can museums generally cope with 'good enough'? What does that do to ideas of 'authority'? If it's machine-generated because there's not enough time for a person in the museum to do it, is there enough time for a person in the museum to clean it? OTOH, the Powerhouse model shows you can crowdsource the cleaning of tags so why not entities. And imagine if we could connect Powerhouse objects in Sydney with data about locations or people in London held at the Museum of London - authority versus utility?
Do we need to critically examine and change the environment in which catalogue data is viewed so that the reputation of our curators/finds specialists in some of the more critical (bitchy) or competitive fields isn't affected by this kind of exposure? I know it's a problem in archaeology too.]
They've published an OpenSearch feed as GeoRSS.
Fire eagle, Yahoo beta product. Link it to other data sets so you can see what's near you. [If you can get on the beta.]
I think that was the end, and the next bits were questions and discussion.
David Bearman: regarding linked authority files... if we wait until everything is perfect before getting it out there, then "all curators have to die before we can put anything on the web", "just bloody experiment".
Nate (Walker): is 'good enough' good enough? What about involving museums in creating better and correcting data. [I think, correct me if not]
Seb: no reason why a museum community shouldn't create an OpenCalais equivalent. David: Calais knows what reuters know about data. [So we should get together as a sector, nationally or internationally, or as art, science, history museums, and teach it about museum data.]
David - almost saying 'make the uncertainty an opportunity' in museum data - open it up to the public as you may find the answers. Crowdsource the data quality processes in cataloguing! "we find out more by admitting we know less".
Seb - geo-location is critical to allowing communities to engage with this material.
Frankie - doing a big database dump every few months could be enough of an API.
Location sensitive devices are going to be huge.
Seb - we think of search in a very particular way, but we don't know how people want to search i.e. what they want to search for, how they find stuff. [This is one of the sessions that made me think about faceted browsing.]
"Selling a virtual museum to a director is easier than saying 'put all our stuff there and let people take it'".
Tim Hart (Museum Victoria) - is the data from the public going back into the collection management system? Seb - yep. There's no field in EMu for some of the stuff that OpenCalais has, but the use of it from OpenCalais makes a really good business case for putting it into EMu.
Seb - we need tools to create metadata for us, we don't and won't have resources to do it with humans.
Seb - Commons on Flickr is good experiment in giving stuff away. Freebase - not sure if go to that level.
Overall, this was a great session - lots of ideas for small and large things museums can do with digital collections, and it generated lots of interesting and engaged discussion.
[It's interesting, we opened up the dataset from Çatalhöyük for download so that people could make their own interpretations and/or remix the data, but we never got around to implementing interfaces so people could contribute or upload the knowledge they created back to the project, or how to use the queries they'd run.]
Dr Klaus Werner has been working with Intelligent Cultural Resources Information Management (ICRIM) on connecting repositories or information silos from "different cultural heritage organizations – museums, superintendencies, environmental and architectural heritage organizations" to make "information resources accessible, searchable, re-usable and interchangeable via the internet".
You can read more on these CAA07 conference slides: ICRIM: Interconnectivity of information resources across a network of federated repositories (pdf download), and the abstract from the CAA07 paper might also provide some useful context:
The HyperRecord system, used by the Capitoline Museums (Rome) and the Bibliotheca Hertziana (Max-Planck Institute, Rome) and developed as Culture2000 project, is a framework for the inter-connectivity of information resources from museums, archives and cultural institutes.Thanks to Leif Isaksen for putting Dr Werner in contact with me after he saw his paper at CAA07.
The repositories offer both the usual human interface for research (fulltext, title, etc.) and a smart REST API with a powerful behind-the-scenes direct machine-to-machine facility for querying and retrieving data.
The different information resources use digital object identifiers in the form of URNs (up to now, mostly for museum objects) for identification and direct-access. These allow easy aggregation of contents (data, records, documents) not only inside a repository but also across boundaries using the REST API for serving XML over a plain HTTP connection, in fact creating a loosely coupled network of repositories.
These are my notes from the workshop Everything RSS with Jim Spadaccini from Ideum at Museums and the Web, Montreal, April 2008. Some of my notes will seem self-evident to various geeks or non-geeks but I've tried to include most of what was covered.
It's taken me a while to catch up on some of my notes, so especially at this distance - any mistakes are mine, any comments or corrections are welcome, and the comments in [square brackets] below are me. All the conference papers and notes I've blogged have been tagged with 'MW2008'.
The workshop will cover: context, technology, the museum sector, usability and design.
RSS/web feeds - it's easy to add or remove content sources, they can be rich media including audio, images, video, they are easily read or consumed via applications, websites, mobile devices.
The different flavours and definitions of RSS have hindered adoption.
Atom vs RSS - Atom might be better but not as widely adopted. Most mature RSS readers can handle both.
RSS users are more engaged - 2005, Nielsen NetRatings.
Marketers are seeing RSS as alternative to email as email is being overrun by spam and becoming a less efficient marketing tool.
The audience for RSS content is slowly building as it's built into browsers, email (Yahoo, Outlook, Mac), MySpace widget platform.
Feedburner. [I'm sure more was said about than this - probably 'Feedburner is good/useful' - but it was quite a while ago now.]
Extending RSS: GeoRSS - interoperable geo-coded data; MediaRSS, Creative Commons RSS Module.
Creating RSS feeds on the server-side [a slide of references I failed to get down in time].
You can use free or open source software to generate RSS feeds. MagpieRSS, Feed Editor (Windows, extralabs.net); or free Web Services to create or extend RSS feeds.
There was an activity where we broke into groups to review different RSS applications, including Runstream (create own RSS feed from static content) and xFruits (convert RSS into different platforms).
Others included rssfeedssubmit.com, aiderss.com, rssmixer.com (prototype by Ideum), rsscalendar.com and feedshow.com (OPML generator).
OPML - exchange lists of web feeds between aggregators. e.g. museumblogs site.
RSSmixer - good for widgets and stats, when live to public. [It looks like it's live now.]
RSS Micro - RSS feed search engine, you can also submit your feed there. Also feedcamp.
Ideas for using RSS:
Use meetup and upcoming for promoting events. Have links back to your events pages and listings.
Link to other museums - it helps everyone's technorati/page ranking.
There was discussion of RSSmixer's conceptual data model. Running on Amazon EC2. [with screenshot]. More recent articles are in front end database, older ones in backend database.
RSS is going to move more to a rich media platform, so interest in mixing and filtering down feeds will grow, create personalisation.
Final thoughts - RSS is still emergent. It won't have a definitive breakthrough but it will eventually become mainstream. It will be used along with email marketing as a tool to reach visitors/customers. RSS web services will continue to expand.
Regular RSS users, who have actively subscribed, are an important constituency. Feeds will be more frequently offered on websites, looking beyond blogs and podcasts.
RSS can help you reach new audiences and cement relationships with existing visitors. You can work with partners to create 'mixed' feeds to foster deeper connections with visitors.
Use RSS for multiple points of dissemination - not just RSS. [At this stage I really have no idea what I meant by this but I'm sure whatever Jim said made sense.]
[I had a question about tips for educating existing visitors about RSS. I'd written a blog post about RSS and how to subscribe, which helped, but that's still only reaching a tiny part of potential audience. Could do a widget to demonstrate it.
This was also one of the workshops or talks that made me realise we are so out of the loop with up-to-date programming models like deployment methods. I guess we're so busy all the time it's difficult to keep up with things, and we don't have the spare resources to test new things out as they come along.]
Wednesday, 14 May 2008
[Update: Migratr downloads all your files to the desktop, with your metadata in an XML file, so it's a great way to backup your content if you're feeling a bit nervous about the sustainability of the online services you use. If it's saved your bacon, consider making a donation.]
This is just a quick post to recommend a nice piece of software: "Migratr is a desktop application which moves photos between popular photo sharing services. Migratr will also migrate your metadata, including the titles, tags, descriptions and album organization."
I was using it to migrate stuff from random Flickr accounts people had created at work in bursts of enthusiasm to our main Museum of London Flickr account, but it also works for 23HQ, Picasa, SmugMug and several other photo sites.
The only hassles were that it concatenated the tags (e.g. "Museum of London" became "museumoflondon") and didn't get the set descriptions, but overall it's a nifty utility - and it's free (though you can make a donation). [Update: Alex, the developer, has pointed out that the API sends the tags space delimited, so his app can't tell the different.]
And as the developer says, the availability of free libraries (and the joys of APIs) cut down development time and made the whole thing much more possible. He quotes Newton's, "If I have seen further it is by standing on the shoulders of giants" and I think that's beautifully apt.
Tuesday, 13 May 2008
Bootstrapping a Niche Social Network poses the question, "How do you bootstrap your social site if you're targeting a group that doesn't yet use software (or doesn't seem interested in using software)? While software designers can often see how useful their tool can be, normal users aren't so prescient. How do you get them to see the value in your software?", and provides some answers:
People don't want to be good at software. They want to be good at fun things like acting, writing, and ultimate frisbee.See also: Social Media for Social Change Behind the Nonprofit Firewall (and the discussion in the comments).
Once you identify the areas where the software can improve the theatre folks life, you’ll have a much easier time convincing them to give it a shot. So in their mind they won’t be using "social network software", they’ll be using a tool to help them be a better theatre group.
This is an unfortunate side-effect of the social networking craze. We have new words that we're using to communicate among those of us who design the software, but for the vast majority of folks who will actually use the software, the terms don't mean very much. So while you may understand what I mean by "niche social network", the people actually in the niche social network think of themselves as performers, actors, or what-have-you.
The issues are a bit different for social networks - if you get it right then your users are your content creators, while you'll probably need others outside of IT to contribute if you want blogs or videos or photos about your organisation.
Finding real world metaphors also seems to help - Andy Powell described the Ning site for the Eduserv Foundation Symposium 2008 as "a virtual delegate list - a place where people could find out who is coming on the day (physically or virtually) and what their interests are". This description has made a lot of sense to people I've discussed it with - everyone knows what a conference delegate list looks like, and everyone has probably also wondered how on earth they'll find the people who sound interesting. A social network meets a need in that context.
Thursday, 8 May 2008
These are my notes from the third paper 'The API as Curator' by Aaron Straup Cope in the Theoretical Frameworks session chaired by Darren Peacock at Museums and the Web 2008. The slides for The API as Curator are online.
I've also included below some further notes on why, how, whether museums should hire programmers, as this was a big meme at the conference and Aaron's paper made a compelling case for geeks in art, arty geeks and geeky artists.
You might have noticed it's taken me a while to catch up on some of my notes from this conference, and the longer I leave it the harder it gets. As always, any mistakes are mine, any comments corrections are welcome, and the comments in [square brackets] below are mine.
The other session papers were Object-centred democracies: contradictions, challenges and opportunities by Fiona Cameron and Who has the responsibility for saying what we see? mashing up Museum and Visitor voices, on-site and online by Peter Samis; all the conference papers and notes I've blogged have been tagged with 'MW2008'.
Aaron Cope: The API as curator.
The paper started with some quotes as 'mood music' for the paper.
Institutions are opening up, giving back to the communitiy and watching what people build.
It's about (computer stuff as) plumbing, about making plumbing not scary. If you're talking about the web, sooner or later you're going to need to talk about computer programming.
Programmers need to be more than just an accessory - they should be in-house and full-time and a priority. It boils down to money. You don't all need to be computer scientists, but it should be part of it so that you can build things.
Experts and consumers - there's a long tradition of collaboration in the art community, for example printmaking. Printers know about all the minutiae (the technical details) but/so the artists don't have to.
Teach computer stuff/programming so that people in the arts world are not simply consumers.
Threadless (the t-shirt site) as an example. Anyone can submit a design, they're voted on in forum, then the top designs are printed. It makes lots of money. It's printmaking by any other name. Is it art?
"Synthetic performances" Joseph Beuys in Second Life...
It's nice not to be beholden to nerds... [I guess a lot of people think that about their IT department. Poor us. We all come in peace!]
Pure programming and the "acid bath of the internet".
Interestingness on Flickr - a programmer works on it, but it's not a product - (it's an expression of their ideas). Programming is not a disposable thing, it's not as simple as a toaster. But is it art? [Yes! well, it can be sometimes, if a language spoken well and a concept executed elegantly can be art.]
API and Artspeak - Aaron's example (a bit on slide 15 and some general mappy goodness).
Build on top of APIs. Open up new ways to explore collection. Let users map their path around your museum to see the objects they want to see.
Their experience at Flickr is that people will build those things (if you make it possible). [Yay! So let's make it possible.]
There's always space for collaboration.
APIs as the nubby bits on Lego. [Lego is the metaphor of the conference!]
Flickr Places - gazetteer browsing.
[Good image on slide 22]: interpretation vs intent, awesome (x) vs time (y). You need programmers on staff, you need to pay them [please], you don't want them to be transient if you want to increase smoothness of graph between steps of awesomeness. Go for the smallest possible release cycles. Small steps towards awesome.
Questions for the Theoretical Frameworks session
Qu from the Science Museum Minnesota: how to hire programmers in museums - how to attract them? when salaries are crap.
Aaron - teach it in schools and go to computer science departments. People do stuff for more than just money.
Qu on archiving UGC and other stuff generated in these web 2.0 projects... Peter Samis - WordPress archives things. [So just use the tools that already exist]
Aaron - build it and they will come. Also, redefine programming.
There's a good summary of this session by Nate at MW2008 - Theoretical Frameworks.
And here's a tragically excited dump from my mind written at the time: "Yes to all that! Now how do we fund it, and convince funders that big top-down projects are less likely to work than incremental and iterative builds? Further, what if programmers and curators and educators had time to explore, collaborate, push each other in a creative space? If you look at the total spend on agencies and external contractors, it must be possible to make a case for funding in-house programmers - but silos of project-based funding make it difficult to consolidate those costs, at least in the UK."
Continuing the discussion about the benefits of an in-house developer team, post-Museums and the Web, Bryan Kennedy wrote a guest post on Museum 2.0 about Museums and the Web in Montreal that touched on the issue:
More museums should be building these programming skills in internal teams that grow expertise from project to project. Far too many museums small and large rely on outside companies for almost all of their technical development on the web. By and large the most innovation at Museums and the Web came from teams of people who have built expertise into the core operations of their institution.I left the following comment at the time, and I'm being lazy* and pasting here to save re-writing my thoughts:
I fundamentally believe that at least in the museum world there isn't much danger of the technology folks unseating the curators of the world from their positions of power. I'm more interested in building skilled teams within museums so that the intelligent content people aren't beholden to external media companies but rather their internal programmers who feel like they are part of the team and understand the overall mission of the museum as well as how to pull UTF-8 data out of a MySQL database.
Good round-up! The point about having permanent in-house developers is really important and I was glad to see it discussed so much at MW2008.There's a really good discussion in the comments on Bryan's post. I'm sure this is only a sample of the discussion, but it's a bit difficult to track down across the blogosphere/twitterverse/whatever and I want to get this posted some time this century.
It's particularly on my mind at the moment because yesterday I gave a presentation (on publishing from collections databases and the possibilities of repositories or feeds of data) to a group mostly comprised of collections managers, and I was asked afterwards if this public accessibility meant "the death of the curator". I've gathered the impression that some curators think IT projects impose their grand visions of the new world, plunder their data, and leave the curators feeling slightly shell-shocked and unloved.
One way to engage with curatorial teams (and educators and marketers and whoever) and work around these fears and valuable critiques is to have permanent programmers on staff who demonstrably value and respect museum expertise and collections just as much as curators, and who are willing to respond to the concerns raised during digital projects.
* But good programmers are lazy, right?
Notes from 'Who has the responsibility for saying what we see?' in the 'Theoretical Frameworks' session, MW2008
These are my notes from the second paper, 'Who has the responsibility for saying what we see? mashing up Museum and Visitor voices, on-site and online' by Peter Samis in the Theoretical Frameworks session chaired by Darren Peacock at Museums and the Web 2008.
The other session papers were Object-centred democracies: contradictions, challenges and opportunities by Fiona Cameron and The API as Curator by Aaron Straup Cope; all the conference papers and notes I've blogged have been tagged with 'MW2008'.
It's taken me a while to catch up on some of my notes - real life has a way of demanding attention sometimes. Any mistakes are mine, any comments corrections are welcome, and the comments in [square brackets] below are mine.
Peter Samis spoke about the work of SFMOMA with Olafur Eliasson. His slides are here.
How our perception changes how we see the world...
"Objecthood doesn’t have a place in the world if there’s not an individual person making use of that object… I of course don’t think my work is about my work. I think my work is about you." (Olafur Eliasson, 2007)
Samis gave an overview of the exhibitions "Take your time: Olafur Eliasson" and "Your tempo" presented at SFMOMA.
The "your" in the titles demands a proactive and subjective approach; stepping into installations rather than looking at paintings. The viewer is integral to the fulfilment of a works potential.
Do these rules apply to all [museum] objects? These are the questions...
They aimed to encourage visitors in contemplation of their own experience.
Visitors who came to blog viewed 75% of pages. Comments were left by 2% of blog visitors.
There was a greater in interest in seeing how others responded than in contributing to the conversation. Comments were a 'mixed bag'.
The comments helped with understanding visitor motivations in narratives... there's a visual 'Velcro effect' - some artworks stay with people - the more visceral the experience of various artworks, the greater the corresponding number of comments.
[Though I wondered if it's an unproblematic and direct relationship? People might have a relationship with the art work that doesn't drive them to comment; that requires more reflection to formulate a response; or that might occur at an emotional rather than intellectual level.]
Visitors also take opportunity to critique the exhibition/objects and curatorial choices when asked to comment.
What are the criteria of values for comments? By whose standards? And who within the institution reads the blog?
How do you know if you've succeeded? Depends on goals.
"We opened the door to let visitors in... then we left the room. They were the only ones left in the room." - the museum opens up to the public then steps out of the dialogue. [Slide 20]
[I have quoted this in conversation so many times since the conference. I think it's an astute and powerful summary of the unintended effect of participatory websites that aren't integrated into the museum's working practices. We say we want to know what our visitors think, and then we walk away while they're still talking. This image is great because it's so visceral - everyone realises how rude that is.]
Typology/examples of museum blogs over time... based on whether they open to comments, and whether they act like docents/visitors assistants and have conversations with the public in front of the artworks.
If we really engage with our visitors, will we release the "pent up comments"?
A NY Times migraine blog post had 294 reflective, articulate, considered, impassioned comments on the first day.
[What are your audiences' pent up questions? How do you find the right questions? Is it as simple as just asking our audiences, and even if it isn't, isn't that the easiest place to start? If we can crack the art of asking the right questions to elicit responses, we're in a better position.]
Nina Simon's hierarchy of social participation. Museums need to participate to get to higher levels of co-creative, collaborative process. "Community producer" - enlist others, get
Even staff should want to return to your blogs and learn from them.
[Who are the comments that people leave addressed to? Do we tell them or do we just expect them to comment into empty space? Is that part of the reason for low participation rates? What's the relationship between participation and engagement? But also because people aren't participating in the forum you provide, doesn't mean they're not participating somewhere else... or engaging with it in other forums, conversations in the pubs, etc not everything is captured online even if the seed is online and in your institution. ]
Wednesday, 7 May 2008
Just today I asked if anyone used drop-down menus anymore, and here Amazon have gone and launched a new design that uses them.
I don't know how many people would notice, but I like that they've provided a link (in the top right-hand corner with the text, 'We've had a redesign. Take a look') to 'A Quick Tour of Our Redesign'. The page highlights some of the changes/new features and provides answers to questions including 'Why did you change the site?', 'How did you decide on this design?' and 'What's different?'.
I'm guessing they've done their research and found that kind of transparency helps people deal with the changes - I was hoping to blog about our web redesign process, and I think this shows its worth doing. I wonder how many people notice the 'redesign' link and are interested enough to click on it.
It's easy to get lost on Flickr. You click from here to there, this to that, then suddenly you look up and notice you've lost hours. Allow visitors to cut their own path through the place and they'll curate their own experiences. The idea that every Flickr visitor has an entirely different view of its content is both unsettling, because you can't control it, and liberating, because you’ve given control away. Embrace the idea that the site map might look more like a spider web than a hierarchy. There are natural links in content created by many, many different people. Everyone who uses a site like Flickr has an entirely different picture of it, so the question becomes, what can you do to suggest the next step in the display you design?
I've been thinking about something like this for a while, though the example I've used is Wikipedia. I have friends who've had to ban themselves from Wikipedia because they literally lose hours there after starting with one innocent question, then clicking onto an interesting link, then onto another...
That ability to lose yourself as you click from one interesting thing to another is exactly what I want for our museum sites: our visitor experience should be as seductive and serendipitous as browsing Wikipedia or Flickr.
And hey, if we look at the links visitors are making between our content, we might even learn something new about our content ourselves.
Saturday, 3 May 2008
I've uploaded my presentation slides from a talk for the UK MultiMimsy Users group in Docklands last month to MultiMimsy database extractions and the possibilities for OAI-based collections repositories at the Museum of London.
The first part discusses how to get from a set of data in a collections management system to a final published website, looking at the design process and technical considerations. Willoughby's use of Oracle on the back-end means that any ODBC-compliant database can query the underlying database and extract collections data.
The paper then looks at some of the possibilities for the Museum of London's OAI-PMH repository. We've implemented an OAI repository for the People's Network Discover Service (PNDS) for Exploring 20th Century London (which also means we're set to get records into Europeana), but I hope that we can use the repository in lots of other ways, including the possibility of using our repository to serve data for federated searches.
There's currently some discussion internationally in the cultural heritage sector about repositories vs federated search, but I'm not sure it's an either/or choice. The reasons each are used are often to do with political or funding factors instead of the base technology, but either method, or both, could be used internally or externally depending on the requirements of the project and institution.
I can go into more detail about the scripts we use to extract data from MultiMimsy or send sample scripts if people are interested. They might be a good way to get started if you haven't extracted data from MultiMimsy before but they won't generally be directly relevant to your data structres as the use of MultiMimsy can vary so widely between types of museums, collections and projects.