Saturday, 9 May 2009

RDFa, SearchMonkey - tech talks at Open Hack London

While today's Open Hack London event is mostly about the 24-hour hackathon, I signed up just for the Tech Talks because I couldn't afford to miss a whole weekend's study in the fortnight before my exams (stupid exams). I went to the sessions on 'Guardian Data Store and APIs', 'RDFa SearchMonkey', Arduino, 'Hacking with PHP', 'BBC Backstage', Dopplr's 'mashups made of messages' and lightning talks including 'SPARQL and semantic web' stuff you can do now.

I'm putting my rough and ready notes online so that those who couldn't make it can still get some of the benefits. Apologies for any mishearings or mistakes in transcription – leave me a comment with any questions or clarifications.

One of the reasons I was going was to push my thinking about the best ways to provide API-like access to museum information and collections, so my notes will reflect that but I try to generalise where I can. And if you have thoughts on what you'd like cultural heritage institutions to do for developers, let us know! (For background, here's a lightning talk I did at another hack event on happy museums + happy developers = happy punters).

RDFa - now everyone can have an API.
Mark Birkbeck

Going to cover some basic mark-up, and talk about why RDFa is a good thing. [The slides would be useful for the syntax examples, I'll update if they go online.]

RDFa is a new syntax from W3C - a way of embedding metadata (RDF) in HTML documents using attributes.

e.g. <span property="dc:title"> - value of property is the text inside the span.

Because it's inline you don't need to point to another document to provide source of metadata and presentation HTML.

One big advance is that can provide metadata for other items e.g. images, so you can e.g. attach licence info to the image rather than page it's in – e.g. <img src="" rel="licence" resource="[creative commons licence]">

Putting RDFa into web pages means you've now got a feed (the web page is the RSS feed), and a simple static web page can become an API that can be consumed in the same way as stuff from a big expensive system. 'Growing adoption'.

Government department Central Office of Information [?] is quite big on RDFa, have a number of projects with it. [I'd come across the UK Civil Service Job Service API while looking for examples for work presentations on APIs.]

RDFa allows for flexible publishing options. If you're already publishing HTML, you can add RDFa mark-up then get flexible publishing models - different departments can keep publishing data in their own way, a central website can go and request from each of them and create its own database of e.g. jobs. Decentralised way of approaching data distribution.

Can be consumed by: smarter browsers; client-side AJAX, other servers such as SearchMonkey.

He's interested where browsers can do something with it - either enhanced browsers that could e.g. store contact info in a page into your address book; or develop JavaScript libraries that can parse page and do something with it. [screen shot of jobs data in search monkey with enhanced search results]

RDFa might be going into Drupal core.

Example of putting isbn in RDFa in page, then a parser can go through the page, pull out the triples [some explanation of them as mini db?], pull back more info about the book from other APIs e.g. Amazon - full title, thumbnail of cover. e.g. pipes.

Example of FOAF - twitter account marked up in page, can pull in tweets. Could presumably pull in newer services as more things were added, without having to re-mark-up all the pages.

Example of chemist writing a blog who mentions a chemical compound in blog post, a processor can go off and retrieve more info - e.g. add icon for mouseover info - image of molecule, or link to more info.

Next plan is to link with BOSS. Can get back RDFa from search results - augment search results with RDFa from the original page.

Search Monkey (what it is and what you can do with it)
Neil Crosby (European frontend architect for search at Yahoo).

SearchMonkey is (one of) Yahoo's open search platforms (along with BOSS). Uses structured data to enhance search results. You get to change stuff on Yahoo search results page.

SearchMonkey lets you: style results for certain URL patterns; brand those results; make the results more useful for users.

[examples of sites that have done it to see how their results look in Yahoo? I thought he mentioned IMDb but it doesn't look any different - a film search that returns a wikipedia result, OTOH, does.]

Make life better for users - not just what Yahoo thinks results should be, you can say 'actually this is the important info on the page'

Three ways to do it [to change the SERP [search engine results page]: mark up data in a way that Yahoo knows about - 'just structure your data nicely'. e.g. video mark-up; enhance a result directly; make an infobar.

Infobar - doesn't change result see immediately on the page, but it opens on the page. e.g. of auto-enhanced result- playcrafter. Link to developer start page - how to mark it up, with examples, and what it all means.

User-enhanced result - Facebook profile pages are marked up with microformats - can add as friend, poke, send message, view friends, etc from the search results page. Can change the title and abstract, add image, favicon, quicklinks, key/value pairs. Create at [link I can't see but is on slides] Displayed in screen, you fill it out on a template.

Infobar - dropdown in grey bar under results. Can do a lot more, as it's hidden in the infobar and doesn't have to worry people.

Data from: microformats, RDF, XSLT, Yahoo's index, and soon, top tags from delicious.

If no machine data, can write an XSLT. 'isn't that hard'. Lots of documentation on the web.

Examples of things that have been made - a tool that exposes all the metadata known for a page. URL on slide. can install on Yahoo search page, add it in. Use location data to make a map - any page on web with metadata about locations on it - map monkey. Get qype results for anything you search for.

There's a mailing list (people willing and wanting to answer questions) and a tutorial.


Question: do you need to use a special doctype [for RDFa]?
Answer: added to spec that 'you should use this doctype' but the spec allows for RDFa to be used in situations when can't change doctype e.g. RDFa embedded in blogger blogpost. Most parsers walk the DOM rather than relying on the doctype.

Jim O'D - excited that SearchMonkey supports XSLT - if have website with correctly marked up tables, could expose those as key/value pairs?
Answer: yes. XSLT fantastic tool for when don't have data marked up - can still get to it.

Frankie - question I couldn't hear. About info out to users?
Answer: if you've built a monkey, up to you to tell people about it for the moment. Some monkeys are auto-on e.g. Facebook, wikipedia... possibly in future, if developed a monkey for a site you own, might be able to turn it auto-on in the results for all users... not sure yet if they'll do it or not.
Frankie: plan that people get monkeys they want, or go through gallery?
Answer: would be fantastic if could work out what people are using them for and suggest ones appropriate to people doing particular kinds of searches, rather than having to go to a gallery.

No comments:

Post a Comment