Tuesday, 29 July 2008

One step closer to intelligent searching?

The BBC have a story on a new search engine, Search site aims to rival Google:
Called Cuil [pronounced 'cool'], from the Gaelic for knowledge and hazel, its founders claim it does a better and more comprehensive job of indexing information online.

The technology it uses to index the web can understand the context surrounding each page and the concepts driving search requests, say the founders.

But analysts believe the new search engine, like many others, will struggle to match and defeat Google.

...

Instead of just looking at the number and quality of links to and from a webpage as Google's technology does, Cuil attempts to understand more about the information on a page and the terms people use to search. Results are displayed in a magazine format rather than a list.
From the Cuil FAQ:
So Cuil searches the Web for pages with your keywords and then we analyze the rest of the text on those pages. This tells us that the same word has several different meanings in different contexts. Are you looking for jaguar the cat, the car or the operating system?

We sort out all those different contexts so that you don't have to waste time rephrasing your query when you get the wrong result.

Different ideas are separated into tabs; we add images and roll-over definitions for each page and then make suggestions as to how you might refine your search. We use columns so you can see more results on one page.
They also provide 'drill-downs' on the results page.
Cuil will direct you to this additional information. By looking at these suggestions, you may discover search data, concepts, or related areas of interest that you hadn’t expected. This is particularly useful when you are researching a subject you don't know much about and aren't sure how to compose the "right" query to find the information you need.
I haven't used it enough to work out exactly how it differentiates concepts (tabs) and 'additional information' (drill-downs/categories).

It does a good job on something like the Cutty Sark. Under 'Explore by Category' it offered:
  • Buildings And Structures In Greenwich
  • Sailboat Names
  • Museums In London
  • Neighbourhoods Of Greenwich
  • School Ships
It picked up search results for Cutty Sark whisky and news of the Cutty Sark fire but they weren't reflected in the categories, and the search term didn't trigger the tabs. The tabs kick in when you search for something like 'orange'.

It didn't do as well with 'samian ware' - the categories picked up all sorts of places and peoples, (and randomly 'American Films'), but while the search results all say that it's 'a kind of bright red Roman pottery' that's not reflected in the categories. Fair enough, there may not be enough information easily available online so that 'Types of Roman pottery' registers as a category.

Incidentally, most of the results listed for 'samian ware' are just recycled entries from Wikipedia. It's a shame the results aren't filtered to remove entries that have just duplicated Wikipedia text. The FAQ says they don't index duplicate content I guess the overall site or page is just different enough to be retained.

It might take a while for museum content to appear in the most useful ways, but it looks like it might be a useful search engine for niche content. From the FAQ again:
We've found that a lot of Web pages have been designed with a small audience in mind—perhaps they are blogs or academic papers with specific interests or pages with family photos. We think that even though these pages aren't necessarily for a wide audience, they contain content that one day you might need.

Our job is to index all these pages and examine their content for relevancy to your search. If they contain information you need, then they should be available to you.
It's all sounding a bit semantic web-ish (and quite a bit 'reacting to Google-ish') and I'll use it for a while to see how it compared to Google. The webmaster information doesn't give any indication of how you could mark up content so the relationships between terms in different contexts is clear, but I guess nice semantic markup would help.

Refreshingly, it doesn't retain search info - privacy is one of their big differentiators from Google.

3 comments:

  1. We were all very amused today to find that it cannot even find itself!

    A cuil search for cuil.

    Heh heh

    ReplyDelete
  2. I'm not sure how this alternative this is to Google, and whether it really represents a further step towards 'intelligent search'. It's not doing anything much different to Google, it just has a different approach to the UI.

    Google already employs all sorts of [artificial] 'intellegence', eg recognising place names (and displaying maps), news stories, video titles, and so on.

    Google even already does the kind of disambiguation that cuil has modelled itself on. For instance their blog post on personalised search reveals how over time that can serve you different results for 'dolphin' depending on whether you're interested in the animal or a football team.

    The difference is that Google doesn't require the user to have to make a decision about 'which type of dolphin did I mean' - instead they try to make an intelligent guess. 'Don't make me think!'.

    Additionally, Cuil is shooting itself in its foot by not storing search log data, as that is THE KEY information that any search provider needs in order to be able to analyse and improve the service. All institutions (including yours) should be doing this.

    P.S Your link to the privacy page has a rogue space at the end of it (encoded as %20) which stop it working.

    ReplyDelete
  3. Frankie the contrarian! Sometimes people *like* to think, and Google isn't good enough for people not to have to think.

    How many times have you answered a question for someone with a Google query, and asked them why they didn't try Google first, only to hear that they weren't able to deconstruct their query into the magic phrase that would give them the right results? I'd like to say the difference is that I'm smarter than those other searchers, but it's more likely that I've learnt to think a bit like a search engine.

    Google doesn't always find the results and link popularity can be a bit too close to 'lowest common denominator' for me. I need so it's important to have alternatives. And I love the possibility of something like the drill-downs for people who are searching for content in areas where they aren't domain experts, or would benefit from serendipity: "By looking at these suggestions, you may discover search data, concepts, or related areas of interest that you hadn’t expected. This is particularly useful when you are researching a subject you don't know much about and aren't sure how to compose the "right" query to find the information you need."

    Cuil seems to be able to find itself now, so it's obviously learning quickly. (Except that it doesn't find this blog with a search for my name, pah).

    And the privacy thing is important (thanks to Frankie for the link):
    http://www.oblomovka.com/wp/2008/07/30/following-the-referers-to-the-edge/
    "More sceptically, I do marvel how much we currently depend on the fair-weather compliance of others to preserve our privacy and our liberty — both from corporations and from individuals."
    ...
    "Business practice now determines later what courts and intrusive governments imagine is "reasonable" to obtain."

    ReplyDelete