So far our fingerprint server identified 23 million unique tracks, from the 650 million fingerprint requests you’ve thrown at it. Who knows how many unique tracks there are out there.. We have a couple of hundred million tracks based on spelling alone – but not all of them are spelt correctly.
They have some interesting issues to deal with in cleaning up their (i.e. your data, if you're a last.fm user) data, especially when 'the most popular spelling is not necessarily the correct one'. And what about bands that change their name (but are essentially the same band) or line-up (are they still the same band?) - when do you decide to create a new identifier?
They're letting users who are logged in vote on potential corrections to an artist name, effectively testing crowdsourcing metadata corrections as well as the original data creation process. This model could work for museums - depending on the collection, some museums already get a lot of corrections when parts of their collections are published online. What would happen if we made that process transparent?