Emergent i18n effects in folksonomies

My series of posts on international information architecture:
  1. Translating taxonomies and categories
  2. Translating categories, translating terms
  3. Translating the Dewey Decimal Classification system
  4. Designing the relationship between content and locales
  5. Emergent i18n effects in folksonomies (this post)
  6. The Maori versus Dewey, and why limiting access can be culturally appropriate.
Folksonomies are taxonomies created by users who add tags to things. Folksonomies are messy and have a lot of problems, but their great merit is that they're scalable and they use the users' terminology by definition, a serious problem with more classic taxonomies that are created by information architects or librarians. There is a lot of innovation happening around folksonomies. One interesting area is internationalization. Users enter tags in many languages, but generally, the system does not know what language a tag is in. Have a look at a screenshot of this page from Technorati, showing popular tags: technorati tags I drew big red lines around them, so you'll probably notice some tags in other languages than English. "Algemeen - Algemein - Algemeines" means "general" in Dutch and German. "Entertainment Entretenimento" is English-Spanish for entertainment. Notice that a misspelling of "Entretenimiento" is also used. But we are talking about languages today, not spelling. "Music - Musique - Música" are English, French and Spanish. So what's going on here? People are tagging things in many languages. Right now, Technorati displays the various languages mixed together on one page. That's pretty interesting, expecially if you're interested in languages like me. But it might also be cool to see only tags in your language. Especially if you don't speak English. How can we do this? One way is to use a dictionary lookup to figure out what language a certain tag is in. It won't be perfect, but this approach could be used to display a page of popular tags in mostly German, mostly English, mostly Hindi and so on. This will reduce the amount of tags available to the user, but make them more relevant to them (because they are in their language). Again, seeing a few non-English tags won't bother you, but this is not for English speakers - the dominant language. For all the people who don't speak English, seeing tags in their language will be invaluable. If you do dictionary lookup only with popular tags, it shouldn't be too resource-expensive - a tag only has to be checked once against the dictionary, and assigned a probable language. Another way is to look at the language of the source (rss feed, ...) and assume the tag is in the same language. Tricky - I'm not sure this will work. There might be other algorythms as well - if I do a Google search for "Música", it knows this isn't English, because it asks me if I want to "Search for English results only", so there is some algorythm going on there I assume (unless they also use the dictionary approach). Later: I realized something else. Displaying tags in mostly some language, as opposed to exclusively in that language, is not necessarily a bug, it might be a feature. Many user populations around the world incorporate words in multiple languages in their vocabulary. The language namespaces I am talking about might not map perfectly to a specific language, but include words in other languages, and slang and such, and in this way be a much better representation of the real language of a certain user population than if we were to just use one language. So it's not so much about language namespaces, it's more about user population namespaces. Language is just a starting point and might be an easy way to group user populations. As an aside, I think the real innovation with folksonomies will come from creating algorythms. It's all about scalability. The way Google's superior algorythm in search made them the nr1 search engine, someone will invent superior algorythms in tagging and this may make them the nr1 tagging engine. Back to languages. The most interesting aspect of the screenshot above isn't that there are tags in other languages, is that the tags are the same in other languages. The tags in French and Spanish and such have their English translations right on the same page. This suggests that people seem to tag things similarly in different languages. Is there a way to create algorythms that take advantage of this fact? Also take into account that different people can tag the same things (pictures, bookmarks, ...), in different languages. An interesting language effect with tags was pointed out by Tanya, on this Flickr page for the tag chat. "Chat" means cat in French. Here's a screenshot: flickr tags chat So different people have used "chat" and "cat" to tag similar items, and as a result Flickr knows that "chat" and "cat" are related tags. I'm not sure what that means for internationalization of folksonomies, but as an emergent i18n effect I think that's pretty amazing. A third and similar i18n effect I found when playing around with this in Flickr is that of language namespaces. If you start following related tags in Flickr in a certain language, you will see many tags in that language. Here's a screenshot of the tag "leuk", which means "funny" in Dutch. leuk on flickr The related tags are in English, but in the see also tags we see a whole bunch of Dutch words: "ik, konijn, middelharnis, mooi, oudetonge, overflakkee, plankje". And if you follow those you'll see more Dutch words in the related and see-also tags, creating a kind of Dutch namespace almost. Again, I'm not sure how to use this exactly, but it's pretty amazing to me that, in this early stage, there are already interesting i18n effects happening in the tagging space. Comments welcome! | | | | |
# Jan 15, 2005