Tweetably stated, Zukmo lets you save and search for content in almost any language henceforth. To further qualify “almost any language”, we should consider that there are a total of nearly 6900 languages in the world and that 94% of the people in the world account for the usage of only 6% of that total (or if you prefer alternatively, 94% of the languages are used by only 6% of the people !). Our initial focus naturally was on that which is represented online among atleast the popular 6% of the language pie.
With the Zukmo concept being construed as a solution for digital information explosion, we were covertly challenging ourselves to explore and introduce support that would be basically needed for handling content in languages other than English. After all, there are enough telling stats about how non English websites have also been growing astronomically and usurping their share of the overall information overload.
Check Netcraft.com stats and they reveal that there are a total of 298 million+ websites as of the end of March 2011. Ever wondered about whats the split up of websites by the content language ? Well, check out the curtailed graphic below from a stats gathering website (w3techs.com) which assessed that 57% of all the sites worldwide are in English and the rest in other languages.
The above is only a partial list and if you are interested, you can assimilate in depth the entire list of languages and the sampling methodology used here
Given this, the back burners of our collective minds got fired up to meet this challenge and we now have devised a way to save, search see & savor multilingual content within “My Zukmo”. An important rider is that the character encoding of the web content being bookmarked is expected to be in Unicode (UTF-8) though our informal checks revealed that it may work even if the character encoding is Western (ISO-8859-1) as well. Also, the search in languages other than English can be conducted on full words only and there is no stemming support. Though we are far from proclaiming Zukmo to be entirely multilingual, this upgrade portends an international launch of sorts for us. And with that, we invite you to enjoy and hopefully benefit from Zukmo next time you check out a website in your native tongue. An interesting side note is that W3Techs.com survey logs nearly 62% of the websites world wide to be UTF-8 encoded and this figure is clearly climbing monthly (ISO-8859-1 is second placed at 21% and dropping). This stat alone should be indicative of the Zukmo coverage.
We will also post a video soon to show that when you add content from supported languages to your Zukmo basket, it will be both viewable and searchable.
Read further if you curious about the number of languages supported by Unicode. The following is a quote from the Unicode.org site.
“It’s hard to say, because Unicode encodes scripts for languages, rather than languages per se. Many scripts (especially the Latin script) are used to write a large number of languages. The easiest answer is that Unicode covers all of the languages that can be written in the following scripts: Latin, Greek, Cyrillic, Armenian, Hebrew, Arabic, Syriac, Thaana, Devanagari, Bengali, Gurmukhi, Oriya, Tamil, Telugu, Kannada, Malayalam, Sinhala, Thai, Lao, Tibetan, Myanmar, Georgian, Hangul, Ethiopic, Cherokee, Canadian Aboriginal Syllabics, Khmer, Mongolian, Han (Japanese, Chinese, Korean ideographs), Hiragana, Katakana, and Yi….”
Just taking Latin script as an example, the so-called ‘standard Latin’ character sets support all the major West European languages including English, French, German, Spanish, Italian, Portuguese, Dutch, Danish, Norwegian and Swedish. The ‘Extended-Latin’ character sets include Czech, Croatian, Hungarian, Polish, Romanian, Slovak, Slovene, Turkish, Latvian, Lithuanian, Lappish, Welsh, Maltese, Vietnamese, as well as a host of African, Polynesian and other languages.
Here is the entire list of Languages and Scripts from the Unicode site.
Leave A Comment