Friday, October 16, 2009

Wikipedia uses the CLDR data

I travelled with Siebrand the other day and I learned that in order to provide plural support at, he uses the information in the CLDR to know what languages need plural support and in what way.

The amazing thing was that for some languages the plural support in MediaWiki is different from the one indicated by the standard. There are also a number of languages where the CLDR did not have information about their plural support.

It is vital that the CLDR and MediaWiki agree on how to provide plural support for languages. The CLDR is the standard and should be complete and correct because it exists for any application.

Wednesday, September 9, 2009

African Locales: completion deadline October 1

The African Network for Localization (ANLoc) is seeking immediate help to create Locales for 100 African languages. You can view a description of the project at

You can help in one of three ways:
-> volunteer to work on a locale yourself (the project will help you every step of the way!)
-> play matchmaker - introduce someone who can volunteer for their language
-> spread the word - pass along this message to your networks, so that we increase the chances of finding volunteers for many different languages

THIS YEAR'S DEADLINE to get new languages into the CLDR (Common Locales Data Repository), the international system used to produce all major software on the planet, is OCTOBER 1. So, we need to connect with people who speak languages from all over Africa. And, we need to complete each locale THIS MONTH.

The full list of languages currently in the project is at . If your favorite language shows any red in any of the bars next to it, please volunteer to help complete the locale!

It's easy to volunteer - just send an email to

The interface to build a locale in your favorite African language is available in English, French, and Swahili. Building a locale only takes a couple of hours. Please tell your friends, tell your colleagues, tell your networks!

A quick, true story - one Friday last month, someone in Nairobi took a couple of minutes to provide an introduction between the Locales project and a colleague of theirs working on the Kreole Morisyen language of Mauritius. A few emails were exchanged, and by Monday the Morisyen locale was 90% finished. By the end of that week, the locale was complete. On October 1, this locale will be submitted to CLDR. By early next year, Morisyen will be forevermore part of the universe of languages available for information technology development.

It just takes one person and a couple of hours to finish a locale for a language, but it takes a lot of villagers on the web to find that one person. Thanks in advance for volunteering, for introducing contacts, and/or for passing along this message!

Thursday, April 16, 2009

Africa helping itself on the Internet

In December I blogged about the Afrigen project. In this project people are asked to add CLDR information for their language. Now after some months there are results and, I am impressed. Many languages have made a start and the first languages have completed all the information that is looked for in this standard.

In my opinion having quality information in the "Common Locale Data Repository" is a litmus test for readiness of a language for the Internet. The Afrigen project makes completed data available in their subversion.

The CLDR itself distinguishes levels of CLDR support; this includes how lists are sorted, how numbers are written and how a few languages are called. For this project to insist on a complete set of data takes courage but is in my opinion the right thing to do.

There are people who say that a language is on the map when it has its own Wikipedia, in my opinion a complete set of CLDR data has a much wider application.

Monday, January 26, 2009

Unintended consequences

The fiu-vro Wikipedia is a language in the Võro language. People applied for an IS)-639-3 code recently, and this request was granted; the Võro language is now known under the vro code. This has changed the status of this project considerably. Where it used to be a project that existed because "things happened in those days", the language complies with all the requirements for a new project. We have started the process of renaming the message file for this project and, we have requested the rename of the project.

There is one glitch. The Estonian Wikipedia is known as The ISO-639-1 et code is connected to the ISO-639-3 est code, and this just became a macro language. Standard Estonian has been given its own code of ekk.

It is quite clear that technically it would be preferable to rename the Estonian Wikipedia. It can be done, this will be demonstrated with the rename of the Võro Wikipedia. From a community perspective it is not so clear cut. People are conservative, they do not like change and there are a lot of references out their to the Estonian Wikipedia.

For the Võro community, it is a badge of pride to have their own ISO-639-3 code. For the Estonian community it is a nuisance.

Wednesday, October 8, 2008

African Language Locales: Call for Volunteers

ANLoc, the African Network for Localization, has started an initiative
to build locales for over 100 African languages. The project is now
ready to line up volunteers and get to work!

The main requirements for a volunteer are:

1) literate in the target language
2) comfortable using computers
3) can volunteer about 1 or 2 hours
4) finishes what they start

If you are willing and able to help - or if you know anyone who might be, or can contact any networks that might include potential volunteers - please look through these lists of languages. Contact if you can work on any language that does not yet have a volunteer:

* West Africa
* Nigeria
* Central Africa
* Tanzania and Indian Ocean
* Great Lakes and Kenya
* Horn of Africa
* Southern Africa

(Note that we do NOT need volunteers for South Africa, because those
languages already have good locales.)

You can also help by letting your colleagues from other African language
communities know about the project.

We want to build all of these locales in a few months, so please let me
know quickly if you can help out!

For more details about the project, please view this presentation:

Wednesday, August 27, 2008

The GUM3C conference in Bangor

The GUM3C conference in Bangor has come and gone. Those who were there, were presented with an exquisite set of presentations. The ambiance was lovely and many of the conversations were thought provoking. This conference was in association with the UNESCO and consequently subjects like sign languages, minority languages and support for people who do not speak a dominant language featured prominently.

One of the discussions was a follow on of the presentation by David Crystal, who among other things presented about his work in the advertisement industry. There were a substantial number of people that agreed that advertisments in minority languages give a language an economic underpinning. Consider, the majority of the trade by Welsh companies is in Wales and people respond more favourable to advertisement that target them.

A presentation by Gwerfyl Roberts was thought provoking. As a practioner in this field she told us that people who do not speak and read the dominant language well, will get substandard medical treatment. Gwerfyl is working hard to improve the situation for the people for whom Welsh is their first language, but she agreed that people from the Indian subcontinent suffer from the same problem. Having the inserts for medicines available in as many languages as possible would be one problem to the solution. Providing terminological support as is currently provided by Wikiprofessional is another.

Sign languages and particularly SignWriting is dear to my heart. I could not be more pleased to have a presentation about sign language in India. Michael Morgan presented on how a university for the deaf is being set up, he explained about the problems that exist in India.. One anecdote was about people texting, not coming to a conclusion and in the end, people traveled two days to come to talk for five minutes.. they had to travel back as well. Chris Cox presented about the efforts that have been put into introducing SignWriting into Britain and Ireland.

When you are interested in all the goodies that you missed, you will be pleased to learn that the proceedings of the GUM3C are already published  ISBN No. 978-1-84220-115-2 and that the presentations will be posted on the GUM3C website..

Wednesday, August 13, 2008

Farsi, is it a macro language ?

According to the ISO-639-3, Farsi is a macro language. From my position it is a clear case as the standard says so, it is likely to be so. Farsi is divided in two, Western Farsi and Eastern Farsi. Western Farsi is spoken primarily in Iran and Eastern Farsi in Afghanistan and Pakistan.

The problem I have is that several people I respect, independently inform me that in their opinion this division is wrong. Farsi is said to be understood by all. Raising this question is for me about something practical. In this case it is about a request for a

Let me be clear, I am all in favour of such a project but I do not want to continue an ambiguity about the language. The practical question is, to what extend is it justified to consider Farsi and Dari as separate languages. When they are indeed to be considered separate languages, how different are they. Can it be compared in a similar way as South African and Dutch?

Please share your thoughts ...