Showing posts with label Wikipedia. Show all posts
Showing posts with label Wikipedia. Show all posts

Friday, October 16, 2009

Wikipedia uses the CLDR data

I travelled with Siebrand the other day and I learned that in order to provide plural support at translatewiki.net, he uses the information in the CLDR to know what languages need plural support and in what way.

The amazing thing was that for some languages the plural support in MediaWiki is different from the one indicated by the standard. There are also a number of languages where the CLDR did not have information about their plural support.

It is vital that the CLDR and MediaWiki agree on how to provide plural support for languages. The CLDR is the standard and should be complete and correct because it exists for any application.
Thanks,
      GerardM

Thursday, April 16, 2009

Africa helping itself on the Internet





In December I blogged about the Afrigen project. In this project people are asked to add CLDR information for their language. Now after some months there are results and, I am impressed. Many languages have made a start and the first languages have completed all the information that is looked for in this standard.

In my opinion having quality information in the "Common Locale Data Repository" is a litmus test for readiness of a language for the Internet. The Afrigen project makes completed data available in their subversion.

The CLDR itself distinguishes levels of CLDR support; this includes how lists are sorted, how numbers are written and how a few languages are called. For this project to insist on a complete set of data takes courage but is in my opinion the right thing to do.

There are people who say that a language is on the map when it has its own Wikipedia, in my opinion a complete set of CLDR data has a much wider application.
Thanks,
       GerardM

Monday, January 26, 2009

Unintended consequences

The fiu-vro Wikipedia is a language in the Võro language. People applied for an IS)-639-3 code recently, and this request was granted; the Võro language is now known under the vro code. This has changed the status of this project considerably. Where it used to be a project that existed because "things happened in those days", the language complies with all the requirements for a new project. We have started the process of renaming the message file for this project and, we have requested the rename of the project.

There is one glitch. The Estonian Wikipedia is known as et.wikipedia.org. The ISO-639-1 et code is connected to the ISO-639-3 est code, and this just became a macro language. Standard Estonian has been given its own code of ekk.

It is quite clear that technically it would be preferable to rename the Estonian Wikipedia. It can be done, this will be demonstrated with the rename of the Võro Wikipedia. From a community perspective it is not so clear cut. People are conservative, they do not like change and there are a lot of references out their to the Estonian Wikipedia.

For the Võro community, it is a badge of pride to have their own ISO-639-3 code. For the Estonian community it is a nuisance.
Thanks,
     Gerard

Thursday, July 3, 2008

¿Hablas español?

According to Alexa, Spanish is the second language for Wikipedia considering the amount of traffic it generates. There must be MANY people who use the Spanish Wikipedia. When you look at Betawiki, there are 20 people who indicated there wish to help with the localisation of Spanish.

¿Hablas español?

We are looking for people who speak Spanish, who are willing and able to help us with the localisation of MediaWiki into Spanish. Not only the WMF extensions (41.29% ) but also the MediaWiki core messages (91.84%) are in need of attention..

When Spanish is not your "language", you may want to check out how your language is doing..
Thanks,
     GerardM

Thursday, January 24, 2008

Localizing MediaWiki: a translator's perspective

The push is on to localize the core 500 MediaWiki messages for numerous languages. The language committee responsible for the creation of new Wikipedia projects has made the sensible decision that a language cannot have its own Wikipedia unless and until the core user interface is available in that language. As a big side benefit, MediaWiki is used to run thousands of other sites, so a single localization effort can catalyze all sorts of projects in a given language.

The Swahili Wikipedia has about 6500 articles, but only about 100 MediaWiki messages had been translated. The schizophrenic Swanglish interface produces gems like this: "Ficha logged-in users | Fichua my edits." It was beyond time to bring sw.wikipedia up to standard, so I decided to give it a go. In the interests of letting localizers for other languages know what they are getting into, here is a brief report on the process:

1) First, you need to get yourself approved on Betawiki, the site that oversees all the translations. You will need to create an account and join the language project. Thankfully, the people on the site are friendly, helpful, and fast.

2) You will need to download the files (unless you want to do all the work online, which is NOT recommended) and use a special translation assistance tool such as the free POedit or OmegaT. To download, go to the Translate page and select the following options:
  • I want to: Export translation in Gettext format

  • Group: MediaWiki messages (most used)

  • Language: [select your language]

  • Limit: 500 messages per page

Click "Fetch" and you will end up with a long, messy-looking document. Copy and paste that into a text editor like TextPad, save it in standard text format with a .po extension (the filename should be something like mylanguage.po), open it with your translation software, and voila!

Actually, not so fast. It took me a painful half hour or more to figure out that the file had to be saved in "UTF-8" rather than "Unicode," before I could actually open the file with POedit.

3) Let the games begin! At first, the translation goes very quickly, as you breeze through terms like "January" and "Comment." However, you soon start hitting challenging terms. The less of a computer presence your language already has, the more of a challenge the terms will be. "View source." "Full resolution." "Metadata." "Disambiguation pages." And long chunks like this:
This page is currently protected because it is included in the following {{PLURAL:$1|page, which has|pages, which have}} cascading protection turned on. You can change this page's protection level, but it will not affect the cascading protection.

Frustratingly, more than half of the translation strings do not have any accompanying explanations. You may have to click through Wikipedia special pages looking for an instance of the term, in order to figure out what is being talked about. Even a simple term like "block" that does not have an explanatory note becomes needlessly difficult; I used the word for "a block of text" until a later entry made it clear that the sense called for was "prevent access."

Messages that have code elements such as $1 (meaning that some text or number will be inserted in that position by the software) should especially have explanatory text, since the content of "$1" often makes a big difference in the words you use and the order you place your text and code elements. For example, "$1 logged-in users" could either render as "50 logged-in users," in which case the Swahili would be "Watumiaji $1 sasa," or "Show/hide logged-in users," in which case the Swahili should be "$1 watumiaji sasa."

The final frustration is that many of the messages - naturally, the ones you put off for last - are extremely long. One has to wonder if messages like this are crucial to establishing the core functionality of MediaWiki in any language:
Using the form below will rename a page, moving all\n
of its history to the new name.\n
The old title will become a redirect page to the new title.\n
Links to the old page title will not be changed; be sure to\n
check for double or broken redirects.\n
You are responsible for making sure that links continue to\n
point where they are supposed to go.\n
\n
Note that the page will '''not''' be moved if there is already\n
a page at the new title, unless it is empty or a redirect and has no\n
past edit history. This means that you can rename a page back to where\n
it was just renamed from if you make a mistake, and you cannot overwrite\n
an existing page.\n
\n
WARNING!\n
This can be a drastic and unexpected change for a popular page;\n
please be sure you understand the consequences of this before\n
proceeding.

The warning that should be posted is that the project will take a lot longer than you expect, and won't be nearly as straightforward as advertised.

Nonetheless, the draft Swahili translation is complete, after probably 16 hours of work. At the moment it is being reviewed in Kenya, and we will upload it as soon as we finish refining it. Meanwhile, a few observations are in order:

  • It really helps to have a good familiarity with computer terminology in general and Wikipedia in particular before starting to translate MediaWiki. If you don't know about "Watchlists" or "RSS Feeds," you will face challenges beyond your normal translation project.

  • You will benefit greatly from a working technical glossary and good dictionaries for your language. Fortunately, Swahili has had a number of successful localization projects (OpenOffice, Google, Microsoft Windows, and others), so I had a lot of resources to consult and an online Swahili dictionary that contains many IT terms and to which I could add new ones. This will not be the case for most other African or minority languages.

  • This project should not be done alone. Ideally you will have two people who are both knowledgeable about computers, both fluent in English and the project language, but one of whom is a native English speaker and one a native speaker of the language in question. Alternately, have someone on standby who can explain any problematic terms. I was lucky to work with Arthur Buliva, a Kenyan computer programmer, as we pitched translations back and forth over instant messenger.

  • A big challenge is that you cannot simply coin terms where none exist in your project language. Your users will need to understand the meanings behind the messages, usually without reference to a dictionary. If your language does not have a word for "subcategories" or "namespace" - well, welcome to the wonderful world of software localization!

I do not want to scare anyone away from trying to localize MediaWiki in their language, but I do want to paint a realistic picture of the task in front of you. It is not fast, and for most languages it is not easy, but the outcome - truly useful software in your language - will be its own reward.