Wednesday, July 30, 2008

100 African Language Locales

ANLoc, the African Network for Localization, is undertaking a project to create Locales for 100 African languages. The following presentation provides an introduction to the initiative:


Thursday, July 3, 2008

¿Hablas español?

According to Alexa, Spanish is the second language for Wikipedia considering the amount of traffic it generates. There must be MANY people who use the Spanish Wikipedia. When you look at Betawiki, there are 20 people who indicated there wish to help with the localisation of Spanish.

¿Hablas español?

We are looking for people who speak Spanish, who are willing and able to help us with the localisation of MediaWiki into Spanish. Not only the WMF extensions (41.29% ) but also the MediaWiki core messages (91.84%) are in need of attention..

When Spanish is not your "language", you may want to check out how your language is doing..
Thanks,
     GerardM

Semantic Search Engine of African Languages

Here is a thoughtful article about the Kamusi Project from Appfrica: http://appfrica.net/blog/archives/82

The article talks in part about the blog widget we've been developing with one of our Code Africa volunteers, which people can insert in blogs and web pages to perform Kamusi lookups on their sites. The widget is not quite finished, but we put together the following brief presentation for a Barcamp event recently in Nairobi:


 


Thursday, May 8, 2008

Supporting languages that do not have localisation

Yesterday I had the privilege to present at a workshop in Milan for the ISO. The workshop discussed how ISO will continue its development in the 21th century. A whole day was filled with a mix of people inside and outside of ISO providing their point of view how the world is changing and many kinds of new technology are becoming available and relevant that have the potential to change the current practices at ISO.

Bob Sutor, the IBM vice president for Standards and Open Source opened and discussed everything from Wikis to Second Life. It was a great speech and it opened up the floor for the presenters that followed really well.

The WLDC is about languages and with Debbie's permission, she had seen my presentation ahead of time, I had included the WLDC as a way to establish that I am truly committed to do good for languages.. What we want to do in the WLDC is making a document languages and make a difference by doing this. To help us realise this, I approached Mr Sutor and asked him if IBM could be interested in giving languages a presence in the user interface provided by GNOME or KDE.

This is of a great practical importance; when you write Neapolitan for instance, you do not want an Italian spell checker telling you that what you have written is spelled is incorrectly. The localisation of software is an expensive and time consuming business, it is not realistic to expect that all languages, linguistic entities will be localised. It is however feasible to make Gnome or KDE aware of the language that is used for a document. This is the first step to ensure that this document will be tagged in its meta data appropriately to the language that is used.

I am sure that you know more great arguments why a practical application like this will be of a much bigger benefit then is immediately apparent. So please pitch in with suggestions so that we will be able to produce the proposal that Mr Sutor and IBM just cannot refuse :)
Thanks,
Gerard

Sunday, May 4, 2008

A proud moment

At the Wikimedia Foundation I have been banging the drum for the use of standards. I made some friends and enemies in that way, but the overall effect has been good. Some fights are no longer fought because the result is clear from the start.

At Betawiki, we are developing an extension for MediaWiki called Babel. The tool is to be used on the user pages indicating the self assessed skills in the languages a person knows. The texts are shown in the language itself.

When we do not have a translated text yet, we are still able to use the native name of that language courtesy of the data available in the CLDR. The standard is not complete, and I asked if it was possible to change the data in our database. I was told no. "The data belongs to a standard and, the data should be improved at source".

I do agree with this sentiment. I have written to someone active in the CLDR if there is an interest in collaboration. I am happy and proud of this turn of events. I hope that we are welcome :)
Thanks,
GerardM

Thursday, April 17, 2008

Of ancient and historical languages

According to the records at SIL the documentation for Ancient Greek (to 1453), ISO-639 code grc, has been tagged as type "Historical". This means that the language is dead. Latin lat on the other hand is considered to be "Ancient". Both Latin and Ancient Greek are still taught in schools to kids who get a classic western education.

According to the definition Latin is ancient and consequently it must have gone extinct more then a millenium ago. However, the Roman Catholic Church has continued to use Latin as its language. It maintains a dictionary of Latin modern vocabulary. Surely Latin may be old but it never went extinct.

Ancient Greek does not qualify as ancient because 1453 means less then a millenium. Ancient Greek is taught in school. Books, like the Harry Potter books are translated in Ancient Greek. As far as I understand it, there has not been a similar usage for Ancient Greek as it existed for Latin.

When you are to tag a text using the ISO-639 codes and its definitions, a modern text in Latin or Ancient Greek cannot be tagged. The first issue is that the definitions clearly limit the time when texts are to be considered in a historical or ancient language. The second issue is that in order to write a modern text neologisms are needed and/or existing words with a modern meaning are needed to express modern concepts.

When the definitions preclude the tagging of the modern expressions of Latin or Ancient Greek, it means that either a new code is needed to indicate the modern expression or the defintions of these languages are wrong.

I would argue that when a language has not seen continued use, the modern text is assigned a separate code. It is distinctly different and by tagging it as such, it may be clear to the reader of a text that the understanding of such a text does not reflect the language and the time when it was a living language. I would argue for a separate ISO-639-3 code.

My question is what do you think about this ?
Thanks,
GerardM

Monday, April 7, 2008

WLDC Conference 2008

The World Language Documentation Centre, together with Bangor University and Language Standards for Global Business, wishes to announce a major multidisciplinary conference to celebrate 2008 as the International Year of Languages

August 22-23, 2008

To be held at the Bangor University Business Management Conference Centre

This event is supported by the Welsh Assembly Government and the UK National Committee to UNESCO

The United Nations announced that 2008 would be the International Year of Languages, recognizing the importance of multilingualism in supporting international understanding. The GUM3C conference will attempt to bridge the communications gap between academia and industry, asking (and attempting to answer) such questions as:


How can industry help academia prioritize its research in the 3 Ms?

What are the developing standards, who are developing them and will they be used?

How will this generate peace, prosperity and global understanding?

Mor info, details on submission of papers or workshops as well as conference registration can be obtained from http://www.gum3c.org