MavEtJu's Distorted View of the World

Localisation in the FreeBSD operating system: share/{monet,msg,numeric,time}def

Posted on 2009-09-08 17:00:00
Tags: FreeBSD, Localisation

(This text was originally posted to the freebsd-i18n@FreeBSD.org mailinglist)

In the last couple of months I've spend some time with the data in the share/{monet,msg,numeric,time}def directories and the data from the CLDR (Common Locale Data Repository) project.

The biggest issues with the way the current data in the *def directories is maintained is that it is partly high-ascii (specially for the non-US-ASCII and non-ISO8859-{1,2,15} character maps) and partly un-synchronized between the different character maps for the same locale.

The first approach was to see if I could transform the data from the CLDR project into the format the FreeBSD project wanted to have it. It taught me a lot about the data stored in the CLDR project, but also that it isn't compatible enough to do it automatic.

The second approach, still happening now, is going much better: Instead of storing the high-ascii and multiple charactermap translations in the SCM, we have per locale one file with a proper definition of the words and syntax used, which gets converted into UTF-8 and which then gets transformed to the required charactermaps.

For example, the file share/msgdef/nl_NL.unicode:

# yesexpr
^[<LATIN SMALL LETTER J><LATIN CAPITAL LETTER J><LATIN SMALL LETTER Y><LATIN CAPITAL LETTER Y>].*
# noexpr
^[<LATIN SMALL LETTER N><LATIN CAPITAL LETTER N>].*
# EOF

gets converted into nl_NL.UTF-8:

# yesexpr
^[jJyY].*
# noexpr
^[nN].*
# EOF

and gets transformed into its ISO8859-1 and ISO8859-15 equivalents. Since this is low-ascii it is a boring example, but the idea is there.

What are currently show-stoppers?

These two show-stoppers right now cause that we will get a lot more data in the SCM system than what we have right now until they are resolved. The first one should not be difficult, the second one is with somebody who understands it :-)

So the advantages, when everything is ready:

Once this part is working properly (and to others people satisfaction) we can update the contents with information from third party sources like the CLDR. But that is still a long time away for now.

| Share on Facebook | Share on Twitter
Comments: No comments yet
Leave a comment
Back to the main page