Step 1: ANSEL to Unicode (old)

This first step is the most important. Unfortunately the steps to create such a mapping table are most difficult. The main source is a speccharlatin.html conversion table, which I found on the web (thanks to Mike Kay). MARC seems to be character set widely used in american computerized libraries. ANSEL appears to be a subset of this USMARC. UCS-2 is just another name for Unicode.
But this conversion table is just the starting point. In order to check the reliability of each conversion the apperence and the character names have to be checked. To check the appearence, it is important to know, how this ANSEL characters look like. They are shown in the classical GEDCOM 5.5 specification as well as in an updated one (which contains more ANSEL characters!!). If you go to the ANSEL appendix in these documents you can see the ANSEL characters and most of them are WRONG!
The reason is, that they are using my computers font capabilites to display them. Although I'm running a western (latin-1 or CP1252) computer, the envoy and the html version of the GEDCOM spcification display wrong charcters. One more reason to hate code pages and to hope for the success of Unicode! The only way to get correct character views is to use WordPerfect 5.1 and type in the WerdPerfect codes given in both documents. You can not use WordPerfect 6 for it! I guess the internal code have been changed!! Another reason to go to Unicode. If you do not have WordPerfect 5.1 any more you can view the result here. The appearence of the Unicode code points is much easier to get: Just visit the Unicode home page and click on "character charts". A similar procedure is done with the character names. The result is the following ANSEL to Unicode conversion table. A more computer readable form you can find here.

Last modification: 2001-03-19
Back