Step 1: ANSEL to Unicode (old)
This first step is the most important. Unfortunately the steps to
create such a mapping table are most difficult. The main source is a
speccharlatin.html
conversion table, which I found on the web (thanks to Mike Kay).
MARC seems to be character set widely used in american computerized libraries.
ANSEL appears to be a subset of this USMARC. UCS-2 is just another name
for Unicode.
But this conversion table is just the starting point. In order to check
the reliability of each conversion the apperence and the character names
have to be checked. To check the appearence, it is important to know, how
this ANSEL characters look like. They are shown in the classical
GEDCOM 5.5
specification as well as in an
updated one
(which contains more ANSEL characters!!). If you go to the ANSEL appendix
in these documents you can see the ANSEL characters and most of
them are WRONG!
The reason is, that they are using my computers font capabilites to display
them. Although I'm running a western (latin-1 or CP1252) computer, the
envoy and the html version of the GEDCOM spcification display wrong charcters.
One more reason to hate code pages and to hope for the success of Unicode!
The only way to get correct character views is to use WordPerfect 5.1 and
type in the WerdPerfect codes given in both documents. You can not use
WordPerfect 6 for it! I guess the internal code have been changed!! Another
reason to go to Unicode. If you do not have WordPerfect 5.1 any more you
can view the result here. The appearence of the
Unicode code points is much easier to get: Just visit the
Unicode home page and click on
"character charts".
A similar procedure is done with the character names. The result is the
following
ANSEL to Unicode conversion table. A more computer
readable form you can find here.
Last modification: 2001-03-19
Back