ANSEL to Unicode Conversion Table

Do not use it; Read the explanation !!!

Despite an intensive search on the web I could not find a full ANSEL specification. Therefore I tried a guess from some sources available in the web. They are all given below. Download them and decide for yourself whether this table is correct or not. Its not an official posting! Use it at your own risk!!

ANSELANSEL NameOUnicodeUnicode NameVNSCR
A1slash l - uppercase+0141latin capital letter L with stroke+ +  
A2slash o - uppercase+00D8latin capital letter O with stroke+ +  
A3slash d - uppercase+0110latin capital letter D with stroke+ o 1
A4thorn - uppercase+00DElatin capital letter thorn+++  
A5ligature ae - uppercase+00C6latin capital letter AE+++  
A6ligature oe - uppercase+0152latin capital ligature OE+++  
A7miagkii znak+02B9modified letter prime+++ 2
A8middle dot+00B7middle dot+++  
A9musical flat 266Dmusic flat sign+++  
AApatent mark 00AEregistered sign+ +  
ABplus-or-minus 00B1plus-minus sign+++  
AChook o - uppercase 01A0latin capital letter O with horn+ +  
ADhook u - uppercase 01AFlatin capital letter U with horn+ +  
AEalif+02BCmodifier letter apostrophe?-- 3
B0ayn+02BBmodifier letter turned comma?-- 4
B1slash l - lowercase+0142latin small letter L with stroke+ +  
B2slash o - lowercase+00F8latin small letter O with stroke+ +  
B3slash d - lowercase+0111latin small letter D with stroke+ + 5
B4thorn - lowercase+00FElatin small letter thorn+++  
B5ligature ae - lowercase+00E6latin small letter AE+++  
B6ligature oe - lowercase+0153latin small ligature OE+++  
B7hard sign (tverdyi znak) 02BAmodified letter double prime+++ 6
B8dotless i - lowercase+0131latin small letter dotless i+++  
B9british pound+00A3pound sign+++  
BAeth+00F0latin small letter eth+++ 5
BChook o - lowercase 01A1latin small letter O with horn+ +  
BDhook u - lowercase 01B0latin small letter U with horn+ +  
C0degree sign 00B0degree sign+++  
C1script l 2113script small L+++  
C2phonograph copyright mark 2117sound recording copyright+ +  
C3copyright symbol+00A9copyright sign+++  
C4musical sharp 266Fmusic sharp sign+++  
C5inverted question mark+00BFinverted question mark+++  
C6inverted exclamation mark+00A1inverted exclamation mark+++  
CFes zet+00DFlatin small letter sharp S+-+ 7
E0low rising tone mark 0309combining hook above+-o+ 
E1grave accent+0300combining grave accent++++ 
E2acute accent+0301combining acute accent++++ 
E3circumflex accent+0302combining circumflex accent++++ 
E4tilde+0303combining tilde++++ 
E5macron+0304combining macron++++ 
E6breve+0306combining breve++++ 
E7dot above+0307combining dot above++++ 
E8umlaut (dieresis)+0308combining diaeresis++++ 
E9hacek (caron)+030Ccombining caron++++8
EAcircle above (angstrom)+030Acombining ring above+ ++ 
EBligature, left half FE20combining ligature left half?++- 
ECligature, right half FE21combining ligature right half?++- 
EDhigh comma, off center+0315combining comma above right+ +- 
EEdouble acute accent+030Bcombining double acute accent++++ 
EFcandrabindu 0310combining candrabindu+++- 
F0cedilla+0327combining cedilla++++ 
F1right hook+0328combining ogonek+-o+ 
F2dot below 0323combining dot below++++ 
F3double dot below 0324combining diaeresis below+ ++ 
F4circle below 0325combining ring below+ ++ 
F5double underscore 0333combining double low line+ +- 
F6underscore+0332combining low line? o?9
F7left hook+0326combining comma below? o-10
F8right cedilla 031Ccombining left half ring below+-o-11
F9half circle below 032Ecombining breve below+-o+ 
FAdouble tilde, left half FE22combining double tilde left half+++- 
FBdouble tilde, right half FE23combining double tilde right half+++- 
FEhigh comma, centered+0313combining comma above+ +- 

These data are summarized in a computer readable file which can be found here.


1st Column

ANSEL code value (hexadecimal). Values below 80 are identical to the corresponding ASCII value. All values are taken from a GEDCOM 5.5 description, which appear to have an updated ANSEL appendix. Values not listed here are either undefined or have a LDS-specific meaning.

2nd Column

Character name according to the updated ANSEL specification

3rd Column

O = Old ANSEL appendix. '+': this character appears also in the standard GEDCOM 5.5 description.

4th Column

Unicode code point (in hex). All values except one (CF->00DF) are taken from speccharlatin.html (see 4th column).

5th Column

Unicode Name is the name of this Unicode code point which is taken from the Unicode web-page.

6th Column

V = Visible comparison between the ANSEL character as published in ANSEL (ANSI Z39.47-1993) and the unicode character as published at the Unicode web-page.
'+': good matching
'?': slight differences
'-': significant differences

7th Column

N = Name comparison: Comparison of the ANSEL character name (according to the GEDCOM specification) in column 2 and the Unicode code point name (according to the Unicode web page) in column 5.
'+': identical or good matching
'-': no matching
nothing: decide yourself

8th Column

S = Summary: My opinion of the reliability of this row. A combination of the rows 6, 7 and 10.
'+': very reliable
'o': probalbly OK
'-': questionable

9th Column

C = Combined character used. This entry is impotant for building parsers. According to this table the ANSEL code sequence E2 41 has to be converted to: 0301 0041 (accute accent + latin capital letter A), which will be displayed as Á.
'+': the name of the character in this row appears also as a part of the name of other code points: in this example 00C1 (latin capital letter A with accute) displayed as Á (if this character exists within the installed font)
'-' the name does not exist as a part of a name of an european letter (at least not in the name list published at at the end of 1997)
nothing: does not apply (not a combining character)

10th Column

R = Remarks:
1 The codepoint 00D0 (capital eth) has the same appearence. Another MARC to USC conversion table (DP73.DOC, found on this interesting page) maps A3 to D0. The mapping to 110 appears to be more logical (see comment 3).
2 The comment to unicode code point 02B9 says: transliteration of mjagkij znak (Cyrillic soft sign: palatalization). Therefore the name matches.
3 From the visual comparison (see column V) a mapping to 02BE or 0027 is also possible.
4 From the visual comparison (see column V) a mapping to 02BD is also possible.
5 DP73.DOC (see remark 1) maps B3 to 00F0 (small eth), which is not identical due to the visual comparison (see column 7) and collides with BA. The mapping to 0111 appears to be more logical.
6 The comment to unicode code point 02BA says: transliteration of tverdyj znak (Cyrillic hard sign: no palatalization). Therefore the name matches.
7 Not given in ANSEL (ANSI Z39.47-1993). This is probably a GEDCOM extention of ANSEL. Therefore it is kept here.
8 The comment to unicode code point 030C says: = hacek, V above.
9 The comment to unicode code point 0332 says: = underline, underscore. From the visual comparison a mapping to 0331 (COMBINING MACRON BELOW) is also possible. Furthermore: LOW LINE does not exist in the name of combined characters, but LINE BELOW does. There is no code point named COMBINED LINE BELOW. Therefore I guess that LOW LINE and LINE BELOW are two names for the same character.
10 ANSEL (ANSI Z39.47-1993), table B1: left hook .... Latvian, Romanian. The comment to unicode code point 0332 says: Romanian, Latvian, Livonian.
11 0321 might also be possible

Last modification: 2007-01-16