ANSEL to Unicode Convertion Table

Do not use it; Read the explanation !!!

Despite an intensive search in the web I could not find a full ANSEL specification. Therefore I tried a guess from some sources availible in the web. They are all given below. Download them and decide for yourself wether this table is correct or not. Its not an official posting! Use it on your own risk!!

ANSELANSEL NameOMUnicodeUnicode NameVNSCR
A1slash l - uppercase+?0141latin capital letter L with stroke+ +  
A2slash o - uppercase+?00D8latin capital letter O with stroke+ +  
A3slash d - uppercase++0110latin capital letter D with stroke+ + 1
A4thorn - uppercase++00DElatin capital letter thorn+++  
A5ligature ae - uppercase++00C6latin capital letter AE+++  
A6ligature oe - uppercase++0152latin capital ligature OE+++  
A7miagkii znak+-02B9modified letter prime+-o  
A8middle dot++00B7middle dot+++  
A9musical flat +266Dmusic flat sign+++  
AApatent mark +00AEregistered sign+ +  
ABplus-or-minus +00B1plus-minus sign+++  
AChook o - uppercase +01A0latin capital letter O with horn+ +  
ADhook u - uppercase +01AFlatin capital letter U with horn+ +  
AEalif++02BEmodifier letter right half ring?-- 2
B0ayn++02BBmodifier letter left half ring+-o  
B1slash l - lowercase+?0142latin small letter L with stroke+ +  
B2slash o - lowercase+?00F8latin small letter O with stroke+ +  
B3slash d - lowercase++0111latin small letter D with stroke+ + 3
B4thorn - lowercase++00FElatin small letter thorn+++  
B5ligature ae - lowercase++00E6latin small letter AE+++  
B6ligature oe - lowercase++0153latin small ligature OE+++  
B7tverdyi znak -02BAmodified letter double prime+-o  
B8dotless i - lowercase+?0131latin small letter dotless i+++  
B9british pound++00A3pound sign+++  
BAeth++00F0latin small letter eth+++  
BChook o - lowercase +01A1latin small letter O with horn+ +  
BDhook u - lowercase +01B0latin small letter U with horn+ +  
C0degree sign +00B0degree sign+++  
C1script l +2113script small L+++  
C2phonograph copyright mark +2117sound recording copyright+ +  
C3copyright symbol++00A9copyright sign+++  
C4musical sharp +266Fmusic sharp sign+++  
C5inverted question mark++00BFinverted question mark+++  
C6inverted exclamation mark++00A1inverted exclamation mark+++  
CFes zet+-00DFlatin small letter sharp S+-+ 4
E0low rising tone mark -0309combining hook above?--+5
E1grave accent++0300combining grave accent++++ 
E2acute accent++0301combining acute accent++++ 
E3circumflex accent++0302combining circumflex accent++++ 
E4tilde++0303combining tilde++++ 
E5macron++0304combining macron++++ 
E6breve++0306combining breve++++ 
E7dot above++0307combining dot above++++ 
E8umlaut (dieresis)++0308combining diaeresis++++ 
E9hacek++030Ccombining caron+-o+ 
EAcircle above (angstrom)++030Acombining ring above+ ++ 
EBligature, left half +FE20combining ligature left half-+o- 
ECligature, right half +FE21combining ligature right half-+o- 
EDhigh comma, off center++0315combining comma above right+ +- 
EEdouble acute accent++030Bcombining double acute accent++++ 
EFcandrabindu +0310combining candrabindu+++- 
F0cedilla++0327combining cedilla++++ 
F1right hook++0328combining ogonek+-o+ 
F2dot below +0323combining dot below++++ 
F3double dot below +0324combining diaeresis below+ ++ 
F4circle below +0325combining ring below+ ++ 
F5double underscore +0333combining double low line+ +- 
F6underscore++0332combining low line (= line below?)? o?6
F7left hook++0326combining comma below?--- 
F8right cedilla +031Ccombining left half ring below+-o-7
F9half circle below -032Ecombining breve below+-o+ 
FAdouble tilde, left half +FE22combining double tilde left half ++- 
FBdouble tilde, right half +FE23combining double tilde right half ++- 
FEhigh comma, centered++0313combining comma above+ +- 

This data ara summarized in a computer readable file can de found here.

Explantion

1st Column

ANSEL code value (hexadecimal). Values below 80 are identical to the corresponding ASCII value. All values are taken from a GEDCOM 5.5 description, which appear to have an updated ANSEL appendix. Values not listed here are either undefined or have a LDS-specific meaning.


2nd Column

Character name according to the updated ANSEL specification


3rd Column

O = Old ANSEL appendix. '+': this character appears also in the standard GEDCOM 5.5 description.


4th Column

M = speccharlatin.html is a mapping table from MARC to UCS (Unicode). In the description the word 'ANSEL' appears several times. Therefore I guess that ANSEL is a part of this MARC. This HTML page contains 4 colums: hex value of the MARC (ANSEL) character, the name of the chcarcter, the code point of the Unicode mapping and the code point name of the Unicode code point. The last column is not taken into account here. The third column (mapping to Unicode) is shown in column 5. The matching of the 2nd column (MARC character name) with the ANSEL name according to GEDCOM is shown here:
'+': good matching
'?' notable differences
'-': very different



5th Column

Unicode code point (in hex). All values except one (CF->00DF) are taken from speccharlatin.html (see 4th column).


6th Column

Unicode Name is the name of this Unicode code point which is taken from the Unicode web-page.


7th Column

V = Visible comparison. All ANSEL charcters have been printed: The WP code values, which are given in the GEDCOM 5.5 specification have been typed into WordPerfect 5.1. This file has been converted into WordPerfect 6.0. See the result here . You cannot type these values directly into WordPerfect 6.0. I guess that the WP-code tables have been changed (another reason to go to Unicode). This visible representation have been compared with the visible representation of the Unicode character as given at the Unicode web-page.
'+': good matching
'?': slight differences
'-': significant differences
nothing: no WP-code given


8th Column

N = Name comparisson: Comparisson of the ANSEL character name (according to the GEDCOM specification) in column 2 and the Unicode code point name (according to the Unicode web page) in column 6.
'+': identical or good matching
'-': no matching
nothing: decide yourself


9th Column

S = Summary: My opinion of the reliability of this row. A combination of the rows 7, 8 and 11.
'+': very reliable
'o': probalbly OK
'-': questionable


10th Column

C = Combined character used. This entry is impotant for building parsers. According to this table the ANSEL code sequence E2 41 has to be converted to: 0301 0041 (accute accent + latin capital letter A), which will be displayed as Á.
'+': the name of the character in this row appears also as a part of the name of other code points: in this example 00C1 (latin capital letter A with accute) displayed as Á (if this character exists within the installed font)
'-' the name does not exist as a part of a name of an european letter (at least not in the name list published at www.unicode.org at the end of 1997)
nothing: does not apply (not a combining character)


11th Column

R = Remarks:
1 The codepoint 00D0 (capital eth) has the same appearence. Another MARC to USC conversion table (DP73.DOC, found on this interesting page) maps A3 to D0. The mapping to 110 appears to be more logical (see comment 3).
2 From the visual comparisson (see column 7) a mapping to 02bc appears to be better.
3 DP73.DOC (see remark 1) maps B3 to 00F0 (small eth), which is not identical due to the visual comparison (see column 7) and collides with BA. The mapping to 0111 appears to be more logical.
4 Not given in speccharlatin.html (or DP73.DOC) but correct. Is probably a GEDCOM extention of ANSEL.
5 From the visual comparisson (see column 7) the characters E0 and FE appear to be identical.
6 From the visual comparisson (see column 7) a mapping to 0331 (COMBINING MACRON BELOW) is also possible. Furthermore: LOW LINE does not exist in the name of combined characters, but LINE BELOW does and there is no code point named COMBINED LINE BELOW. Therefore I guess that LOW LINE and LINE BELOW are two names for the same character.
7 DP73.DOC converts both F1 and F8 are to 0328.

Last modification: 2000-09-02
Back