ANSEL to Unicode Convertion Table

Do not use it; Read the explanation !!!

Despite an intensive search in the web I could not find a full ANSEL specification. Therefore I tried a guess from some sources availible in the web. They are all given below. Download them and decide for yourself wether this table is correct or not. Its not an official posting! Use it on your own risk!!

ANSEL	ANSEL Name	O	M	Unicode	Unicode Name	V	N	S	C	R

A1	slash l - uppercase	+	?	0141	latin capital letter L with stroke	+		+
A2	slash o - uppercase	+	?	00D8	latin capital letter O with stroke	+		+
A3	slash d - uppercase	+	+	0110	latin capital letter D with stroke	+		+		1
A4	thorn - uppercase	+	+	00DE	latin capital letter thorn	+	+	+
A5	ligature ae - uppercase	+	+	00C6	latin capital letter AE	+	+	+
A6	ligature oe - uppercase	+	+	0152	latin capital ligature OE	+	+	+
A7	miagkii znak	+	-	02B9	modified letter prime	+	-	o
A8	middle dot	+	+	00B7	middle dot	+	+	+
A9	musical flat		+	266D	music flat sign	+	+	+
AA	patent mark		+	00AE	registered sign	+		+
AB	plus-or-minus		+	00B1	plus-minus sign	+	+	+
AC	hook o - uppercase		+	01A0	latin capital letter O with horn	+		+
AD	hook u - uppercase		+	01AF	latin capital letter U with horn	+		+
AE	alif	+	+	02BE	modifier letter right half ring	?	-	-		2
B0	ayn	+	+	02BB	modifier letter left half ring	+	-	o
B1	slash l - lowercase	+	?	0142	latin small letter L with stroke	+		+
B2	slash o - lowercase	+	?	00F8	latin small letter O with stroke	+		+
B3	slash d - lowercase	+	+	0111	latin small letter D with stroke	+		+		3
B4	thorn - lowercase	+	+	00FE	latin small letter thorn	+	+	+
B5	ligature ae - lowercase	+	+	00E6	latin small letter AE	+	+	+
B6	ligature oe - lowercase	+	+	0153	latin small ligature OE	+	+	+
B7	tverdyi znak		-	02BA	modified letter double prime	+	-	o
B8	dotless i - lowercase	+	?	0131	latin small letter dotless i	+	+	+
B9	british pound	+	+	00A3	pound sign	+	+	+
BA	eth	+	+	00F0	latin small letter eth	+	+	+
BC	hook o - lowercase		+	01A1	latin small letter O with horn	+		+
BD	hook u - lowercase		+	01B0	latin small letter U with horn	+		+
C0	degree sign		+	00B0	degree sign	+	+	+
C1	script l		+	2113	script small L	+	+	+
C2	phonograph copyright mark		+	2117	sound recording copyright	+		+
C3	copyright symbol	+	+	00A9	copyright sign	+	+	+
C4	musical sharp		+	266F	music sharp sign	+	+	+
C5	inverted question mark	+	+	00BF	inverted question mark	+	+	+
C6	inverted exclamation mark	+	+	00A1	inverted exclamation mark	+	+	+
CF	es zet	+	-	00DF	latin small letter sharp S	+	-	+		4
E0	low rising tone mark		-	0309	combining hook above	?	-	-	+	5
E1	grave accent	+	+	0300	combining grave accent	+	+	+	+
E2	acute accent	+	+	0301	combining acute accent	+	+	+	+
E3	circumflex accent	+	+	0302	combining circumflex accent	+	+	+	+
E4	tilde	+	+	0303	combining tilde	+	+	+	+
E5	macron	+	+	0304	combining macron	+	+	+	+
E6	breve	+	+	0306	combining breve	+	+	+	+
E7	dot above	+	+	0307	combining dot above	+	+	+	+
E8	umlaut (dieresis)	+	+	0308	combining diaeresis	+	+	+	+
E9	hacek	+	+	030C	combining caron	+	-	o	+
EA	circle above (angstrom)	+	+	030A	combining ring above	+		+	+
EB	ligature, left half		+	FE20	combining ligature left half	-	+	o	-
EC	ligature, right half		+	FE21	combining ligature right half	-	+	o	-
ED	high comma, off center	+	+	0315	combining comma above right	+		+	-
EE	double acute accent	+	+	030B	combining double acute accent	+	+	+	+
EF	candrabindu		+	0310	combining candrabindu	+	+	+	-
F0	cedilla	+	+	0327	combining cedilla	+	+	+	+
F1	right hook	+	+	0328	combining ogonek	+	-	o	+
F2	dot below		+	0323	combining dot below	+	+	+	+
F3	double dot below		+	0324	combining diaeresis below	+		+	+
F4	circle below		+	0325	combining ring below	+		+	+
F5	double underscore		+	0333	combining double low line	+		+	-
F6	underscore	+	+	0332	combining low line (= line below?)	?		o	?	6
F7	left hook	+	+	0326	combining comma below	?	-	-	-
F8	right cedilla		+	031C	combining left half ring below	+	-	o	-	7
F9	half circle below		-	032E	combining breve below	+	-	o	+
FA	double tilde, left half		+	FE22	combining double tilde left half		+	+	-
FB	double tilde, right half		+	FE23	combining double tilde right half		+	+	-
FE	high comma, centered	+	+	0313	combining comma above	+		+	-

This data ara summarized in a computer readable file can de found here.

Explantion

1st Column

ANSEL code value (hexadecimal). Values below 80 are identical to the corresponding ASCII value. All values are taken from a GEDCOM 5.5 description, which appear to have an updated ANSEL appendix. Values not listed here are either undefined or have a LDS-specific meaning.

2nd Column

Character name according to the updated ANSEL specification

3rd Column

O = Old ANSEL appendix. '+': this character appears also in the standard GEDCOM 5.5 description.

4th Column

M = speccharlatin.html is a mapping table from MARC to UCS (Unicode). In the description the word 'ANSEL' appears several times. Therefore I guess that ANSEL is a part of this MARC. This HTML page contains 4 colums: hex value of the MARC (ANSEL) character, the name of the chcarcter, the code point of the Unicode mapping and the code point name of the Unicode code point. The last column is not taken into account here. The third column (mapping to Unicode) is shown in column 5. The matching of the 2nd column (MARC character name) with the ANSEL name according to GEDCOM is shown here:
'+': good matching
'?' notable differences
'-': very different

5th Column

Unicode code point (in hex). All values except one (CF->00DF) are taken from speccharlatin.html (see 4th column).

6th Column

Unicode Name is the name of this Unicode code point which is taken from the Unicode web-page.

7th Column

V = Visible comparison. All ANSEL charcters have been printed: The WP code values, which are given in the GEDCOM 5.5 specification have been typed into WordPerfect 5.1. This file has been converted into WordPerfect 6.0. See the result here . You cannot type these values directly into WordPerfect 6.0. I guess that the WP-code tables have been changed (another reason to go to Unicode). This visible representation have been compared with the visible representation of the Unicode character as given at the Unicode web-page.
'+': good matching
'?': slight differences
'-': significant differences
nothing: no WP-code given

8th Column

N = Name comparisson: Comparisson of the ANSEL character name (according to the GEDCOM specification) in column 2 and the Unicode code point name (according to the Unicode web page) in column 6.
'+': identical or good matching
'-': no matching
nothing: decide yourself

9th Column

S = Summary: My opinion of the reliability of this row. A combination of the rows 7, 8 and 11.
'+': very reliable
'o': probalbly OK
'-': questionable

10th Column

C = Combined character used. This entry is impotant for building parsers. According to this table the ANSEL code sequence E2 41 has to be converted to: 0301 0041 (accute accent + latin capital letter A), which will be displayed as Á.
'+': the name of the character in this row appears also as a part of the name of other code points: in this example 00C1 (latin capital letter A with accute) displayed as Á (if this character exists within the installed font)
'-' the name does not exist as a part of a name of an european letter (at least not in the name list published at www.unicode.org at the end of 1997)
nothing: does not apply (not a combining character)

11th Column

R = Remarks:
1 The codepoint 00D0 (capital eth) has the same appearence. Another MARC to USC conversion table (DP73.DOC, found on this interesting page) maps A3 to D0. The mapping to 110 appears to be more logical (see comment 3).
2 From the visual comparisson (see column 7) a mapping to 02bc appears to be better.
3 DP73.DOC (see remark 1) maps B3 to 00F0 (small eth), which is not identical due to the visual comparison (see column 7) and collides with BA. The mapping to 0111 appears to be more logical.
4 Not given in speccharlatin.html (or DP73.DOC) but correct. Is probably a GEDCOM extention of ANSEL.
5 From the visual comparisson (see column 7) the characters E0 and FE appear to be identical.
6 From the visual comparisson (see column 7) a mapping to 0331 (COMBINING MACRON BELOW) is also possible. Furthermore: LOW LINE does not exist in the name of combined characters, but LINE BELOW does and there is no code point named COMBINED LINE BELOW. Therefore I guess that LOW LINE and LINE BELOW are two names for the same character.
7 DP73.DOC converts both F1 and F8 are to 0328.

Last modification: 2000-09-02
Back