This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
”, to distinguish the markup or tags from the “real” content. Many popular word processing packages rely on a buffer of plain text to represent the content and implement links to a parallel store of formatting data. The relative functional roles of both plain text and rich text are well established: • Plain text is the underlying content stream to which formatting can be applied. • Rich text carries complex formatting information as well as text context. • Plain text is public, standardized, and universally readable. • Rich text representation may be implementation-specific or proprietary. Although some rich text formats have been standardized or made public, the majority of rich text designs are vehicles for particular implementations and are not necessarily readable by other implementations. Given that rich text equals plain text plus added information, the extra information in rich text can always be stripped away to reveal the “pure” text underneath. This operation is often employed, for example, in word processing systems that use both their own private rich text format and plain text file format as a universal, if limited, means of exchange. Thus, by default, plain text represents the basic, interchangeable content of text. Plain text represents character content only, not its appearance. It can be displayed in a varity of ways and requires a rendering process to make it visible with a particular appearance. (modern HTML delimits paragraphs by enclosing them in ...
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
2.2
Unicode Design Principles
19
If the same plain text sequence is given to disparate rendering processes, there is no expectation that rendered text in each instance should have the same appearance. Instead, the disparate rendering processes are simply required to make the text legible according to the intended reading. This legibility criterion constrains the range of possible appearances. The relationship between appearance and content of plain text may be summarized as follows: Plain text must contain enough information to permit the text to be rendered legibly, and nothing more. The Unicode Standard encodes plain text. The distinction between plain text and other forms of data in the same data stream is the function of a higher-level protocol and is not specified by the Unicode Standard itself.
Logical Order The order in which Unicode text is stored in the memory representation is called logical order. This order roughly corresponds to the order in which text is typed in via the keyboard; it also roughly corresponds to phonetic order. For decimal numbers, the logical order consistently corresponds to the most significant digit first, which is the order expected by number-parsing software. When displayed, this logical order often corresponds to a simple linear progression of characters in one direction, such as from left to right, right to left, or top to bottom. In other circumstances, text is displayed or printed in an order that differs from a single linear progression. Some of the clearest examples are situations where a right-to-left script (such as Arabic or Hebrew) is mixed with a left-to-right script (such as Latin or Greek). For example, when the text in Figure 2-4 is ordered for display, the glyph that represents the first character of the English text appears at the left. The logical start character of the Hebrew text, however, is represented by the Hebrew glyph closest to the right margin. The succeeding Hebrew glyphs are laid out to the left.
Figure 2-4. Bidirectional Ordering
In logical order, numbers are encoded with most significant digit first, but are displayed in different writing directions. As shown in Figure 2-5 these writing directions do not always correspond to the writing direction of the surrounding text. The first example shows N’Ko, a right-to-left script with digits that also render right to left. Examples 2 and 3 show Hebrew and Arabic, in which the numbers are rendered left to right, resulting in bidirectional layout. In left-to-right scripts, such as Latin and Hiragana and Katakana (for Japa-
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
20
General Structure
nese), numbers follow the predominant left-to-right direction of the script, as shown in Examples 4 and 5. When Japanese is laid out vertically, numbers are either laid out vertically or may be rotated clockwise 90 degrees to follow the layout direction of the lines, as shown in Example 6.
Figure 2-5. Writing Direction and Numbers .1123נא ראה עמוד
Please see page 1123. 1123ページをみてください。 The Unicode Standard precisely defines the conversion of Unicode text from logical order to the order of readable (displayed) text so as to ensure consistent legibility. Properties of directionality inherent in characters generally determine the correct display order of text. The Unicode Bidirectional Algorithm specifies how these properties are used to resolve directional interactions when characters of right-to-left and left-to-right directionality are mixed. (See Unicode Standard Annex #9, “The Bidirectional Algorithm.”) However, when characters of different directionality are mixed, inherent directionality alone is occasionally insufficient to render plain text legibly. The Unicode Standard therefore includes characters to explicitly specify changes in direction when necessary. The Bidirectional Algorithm uses these directional layout control characters together with the inherent directional properties of characters to exert exact control over the display ordering for legible interchange. By requiring the use of this algorithm, the Unicode Standard ensures that plain text used for simple items like file names or labels can always be correctly ordered for display. Besides mixing runs of differing overall text direction, there are many other cases where the logical order does not correspond to a linear progression of characters. Combining characters (such as accents) are stored following the base character to which they apply, but are positioned relative to that base character and thus do not follow a simple linear progression in the final rendered text. For example, the Latin letter “Ï” is stored as “x” followed by combining “Δ; the accent appears below, not to the right of the base. This position with respect to the base holds even where the overall text progression is from top to bottom—for example, with “Ï” appearing upright within a vertical Japanese line. Characters may also combine into ligatures or conjuncts or otherwise change positions of their components radically, as shown in Figure 2-3 and Figure 2-20. There is one particular exception to the usual practice of logical order paralleling phonetic order. With the Thai and Lao scripts, users traditionally type in visual order rather than phonetic order, resulting in some vowel letters being stored ahead of consonants, even though they are pronounced after them.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
2.2
Unicode Design Principles
21
Unification The Unicode Standard avoids duplicate encoding of characters by unifying them within scripts across language. Common letters are given one code each, regardless of language, as are common Chinese/Japanese/Korean (CJK) ideographs. (See Section 12.1, Han.) Punctuation marks, symbols, and diacritics are handled in a similar manner as letters. If they can be clearly identified with a particular script, they are encoded once for that script and are unified across any languages that may use that script. See, for example, U+1362 ethiopic full stop, U+060F arabic sign misra, and U+0592 hebrew accent segol. However, some punctuation or diacritic marks may be shared in common across a number of scripts—the obvious example being Western-style punctuation characters, which are often recently added to the writing systems of scripts other than Latin. In such cases, characters are encoded only once and are intended for use with multiple scripts. Common symbols are also encoded only once and are not associated with any script in particular. It is quite normal for many characters to have different usages, such as comma “,” for either thousands-separator (English) or decimal-separator (French). The Unicode Standard avoids duplication of characters due to specific usage in different languages; rather, it duplicates characters only to support compatibility with base standards. Avoidance of duplicate encoding of characters is important to avoid visual ambiguity. There are a few notable instances in the standard where visual ambiguity between different characters is tolerated, however. For example, in most fonts there is little or no distinction visible between Latin “o”, Cyrillic “o”, and Greek “o” (omicron). These are not unified because they are characters from three different scripts, and many legacy character encodings distinguish between them. As another example, there are three characters whose glyph is the same uppercase barred D shape, but they correspond to three distinct lowercase forms. Unifying these uppercase characters would have resulted in unnecessary complications for case mapping. The Unicode Standard does not attempt to encode features such as language, font, size, positioning, glyphs, and so forth. For example, it does not preserve language as a part of character encoding: just as French i grec, German ypsilon, and English wye are all represented by the same character code, U+0057 “Y”, so too are Chinese zi, Japanese ji, and Korean ja all represented as the same character code, U+5B57 %. In determining whether to unify variant CJK ideograph forms across standards, the Unicode Standard follows the principles described in Section 12.1, Han. Where these principles determine that two forms constitute a trivial difference, the Unicode Standard assigns a single code. Just as for the Latin and other scripts, typeface distinctions or local preferences in glyph shapes alone are not sufficient grounds for disunification of a character. Figure 2-6 illustrates the well-known example of the CJK ideograph for “bone,” which shows significant shape differences from typeface to typeface, with some forms preferred in China and some in Japan. All of these forms are considered to be the same character, encoded at U+9AA8 in the Unicode Standard.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
22
General Structure
Figure 2-6. Typeface Variation for the Bone Character
EF Many characters in the Unicode Standard could have been unified with existing visually similar Unicode characters or could have been omitted in favor of some other Unicode mechanism for maintaining the kinds of text distinctions for which they were intended. However, considerations of interoperability with other standards and systems often require that such compatibility characters be included in the Unicode Standard. See Section 2.3, Compatibility Characters. In particular, whenever font style, size, positioning or precise glyph shape carry a specific meaning and are used in distinction to the ordinary character—for example, in phonetic or mathematical notation—the characters are not unified.
Dynamic Composition The Unicode Standard allows for the dynamic composition of accented forms and Hangul syllables. Combining characters used to create composite forms are productive. Because the process of character composition is open-ended, new forms with modifying marks may be created from a combination of base characters followed by combining characters. For example, the diaeresis “¨” may be combined with all vowels and a number of consonants in languages using the Latin script and several other scripts, as shown in Figure 2-7.
Figure 2-7. Dynamic Composition
¨ Ä A + 0041
0308
Equivalent Sequences. Some text elements can be encoded either as static precomposed forms or by dynamic composition. Common precomposed forms such as U+00DC “Ü” latin capital letter u with diaeresis are included for compatibility with current standards. For static precomposed forms, the standard provides a mapping to an equivalent dynamically composed sequence of characters. (See also Section 3.7, Decomposition.) Thus different sequences of Unicode characters are considered equivalent. A precomposed character may be represented as an equivalent composed character sequence (see Section 2.12, Equivalent Sequences and Normalization).
Stability Certain aspects of the Unicode Standard must be absolutely stable between versions, so that implementers and users can be guaranteed that text data, once encoded, retains the same meaning. Most importantly, this means that once Unicode characters are assigned, their code point assignments cannot be changed, nor can characters be removed.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
2.3
Compatibility Characters
23
Characters are retained in the standard, so that previously conforming data stay conformant in future versions of the standard. Sometimes characters are deprecated—that is, their use in new documents is discouraged. Usually, this is because the characters were found not to be needed, and their continued use would merely result in duplicate ways of encoding the same information. While implementations should continue to recognize such characters when they are encountered, spell-checkers or editors could warn users of their presence and suggest replacements. Unicode character names are also never changed, so that they can be used as identifiers that are valid across versions. See Section 4.8, Name—Normative. Similar stability guarantees exist for certain important properties. For example, the decompositions are kept stable, so that it is possible to normalize a Unicode text once and have it remain normalized in all future versions. For a list of stability policies for the Unicode Standard, see Appendix F, Unicode Encoding Stability Policies.
Convertibility Character identity is preserved for interchange with a number of different base standards, including national, international, and vendor standards. Where variant forms (or even the same form) are given separate codes within one base standard, they are also kept separate within the Unicode Standard. This choice guarantees the existence of a mapping between the Unicode Standard and base standards. Accurate convertibility is guaranteed between the Unicode Standard and other standards in wide usage as of May 1993. Characters have also been added to allow convertibility to several important East Asian character sets created after that date—for example, GB 18030. In general, a single code point in another standard will correspond to a single code point in the Unicode Standard. Sometimes, however, a single code point in another standard corresponds to a sequence of code points in the Unicode Standard, or vice versa. Conversion between Unicode text and text in other character codes must, in general, be done by explicit table-mapping processes. (See also Section 5.1, Transcoding to Other Standards.)
2.3 Compatibility Characters Compatibility Variants Conceptually, compatibility characters are those that would not have been encoded except for compatibility and round-trip convertibility with other standards. They are variants of characters that already have encodings as normal (that is, non-compatibility) characters in the Unicode Standard; as such, they are more properly referred to as compatibility variants. Examples of compatibility variants in this sense include all of the glyph variants in the Compatibility and Specials Area: halfwidth or fullwidth characters from East Asian charac-
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
24
General Structure
ter encoding standards, Arabic contextual form glyphs from preexisting Arabic code pages, Arabic ligatures and ligatures from other scripts, and so on. Other examples include CJK compatibility ideographs, which are generally duplicates of a unified Han ideograph, and legacy alternate format characters such as U+206C inhibit arabic form shaping. The fact that a character can be considered a compatibility variant does not mean that the character is deprecated in the standard. The use of many compatibility variants in general interchange is unproblematic. Some, however, such as Arabic contextual forms or vertical forms, can lead to problems when used in general interchange. In identifiers, compatibility variants should be avoided because of their visual similarity with regular characters. (See Unicode Technical Report #36, “Unicode Security Considerations.”) The Compatibility and Specials Area contains a large number of compatibility characters, but the Unicode Standard also contains many compatibility characters that do not appear in that area. These include examples such as U+2163 “IV” roman numeral four, U+2007 figure space, and U+00B2 “2” superscript two. There is no formal listing of all compatibility characters in the Unicode Standard.
Compatibility Decomposable Characters There is a second, narrow sense of the term “compatibility character” in the Unicode Standard, corresponding to the notion of a compatibility decomposable introduced in Section 2.2, Unicode Design Principles. This sense is strictly defined as any Unicode character whose compatibility decomposition is not identical to its canonical decomposition. (See definition D66 in Section 3.7, Decomposition.) Because a compatibility character in this narrow sense must also be a composite character, it may also be unambiguously referred to as a compatibility composite character, or compatibility composite for short. The compatibility decomposable characters are precisely defined in the Unicode Character Database. Because of their use in normalization, their compatibility decompositions are stable and cannot be changed. Compatibility decomposable characters and compatibility characters are two distinct concepts, even though the two sets of characters overlap. Not all compatibility characters have decomposition mappings. For example, the deprecated alternate format characters do not have any distinct decomposition, and CJK compatibility ideographs have canonical decomposition mappings rather than compatibility decomposition mappings. Some compatibility decomposable characters are widely used characters serving essential functions. The no-break space is one example. A large number of compatibility decomposable characters are really distinct symbols used in specialized notations, whether phonetic or mathematical. They are therefore not compatibility variants in the strict sense. Rather, their compatibility mappings express their historical derivation from styled forms of standard letters. In these and similar cases, such as fixed-width space characters, the compatibility decompositions define possible fallback representations.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
2.4
Code Points and Characters
25
Mapping Compatibility Characters Identifying one character as a compatibility variant of another character usually implies that the first can be remapped to the second without the loss of any textual information other than formatting and layout. However, such remapping cannot always take place because many of the compatibility characters are included in the standard precisely to allow systems to maintain one-to-one mappings to other existing character encoding standards and code pages. In such cases, a remapping would lose information that is important to maintaining some distinction in the original encoding. By definition, a compatibility decomposable character decomposes into a compatibly equivalent character or character sequence. Even in such cases, an implementation must proceed with due caution—replacing one with the other may change not only formatting information, but also other textual distinctions on which some other process may depend. In many cases there exists a visual relationship between a compatibility composition and a standard character that is akin to a font style or directionality difference. Replacing such characters with unstyled characters could affect the meaning of the text. Replacing them with rich text would preserve the meaning for a human reader, but could cause some programs that depend on the distinction to behave unpredictably. This issue particularly affects compatibility characters used in mathematical notation. In some usage domains (for example, network identifiers), it may be acceptable to prohibit the use of compatibility variants or to remap them consistently. In fact, in such cases, further sets of characters may be restricted in a similar way to compatibility variants. For more information and an introduction to the concept of “confusable” characters, see Unicode Technical Standard #39, “Unicode Security Mechanisms.”
2.4 Code Points and Characters On a computer, abstract characters are encoded internally as numbers. To create a complete character encoding, it is necessary to define the list of all characters to be encoded and to establish systematic rules for how the numbers represent the characters. The range of integers used to code the abstract characters is called the codespace. A particular integer in this set is called a code point. When an abstract character is mapped or assigned to a particular code point in the codespace, it is then referred to as an encoded character. In the Unicode Standard, the codespace consists of the integers from 0 to 10FFFF16, comprising 1,114,112 code points available for assigning the repertoire of abstract characters. There are constraints on how the codespace is organized, and particular areas of the codespace have been set aside for encoding of certain kinds of abstract characters or for other uses in the standard. For more on the allocation of the Unicode codespace, see Section 2.8, Unicode Allocation.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
26
General Structure
Figure 2-8 illustrates the relationship between abstract characters and code points, which together constitute encoded characters. Note that some abstract characters may be associated with multiple, separately encoded characters (that is, be encoded “twice”). In other instances, an abstract character may be represented by a sequence of two (or more) other encoded characters. The solid arrows connect encoded characters with the abstract characters that they represent and encode.
Figure 2-8. Codespace and Encoded Characters
Abstract
Encoded 00C5 212B 0041
030A
When referring to code points in the Unicode Standard, the usual practice is to refer to them by their numeric value expressed in hexadecimal, with a “U+” prefix. (See Appendix A, Notational Conventions.) Encoded characters can also be referred to by their code points only. To prevent ambiguity, the official Unicode name of the character is often added; this clearly identifies the abstract character that is encoded. For example: U+0061 latin small letter a U+10330 gothic letter ahsa U+201DF cjk unified ideograph-201df Such citations refer only to the encoded character per se, associating the code point (as an integral value) with the abstract character that is encoded.
Types of Code Points There are many ways to categorize code points. Table 2-3 illustrates some of the categorizations and basic terminology used in the Unicode Standard. Not all assigned code points represent abstract characters; only Graphic, Format, Control and Private-use do. Surrogates and Noncharacters are assigned code points but are not assigned to abstract characters. Reserved code points are assignable: any may be assigned in a future version of the standard. The General Category provides a finer breakdown of
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
2.4
Code Points and Characters
27
Table 2-3. Types of Code Points Basic Type
Brief Description
Letter, mark, number, punctuation, symbol, and spaces Invisible but affects neighboring characters; Format includes line/paragraph separators Usage defined by protocols Control or standards outside the Unicode Standard Usage defined by private Private-use agreement outside the Unicode Standard Permanently reserved for Surrogate UTF-16; restricted interchange Permanently reserved for Noncharacter internal usage; restricted interchange Reserved for future assignReserved ment; restricted interchange Graphic
Character Status
General Category
Code Point Status
L, M, N, P, S, Zs
Cf, Zl, Zp
Assigned to abstract character
Cc
Designated (assigned) code point
Co
Cs
Cn
Not assigned to abstract character Undesignated (unassigned) code point
Graphic characters and also distinguishes between the other basic types (except between Noncharacter and Reserved). Other properties defined in the Unicode Character Database provide for different categorizations of Unicode code points. Control Codes. Sixty-five code points (U+0000..U+001F and U+007F.. U+009F) are reserved specifically as control codes, for compatibility with the C0 and C1 control codes of the ISO/IEC 2022 framework. A few of these control codes are given specific interpretations by the Unicode Standard. (See Section 16.1, Control Codes.) Noncharacters. Sixty-six code points are not used to encode characters. Noncharacters consist of U+FDD0..U+FDEF and any code point ending in the value FFFE16 or FFFF16— that is, U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, ... U+10FFFE, U+10FFFF. (See Section 16.7, Noncharacters.) Private Use. Three ranges of code points have been set aside for private use. Characters in these areas will never be defined by the Unicode Standard. These code points can be freely used for characters of any purpose, but successful interchange requires an agreement between sender and receiver on their interpretation. (See Section 16.5, Private-Use Characters.) Surrogates. Some 2,048 code points have been allocated as surrogate code points, which are used in the UTF-16 encoding form. (See Section 16.6, Surrogates Area.)
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
28
General Structure
Restricted Interchange. Code points that are not assigned to abstract characters are subject to restrictions in interchange. • Surrogate code points cannot be conformantly interchanged using Unicode encoding forms. They do not correspond to Unicode scalar values and thus do not have well-formed representations in any Unicode encoding form. (See Section 3.8, Surrogates.) • Noncharacter code points are reserved for internal use, such as for sentinel values. They should never be interchanged. They do, however, have well-formed representations in Unicode encoding forms and survive conversions between encoding forms. This allows sentinel values to be preserved internally across Unicode encoding forms, even though they are not designed to be used in open interchange. • All implementations need to preserve reserved code points because they may originate in implementations that use a future version of the Unicode Standard. For example, suppose that one person is using a Unicode 5.0 system and a second person is using a Unicode 3.2 system. The first person sends the second person a document containing some code points newly assigned in Unicode 5.0; these code points were unassigned in Unicode 3.2. The second person may edit the document, not changing the reserved codes, and send it on. In that case the second person is interchanging what are, as far as the second person knows, reserved code points. Code Point Semantics. The semantics of most code points are established by this standard; the exceptions are Controls, Private-use, and Noncharacters. Control codes generally have semantics determined by other standards or protocols (such as ISO/IEC 6429), but there are a small number of control codes for which the Unicode Standard specifies particular semantics. See Table 16-1 in Section 16.1, Control Codes, for the exact list of those control codes. The semantics of private-use characters are outside the scope of the Unicode Standard; their use is determined by private agreement, as, for example, between vendors. Noncharacters have semantics in internal use only.
2.5 Encoding Forms Computers handle numbers not simply as abstract mathematical objects, but as combinations of fixed-size units like bytes and 32-bit words. A character encoding model must take this fact into account when determining how to associate numbers with the characters. Actual implementations in computer systems represent integers in specific code units of particular size—usually 8-bit (= byte), 16-bit, or 32-bit. In the Unicode character encoding model, precisely defined encoding forms specify how each integer (code point) for a Unicode character is to be expressed as a sequence of one or more code units. The Unicode Standard provides three distinct encoding forms for Unicode characters, using 8-bit, 16bit, and 32-bit units. These are named UTF-8, UTF-16, and UTF-32, respectively. The
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
2.5
Encoding Forms
29
“UTF” is a carryover from earlier terminology meaning Unicode (or UCS) Transformation Format. Each of these three encoding forms is an equally legitimate mechanism for representing Unicode characters; each has advantages in different environments. All three encoding forms can be used to represent the full range of encoded characters in the Unicode Standard; they are thus fully interoperable for implementations that may choose different encoding forms for various reasons. Each of the three Unicode encoding forms can be efficiently transformed into either of the other two without any loss of data. Non-overlap. Each of the Unicode encoding forms is designed with the principle of nonoverlap in mind. Figure 2-9 presents an example of an encoding where overlap is permitted. In this encoding (Windows code page 932), characters are formed from either one or two code bytes. Whether a sequence is one or two bytes in length depends on the first byte, so that the values for lead bytes (of a two-byte sequence) and single bytes are disjoint. However, single-byte values and trail-byte values can overlap. That means that when someone searches for the character “D”, for example, he or she might find it either (mistakenly) as the trail byte of a two-byte sequence or as a single, independent byte. To find out which alternative is correct, a program must look backward through text.
Figure 2-9. Overlap in Legacy Mixed-Width Encodings
84 44 0414
D
44
Trail and Single
0044
84 84 84 84
0442
Lead and Trail
The situation is made more complex by the fact that lead and trail bytes can also overlap, as shown in the second part of Figure 2-9. This means that the backward scan has to repeat until it hits the start of the text or hits a sequence that could not exist as a pair as shown in Figure 2-10. This is not only inefficient, but also extremely error-prone: corruption of one byte can cause entire lines of text to be corrupted.
Figure 2-10. Boundaries and Interpretation
?? ... 84 84 84 84 84 84 44
0442
The Unicode Standard 5.0 – Electronic edition
0414
D
0044
Copyright © 1991–2007 Unicode, Inc.
30
General Structure
The Unicode encoding forms avoid this problem, because none of the ranges of values for the lead, trail, or single code units in any of those encoding forms overlap. Non-overlap makes all of the Unicode encoding forms well behaved for searching and comparison. When searching for a particular character, there will never be a mismatch against some code unit sequence that represents just part of another character. The fact that all Unicode encoding forms observe this principle of non-overlap distinguishes them from many legacy East Asian multibyte character encodings, for which overlap of code unit sequences may be a significant problem for implementations. Another aspect of non-overlap in the Unicode encoding forms is that all Unicode characters have determinate boundaries when expressed in any of the encoding forms. That is, the edges of code unit sequences representing a character are easily determined by local examination of code units; there is never any need to scan back indefinitely in Unicode text to correctly determine a character boundary. This property of the encoding forms has sometimes been referred to as self-synchronization. This property has another very important implication: corruption of a single code unit corrupts only a single character; none of the surrounding characters are affected. For example, when randomly accessing a string, a program can find the boundary of a character with limited backup. In UTF-16, if a pointer points to a leading surrogate, a single backup is required. In UTF-8, if a pointer points to a byte starting with 10xxxxxx (in binary), one to three backups are required to find the beginning of the character. Conformance. The Unicode Consortium fully endorses the use of any of the three Unicode encoding forms as a conformant way of implementing the Unicode Standard. It is important not to fall into the trap of trying to distinguish “UTF-8 versus Unicode,” for example. UTF-8, UTF-16, and UTF-32 are all equally valid and conformant ways of implementing the encoded characters of the Unicode Standard. Examples. Figure 2-11 shows the three Unicode encoding forms, including how they are related to Unicode code points.
Figure 2-11. Unicode Encoding Forms
UTF-32 00000041 000003A9 00008A9E 00010384
UTF-16 0041 03A9 8A9E D800 DF84
UTF-8 41 CE A9 E8 AA 9E F0 90 8E 84
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
2.5
Encoding Forms
31
In Figure 2-11, the UTF-32 line shows that each example character can be expressed with one 32-bit code unit. Those code units have the same values as the code point for the character. For UTF-16, most characters can be expressed with one 16-bit code unit, whose value is the same as the code point for the character, but characters with high code point values require a pair of 16-bit surrogate code units instead. In UTF-8, a character may be expressed with one, two, three, or four bytes, and the relationship between those byte values and the code point value is more complex. UTF-8, UTF-16, and UTF-32 are further described in the subsections that follow. See each subsection for a general overview of how each encoding form is structured and the general benefits or drawbacks of each encoding form for particular purposes. For the detailed formal definition of the encoding forms and conformance requirements, see Section 3.9, Unicode Encoding Forms.
UTF-32 UTF-32 is the simplest Unicode encoding form. Each Unicode code point is represented directly by a single 32-bit code unit. Because of this, UTF-32 has a one-to-one relationship between encoded character and code unit; it is a fixed-width character encoding form. This makes UTF-32 an ideal form for APIs that pass single character values. As for all of the Unicode encoding forms, UTF-32 is restricted to representation of code points in the range 0..10FFFF16—that is, the Unicode codespace. This guarantees interoperability with the UTF-16 and UTF-8 encoding forms. Fixed Width. The value of each UTF-32 code unit corresponds exactly to the Unicode code point value. This situation differs significantly from that for UTF-16 and especially UTF-8, where the code unit values often change unrecognizably from the code point value. For example, U+10000 is represented as <00010000> in UTF-32 and as
UTF-16 In the UTF-16 encoding form, code points in the range U+0000..U+FFFF are represented as a single 16-bit code unit; code points in the supplementary planes, in the range U+10000..U+10FFFF, are represented as pairs of 16-bit code units. These pairs of special code units are known as surrogate pairs. The values of the code units used for surrogate pairs are completely disjunct from the code units used for the single code unit representations, thus maintaining non-overlap for all code point representations in UTF-16. For the formal definition of surrogates, see Section 3.8, Surrogates.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
32
General Structure
Optimized for BMP. UTF-16 optimizes the representation of characters in the Basic Multilingual Plane (BMP)—that is, the range U+0000..U+FFFF. For that range, which contains the vast majority of common-use characters for all modern scripts of the world, each character requires only one 16-bit code unit, thus requiring just half the memory or storage of the UTF-32 encoding form. For the BMP, UTF-16 can effectively be treated as if it were a fixed-width encoding form. Supplementary Characters and Surrogates. For supplementary characters, UTF-16 requires two 16-bit code units. The distinction between characters represented with one versus two 16-bit code units means that formally UTF-16 is a variable-width encoding form. That fact can create implementation difficulties if it is not carefully taken into account; UTF-16 is somewhat more complicated to handle than UTF-32. Preferred Usage. UTF-16 may be a preferred encoding form in many environments that need to balance efficient access to characters with economical use of storage. It is reasonably compact, and all the common, heavily used characters fit into a single 16-bit code unit. Origin. UTF-16 is the historical descendant of the earliest form of Unicode, which was originally designed to use a fixed-width, 16-bit encoding form exclusively. The surrogates were added to provide an encoding form for the supplementary characters at code points past U+FFFF. The design of the surrogates made them a simple and efficient extension mechanism that works well with older Unicode implementations and that avoids many of the problems of other variable-width character encodings. See Section 5.4, Handling Surrogate Pairs in UTF-16, for more information about surrogates and their processing. Collation. For the purpose of sorting text, binary order for data represented in the UTF-16 encoding form is not the same as code point order. This means that a slightly different comparison implementation is needed for code point order. For more information, see Section 5.17, Binary Order.
UTF-8 To meet the requirements of byte-oriented, ASCII-based systems, a third encoding form is specified by the Unicode Standard: UTF-8. This variable-width encoding form preserves ASCII transparency by making use of 8-bit code units. Byte-Oriented. Much existing software and practice in information technology have long depended on character data being represented as a sequence of bytes. Furthermore, many of the protocols depend not only on ASCII values being invariant, but must make use of or avoid special byte values that may have associated control functions. The easiest way to adapt Unicode implementations to such a situation is to make use of an encoding form that is already defined in terms of 8-bit code units and that represents all Unicode characters while not disturbing or reusing any ASCII or C0 control code value. That is the function of UTF-8. Variable Width. UTF-8 is a variable-width encoding form, using 8-bit code units, in which the high bits of each code unit indicate the part of the code unit sequence to which each byte belongs. A range of 8-bit code unit values is reserved for the first, or leading, element
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
2.5
Encoding Forms
33
of a UTF-8 code unit sequences, and a completely disjunct range of 8-bit code unit values is reserved for the subsequent, or trailing, elements of such sequences; this convention preserves non-overlap for UTF-8. Table 3-6 on page 103 shows how the bits in a Unicode code point are distributed among the bytes in the UTF-8 encoding form. See Section 3.9, Unicode Encoding Forms, for the full, formal definition of UTF-8. ASCII Transparency. The UTF-8 encoding form maintains transparency for all of the ASCII code points (0x00..0x7F). That means Unicode code points U+0000..U+007F are converted to single bytes 0x00..0x7F in UTF-8 and are thus indistinguishable from ASCII itself. Furthermore, the values 0x00..0x7F do not appear in any byte for the representation of any other Unicode code point, so that there can be no ambiguity. Beyond the ASCII range of Unicode, many of the non-ideographic scripts are represented by two bytes per code point in UTF-8; all non-surrogate code points between U+0800 and U+FFFF are represented by three bytes; and supplementary code points above U+FFFF require four bytes. Preferred Usage. UTF-8 is typically the preferred encoding form for HTML and similar protocols, particularly for the Internet. The ASCII transparency helps migration. UTF-8 also has the advantage that it is already inherently byte-serialized, as for most existing 8-bit character sets; strings of UTF-8 work easily with C or other programming languages, and many existing APIs that work for typical Asian multibyte character sets adapt to UTF-8 as well with little or no change required. Self-synchronizing. In environments where 8-bit character processing is required for one reason or another, UTF-8 has the following attractive features as compared to other multibyte encodings: • The first byte of a UTF-8 code unit sequence indicates the number of bytes to follow in a multibyte sequence. This allows for very efficient forward parsing. • It is efficient to find the start of a character when beginning from an arbitrary location in a byte stream of UTF-8. Programs need to search at most four bytes backward, and usually much less. It is a simple task to recognize an initial byte, because initial bytes are constrained to a fixed range of values. • As with the other encoding forms, there is no overlap of byte values.
Comparison of the Advantages of UTF-32, UTF-16, and UTF-8 On the face of it, UTF-32 would seem to be the obvious choice of Unicode encoding forms for an internal processing code because it is a fixed-width encoding form. It can be conformantly bound to the C and C++ wchar_t, which means that such programming languages may offer built-in support and ready-made string APIs that programmers can take advantage of. However, UTF-16 has many countervailing advantages that may lead implementers to choose it instead as an internal processing code. While all three encoding forms need at most 4 bytes (or 32 bits) of data for each character, in practice UTF-32 in almost all cases for real data sets occupies twice the storage that UTF-
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html
Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html
Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html
This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006
Chapter 3
Conformance
3
This chapter defines conformance to the Unicode Standard in terms of the principles and encoding architecture it embodies. The first section defines the format for referencing the Unicode Standard and Unicode properties. The second section consists of the conformance clauses, followed by sections that define more precisely the technical terms used in those clauses. The remaining sections contain the formal algorithms that are part of conformance and referenced by the conformance clause. Additional definitions and algorithms that are part of this standard can be found in the Unicode Standard Annexes listed at the end of Section 3.2, Conformance Requirements. In this chapter, conformance clauses are identified with the letter C. Definitions are identified with the letter D. Bulleted items are explanatory comments regarding definitions or subclauses. The numbering of clauses and definitions has been changed from that of prior versions of The Unicode Standard. This change was necessitated by the addition of a substantial number of new definitions that did not fit well into the prior numbering scheme. A cross-reference table enabling the matching of a clause or definition between Version 5.0 and earlier versions of the standard is available in Section D.3, Clause and Definition Numbering Changes. For information on implementing best practices, see Chapter 5, Implementation Guidelines.
3.1 Versions of the Unicode Standard For most character encodings, the character repertoire is fixed (and often small). Once the repertoire is decided upon, it is never changed. Addition of a new abstract character to a given repertoire creates a new repertoire, which will be treated either as an update of the existing character encoding or as a completely new character encoding. For the Unicode Standard, by contrast, the repertoire is inherently open. Because Unicode is a universal encoding, any abstract character that could ever be encoded is a potential candidate to be encoded, regardless of whether the character is currently known. Each new version of the Unicode Standard supersedes the previous one, but implementations—and, more significantly, data—are not updated instantly. In general, major and minor version changes include new characters, which do not create particular problems
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
66
Conformance
with old data. The Unicode Technical Committee will neither remove nor move characters. Characters may be deprecated, but this does not remove them from the standard or from existing data. The code point for a deprecated character will never be reassigned to a different character, but the use of a deprecated character is strongly discouraged. Generally these rules make the encoded characters of a new version backward-compatible with previous versions. Implementations should be prepared to be forward-compatible with respect to Unicode versions. That is, they should accept text that may be expressed in future versions of this standard, recognizing that new characters may be assigned in those versions. Thus they should handle incoming unassigned code points as they do unsupported characters. (See Section 5.3, Unknown and Missing Characters.) A version change may also involve changes to the properties of existing characters. When this situation occurs, modifications are made to the Unicode Character Database and a new update version is issued for the standard. Changes to the data files may alter program behavior that depends on them. However, such changes to properties and to data files are never made lightly. They are made only after careful deliberation by the Unicode Technical Committee has determined that there is an error, inconsistency, or other serious problem in the property assignments.
Stability Each version of the Unicode Standard, once published, is absolutely stable and will never change. Implementations or specifications that refer to a specific version of the Unicode Standard can rely upon this stability. When implementations or specifications are upgraded to a future version of the Unicode Standard, then changes to them may be necessary. Note that even errata and corrigenda do not formally change the text of a published version; see “Errata and Corrigenda” later in this section. Some features of the Unicode Standard are guaranteed to be stable across versions. These include the names and code positions of characters, their decompositions, and several other character properties for which stability is important to implementations. See also “Stability of Properties” in Section 3.5, Properties. The formal statement of such stability guarantees is contained in the policies on character encoding stability found on the Unicode Web site. See the subsection “Policies” in Section B.6, Other Unicode Online Resources. Appendix F, Unicode Encoding Stability Policies, presents a copy of these policies in effect at the time of this publication. See also the discussion of backward compatibility in Unicode Standard Annex #31, “Identifier and Pattern Syntax,” and the subsection “Interacting with Downlevel Systems” in Section 5.3, Unknown and Missing Characters.
Version Numbering Version numbers for the Unicode Standard consist of three fields, denoting the major version, the minor version, and the update version, respectively. For example, “Unicode 3.1.1” indicates major version 3 of the Unicode Standard, minor version 1 of Unicode 3, and update version 1 of minor version Unicode 3.1.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
3.1
Versions of the Unicode Standard
67
Formally, each new version of the Unicode Standard supersedes all earlier versions. However, because of the differences in the ways major, minor, and update versions are documented, minor and update versions generally do not obsolete all of the documentation of the immediately prior versions of the standard. Additional information on the current and past versions of the Unicode Standard can be found on the Unicode Web site. See the subsection “Versions” in Section B.6, Other Unicode Online Resources. The online document contains the precise list of contributing files from the Unicode Character Database and the Unicode Standard Annexes, which are formally part of each version of the Unicode Standard. The differences between major, minor, and update versions are as follows: Major Version. A major version represents significant additions to the standard, including but not limited to major additions to the repertoire of encoded characters. A major version is published as a book, together with associated updates to Unicode Standard Annexes and the Unicode Character Database. A major version consolidates all errata and corrigenda to data. The publication of the book for a major version supersedes any prior documentation for major, minor, and update versions. Minor Version. A minor version also represents significant additions to the standard. It may include small or large additions to the repertoire of encoded characters or other significant normative changes. A minor version is published only online and is not published as a book. Prior to Unicode 4.1, a minor version was published as a Unicode Standard Annex (or as a Unicode Technical Report for the very earliest minor versions). Starting with Unicode 4.1, minor versions are published as stable version pages online. A minor version is also associated with an update to the Unicode Character Database and updates to the UAXes. A minor version incorporates selected errata as appropriate. The documentation for a minor version does not stand alone, but rather amends the documentation of the prior version. Update Version. An update version represents relatively small changes to the standard, focusing on updates to the data files of the Unicode Character Database. An update version never involves any additions to character repertoire. It is published only online. Starting with Unicode 3.0.1, update versions are published as stable version pages online. Prior to that version, update versions were simply documented with the list of relevant data file changes to the Unicode Character Database. An update version incorporates selected errata, primarily for the data files. The documentation for an update version does not stand alone, but rather amends the prior version.
Errata and Corrigenda From time to time it may be necessary to publish errata or corrigenda to the Unicode Standard. Such errata and corrigenda will be published on the Unicode Web site. See Section B.6, Other Unicode Online Resources, for information on how to report errors in the standard.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
68
Conformance
Errata. Errata correct errors in the text or other informative material, such as the representative glyphs in the code charts. See the subsection “Updates and Errata” in Section B.6, Other Unicode Online Resources. Whenever a new major version of the standard is published, all errata up to that point are incorporated into the text. Corrigenda. Occasionally errors may be important enough that a corrigendum is issued prior to the next version of the Unicode Standard. Such a corrigendum does not change the contents of the previous version. Instead, it provides a mechanism for an implementation, protocol, or other standard to cite the previous version of the Unicode Standard with the corrigendum applied. If a citation does not specifically mention the corrigendum, the corrigendum does not apply. For more information on citing corrigenda, see “Versions” in Section B.6, Other Unicode Online Resources.
References to the Unicode Standard The documents associated with the major, minor, and update versions are called the major reference, minor reference, and update reference, respectively. For example, consider Unicode Version 3.1.1. The major reference for that version is The Unicode Standard, Version 3.0 (ISBN 0-201-61633-5). The minor reference is Unicode Standard Annex #27, “The Unicode Standard, Version 3.1.” The update reference is Unicode Version 3.1.1. The exact list of contributory files, Unicode Standard Annexes, and Unicode Character Database files can be found at Enumerated Version 3.1.1. The reference for this version, Version 5.0.0, of the Unicode Standard, is The Unicode Consortium. The Unicode Standard, Version 5.0.0, defined by: The Unicode Standard, Version 5.0 (Boston, MA: Addison-Wesley, 2007. ISBN 0-321-48091-0) References to an update or minor version include a reference to both the major version and the documents modifying it. For the standard citation format for other versions of the Unicode Standard, see “Versions” in Section B.6, Other Unicode Online Resources.
Precision in Version Citation Because Unicode has an open repertoire with relatively frequent updates, it is important not to over-specify the version number. Wherever the precise behavior of all Unicode characters needs to be cited, the full three-field version number should be used, as in the first example below. However, trailing zeros are often omitted, as in the second example. In such a case, writing 3.1 is in all respects equivalent to writing 3.1.0. 1. The Unicode Standard, Version 3.1.1 2. The Unicode Standard, Version 3.1 3. The Unicode Standard, Version 3.0 or later 4. The Unicode Standard
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
3.1
Versions of the Unicode Standard
69
Where some basic level of content is all that is important, phrasing such as in the third example can be used. Where the important information is simply the overall architecture and semantics of the Unicode Standard, the version can be omitted entirely, as in example 4.
References to Unicode Character Properties Properties and property values have defined names and abbreviations, such as Property:
General_Category (gc)
Property Value: Uppercase_Letter (Lu) To reference a given property and property value, these aliases are used, as in this example: The property value Uppercase_Letter from the General_Category property, as specified in Version 5.0.0 of the Unicode Standard. Then cite that version of the standard, using the standard citation format that is provided for each version of the Unicode Standard. When referencing multi-word properties or property values, it is permissible to omit the underscores in these aliases or to replace them by spaces. When referencing a Unicode character property, it is customary to prepend the word “Unicode” to the name of the property, unless it is clear from context that the Unicode Standard is the source of the specification.
References to Unicode Algorithms A reference to a Unicode algorithm must specify the name of the algorithm or its abbreviation, followed by the version of the Unicode Standard, as in this example: The Unicode Bidirectional Algorithm, as specified in Version 4.1.0 of the Unicode Standard. See Unicode Standard Annex #9, “The Bidirectional Algorithm,” (http://www.unicode.org/reports/tr9/tr9-15.html) Where algorithms allow tailoring, the reference must state whether any such tailorings were applied or are applicable. For algorithms contained in a Unicode Standard Annex, the document itself and its location on the Unicode Web site may be cited as the location of the specification. When referencing a Unicode algorithm it is customary to prepend the word “Unicode” to the name of the algorithm, unless it is clear from the context that the Unicode Standard is the source of the specification. Omitting a version number when referencing a Unicode algorithm may be appropriate when such a reference is meant as a generic reference to the overall algorithm. Such a generic reference may also be employed in the sense of latest available version of the algorithm. However, for specific and detailed conformance claims for Unicode algorithms,
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
70
Conformance
generic references are generally not sufficient, and a full version number must accompany the reference.
3.2 Conformance Requirements This section presents the clauses specifying the formal conformance requirements for processes implementing Version 5.0 of the Unicode Standard. A few of these clauses have been revised from Version 4.0 of the Unicode Standard. The revisions do not change the fundamental substance of the conformance requirements previously set forth, but rather are reformulated to clarify their applicability to Unicode algorithms and tailoring. The definitions that these clauses—particularly conformance clause C4—depend on have been extended to cover additional aspects of properties and algorithms. In addition to the specifications printed in this book, the Unicode Standard, Version 5.0, includes a number of Unicode Standard Annexes (UAXes) and the Unicode Character Database. Both are available only electronically, either on the CD-ROM or on the Unicode Web site. At the end of this section there is a list of those annexes that are considered an integral part of the Unicode Standard, Version 5.0.0, and therefore covered by these conformance requirements. The Unicode Character Database contains an extensive specification of normative and informative character properties completing the formal definition of the Unicode Standard. See Chapter 4, Character Properties, for more information. Not all conformance requirements are relevant to all implementations at all times because implementations may not support the particular characters or operations for which a given conformance requirement may be relevant. See Section 2.14, Conforming to the Unicode Standard, for more information. In this section, conformance clauses are identified with the letter C. The numbering of clauses has been changed from that of prior versions of The Unicode Standard. A cross-reference table enabling the matching of a clause between Version 5.0 and earlier versions of the standard is available in Section D.3, Clause and Definition Numbering Changes.
Code Points Unassigned to Abstract Characters C1 A process shall not interpret a high-surrogate code point or a low-surrogate code point as an abstract character. • The high-surrogate and low-surrogate code points are designated for surrogate code units in the UTF-16 character encoding form. They are unassigned to any abstract character. C2 A process shall not interpret a noncharacter code point as an abstract character.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
3.2
Conformance Requirements
71
• The noncharacter code points may be used internally, such as for sentinel values or delimiters, but should not be exchanged publicly. C3 A process shall not interpret an unassigned code point as an abstract character. • This clause does not preclude the assignment of certain generic semantics to unassigned code points (for example, rendering with a glyph to indicate the position within a character block) that allow for graceful behavior in the presence of code points that are outside a supported subset. • Unassigned code points may have default property values. (See D26.) • Code points whose use has not yet been designated may be assigned to abstract characters in future versions of the standard. Because of this fact, due care in the handling of generic semantics for such code points is likely to provide better robustness for implementations that may encounter data based on future versions of the standard.
Interpretation C4 A process shall interpret a coded character sequence according to the character semantics established by this standard, if that process does interpret that coded character sequence. • This restriction does not preclude internal transformations that are never visible external to the process. C5 A process shall not assume that it is required to interpret any particular coded character sequence. • Processes that interpret only a subset of Unicode characters are allowed; there is no blanket requirement to interpret all Unicode characters. • Any means for specifying a subset of characters that a process can interpret is outside the scope of this standard. • The semantics of a private-use code point is outside the scope of this standard. • Although these clauses are not intended to preclude enumerations or specifications of the characters that a process or system is able to interpret, they do separate supported subset enumerations from the question of conformance. In actuality, any system may occasionally receive an unfamiliar character code that it is unable to interpret. C6 A process shall not assume that the interpretations of two canonical-equivalent character sequences are distinct. • The implications of this conformance clause are twofold. First, a process is never required to give different interpretations to two different, but canonicalequivalent character sequences. Second, no process can assume that another
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
72
Conformance
process will make a distinction between two different, but canonical-equivalent character sequences. • Ideally, an implementation would always interpret two canonical-equivalent character sequences identically. There are practical circumstances under which implementations may reasonably distinguish them. • Even processes that normally do not distinguish between canonical-equivalent character sequences can have reasonable exception behavior. Some examples of this behavior include graceful fallback processing by processes unable to support correct positioning of nonspacing marks; “Show Hidden Text” modes that reveal memory representation structure; and the choice of ignoring collating behavior of combining sequences that are not part of the repertoire of a specified language (see Section 5.12, Strategies for Handling Nonspacing Marks).
Modification C7 When a process purports not to modify the interpretation of a valid coded character sequence, it shall make no change to that coded character sequence other than the possible replacement of character sequences by their canonical-equivalent sequences or the deletion of noncharacter code points. • Replacement of a character sequence by a compatibility-equivalent sequence does modify the interpretation of the text. • Replacement or deletion of a character sequence that the process cannot or does not interpret does modify the interpretation of the text. • Changing the bit or byte ordering of a character sequence when transforming it between different machine architectures does not modify the interpretation of the text. • Changing a valid coded character sequence from one Unicode character encoding form to another does not modify the interpretation of the text. • Changing the byte serialization of a code unit sequence from one Unicode character encoding scheme to another does not modify the interpretation of the text. • If a noncharacter that does not have a specific internal use is unexpectedly encountered in processing, an implementation may signal an error or delete or ignore the noncharacter. If these options are not taken, the noncharacter should be treated as an unassigned code point. For example, an API that returned a character property value for a noncharacter would return the same value as the default value for an unassigned code point. • All processes and higher-level protocols are required to abide by conformance clause C7 at a minimum. However, higher-level protocols may define additional equivalences that do not constitute modifications under that protocol.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
3.2
Conformance Requirements
73
For example, a higher-level protocol may allow a sequence of spaces to be replaced by a single space.
Character Encoding Forms C8 When a process interprets a code unit sequence which purports to be in a Unicode character encoding form, it shall interpret that code unit sequence according to the corresponding code point sequence. • The specification of the code unit sequences for UTF-8 is given in D92. • The specification of the code unit sequences for UTF-16 is given in D91. • The specification of the code unit sequences for UTF-32 is given in D90. C9 When a process generates a code unit sequence which purports to be in a Unicode character encoding form, it shall not emit ill-formed code unit sequences. • The definition of each Unicode character encoding form specifies the illformed code unit sequences in the character encoding form. For example, the definition of UTF-8 (D92) specifies that code unit sequences such as
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
74
Conformance
byte sequences. However, such repair of mangled data is a special case, and it must not be used in circumstances where it would cause security problems.
Character Encoding Schemes C11 When a process interprets a byte sequence which purports to be in a Unicode character encoding scheme, it shall interpret that byte sequence according to the byte order and specifications for the use of the byte order mark established by this standard for that character encoding scheme. • Machine architectures differ in ordering in terms of whether the most significant byte or the least significant byte comes first. These sequences are known as “big-endian” and “little-endian” orders, respectively. • For example, when using UTF-16LE, pairs of bytes are interpreted as UTF-16 code units using the little-endian byte order convention, and any initial
Bidirectional Text C12 A process that displays text containing supported right-to-left characters or embedding codes shall display all visible representations of characters (excluding format characters) in the same order as if the Bidirectional Algorithm had been applied to the text, unless tailored by a higher-level protocol as permitted by the specification. • The Bidirectional Algorithm is specified in Unicode Standard Annex #9, “The Bidirectional Algorithm.”
Normalization Forms C13 A process that produces Unicode text that purports to be in a Normalization Form shall do so in accordance with the specifications in Unicode Standard Annex #15, “Unicode Normalization Forms.” C14 A process that tests Unicode text to determine whether it is in a Normalization Form shall do so in accordance with the specifications in Unicode Standard Annex #15, “Unicode Normalization Forms.” C15 A process that purports to transform text into a Normalization Form must be able to produce the results of the conformance test specified in Unicode Standard Annex #15, “Unicode Normalization Forms.” • This means that when a process uses the input specified in the conformance test, its output must match the expected output of the test.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
3.2
Conformance Requirements
75
Normative References C16 Normative references to the Unicode Standard itself, to property aliases, to property value aliases, or to Unicode algorithms shall follow the formats specified in Section 3.1, Versions of the Unicode Standard. C17 Higher-level protocols shall not make normative references to provisional properties. • Higher-level protocols may make normative references to informative properties.
Unicode Algorithms C18 If a process purports to implement a Unicode algorithm, it shall conform to the specification of that algorithm in the standard, including any tailoring by a higher-level protocol as permitted by the specification. • The term Unicode algorithm is defined at D17. • An implementation claiming conformance to a Unicode algorithm need only guarantee that it produces the same results as those specified in the logical description of the process; it is not required to follow the actual described procedure in detail. This allows room for alternative strategies and optimizations in implementation. C19 The specification of an algorithm may prohibit or limit tailoring by a higher-level protocol. If a process that purports to implement a Unicode algorithm applies a tailoring, that fact must be disclosed. • For example, the algorithms for normalization and canonical ordering are not tailorable. The Bidirectional Algorithm allows some tailoring by higher-level protocols. The Unicode Default Case algorithms may be tailored without limitation.
Default Casing Algorithms C20 An implementation that purports to support Default Case Conversion, Default Case Detection, or Default Caseless Matching shall do so in accordance with the definitions and specifications in Section 3.13, Default Case Algorithms. • A conformant implementation may perform casing operations that are different from the default algorithms, perhaps tailored to a particular orthography, so long as the fact that a tailoring is applied is disclosed.
Unicode Standard Annexes The following standard annexes are approved and considered part of Version 5.0 of the Unicode Standard. These annexes may contain either normative or informative material, or
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
76
Conformance
both. Any reference to Version 5.0 of the standard automatically includes these standard annexes. • UAX #9: The Bidirectional Algorithm, Version 5.0.0 • UAX #11: East Asian Width, Version 5.0.0 • UAX #14: Line Breaking Properties, Version 5.0.0 • UAX #15: Unicode Normalization Forms, Version 5.0.0 • UAX #24: Script Names, Version 5.0.0 • UAX #29: Text Boundaries, Version 5.0.0 • UAX #31: Identifier and Pattern Syntax, Version 5.0.0 • UAX #34: Unicode Named Character Sequences, Version 5.0.0 Conformance to the Unicode Standard requires conformance to the specifications contained in these annexes, as detailed in the conformance clauses listed earlier in this section.
3.3 Semantics Definitions This and the following sections more precisely define the terms that are used in the conformance clauses. The numbering of definitions has been changed from that of prior versions of The Unicode Standard. A cross-reference table enabling the matching of a definition between Version 5.0 and earlier versions of the standard is available in Section D.3, Clause and Definition Numbering Changes.
Character Identity and Semantics D1 Normative behavior: The normative behaviors of the Unicode Standard consist of the following list or any other behaviors specified in the conformance clauses: • Character combination • Canonical decomposition • Compatibility decomposition • Canonical ordering behavior • Bidirectional behavior, as specified in the Unicode Bidirectional Algorithm (see Unicode Standard Annex #9, “The Bidirectional Algorithm”)
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
3.3
Semantics
77
• Conjoining jamo behavior, as specified in Section 3.12, Conjoining Jamo Behavior • Variation selection, as specified in Section 16.4, Variation Selectors • Normalization, as specified in Unicode Standard Annex #15, “Unicode Normalization Forms” • Default casing, as specified in Section 3.13, Default Case Algorithms D2 Character identity: The identity of a character is established by its character name and representative glyph in Chapter 17, Code Charts. • A character may have a broader range of use than the most literal interpretation of its name might indicate; the coded representation, name, and representative glyph need to be assessed in context when establishing the identity of a character. For example, U+002E full stop can represent a sentence period, an abbreviation period, a decimal number separator in English, a thousands number separator in German, and so on. The character name itself is unique, but may be misleading. See “Character Names” in Section 17.1, Character Names List. • Consistency with the representative glyph does not require that the images be identical or even graphically similar; rather, it means that both images are generally recognized to be representations of the same character. Representing the character U+0061 latin small letter a by the glyph “X” would violate its character identity. D3 Character semantics: The semantics of a character are determined by its identity, normative properties, and behavior. • Some normative behavior is default behavior; this behavior can be overridden by higher-level protocols. However, in the absence of such protocols, the behavior must be observed so as to follow the character semantics. • The character combination properties and the canonical ordering behavior cannot be overridden by higher-level protocols. The purpose of this constraint is to guarantee that the order of combining marks in text and the results of normalization are predictable. D4 Character name: A unique string used to identify each abstract character encoded in the standard. • The character names in the Unicode Standard match those of the English edition of ISO/IEC 10646. • Character names are immutable and cannot be overridden; they are stable identifiers. For more information, see Section 4.8, Name—Normative. • The name of a Unicode character is also formally a character property in the Unicode Character Database. Its long property alias is “Name” and its short
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
78
Conformance
property alias is “na”. Its value is the unique string label associated with the encoded character. D5 Character name alias: An additional unique string identifier, other than the character name, associated with an encoded character in the standard. • Character name aliases are assigned when there is a serious clerical defect with a character name, such that the character name itself may be misleading regarding the identity of the character. A character name alias constitutes an alternate identifier for the character. • Character name aliases are unique within the common namespace shared by character names, character name aliases, and named character sequences. • Character name aliases are a formal, normative part of the standard and should be distinguished from the informative, editorial aliases provided in the code charts. See Section 17.1, Character Names List, for the notational conventions used to distinguish the two. D6 Namespace: A set of names together with name matching rules, so that all names are distinct under the matching rules. • Within a given namespace all names must be unique, although the same name may be used with a different meaning in a different namespace. • Character names, character name aliases, and named character sequences share a single namespace in the Unicode Standard.
3.4 Characters and Encoding D7 Abstract character: A unit of information used for the organization, control, or representation of textual data. • When representing data, the nature of that data is generally symbolic as opposed to some other kind of data (for example, aural or visual). Examples of such symbolic data include letters, ideographs, digits, punctuation, technical symbols, and dingbats. • An abstract character has no concrete form and should not be confused with a glyph. • An abstract character does not necessarily correspond to what a user thinks of as a “character” and should not be confused with a grapheme. • The abstract characters encoded by the Unicode Standard are known as Unicode abstract characters. • Abstract characters not directly encoded by the Unicode Standard can often be represented by the use of combining character sequences.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
3.4
Characters and Encoding
79
D8 Abstract character sequence: An ordered sequence of one or more abstract characters. D9 Unicode codespace: A range of integers from 0 to 10FFFF16. • This particular range is defined for the codespace in the Unicode Standard. Other character encoding standards may use other codespaces. D10 Code point: Any value in the Unicode codespace. • A code point is also known as a code position. • See D77 for the definition of code unit. D11 Encoded character: An association (or mapping) between an abstract character and a code point. • An encoded character is also referred to as a coded character. • While an encoded character is formally defined in terms of the mapping between an abstract character and a code point, informally it can be thought of as an abstract character taken together with its assigned code point. • Occasionally, for compatibility with other standards, a single abstract character may correspond to more than one code point—for example, “Å” corresponds both to U+00C5 Å latin capital letter a with ring above and to U+212B Å angstrom sign. • A single abstract character may also be represented by a sequence of code points—for example, latin capital letter g with acute may be represented by the sequence , rather than being mapped to a single code point. D12 Coded character sequence: An ordered sequence of one or more code points. • A coded character sequence is also known as a coded character representation. • Normally a coded character sequence consists of a sequence of encoded characters, but it may also include noncharacters or reserved code points. • Internally, a process may choose to make use of noncharacter code points in its coded character sequences. However, such noncharacter code points may not be interpreted as abstract characters (see conformance clause C2), and their removal by a conformant process does not constitute modification of interpretation of the coded character sequence (see conformance clause C7). • Reserved code points are included in coded character sequences, so that the conformance requirements regarding interpretation and modification are properly defined when a Unicode-conformant implementation encounters coded character sequences produced under a future version of the standard. Unless specified otherwise for clarity, in the text of the Unicode Standard the term character alone designates an encoded character. Similarly, the term character sequence alone designates a coded character sequence.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
80
Conformance
D13 Deprecated character: A coded character whose use is strongly discouraged. Such characters are retained in the standard, but should not be used. • Deprecated characters are retained in the standard so that previously conforming data stay conformant in future versions of the standard. Deprecated characters should not be confused with obsolete characters, which are historical. Obsolete characters do not occur in modern text, but they are not deprecated; their use is not discouraged. D14 Noncharacter: A code point that is permanently reserved for internal use and that should never be interchanged. Noncharacters consist of the values U+nFFFE and U+nFFFF (where n is from 0 to 1016) and the values U+FDD0..U+FDEF. • For more information, see Section 16.7, Noncharacters. • These code points are permanently reserved as noncharacters. D15 Reserved code point: Any code point of the Unicode Standard that is reserved for future assignment. Also known as an unassigned code point. • Surrogate code points and noncharacters are considered assigned code points, but not assigned characters. • For a summary classification of reserved and other types of code points, see Table 2-3. In general, a conforming process may indicate the presence of a code point whose use has not been designated (for example, by showing a missing glyph in rendering or by signaling an appropriate error in a streaming protocol), even though it is forbidden by the standard from interpreting that code point as an abstract character. D16 Higher-level protocol: Any agreement on the interpretation of Unicode characters that extends beyond the scope of this standard. • Such an agreement need not be formally announced in data; it may be implicit in the context. • The specification of some Unicode algorithms may limit the scope of what a conformant higher-level protocol may do. D17 Unicode algorithm: The logical description of a process used to achieve a specified result involving Unicode characters. • This definition, as used in the Unicode Standard and other publications of the Unicode Consortium, is intentionally broad so as to allow precise logical description of required results, without constraining implementations to follow the precise steps of that logical description. D18 Named Unicode algorithm: A Unicode algorithm that is specified in the Unicode Standard or in other standards published by the Unicode Consortium and that is given an explicit name for ease of reference.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
3.5
Properties
81
• Named Unicode algorithms are cited in titlecase in the Unicode Standard. • When referenced outside the context of the Unicode Standard, it is customary to prepend the word “Unicode” to the name of the algorithm. Table 3-1 lists the named Unicode algorithms and indicates the locations of their specifications. Details regarding conformance to these algorithms and any restrictions they place on the scope of allowable tailoring by higher-level protocols can be found in the specifications. In some cases, a named Unicode algorithm is provided for information only.
Table 3-1. Named Unicode Algorithms Name
Description
Canonical Ordering Hangul Syllable Boundary Determination Hangul Syllable Composition Hangul Syllable Decomposition Hangul Syllable Name Generation Default Case Conversion Default Case Detection Default Caseless Matching Bidirectional Algorithm Line Breaking Algorithm Normalization Algorithm Grapheme Cluster Boundary Determination Word Boundary Determination Sentence Boundary Determination Default Identifier Determination Alternative Identifier Determination Pattern Syntax Determination Identifier Normalization Identifier Case Folding Standard Compression Scheme for Unicode (SCSU) Collation Algorithm (UCA)
Section 3.11 Section 3.12 Section 3.12 Section 3.12 Section 3.12 Section 3.13 Section 3.13 Section 3.13 and Section 5.18 UAX #9 UAX #14 UAX #15 UAX #29 UAX #29 UAX #29 UAX #31 UAX #31 UAX #31 UAX #31 UAX #31 UTS #6 UTS #10
3.5 Properties The Unicode Standard specifies many different types of character properties. This section provides the basic definitions related to character properties. The actual values of Unicode character properties are specified in the Unicode Character Database. See Section 4.1, Unicode Character Database, for an overview of those data files. Chapter 4, Character Properties, contains more detailed descriptions of some particular, important character properties. Additional properties that are specific to particular charac-
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
82
Conformance
ters (such as the definition and use of the right-to-left override character or zero width space) are discussed in the relevant sections of this standard. The interpretation of some properties (such as the case of a character) is independent of context, whereas the interpretation of other properties (such as directionality) is applicable to a character sequence as a whole, rather than to the individual characters that compose the sequence.
Types of Properties D19 Property: A named attribute of an entity in the Unicode Standard, associated with a defined set of values. D20 Code point property: A property of code points. • Code point properties refer to attributes of code points per se, based on architectural considerations of this standard, irrespective of any particular encoded character. • Thus the Surrogate property and the Noncharacter property are code point properties. D21 Abstract character property: A property of abstract characters. • Abstract character properties refer to attributes of abstract characters per se, based on their independent existence as elements of writing systems or other notational systems, irrespective of their encoding in the Unicode Standard. • Thus the Alphabetic property, the Punctuation property, the Hex_Digit property, the Numeric_Value property, and so on are properties of abstract characters and are associated with those characters whether encoded in the Unicode Standard or in any other character encoding—or even prior to their being encoded in any character encoding standard. D22 Encoded character property: A property of encoded characters in the Unicode Standard. • For each encoded character property there is a mapping from every code point to some value in the set of values associated with that property. Encoded character properties are defined this way to facilitate the implementation of character property APIs based on the Unicode Character Database. Typically, an API will take a property and a code point as input, and will return a value for that property as output, interpreting it as the “character property” for the “character” encoded at that code point. However, to be useful, such APIs must return meaningful values for unassigned code points, as well as for encoded characters. In some instances an encoded character property in the Unicode Standard is exactly equivalent to a code point property. For example, the Pattern_Syntax property simply defines a
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
3.5
Properties
83
range of code points that are reserved for pattern syntax. (See Unicode Standard Annex #31, “Identifier and Pattern Syntax.”) In other instances, an encoded character property directly reflects an abstract character property, but extends the domain of the property to include all code points, including unassigned code points. For Boolean properties, such as the Hex_Digit property, typically an encoded character property will be true for the encoded characters with that abstract character property and will be false for all other code points, including unassigned code points, noncharacters, private-use characters, and encoded characters for which the abstract character property is inapplicable or irrelevant. However, in many instances, an encoded character property is semantically complex and may telescope together values associated with a number of abstract character properties and/or code point properties. The General_Category property is an example—it contains values associated with several abstract character properties (such as Letter, Punctuation, and Symbol) as well as code point properties (such as \p{gc=Cs} for the Surrogate code point property). In the text of this standard the terms “Unicode character property,” “character property,” and “property” without qualifier generally refer to an encoded character property, unless otherwise indicated. A list of the encoded character properties formally considered to be a part of the Unicode Standard can be found in PropertyAliases.txt in the Unicode Character Database. See also “Property Aliases” later in this section.
Property Values D23 Property value: One of the set of values associated with an encoded character property. • For example, the East_Asian_Width [EAW] property has the possible values “Narrow”, “Neutral”, “Wide”, “Ambiguous”, and “Unassigned”. A list of the values associated with encoded character properties in the Unicode Standard can be found in PropertyValueAliases.txt in the Unicode Character Database. See also “Property Aliases” later in this section. D24 Explicit property value: A value for an encoded character property that is explicitly associated with a code point in one of the data files of the Unicode Character Database. D25 Implicit property value: A value for an encoded character property that is given by a generic rule or by an “otherwise” clause in one of the data files of the Unicode Character Database. • Implicit property values are used to avoid having to explicitly list values for more than 1 million code points (most of them unassigned) for every property.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
84
Conformance
D26 Default property value: The value (or in some cases small set of values) of a property associated with unassigned code points or with encoded characters for which the property is irrelevant. • For example, for most Boolean properties, “false” is the default property value. In such cases, the default property value used for unassigned code points may be the same value that is used for many assigned characters as well. • Some properties, particularly enumerated properties, specify a particular, unique value as their default value. For example, “XX” is the default property value for the Line_Break property. • A default property value is typically defined implicitly, to avoid having to repeat long lists of unassigned code points. • In the case of some properties with arbitrary string values, the default property value is an implied null value. For example, the fact that there is no Unicode character name for unassigned code points is equivalent to saying that the default property value for the Name property for an unassigned code point is a null string. • In some instances, an encoded character property may have multiple default values. For example, the Bidi_Class property defines a range of unassigned code points as having the “R” value, another range of unassigned code points as having the “AL” value, and the otherwise case as having the “L” value.
Classification of Properties by Their Values D27 Enumerated property: A property with a small set of named values. • As characters are added to the Unicode Standard, the set of values may need to be extended in the future, but enumerated properties have a relatively fixed set of possible values. D28 Closed enumeration: An enumerated property for which the set of values is closed and will not be extended for future versions of the Unicode Standard. • Currently, the General Category is the only closed enumeration, except for the Boolean properties. D29 Boolean property: A closed enumerated property whose set of values is limited to “true” and “false”. • The presence or absence of the property is the essential information. D30 Numeric property: A numeric property is a property whose value is a number that can take on any integer or real value. • An example is the Numeric_Value property. There is no implied limit to the number of possible distinct values for the property, except the limitations on representing integers or real numbers in computers.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
3.5
Properties
85
D31 String-valued property: A property whose value is a string. • The Canonical_Decomposition property is a string-valued property. D32 Catalog property: A property that is an enumerated property, typically unrelated to an algorithm, that may be extended in each successive version of the Unicode Standard. • Examples are the Age and Block properties. Additional values for both may be added each time a new version of the Unicode Standard adds new characters or blocks.
Normative and Informative Properties Unicode character properties are divided into those that are normative and those that are informative. D33 Normative property: A Unicode character property used in the specification of the standard. Specification that a character property is normative means that implementations which claim conformance to a particular version of the Unicode Standard and which make use of that particular property must follow the specifications of the standard for that property for the implementation to be conformant. For example, the directionality property (bidirectional character type) is required for conformance whenever rendering text that requires bidirectional layout, such as Arabic or Hebrew. Whenever a normative process depends on a property in a specified way, that property is designated as normative. The fact that a given Unicode character property is normative does not mean that the values of the property will never change for particular characters. Corrections and extensions to the standard in the future may require minor changes to normative values, even though the Unicode Technical Committee strives to minimize such changes. See also “Stability of Properties” later in this section. Some of the normative Unicode algorithms depend critically on particular property values for their behavior. Normalization, for example, defines an aspect of textual interoperability that many applications rely on to be absolutely stable. As a result, some of the normative properties disallow any kind of overriding by higher-level protocols. Thus the decomposition of Unicode characters is both normative and not overridable; no higher-level protocol may override these values, because to do so would result in non-interoperable results for the normalization of Unicode text. Other normative properties, such as case mapping, are overridable by higher-level protocols, because their intent is to provide a common basis for behavior. Nevertheless, they may require tailoring for particular local cultural conventions or particular implementations. Some important normative character properties of the Unicode Standard are listed in Table 3-2, with an indication of which sections in the standard provide a general descrip-
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
86
Conformance
tion of the properties and their use. Other normative properties are documented in the Unicode Character Database. In all cases, the Unicode Character Database provides the definitive list of character properties and the exact list of property value assignments for each version of the standard. A list of additional special character properties can be found in Section 4.12, Characters with Unusual Properties.
Table 3-2. Normative Character Properties Property
Description
Bidi_Class (directionality) Bidi_Mirrored Block Canonical_Combining_Class Case-related properties Composition_Exclusion Decomposition_Mapping Default_Ignorable_Code_Point Deprecated General_Category Hangul_Syllable_Type Jamo_Short_Name Joining_Type and Joining_Group Name Noncharacter_Code_Point Numeric_Value White_Space
UAX #9 and Section 4.4 Section 4.7 and UAX #9 Chapter 17 Section 3.11, Section 4.3, and UAX #15 Section 3.13, Section 4.2, and Chapter 17 UAX #15 Chapter 3, Chapter 17, and UAX #15 Section 5.20 Section 3.1 Section 4.5 Section 3.12 and UAX #29 Section 3.12 Section 8.2 Chapter 17 Section 16.7 Section 4.6 UCD.html
D34 Overridable property: A normative property whose values may be overridden by conformant higher-level protocols. • For example, the Canonical_Decomposition property is not overridable. The Uppercase property can be overridden. D35 Informative property: A Unicode character property whose values are provided for information only. A conformant implementation of the Unicode Standard is free to use or change informative property values as it may require, while remaining conformant to the standard. An implementer always has the option of establishing a protocol to convey the fact that informative properties are being used in distinct ways. Informative properties capture expert implementation experience. When an informative property is explicitly specified in the Unicode Character Database, its use is strongly recommended for implementations to encourage comparable behavior between implementations. Note that it is possible for an informative property in one version of the Unicode Standard to become a normative property in a subsequent version of the standard if its use starts to acquire conformance implications in some part of the standard.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
3.5
Properties
87
Table 3-3 provides a partial list of the more important informative character properties. For a complete listing, see the Unicode Character Database.
Table 3-3. Informative Character Properties Property
Description
Dash East_Asian_Width Letter-related properties Line_Break Mathematical Script Space Unicode_1_Name
Section 6.2 and Table 6-3 Section 12.4 and UAX #11 Section 4.10 Section 16.1, Section 16.2, and UAX #14 Section 15.4 UAX #24 Section 6.2 and Table 6-2 Section 4.9
D36 Provisional property: A Unicode character property whose values are unapproved and tentative, and which may be incomplete or otherwise not in a usable state. • Provisional properties may be removed from future versions of the standard, without prior notice. Some of the information provided about characters in the Unicode Character Database constitutes provisional data. This data may capture partial or preliminary information. It may contain errors or omissions, or otherwise not be ready for systematic use; however, it is included in the data files for distribution partly to encourage review and improvement of the information. For example, a number of the tags in the Unihan.txt file provide provisional property values of various sorts about Han characters. The data files of the Unicode Character Database may also contain various annotations and comments about characters, and those annotations and comments should be considered provisional. Implementations should not attempt to parse annotations and comments out of the data files and treat them as informative character properties per se.
Context Dependence D37 Context-dependent property: A property that applies to a code point in the context of a longer code point sequence. • For example, the lowercase mapping of a Greek sigma depends on the context of the surrounding characters. D38 Context-independent property: A property that is not context dependent; it applies to a code point in isolation.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
88
Conformance
Stability of Properties D39 Stable transformation: A transformation T on a property P is stable with respect to an algorithm A if the result of the algorithm on the transformed property A(T(P)) is the same as the original result A(P) for all code points. D40 Stable property: A property is stable with respect to a particular algorithm or process as long as possible changes in the assignment of property values are restricted in such a manner that the result of the algorithm on the property continues to be the same as the original result for all previously assigned code points. • For example, while the absolute values of the canonical combining classes are not guaranteed to be the same between versions of the Unicode Standard, their relative values will be maintained. As a result, the Canonical Combining Class, while not immutable, is a stable property with respect to the Normalization Forms as defined in Unicode Standard Annex #15, “Unicode Normalization Forms.” • As new characters are assigned to previously unassigned code points, the replacement of any default values for these code points with actual property values must maintain stability. D41 Fixed property: A property whose values (other than a default value), once associated with a specific code point, are fixed and will not be changed, except to correct obvious or clerical errors. • For a fixed property, any default values can be replaced without restriction by actual property values as new characters are assigned to previously unassigned code points. Examples of fixed properties include Age and Hangul_Syllable_Type. • Designating a property as fixed does not imply stability or immutability (see “Stability” in Section 3.1, Versions of the Unicode Standard). While the age of a character, for example, is established by the version of the Unicode Standard to which it was added, errors in the published listing of the property value could be corrected. For some other properties, explicit stability guarantees prohibit the correction even of such errors. D42 Immutable property: A fixed property that is also subject to a stability guarantee preventing any change in the published listing of property values other than assignment of new values to formerly unassigned code points. • An immutable property is trivially stable with respect to all algorithms. • An example of an immutable property is the Unicode character name itself. Because character names are values of an immutable property, misspellings and incorrect names will never be corrected clerically. Any errata will be noted in a comment in the character names list and, where needed, an informative character name alias will be provided.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
3.5
Properties
89
• When an encoded character property representing a code point property is immutable, none of its values can ever change. This follows from the fact that the code points themselves do not change, and the status of the property is unaffected by whether a particular abstract character is encoded at a code point later. An example of such a property is the Pattern_Syntax property; all values of that property are unchangeable for all code points, forever. • In the more typical case of an immutable property, the values for existing encoded characters cannot change, but when a new character is encoded, the formerly unassigned code point changes from having a default value for the property to having one of its nondefault values. Once that nondefault value is published, it can no longer be changed. D43 Stabilized property: A property that is neither extended to new characters nor maintained in any other manner, but that is retained in the Unicode Character Database. • A stabilized property is also a fixed property. D44 Deprecated property: A property whose use by implementations is discouraged. • One of the reasons a property may be deprecated is because a different combination of properties better expresses the intended semantics. • Where sufficiently widespread legacy support exists for the deprecated property, not all implementations may be able to discontinue the use of the deprecated property. In such a case, a deprecated property may be extended to new characters so as to maintain it in a usable and consistent state. Informative or normative properties in the standard will not be removed even when they are supplanted by other properties or are no longer useful. However, they may be stabilized and/or deprecated. For a list of stability policies related to character properties, see Appendix F, Unicode Encoding Stability Policies.
Simple and Derived Properties D45 Simple property: A Unicode character property whose values are specified directly in the Unicode Character Database (or elsewhere in the standard) and whose values cannot be derived from other simple properties. D46 Derived property: A Unicode character property whose values are algorithmically derived from some combination of simple properties. The Unicode Character Database lists a number of derived properties explicitly. Even though these values can be derived, they are provided as lists because the derivation may not be trivial and because explicit lists are easier to understand, reference, and implement. Good examples of derived properties include the ID_Start and ID_Continue properties, which can be used to specify a formal identifier syntax for Unicode characters. The details
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
90
Conformance
of how derived properties are computed can be found in the documentation for the Unicode Character Database.
Property Aliases To enable normative references to Unicode character properties, formal aliases for properties and for property values are defined as part of the Unicode Character Database. D47 Property alias: A unique identifier for a particular Unicode character property. • The identifiers used for property aliases contain only ASCII alphanumeric characters or the underscore character. • Short and long forms for each property alias are defined. The short forms are typically just two or three characters long to facilitate their use as attributes for tags in markup languages. For example, “General_Category” is the long form and “gc” is the short form of the property alias for the General Category property. • Property aliases are defined in the file PropertyAliases.txt in the Unicode Character Database. • Property aliases of normative properties are themselves normative. D48 Property value alias: A unique identifier for a particular enumerated value for a particular Unicode character property. • The identifiers used for property value aliases contain only ASCII alphanumeric characters or the underscore character, or have the special value “n/a”. • Short and long forms for property value aliases are defined. For example, “Currency_Symbol” is the long form and “Sc” is the short form of the property value alias for the currency symbol value of the General Category property. • Property value aliases are defined in the file PropertyValueAliases.txt in the Unicode Character Database. • Property value aliases are unique identifiers only in the context of the particular property with which they are associated. The same identifier string might be associated with an entirely different value for a different property. The combination of a property alias and a property value alias is, however, guaranteed to be unique. • Property value aliases referring to values of normative properties are themselves normative. The property aliases and property value aliases can be used, for example, in XML formats of property data, for regular-expression property tests, and in other programmatic textual descriptions of Unicode property data. Thus “gc=Lu” is a formal way of specifying that the General Category of a character (using the property alias “gc”) has the value of being an uppercase letter (using the property value alias “Lu”).
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
3.6
Combination
91
Private Use D49 Private-use code point: Code points in the ranges U+E000..U+F8FF, U+F0000.. U+FFFFD, and U+100000..U+10FFFD. • Private-use code points are considered to be assigned characters, but the abstract characters associated with them have no interpretation specified by this standard. They can be given any interpretation by conformant processes. • Private-use code points may be given default property values, but these default values are overridable by higher-level protocols that give those private-use code points a specific interpretation.
3.6 Combination D50 Graphic character: A character with the General Category of Letter (L), Combining Mark (M), Number (N), Punctuation (P), Symbol (S), or Space Separator (Zs). • Graphic characters specifically exclude the line and paragraph separators (Zl, Zp), as well as the characters with the General Category of Other (Cn, Cs, Cc, Cf). • The interpretation of private-use characters (Co) as graphic characters or not is determined by the implementation. • For more information, see Chapter 2, General Structure, especially Section 2.4, Code Points and Characters, and Table 2-3. D51 Base character: Any graphic character except for those with the General Category of Combining Mark (M). • Most Unicode characters are base characters. In terms of General Category values, a base character is any code point that has one of the following categories: Letter (L), Number (N), Punctuation (P), Symbol (S), or Space Separator (Zs). • Base characters do not include control characters or format controls. • Base characters are independent graphic characters, but this does not preclude the presentation of base characters from adopting different contextual forms or participating in ligatures. • The interpretation of private-use characters (Co) as base characters or not is determined by the implementation. However, the default interpretation of private-use characters should be as base characters, in the absence of other information. D52 Combining character: A character with the General Category of Combining Mark (M).
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
92
Conformance
• Combining characters consist of all characters with the General Category values of Spacing Combining Mark (Mc), Nonspacing Mark (Mn), and Enclosing Mark (Me). • All characters with non-zero canonical combining class are combining characters, but the reverse is not the case: there are combining characters with a zero canonical combining class. • The interpretation of private-use characters (Co) as combining characters or not is determined by the implementation. • These characters are not normally used in isolation unless they are being described. They include such characters as accents, diacritics, Hebrew points, Arabic vowel signs, and Indic matras. • The graphic positioning of a combining character depends on the last preceding base character, unless they are separated by a character that is neither a combining character nor either zero width joiner or zero width nonjoiner. The combining character is said to apply to that base character. • There may be no such base character, such as when a combining character is at the start of text or follows a control or format character—for example, a carriage return, tab, or right-left mark. In such cases, the combining characters are called isolated combining characters. • With isolated combining characters or when a process is unable to perform graphical combination, a process may present a combining character without graphical combination; that is, it may present it as if it were a base character. • The representative images of combining characters are depicted with a dotted circle in the code charts. When presented in graphical combination with a preceding base character, that base character is intended to appear in the position occupied by the dotted circle. D53 Nonspacing mark: A combining character with the General Category of Nonspacing Mark (Mn) or Enclosing Mark (Me). • The position of a nonspacing mark in presentation depends on its base character. It generally does not consume space along the visual baseline in and of itself. • Such characters may be large enough to affect the placement of their base character relative to preceding and succeeding base characters. For example, a circumflex applied to an “i” may affect spacing (“î”), as might the character U+20DD combining enclosing circle. D54 Enclosing mark: A nonspacing mark with the General Category of Enclosing Mark (Me). • Enclosing marks are a subclass of nonspacing marks that surround a base character, rather than merely being placed over, under, or through it.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
3.6
Combination
93
D55 Spacing mark: A combining character that is not a nonspacing mark. • Examples include U+093F devanagari vowel sign i. In general, the behavior of spacing marks does not differ greatly from that of base characters. • Spacing marks such as U+0BCA tamil vowel sign o may appear on both sides of a base character, but are not enclosing marks. D56 Combining character sequence: A maximal character sequence consisting of either a base character followed by a sequence of one or more characters where each is a combining character, zero width joiner, or zero width non-joiner; or a sequence of one or more characters where each is a combining character, zero width joiner, or zero width non-joiner. • When identifying a combining character sequence in Unicode text, the definition of the combining character sequence is applied maximally. For example, in the sequence
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
94
Conformance
• A grapheme extender can be conceived of primarily as the kind of nonspacing graphical mark that is applied above or below another spacing character. • zero width joiner and zero width non-joiner are formally defined to be grapheme extenders so that their presence does not break up a sequence of other grapheme extenders. • The small number of spacing marks that have the property Grapheme_Extend are all the second parts of a two-part combining mark. D60 Grapheme cluster: A maximal character sequence consisting of a grapheme base followed by zero or more grapheme extenders or, alternatively, the sequence
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html
Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html
Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html
This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006
Chapter 4
Character Properties
4
Disclaimer The content of all character property tables has been verified as far as possible by the Unicode Consortium. However, in case of conflict, the most authoritative version of the information for Version 5.0.0 is that supplied in the Unicode Character Database on the Unicode Web site. The contents of all the tables in this chapter may be superseded or augmented by information in future versions of the Unicode Standard. The Unicode Standard associates a rich set of semantics with characters and, in some instances, with code points. The support of character semantics is required for conformance; see Section 3.2, Conformance Requirements. Where character semantics can be expressed formally, they are provided as machine-readable lists of character properties in the Unicode Character Database (UCD). This chapter gives an overview of character properties, their status and attributes, followed by an overview of the UCD and more detailed notes on some important character properties. For a further discussion of character properties, see Unicode Technical Report #23, “Unicode Character Property Model.” Status and Attributes. Character properties may be normative or informative. Normative properties are those required for conformance. The following sections discuss important properties identified by their status. Many Unicode character properties can be overridden by implementations as needed. Section 3.2, Conformance Requirements, specifies when such overrides must be documented. A few properties, such as Noncharacter_Code_Point, may not be overridden. See Section 3.5, Properties, for the formal discussion of the status and attributes of properties. Consistency of Properties. The Unicode Standard is the product of many compromises. It has to strike a balance between uniformity of treatment for similar characters and compatibility with existing practice for characters inherited from legacy encodings. Because of this balancing act, one can expect a certain number of anomalies in character properties. For example, some pairs of characters might have been treated as canonical equivalents but are left unequivalent for compatibility with legacy differences. This situation pertains to U+00B5 µ micro sign and U+03BC º greek small letter mu, as well as to certain Korean jamo.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
130
Character Properties
In addition, some characters might have had properties differing in some ways from those assigned in this standard, but those properties are left as is for compatibility with existing practice. This situation can be seen with the halfwidth voicing marks for Japanese (U+FF9E halfwidth katakana voiced sound mark and U+FF9F halfwidth katakana semi-voiced sound mark), which might have been better analyzed as spacing combining marks, and with the conjoining Hangul jamo, which might have been better analyzed as an initial base character followed by formally combining medial and final characters. In the interest of efficiency and uniformity in algorithms, implementations may take advantage of such reanalyses of character properties, as long as this does not conflict with the conformance requirements with respect to normative properties. See Section 3.5, Properties; Section 3.2, Conformance Requirements; and Section 3.3, Semantics, for more information.
4.1 Unicode Character Database The Unicode Character Database (UCD) consists of a set of files that define the Unicode character properties and internal mappings. For each property, the files determine the assignment of property values to each code point. The UCD also supplies recommended property aliases and property value aliases for textual parsing and display in environments such as regular expressions. The properties include the following: • Name • General Category (basic partition into letters, numbers, symbols, punctuation, and so on) • Other important general characteristics (whitespace, dash, ideographic, alphabetic, noncharacter, deprecated, and so on) • Display-related properties (bidirectional class, shaping, mirroring, width, and so on) • Casing (upper, lower, title, folding—both simple and full) • Numeric values and types • Script and Block • Normalization properties (decompositions, decomposition type, canonical combining class, composition exclusions, and so on) • Age (version of the standard in which the code point was first designated) • Boundaries (grapheme cluster, word, line, and sentence) • Standardized variants
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
4.1
Unicode Character Database
131
See the Unicode Character Database for more details on the character properties, their distribution across files, and the file formats. Unihan Database. In addition, a large number of properties specific to CJK ideographs are defined in the Unicode Character Database. These properties include source information, radical and stroke counts, phonetic values, meanings, and mappings to many East Asian standards. These properties are documented in the file Unihan.txt, also known as the Unihan Database. For a complete description of the properties in the Unihan Database, see the documentation file Unihan.html in the Unicode Character Database. (See also “Online Unihan Database” in Section B.6, Other Unicode Online Resources.) Many properties apply to both ideographs and other characters. These are not specified in the Unihan Database. Stability. While the Unicode Consortium strives to minimize changes to character property data, occasionally character properties must be updated. When this situation occurs, a new version of the Unicode Character Database is created, containing updated data files. Data file changes are associated with specific, numbered versions of the standard; character properties are never silently corrected between official versions. Each version of the Unicode Character Database, once published, is absolutely stable and will never change. Implementations or specifications that refer to a specific version of the UCD can rely upon this stability. Detailed policies on character encoding stability as they relate to properties are found in Appendix F, Unicode Encoding Stability Policies. See the subsection “Policies” in Section B.6, Other Unicode Online Resources. See also the discussion of versioning and stability in Section 3.1, Versions of the Unicode Standard. Aliases. Character properties and their values are given formal aliases to make it easier to refer to them consistently in specifications and in implementations, such as regular expressions, which may use them. These aliases are listed exhaustively in the Unicode Character Database, in the data files PropertyAliases.txt and PropertyValueAliases.txt. Many of the aliases have both a long form and a short form. For example, the General Category has a long alias “General_Category” and a short alias “gc”. The long alias is more comprehensible and is usually used in the text of the standard when referring to a particular character property. The short alias is more appropriate for use in regular expressions and other algorithmic contexts. In comparing aliases programmatically, loose matching is appropriate. That entails ignoring case differences and any whitespace, underscore, and hyphen characters. For example, “GeneralCategory”, “general_category”, and “GENERAL-CATEGORY” would all be considered equivalent property aliases. See UCD.html in the Unicode Character Database for further discussion of property and property value matching. For each character property whose values are not purely numeric, the Unicode Character Database provides a list of value aliases. For example, one of the values of the Line_Break property is given the long alias “Open_Punctuation” and the short alias “OP”.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
132
Character Properties
Property aliases and property value aliases can be combined in regular expressions that pick out a particular value of a particular property. For example, “\p{lb=OP}” means the Open_Punctuation value of the Line_Break property, and “\p{gc=Lu}” means the Uppercase_Letter value of the General_Category property. Property aliases define a namespace. No two character properties have the same alias. For each property, the set of corresponding property value aliases constitutes its own namespace. No constraint prevents property value aliases for different properties from having the same property value alias. Thus “B” is the short alias for the Paragraph_Separator value of the Bidi_Class property; “B” is also the short alias for the Below value of the Canonical_Combining_Class property. However, because of the namespace restrictions, any combination of a property alias plus an appropriate property value alias is guaranteed to constitute a unique string, as in “\p{bc=B}” versus “\p{ccc=B}”. For a recommended use of property and property value aliases, see Unicode Technical Standard #18, “Unicode Regular Expressions.” Aliases are also used for normatively referencing properties, as described in Section 3.1, Versions of the Unicode Standard. CD-ROM and Online Availability. A copy of the 5.0.0 version of the UCD is provided on the CD-ROM. All versions of the UCD are available online on the Unicode Web site. See the subsections “Online Unicode Character Database” and “Online Unihan Database” in Section B.6, Other Unicode Online Resources.
4.2 Case—Normative Case is a normative property of characters in certain alphabets whereby characters are considered to be variants of a single letter. These variants, which may differ markedly in shape and size, are called the uppercase letter (also known as capital or majuscule) and the lowercase letter (also known as small or minuscule). The uppercase letter is generally larger than the lowercase letter. Because of the inclusion of certain composite characters for compatibility, such as U+01F1 latin capital letter dz, a third case, called titlecase, is used where the first character of a word must be capitalized. An example of such a character is U+01F2 latin capital letter d with small letter z. The three case forms are UPPERCASE, Titlecase, and lowercase. For those scripts that have case (Latin, Greek, Coptic, Cyrillic, Glagolitic, Armenian, Deseret, and archaic Georgian), uppercase characters typically contain the word capital in their names. Lowercase characters typically contain the word small. However, this is not a reliable guide. The word small in the names of characters from scripts other than those just listed has nothing to do with case. There are other exceptions as well, such as small capital letters that are not formally uppercase. Some Greek characters with capital in their names are actually titlecase. (Note that while the archaic Georgian script contained upper- and lowercase pairs, they are not used in modern Georgian. See Section 7.7, Georgian.) The authoritative source for case of Unicode characters is the specification of lowercase, uppercase, and titlecase properties in the Unicode Character Database.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
4.3
Combining Classes—Normative
133
Case Mapping The default case mapping tables defined in the Unicode Standard are normative, but may be overridden to match user or implementation requirements. The Unicode Character Database contains five files with case mapping information, as shown in Table 4-1. Full case mappings for Unicode characters are obtained by using the basic mappings from UnicodeData.txt and extending or overriding them where necessary with the mappings from SpecialCasing.txt. Full case mappings may depend on the context surrounding the character in the original string. Some characters have a “best” single-character mapping in UnicodeData.txt as well as a full mapping in SpecialCasing.txt. Any character that does not have a mapping in these files is considered to map to itself. For more information on case mappings, see Section 5.18, Case Mappings.
Table 4-1. Sources for Case Mapping Information File Name
Description
UnicodeData.txt
Contains the case mappings that map to a single character. These do not increase the length of strings, nor do they contain context-dependent mappings. SpecialCasing.txt Contains additional case mappings that map to more than one character, such as “ß” to “SS”. Also contains context-dependent mappings, with flags to distinguish them from the normal mappings, as well as some locale-dependent mappings. CaseFolding.txt Contains data for performing locale-independent case folding, as described in “Caseless Matching,” in Section 5.18, Case Mappings. DerivedCoreProp- Contains definitions of the properties Lowercase and Uppercase. erties.txt PropList.txt Contains the definition of the property Soft_Dotted.
The single-character mappings in UnicodeData.txt are insufficient for languages such as German. Therefore, only legacy implementations that cannot handle case mappings that increase string lengths should use UnicodeData.txt case mappings alone. A set of charts that show the latest case mappings is also available on the Unicode Web site. See “Charts” in Section B.6, Other Unicode Online Resources.
4.3 Combining Classes—Normative Each combining character has a normative canonical combining class. This class is used with the Canonical Ordering Algorithm to determine which combining characters interact typographically and to determine how the canonical ordering of sequences of combining characters takes place. Class zero combining characters act like base letters for the purpose of determining canonical order. Combining characters with non-zero classes participate in
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
134
Character Properties
reordering for the purpose of determining the canonical order of sequences of characters. (See Section 3.11, Canonical Ordering Behavior, for a description of the algorithm.) The list of combining characters and their canonical combining class appears in the Unicode Character Database. Most combining characters are nonspacing. The canonical order of character sequences does not imply any kind of linguistic correctness or linguistic preference for ordering of combining marks in sequences. For more information on rendering combining marks, see Section 5.13, Rendering Nonspacing Marks. Class zero combining marks are never reordered by the Canonical Ordering Algorithm. Except for class zero, the exact numerical values of the combining classes are of no importance in canonical equivalence, although the relative magnitude of the classes is significant. For example, it is crucial that the combining class of the cedilla be lower than the combining class of the dot below, although their exact values of 202 and 220 are not important for implementations. Certain classes tend to correspond with particular rendering positions relative to the base character, as shown in Figure 4-1.
Figure 4-1. Positions of Common Combining Marks 230 216
202 220
Reordrant, Split, and Subjoined Combining Marks In some scripts, the rendering of combining marks is notably complex. This is true in particular of the Brahmi-derived scripts of South and Southeast Asia, whose vowels are often encoded as class zero combining marks in the Unicode Standard, known as matras for the Indic scripts. In the case of simple combining marks, as for the accent marks of the Latin script, the normative Unicode combining class of that combining mark typically corresponds to its positional placement with regard to a base letter, as described earlier. However, in the case of the combining marks representing vowels (and sometimes consonants) in the Brahmiderived scripts, all of the combining marks are given the normative combining class of zero, regardless of their positional placement within an aksara. The placement and rendering of a class zero combining mark cannot be derived from its combining class alone, but rather depends on having more information about the particulars of the script involved. In some instances, the position may migrate in different historical periods for a script or may even differ depending on font style.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
4.3
Combining Classes—Normative
135
Such matters are not treated as normative character properties in the Unicode Standard, because they are more properly considered properties of the glyphs and fonts used for rendering. However, to assist implementers, earlier versions of the Unicode Standard did subcategorize some class zero combining marks, pointing out significant types that need to be handled consistently. That earlier subcategorization is extended and refined in this section. Reordrant Class Zero Combining Marks. In many instances in Indic scripts, a vowel is represented in logical order after the consonant of a syllable, but is displayed before (to the left of) the consonant when rendered. Such combining marks are termed reordrant to reflect their visual reordering to the left of a consonant (or, in some instances, a consonant cluster). Special handling is required for selection and editing of these marks. In particular, the possibility that the combining mark may be reordered left past a cluster, and not simply past the immediate preceding character in the backing store, requires attention to the details for each script involved. The visual reordering of these reordrant class zero combining marks has nothing to do with the reordering of combining character sequences in the Canonical Ordering Algorithm. All of these marks are class zero and thus are never reordered by the Canonical Ordering Algorithm or during normalization. The reordering is purely a presentational issue for glyphs during rendering of text. Table 4-2 lists reordrant class zero combining marks in the Unicode Standard.
Table 4-2. Class Zero Combining Marks—Reordrant Script
Code Points
Devanagari Bengali Gurmukhi Gujarati Oriya Tamil Malayalam Sinhala Myanmar Khmer Balinese Buginese
093F 09BF, 09C7, 09C8 0A3F 0ABF 0B47 0BC6, 0BC7, 0BC8 0D46, 0D47, 0D48 0DD9, 0DDA, 0DDB 1031 17C1, 17C2, 17C3 1B3E, 1B3F 1A19, 1A1B
In addition, there are historically related vowel characters in the Thai and Lao scripts that, for legacy reasons, are not treated as combining marks. Instead, for Thai and Lao, these vowels are represented in the backing store in visual order and require no reordering for rendering. The trade-off is that they have to be rearranged logically for searching and sorting. Because of that processing requirement, these characters are given a formal character property assignment, the Logical_Order_Exception property, as listed in Table 4-3. See PropList.txt in the Unicode Character Database.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
136
Character Properties
Table 4-3. Thai and Lao Logical Order Exceptions Script
Code Points
Thai Lao
0E40..0E44 0EC0..0EC4
Split Class Zero Combining Marks. In addition to the reordrant class zero combining marks, there are a number of class zero combining marks whose representative glyph typically consists of two parts, which are split into different positions with respect to the consonant (or consonant cluster) in an aksara. Sometimes these glyphic pieces are rendered both to the left and the right of a consonant. Sometimes one piece is rendered above or below the consonant and the other piece is rendered to the left or the right. Particularly in the instances where some piece of the glyph is rendered to the left of the consonant, these split class zero combining marks pose similar implementation problems as for the reordrant marks. Table 4-4 lists split class zero combining marks in the Unicode Standard, subgrouped by positional patterns.
Table 4-4. Class Zero Combining Marks—Split Glyph Positions
Script
Left and right
Bengali Oriya Tamil Malayalam Sinhala Khmer Balinese Left and top Oriya Sinhala Khmer Left, top, and right Oriya Sinhala Khmer Top and right Oriya Kannada Limbu Balinese Top and bottom Telugu Tibetan Balinese Top, bottom, and right Balinese Bottom and right Balinese
Copyright © 1991-2007, Unicode, Inc.
Code Points 09CB, 09CC 0B4B 0BCA, 0BCB, 0BCC 0D4A, 0D4B, 0D4C 0DDC, 0DDE 17C0, 17C4, 17C5 1B40, 1B41 0B48 0DDA 17BE 0B4C 0DDD 17BF 0B57 0CC0, 0CC7, 0CC8, 0CCA, 0CCB 1925, 1926 1B43 0C48 0F73, 0F76, 0F77, 0F78, 0F79, 0F81 1B3C 1B3D 1B3B
The Unicode Standard 5.0 – Electronic edition
4.3
Combining Classes—Normative
137
One should pay very careful attention to all split class zero combining marks in implementations. Not only do they pose issues for rendering and editing, but they also often have canonical equivalences defined involving the separate pieces, when those pieces are also encoded as characters. As a consequence, the split combining marks may constitute exceptional cases under normalization. Some of the Tibetan split combining marks are discouraged from use. The split vowels also pose difficult problems for understanding the standard, as the phonological status of the vowel phonemes, the encoding status of the characters (including any canonical equivalences), and the graphical status of the glyphs are easily confused, both for native users of the script and for engineers working on implementations of the standard. Subjoined Class Zero Combining Marks. Brahmi-derived scripts that are not represented in the Unicode Standard with a virama may have class zero combining marks to represent subjoined forms of consonants. These correspond graphologically to what would be represented by a sequence of virama + consonant in other related scripts. The subjoined consonants do not pose particular rendering problems, at least not in comparison to other combining marks, but they should be noted as constituting an exception to the normal pattern in Brahmi-derived scripts of consonants being represented with base letters. This exception needs to be taken into account when doing linguistic processing or searching and sorting. Table 4-5 lists subjoined class zero combining marks in the Unicode Standard.
Table 4-5. Class Zero Combining Marks—Subjoined Script
Code Points
Tibetan Limbu
0F90..0F97, 0F99..0FBC 1929, 192A, 192B
These Limbu consonants, while logically considered subjoined combining marks, are rendered mostly at the lower right of a base letter, rather than directly beneath them. Strikethrough Class Zero Combining Marks. The Kharoshthi script is unique in having some class zero combining marks for vowels that are struck through a consonant, rather than being placed in a position around the consonant. These are also called out in Table 4-6 specifically as a warning that they may involve particular problems for implementations.
Table 4-6. Class Zero Combining Marks—Strikethrough Script
Code Points
Kharoshthi
10A01, 10A06
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html
Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html
Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html
This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006
Chapter 5
Implementation Guidelines 5 It is possible to implement a substantial subset of the Unicode Standard as “wide ASCII” with little change to existing programming practice. However, the Unicode Standard also provides for languages and writing systems that have more complex behavior than English does. Whether one is implementing a new operating system from the ground up or enhancing existing programming environments or applications, it is necessary to examine many aspects of current programming practice and conventions to deal with this more complex behavior. This chapter covers a series of short, self-contained topics that are useful for implementers. The information and examples presented here are meant to help implementers understand and apply the design and features of the Unicode Standard. That is, they are meant to promote good practice in implementations conforming to the Unicode Standard. These recommended guidelines are not normative and are not binding on the implementer, but are intended to represent best practice. When implementing the Unicode Standard, it is important to look not only at the letter of the conformance rules, but also at their spirit. Many of the following guidelines have been created specifically to assist people who run into issues with conformant implementations, while reflecting the requirements of actual usage.
5.1 Transcoding to Other Standards The Unicode Standard exists in a world of other text and character encoding standards— some private, some national, some international. A major strength of the Unicode Standard is the number of other important standards that it incorporates. In many cases, the Unicode Standard included duplicate characters to guarantee round-trip transcoding to established and widely used standards.
Issues Conversion of characters between standards is not always a straightforward proposition. Many characters have mixed semantics in one standard and may correspond to more than one character in another. Sometimes standards give duplicate encodings for the same character; at other times the interpretation of a whole set of characters may depend on the application. Finally, there are subtle differences in what a standard may consider a character.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
152
Implementation Guidelines
For these reasons, mapping tables are usually required to map between the Unicode Standard and another standard. Mapping tables need to be used consistently for text data exchange to avoid modification and loss of text data. For details, see Unicode Technical Standard #22, “Character Mapping Markup Language (CharMapML).” By contrast, conversions between different Unicode encoding forms are fast, lossless permutations. The Unicode Standard can be used as a pivot to transcode among n different standards. This process, which is sometimes called triangulation, reduces the number of mapping tables that an implementation needs from O(n2) to O(n).
Multistage Tables Tables require space. Even small character sets often map to characters from several different blocks in the Unicode Standard and thus may contain up to 64K entries (for the BMP) or 1,088K entries (for the entire codespace) in at least one direction. Several techniques exist to reduce the memory space requirements for mapping tables. These techniques apply not only to transcoding tables, but also to many other tables needed to implement the Unicode Standard, including character property data, case mapping, collation tables, and glyph selection tables. Flat Tables. If diskspace is not at issue, virtual memory architectures yield acceptable working set sizes even for flat tables because the frequency of usage among characters differs widely. Even small character sets contain many infrequently used characters. In addition, data intended to be mapped into a given character set generally does not contain characters from all blocks of the Unicode Standard (usually, only a few blocks at a time need to be transcoded to a given character set). This situation leaves certain sections of the mapping tables unused—and therefore paged to disk. The effect is most pronounced for large tables mapping from the Unicode Standard to other character sets, which have large sections simply containing mappings to the default character, or the “unmappable character” entry. Ranges. It may be tempting to “optimize” these tables for space by providing elaborate provisions for nested ranges or similar devices. This practice leads to unnecessary performance costs on modern, highly pipelined processor architectures because of branch penalties. A faster solution is to use an optimized two-stage table, which can be coded without any test or branch instructions. Hash tables can also be used for space optimization, although they are not as fast as multistage tables. Two-Stage Tables. Two-stage tables are a commonly employed mechanism to reduce table size (see Figure 5-1). They use an array of pointers and a default value. If a pointer is NULL, the value returned by a lookup operation in the table is the default value. Otherwise, the pointer references a block of values used for the second stage of the lookup. For BMP characters, it is quite efficient to organize such two-stage tables in terms of high byte and low byte values. The first stage is an array of 256 pointers, and each of the secondary blocks contains 256 values indexed by the low byte in the code point. For supplementary characters, it is often advisable to structure the pointers and second-stage arrays somewhat differ-
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
5.2
Programming Languages and Data Types
153
ently, so as to take best advantage of the very sparse distribution of supplementary characters in the remaining codespace.
Figure 5-1. Two-Stage Tables
Optimized Two-Stage Table. Wherever any blocks are identical, the pointers just point to the same block. For transcoding tables, this case occurs generally for a block containing only mappings to the default or “unmappable” character. Instead of using NULL pointers and a default value, one “shared” block of default entries is created. This block is pointed to by all first-stage table entries, for which no character value can be mapped. By avoiding tests and branches, this strategy provides access time that approaches the simple array access, but at a great savings in storage. Multistage Table Tuning. Given a table of arbitrary size and content, it is a relatively simple matter to write a small utility that can calculate the optimal number of stages and their width for a multistage table. Tuning the number of stages and the width of their arrays of index pointers can result in various trade-offs of table size versus average access time.
5.2 Programming Languages and Data Types Programming languages provide for the representation and handling of characters and strings via data types, data constants (literals), and methods. Explicit support for Unicode helps with the development of multilingual applications. In some programming languages, strings are expressed as sequences (arrays) of primitive types, exactly corresponding to sequences of code units of one of the Unicode encoding forms. In other languages, strings
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
154
Implementation Guidelines
are objects, but indexing into strings follows the semantics of addressing code units of a particular encoding form. Data types for “characters” generally hold just a single Unicode code point value for lowlevel processing and lookup of character property values. When a primitive data type is used for single-code point values, a signed integer type can be useful; negative values can hold “sentinel” values like end-of-string or end-of-file, which can be easily distinguished from Unicode code point values. However, in most APIs, string types should be used to accommodate user-perceived characters, which may require sequences of code points.
Unicode Data Types for C ISO/IEC Technical Report 19769, Extensions for the programming language C to support new character types, defines data types for the three Unicode encoding forms (UTF-8, UTF-16, and UTF-32), syntax for Unicode string and character literals, and methods for the conversion between the Unicode encoding forms. No other methods are specified. Unicode strings are encoded as arrays of primitive types as usual. For UTF-8, UTF-16, and UTF-32, the basic types are char, char16_t, and char32_t, respectively. The ISO Technical Report assumes that char is at least 8 bits wide for use with UTF-8. While char and wchar_t may be signed or unsigned types, the new char16_t and char32_t types are defined to be unsigned integer types. Unlike the specification in the wchar_t programming model, the Unicode data types do not require that a single string base unit alone (especially char or char16_t) must be able to store any one character (code point). UTF-16 string and character literals are written with a lowercase u as a prefix, similar to the L prefix for wchar_t literals. UTF-32 literals are written with an uppercase U as a prefix. Characters outside the basic character set are available for use in string literals through the \uhhhh and \Uhhhhhhhh escape sequences. These types and semantics are available in a compiler if the
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
5.3
Unknown and Missing Characters
155
The width of wchar_t is compiler-specific and can be as small as 8 bits. Consequently, programs that need to be portable across any C or C++ compiler should not use wchar_t for storing Unicode text. The wchar_t type is intended for storing compiler-defined wide characters, which may be Unicode characters in some compilers. However, programmers who want a UTF-16 implementation can use a macro or typedef (for example, UNICHAR) that can be compiled as unsigned short or wchar_t depending on the target compiler and platform. Other programmers who want a UTF-32 implementation can use a macro or typedef that might be compiled as unsigned int or wchar_t, depending on the target compiler and platform. This choice enables correct compilation on different platforms and compilers. Where a 16-bit implementation of wchar_t is guaranteed, such macros or typedefs may be predefined (for example, TCHAR on the Win32 API). On systems where the native character type or wchar_t is implemented as a 32-bit quantity, an implementation may use the UTF-32 form to represent Unicode characters. A limitation of the ISO/ANSI C model is its assumption that characters can always be processed in isolation. Implementations that choose to go beyond the ISO/ANSI C model may find it useful to mix widths within their APIs. For example, an implementation may have a 32-bit wchar_t and process strings in any of the UTF-8, UTF-16, or UTF-32 forms. Another implementation may have a 16-bit wchar_t and process strings as UTF-8 or UTF-16, but have additional APIs that process individual characters as UTF-32 or deal with pairs of UTF-16 code units.
5.3 Unknown and Missing Characters This section briefly discusses how users or implementers might deal with characters that are not supported or that, although supported, are unavailable for legible rendering.
Reserved and Private-Use Character Codes There are two classes of code points that even a “complete” implementation of the Unicode Standard cannot necessarily interpret correctly: • Code points that are reserved • Code points in the Private Use Area for which no private agreement exists An implementation should not attempt to interpret such code points. However, in practice, applications must deal with unassigned code points or private-use characters. This may occur, for example, when the application is handling text that originated on a system implementing a later release of the Unicode Standard, with additional assigned characters. Options for rendering such unknown code points include printing the code point as four to six hexadecimal digits, printing a black or white box, using appropriate glyphs such as ê for reserved and | for private use, or simply displaying nothing. An implementation should not blindly delete such characters, nor should it unintentionally transform them into something else.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
156
Implementation Guidelines
Interpretable but Unrenderable Characters An implementation may receive a code point that is assigned to a character in the Unicode character encoding, but be unable to render it because it lacks a font for the code point or is otherwise incapable of rendering it appropriately. In this case, an implementation might be able to provide limited feedback to the user’s queries, such as being able to sort the data properly, show its script, or otherwise display the code point in a default manner. An implementation can distinguish between unrenderable (but assigned) code points and unassigned code points by printing the former with distinctive glyphs that give some general indication of their type, such as A, B, C, D, E, F, G, H, J, R, S, and so on.
Default Property Values To work properly in implementations, unassigned code points must be given default property values as if they were characters, because various algorithms require property values to be assigned to every code point before they can function at all. These default values are not uniform across all unassigned code points, because certain ranges of code points need different values to maximize compatibility with expected future assignments. For information on the default values for each property, see its description in the Unicode Character Database. Except where indicated, the default values are not normative—conformant implementations can use other values.
Default Ignorable Code Points Normally, code points outside the repertoire of supported characters would be displayed with a fallback glyph, such as a black box. However, format and control characters must not have visible glyphs (although they may have an effect on other characters in display). These characters are also ignored except with respect to specific, defined processes; for example, zero width non-joiner is ignored by default in collation. To allow a greater degree of compatibility across versions of the standard, the ranges U+2060..U+206F, U+FFF0..U+FFFB, and U+E0000..U+E0FFF are reserved for format and control characters (General Category = Cf). Unassigned code points in these ranges should be ignored in processing and display. For more information, see Section 5.20, Default Ignorable Code Points.
Interacting with Downlevel Systems Versions of the Unicode Standard after Unicode 2.0 are strict supersets of Unicode 2.0 and all intervening versions. The Derived Age property tracks the version of the standard at which a particular character was added to the standard. This information can be particularly helpful in some interactions with downlevel systems. If the protocol used for communication between the systems provides for an announcement of the Unicode version on each one, an uplevel system can predict which recently added characters will appear as unassigned characters to the downlevel system.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
5.4
Handling Surrogate Pairs in UTF-16
157
5.4 Handling Surrogate Pairs in UTF-16 The method used by UTF-16 to address the 1,048,576 supplementary code points that cannot be represented by a single 16-bit value is called surrogate pairs. A surrogate pair consists of a high-surrogate code unit (leading surrogate) followed by a low-surrogate code unit (trailing surrogate), as described in the specifications in Section 3.8, Surrogates, and the UTF-16 portion of Section 3.9, Unicode Encoding Forms. In well-formed UTF-16, a trailing surrogate can be preceded only by a leading surrogate and not by another trailing surrogate, a non-surrogate, or the start of text. A leading surrogate can be followed only by a trailing surrogate and not by another leading surrogate, a non-surrogate, or the end of text. Maintaining the well-formedness of a UTF-16 code sequence or accessing characters within a UTF-16 code sequence therefore puts additional requirements on some text processes. Surrogate pairs are designed to minimize this impact. Leading surrogates and trailing surrogates are assigned to disjoint ranges of code units. In UTF-16, non-surrogate code points can never be represented with code unit values in those ranges. Because the ranges are disjoint, each code unit in well-formed UTF-16 must meet one of only three possible conditions: • A single non-surrogate code unit, representing a code point between 0 and D7FF16 or between E00016 and FFFF16 • A leading surrogate, representing the first part of a surrogate pair • A trailing surrogate, representing the second part of a surrogate pair By accessing at most two code units, a process using the UTF-16 encoding form can therefore interpret any Unicode character. Determining character boundaries requires at most scanning one preceding or one following code unit without regard to any other context. As long as an implementation does not remove either of a pair of surrogate code units or incorrectly insert another character between them, the integrity of the data is maintained. Moreover, even if the data becomes corrupted, the corruption remains localized, unlike with some other multibyte encodings such as Shift-JIS or EUC. Corrupting a single UTF16 code unit affects only a single character. Because of non-overlap (see Section 2.5, Encoding Forms), this kind of error does not propagate throughout the rest of the text. UTF-16 enjoys a beneficial frequency distribution in that, for the majority of all text data, surrogate pairs will be very rare; non-surrogate code points, by contrast, will be very common. Not only does this help to limit the performance penalty incurred when handling a variable-width encoding, but it also allows many processes either to take no specific action for surrogates or to handle surrogate pairs with existing mechanisms that are already needed to handle character sequences. Implementations should fully support surrogate pairs in processing UTF-16 text. Without surrogate support, an implementation would not interpret any supplementary characters or guarantee the integrity of surrogate pairs. This might apply, for example, to an older
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
158
Implementation Guidelines
implementation, conformant to Unicode Version 1.1 or earlier, before UTF-16 was defined. Support for supplementary characters is important because a significant number of them are relevant for modern use, despite their low frequency. The individual components of implementations may have different levels of support for surrogates, as long as those components are assembled and communicate correctly. Lowlevel string processing, where a Unicode string is not interpreted but is handled simply as an array of code units, may ignore surrogate pairs. With such strings, for example, a truncation operation with an arbitrary offset might break a surrogate pair. (For further discussion, see Section 2.7, Unicode Strings.) For performance in string operations, such behavior is reasonable at a low level, but it requires higher-level processes to ensure that offsets are on character boundaries so as to guarantee the integrity of surrogate pairs. Strategies for Surrogate Pair Support. Many implementations that handle advanced features of the Unicode Standard can easily be modified to support surrogate pairs in UTF-16. For example: • Text collation can be handled by treating those surrogate pairs as “grouped characters,” such as is done for “ij” in Dutch or “ch” in Slovak. • Text entry can be handled by having a keyboard generate two Unicode code points with a single keypress, much as an ENTER key can generate CRLF or an Arabic keyboard can have a “lam-alef ” key that generates a sequence of two characters, lam and alef. • Truncation can be handled with the same mechanism as used to keep combining marks with base characters. For more information, see Unicode Standard Annex #29, “Text Boundaries.” Users are prevented from damaging the text if a text editor keeps insertion points (also known as carets) on character boundaries. Implementations using UTF-8 and Unicode 8-bit strings necessitate similar considerations. The main difference from handling UTF-16 is that in the UTF-8 case the only characters that are represented with single code units (single bytes) in UTF-8 are the ASCII characters, U+0000..U+007F. Characters represented with multibyte sequences are very common in UTF-8, unlike surrogate pairs in UTF-16, which are rather uncommon. This difference in frequency may result in different strategies for handling the multibyte sequences in UTF-8.
5.5 Handling Numbers There are many sets of characters that represent decimal digits in different scripts. Systems that interpret those characters numerically should provide the correct numerical values. For example, the sequence when numerically interpreted has the value twenty.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
5.5
Handling Numbers
159
When converting binary numerical values to a visual form, digits can be chosen from different scripts. For example, the value twenty can be represented either by or by or by . It is recommended that systems allow users to choose the format of the resulting digits by replacing the appropriate occurrence of U+0030 digit zero with U+0660 arabic-indic digit zero, and so on. (See Chapter 4, Character Properties, for the information needed to implement formatting and scanning numerical values.) Fullwidth variants of the ASCII digits are simply compatibility variants of regular digits and should be treated as regular Western digits. The Roman numerals, Greek acrophonic numerals, and East Asian ideographic numerals are decimal numeral writing systems, but they are not formally decimal radix digit systems. That is, it is not possible to do a one-to-one transcoding to forms such as 123456.789. Such systems are appropriate only for positive integer writing. Sumero-Akkadian numerals were used for sexagesimal systems. There was no symbol for zero, but by Babylonian times, a place value system was in use. Thus the exact value of a digit depended on its position in a number. There was also ambiguity in numerical representation, because a symbol such as U+12079 cuneiform sign dish could represent either 1 or 1 × 60 or 1 × (60 × 60), depending on the context. A numerical expression might also be interpreted as a sexigesimal fraction. So the sequence <1, 10, 5> might be evaluated as 1 × 60 + 10 + 5 = 75 or 1 × 60 × 60 + 10 + 5 = 3615 or 1 + (10 + 5)/60 = 1.25. Many other complications arise in Cuneiform numeral systems, and they clearly require special processing distinct from that used for modern decimal radix systems. It is also possible to write numbers in two ways with CJK ideographic digits. For example, Figure 5-2 shows how the number 1,234 can be written.
Figure 5-2. CJK Ideographic Numbers
or
Supporting these ideographic digits for numerical parsing means that implementations must be smart about distinguishing between these two cases. Digits often occur in situations where they need to be parsed, but are not part of numbers. One such example is alphanumeric identifiers (see Unicode Standard Annex #31, “Identifier and Pattern Syntax”). Only in higher-level protocols, such as when implementing a full mathematical formula parser, do considerations such as superscripting and subscripting of digits become crucial for numerical interpretation.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
160
Implementation Guidelines
5.6 Normalization Alternative Spellings. The Unicode Standard contains explicit codes for the most frequently used accented characters. These characters can also be composed; in the case of accented letters, characters can be composed from a base character and nonspacing mark(s). The Unicode Standard provides decompositions for characters that can be composed using a base character plus one or more nonspacing marks. Implementations that are “liberal” in what they accept but “conservative” in what they issue will have the fewest compatibility problems. The decomposition mappings are specific to a particular version of the Unicode Standard. Further decomposition mappings may be added to the standard for new characters encoded in the future; however, no existing decomposition mapping for a currently encoded character will ever be removed, nor will a decomposition mapping be added for a currently encoded character. This follows from the stability guarantees for normalization. See Appendix F, Unicode Encoding Stability Policies, for more information. Normalization. Systems may normalize Unicode-encoded text to one particular sequence, such as normalizing composite character sequences into precomposed characters, or vice versa (see Figure 5-3).
Figure 5-3. Normalization
Unnormalized
a @¨
@· ë @˜ ò
ä· ë˜ ò
a @¨ @· e @¨ @˜ o @`
Precomposed
Decomposed
Compared to the number of possible combinations, only a relatively small number of precomposed base character plus nonspacing marks have independent Unicode character values. Most existed in dominant standards. Systems that cannot handle nonspacing marks can normalize to precomposed characters; this option can accommodate most modern Latin-based languages. Such systems can use fallback rendering techniques to at least visually indicate combinations that they cannot handle (see the “Fallback Rendering” subsection of Section 5.13, Rendering Nonspacing Marks).
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
5.7
Compression
161
In systems that can handle nonspacing marks, it may be useful to normalize so as to eliminate precomposed characters. This approach allows such systems to have a homogeneous representation of composed characters and maintain a consistent treatment of such characters. However, in most cases, it does not require too much extra work to support mixed forms, which is the simpler route. The standard forms for normalization are defined in Unicode Standard Annex #15, “Unicode Normalization Forms.” For further information, see Chapter 3, Conformance; “Equivalent Sequences” in Section 2.2, Unicode Design Principles; and Section 2.11, Combining Characters.
5.7 Compression Using the Unicode character encoding may increase the amount of storage or memory space dedicated to the text portion of files. Compressing Unicode-encoded files or strings can therefore be an attractive option if the text portion is a large part of the volume of data compared to binary and numeric data, and if the processing overhead of the compression and decompression is acceptable. Compression always constitutes a higher-level protocol and makes interchange dependent on knowledge of the compression method employed. For a detailed discussion of compression and a standard compression scheme for Unicode, see Unicode Technical Standard #6, “A Standard Compression Scheme for Unicode.” Encoding forms defined in Section 2.5, Encoding Forms, have different storage characteristics. For example, as long as text contains only characters from the Basic Latin (ASCII) block, it occupies the same amount of space whether it is encoded with the UTF-8 or ASCII codes. Conversely, text consisting of CJK ideographs encoded with UTF-8 will require more space than equivalent text encoded with UTF-16. For processing rather than storage, the Unicode encoding form is usually selected for easy interoperability with existing APIs. Where there is a choice, the trade-off between decoding complexity (high for UTF-8, low for UTF-16, trivial for UTF-32) and memory and cache bandwidth (high for UTF-32, low for UTF-8 or UTF-16) should be considered.
5.8 Newline Guidelines Newlines are represented on different platforms by carriage return (CR), line feed (LF), CRLF, or next line (NEL). Not only are newlines represented by different characters on different platforms, but they also have ambiguous behavior even on the same platform. These characters are often transcoded directly into the corresponding Unicode code points when a character set is transcoded; this means that even programs handling pure Unicode have to deal with the problems. Especially with the advent of the Web, where text on a single machine can arise from many sources, this causes a significant problem.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
162
Implementation Guidelines
Newline characters are used to explicitly indicate line boundaries. For more information, see Unicode Standard Annex #14, “Line Breaking Properties.” Newlines are also handled specially in the context of regular expressions. For information, see Unicode Technical Standard #18, “Unicode Regular Expression Guidelines.” For the use of these characters in markup languages, see Unicode Technical Report #20, “Unicode in XML and Other Markup Languages.”
Definitions Table 5-1 provides hexadecimal values for the acronyms used in these guidelines.
Table 5-1. Hex Values for Acronyms Acronym
Name
Unicode
ASCII
EBCDIC
CR
carriage return
000D
0D
0D
0D
LF
line feed
000A
0A
25
15
CRLF
<000D 000A> <0D 0A>
<0D 25>
<0D 15>
NEL
carriage return and line feed next line
0085
85
15
25
VT
vertical tab
000B
0B
0B
0B
FF
form feed
000C
0C
0C
0C
LS
line separator
2028
n/a
n/a
n/a
PS
paragraph separator
2029
n/a
n/a
n/a
The acronyms shown in Table 5-1 correspond to characters or sequences of characters. The name column shows the usual names used to refer to the characters in question, whereas the other columns show the Unicode, ASCII, and EBCDIC encoded values for the characters. Encoding. Except for LS and PS, the newline characters discussed here are encoded as control codes. Many control codes were originally designed for device control but, together with TAB, the newline characters are commonly used as part of plain text. For more information on how Unicode encodes control codes, see Section 16.1, Control Codes. Notation. This discussion of newline guidelines uses lowercase when referring to functions having to do with line determination, but uses the acronyms when referring to the actual characters involved. Keys on keyboards are indicated in all caps. For example: The line separator may be expressed by LS in Unicode text or CR on some platforms. It may be entered into text with the SHIFT-RETURN key. EBCDIC. Table 5-1 shows the two mappings of LF and NEL used by EBCDIC systems. The first EBCDIC column shows the default control code mapping of these characters, which is
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
5.8
Newline Guidelines
163
used in most EBCDIC environments. The second column shows the z/OS Unix System Services (Open Edition) mapping of LF and NEL. That mapping arises from the use of the LF character for the newline function in C programs and in Unix environments, while text files on z/OS traditionally use NEL for the newline function. NEL (next line) is not actually defined in 7-bit ASCII. It is defined in the ISO control function standard, ISO 6429, as a C1 control function. However, the 0x85 mapping shown in the ASCII column in Table 5-1 is the usual way that this C1 control function is mapped in ASCII-based character encodings. Newline Function. The acronym NLF (newline function) stands for the generic control function for indication of a new line break. It may be represented by different characters, depending on the platform, as shown in Table 5-2.
Table 5-2. NLF Platform Correlations Platform
NLF Value
MacOS 9.x and earlier MacOS X Unix Windows EBCDIC-based OS
CR LF LF CRLF NEL
Line Separator and Paragraph Separator A paragraph separator—independent of how it is encoded—is used to indicate a separation between paragraphs. A line separator indicates where a line break alone should occur, typically within a paragraph. For example: This is a paragraph with a line separator at this point, causing the word “causing” to appear on a different line, but not causing the typical paragraph indentation, sentence breaking, line spacing, or change in flush (right, center, or left paragraphs). For comparison, line separators basically correspond to HTML
, and paragraph separators to older usage of HTML
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
164
Implementation Guidelines
programs as the Windows Notepad program and the Mac SimpleText program interpret their platform’s NLF as a paragraph separator, not a line separator. Once NLF was reinterpreted to stand for a paragraph separator, in some cases another control character was pressed into service as a line separator. For example, vertical tabulation VT is used in Microsoft Word. However, the choice of character for line separator is even less standardized than the choice of character for NLF. Many Internet protocols and a lot of existing text treat NLF as a line separator, so an implementer cannot simply treat NLF as a paragraph separator in all circumstances.
Recommendations The Unicode Standard defines two unambiguous separator characters: U+2029 paragraph separator (PS) and U+2028 line separator (LS). In Unicode text, the PS and LS characters should be used wherever the desired function is unambiguous. Otherwise, the following recommendations specify how to cope with an NLF when converting from other character sets to Unicode, when interpreting characters in text, and when converting from Unicode to other character sets. Note that even if an implementer knows which characters represent NLF on a particular platform, CR, LF, CRLF, and NEL should be treated the same on input and in interpretation. Only on output is it necessary to distinguish between them. Converting from Other Character Code Sets R1 If the exact usage of any NLF is known, convert it to LS or PS. R1a If the exact usage of any NLF is unknown, remap it to the platform NLF. Recommendation R1a does not really help in interpreting Unicode text unless the implementer is the only source of that text, because another implementer may have left in LF, CR, CRLF, or NEL. Interpreting Characters in Text R2 Always interpret PS as paragraph separator and LS as line separator. R2a In word processing, interpret any NLF the same as PS. R2b In simple text editors, interpret any NLF the same as LS. In line breaking, both PS and LS terminate a line; therefore, the Unicode Line Breaking Algorithm in Unicode Standard Annex #14, “Line Breaking Properties,” is defined such that any NLF causes a line break. R2c In parsing, choose the safest interpretation. For example, in recommendation R2c an implementer dealing with sentence break heuristics would reason in the following way that it is safer to interpret any NLF as LS:
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
5.8
Newline Guidelines
165
• Suppose an NLF were interpreted as LS, when it was meant to be PS. Because most paragraphs are terminated with punctuation anyway, this would cause misidentification of sentence boundaries in only a few cases. • Suppose an NLF were interpreted as PS, when it was meant to be LS. In this case, line breaks would cause sentence breaks, which would result in significant problems with the sentence break heuristics. Converting to Other Character Code Sets R3 If the intended target is known, map NLF, LS, and PS depending on the target conventions. For example, when mapping to Microsoft Word’s internal conventions for documents, LS would be mapped to VT, and PS and any NLF would be mapped to CRLF. R3a If the intended target is unknown, map NLF, LS, and PS to the platform newline convention (CR, LF, CRLF, or NEL). In Java, for example, this is done by mapping to a string nlf, defined as follows: String nlf = System.getProperties("line.separator");
Input and Output R4 A readline function should stop at NLF, LS, FF, or PS. In the typical implementation, it does not include the NLF, LS, PS, or FF that caused it to stop. Because the separator is lost, the use of such a readline function is limited to text processing, where there is no difference among the types of separators. R4a A writeline (or newline) function should convert NLF, LS, and PS according to the recommendations R3 and R3a. In C, gets is defined to terminate at a newline and replaces the newline with '\0', while fgets is defined to terminate at a newline and includes the newline in the array into which it copies the data. C implementations interpret '\n' either as LF or as the underlying platform newline NLF, depending on where it occurs. EBCDIC C compilers substitute the relevant codes, based on the EBCDIC execution set. Page Separator FF is commonly used as a page separator, and it should be interpreted that way in text. When displaying on the screen, it causes the text after the separator to be forced to the next page. It is interpreted in the same way as the LS for line breaking, in parsing, or in input segmentation such as readline. FF does not interrupt a paragraph, as paragraphs can and do span page boundaries.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
166
Implementation Guidelines
5.9 Regular Expressions Byte-oriented regular expression engines require extensions to handle Unicode successfully. The following issues are involved in such extensions: • Unicode is a large character set—regular expression engines that are adapted to handle only small character sets may not scale well. • Unicode encompasses a wide variety of languages that can have very different characteristics than English or other Western European text. For detailed information on the requirements of Unicode regular expressions, see Unicode Technical Standard #18, “Unicode Regular Expression Guidelines.”
5.10 Language Information in Plain Text Requirements for Language Tagging The requirement for language information embedded in plain text data is often overstated. Many commonplace operations such as collation seldom require this extra information. In collation, for example, foreign language text is generally collated as if it were not in a foreign language. (See Unicode Technical Standard #10, “Unicode Collation Algorithm,” for more information.) For example, an index in an English book would not sort the Slovak word “chlieb” after “czar,” where it would be collated in Slovak, nor would an English atlas put the Swedish city of Örebro after Zanzibar, where it would appear in Swedish. Text to speech is also an area where the case for embedded language information is overstated. Although language information may be useful in performing text-to-speech operations, modern software for doing acceptable text-to-speech must be so sophisticated in performing grammatical analysis of text that the extra work in determining the language is not significant in practice. Language information can be useful in certain operations, such as spell-checking or hyphenating a mixed-language document. It is also useful in choosing the default font for a run of unstyled text; for example, the ellipsis character may have a very different appearance in Japanese fonts than in European fonts. Modern font and layout technologies produce different results based on language information. For example, the angle of the acute accent may be different for French and Polish.
Language Tags and Han Unification A common misunderstanding about Unicode Han unification is the mistaken belief that Han characters cannot be rendered properly without language information. This idea might lead an implementer to conclude that language information must always be added to plain text using the tags. However, this implication is incorrect. The goal and methods of
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
5.11
Editing and Selection
167
Han unification were to ensure that the text remained legible. Although font, size, width, and other format specifications need to be added to produce precisely the same appearance on the source and target machines, plain text remains legible in the absence of these specifications. There should never be any confusion in Unicode, because the distinctions between the unified characters are all within the range of stylistic variations that exist in each country. No unification in Unicode should make it impossible for a reader to identify a character if it appears in a different font. Where precise font information is important, it is best conveyed in a rich text format. Typical Scenarios. The following e-mail scenarios illustrate that the need for language information with Han characters is often overstated: • Scenario 1. A Japanese user sends out untagged Japanese text. Readers are Japanese (with Japanese fonts). Readers see no differences from what they expect. • Scenario 2. A Japanese user sends out an untagged mixture of Japanese and Chinese text. Readers are Japanese (with Japanese fonts) and Chinese (with Chinese fonts). Readers see the mixed text with only one font, but the text is still legible. Readers recognize the difference between the languages by the content. • Scenario 3. A Japanese user sends out a mixture of Japanese and Chinese text. Text is marked with font, size, width, and so on, because the exact format is important. Readers have the fonts and other display support. Readers see the mixed text with different fonts for different languages. They recognize the difference between the languages by the content, and see the text with glyphs that are more typical for the particular language. It is common even in printed matter to render passages of foreign language text in nativelanguage fonts, just for familiarity. For example, Chinese text in a Japanese document is commonly rendered in a Japanese font.
5.11 Editing and Selection Consistent Text Elements As far as a user is concerned, the underlying representation of text is not a material concern, but it is important that an editing interface present a uniform implementation of what the user thinks of as characters. (See “‘Characters’ and Grapheme Clusters” in Section 2.11, Combining Characters.) The user expects them to behave as units in terms of mouse selection, arrow key movement, backspacing, and so on. For example, when such behavior is implemented, and an accented letter is represented by a sequence of base character plus a nonspacing combining mark, using the right arrow key would logically skip from the start of the base character to the end of the last nonspacing character.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
168
Implementation Guidelines
In some cases, editing a user-perceived “character” or visual cluster element by element may be the preferred way. For example, a system might have the backspace key delete by using the underlying code point, while the delete key could delete an entire cluster. Moreover, because of the way keyboards and input method editors are implemented, there often may not be a one-to-one relationship between what the user thinks of as a character and the key or key sequence used to input it. Three types of boundaries are generally useful in editing and selecting within words: cluster boundaries, stacked boundaries and atomic character boundaries. Cluster Boundaries. Arbitrarily defined cluster boundaries may occur in scripts such as Devanagari, for which selection may be defined as applying to syllables or parts of syllables. In such cases, combining character sequences such as ka + vowel sign a or conjunct clusters such as ka + halant + ta are selected as a single unit. (See Figure 5-4.)
Figure 5-4. Consistent Character Boundaries
Cluster Stack Atomic
∑Ê’¸– Rôle ∑Ê’¸– Rôle ∑Ê’¸– Rôle
Stacked Boundaries. Stacked boundaries are generally somewhat finer than cluster boundaries. Free-standing elements (such as vowel sign a in Devanagari) can be independently selected, but any elements that “stack” (including vertical ligatures such as Arabic lam + meem in Figure 5-4) can be selected only as a single unit. Stacked boundaries treat default grapheme clusters as single entities, much like composite characters. (See Unicode Standard Annex #29, “Text Boundaries,” for the definition of default grapheme clusters and for a discussion of how grapheme clusters can be tailored to meet the needs of defining arbitrary cluster boundaries.) Atomic Character Boundaries. The use of atomic character boundaries is closest to selection of individual Unicode characters. However, most modern systems indicate selection with some sort of rectangular highlighting. This approach places restrictions on the consistency of editing because some sequences of characters do not linearly progress from the start of the line. When characters stack, two mechanisms are used to visually indicate partial selection: linear and nonlinear boundaries. Linear Boundaries. Use of linear boundaries treats the entire width of the resultant glyph as belonging to the first character of the sequence, and the remaining characters in the backing-store representation as having no width and being visually afterward.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
5.12
Strategies for Handling Nonspacing Marks
169
This option is the simplest mechanism. The advantage of this system is that it requires very little additional implementation work. The disadvantage is that it is never easy to select narrow characters, let alone a zero-width character. Mechanically, it requires the user to select just to the right of the nonspacing mark and drag just to the left. It also does not allow the selection of individual nonspacing marks if more than one is present. Nonlinear Boundaries. Use of nonlinear boundaries divides any stacked element into parts. For example, picking a point halfway across a lam + meem ligature can represent the division between the characters. One can either allow highlighting with multiple rectangles or use another method such as coloring the individual characters. With more work, a precomposed character can behave in deletion as if it were a composed character sequence with atomic character boundaries. This procedure involves deriving the character’s decomposition on the fly to get the components to be used in simulation. For example, deletion occurs by decomposing, removing the last character, then recomposing (if more than one character remains). However, this technique does not work in general editing and selection. In most editing systems, the code point is the smallest addressable item, so the selection and assignment of properties (such as font, color, letterspacing, and so on) cannot be done on any finer basis than the code point. Thus the accent on an “e” could not be colored differently than the base in a precomposed character, although it could be colored differently if the text were stored internally in a decomposed form. Just as there is no single notion of text element, so there is no single notion of editing character boundaries. At different times, users may want different degrees of granularity in the editing process. Two methods suggest themselves. First, the user may set a global preference for the character boundaries. Second, the user may have alternative command mechanisms, such as Shift-Delete, which give more (or less) fine control than the default mode.
5.12 Strategies for Handling Nonspacing Marks By following these guidelines, a programmer should be able to implement systems and routines that provide for the effective and efficient use of nonspacing marks in a wide variety of applications and systems. The programmer also has the choice between minimal techniques that apply to the vast majority of existing systems and more sophisticated techniques that apply to more demanding situations, such as higher-end desktop publishing. In this section and the following section, the terms nonspacing mark and combining character are used interchangeably. The terms diacritic, accent, stress mark, Hebrew point, Arabic vowel, and others are sometimes used instead of nonspacing mark. (They refer to particular types of nonspacing marks.) Properly speaking, a nonspacing mark is any combining character that does not add space along the writing direction. For a formal definition of nonspacing mark, see Section 3.6, Combination.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
170
Implementation Guidelines
A relatively small number of implementation features are needed to support nonspacing marks. Different levels of implementation are also possible. A minimal system yields good results and is relatively simple to implement. Most of the features required by such a system are simply modifications of existing software. As nonspacing marks are required for a number of writing systems, such as Arabic, Hebrew, and those of South Asia, many vendors already have systems capable of dealing with these characters and can use their experience to produce general-purpose software for handling these characters in the Unicode Standard. Rendering. Composite character sequences can be rendered effectively by means of a fairly simple mechanism. In simple character rendering, a nonspacing combining mark has a zero advance width, and a composite character sequence will have the same width as the base character. Wherever a sequence of base character plus one or more nonspacing marks occurs, the glyphs for the nonspacing marks can be positioned relative to the base. The ligature mechanisms in the fonts can also substitute a glyph representing the combined form. In some cases the width of the base should change because of an applied accent, such as with “î”. The ligature or contextual form mechanisms in the font can be used to change the width of the base in cases where this is required. Other Processes. Correct multilingual comparison routines must already be able to compare a sequence of characters as one character, or one character as if it were a sequence. Such routines can also handle combining character sequences when supplied with the appropriate data. When searching strings, remember to check for additional nonspacing marks in the target string that may affect the interpretation of the last matching character. Line breaking algorithms generally use state machines for determining word breaks. Such algorithms can be easily adapted to prevent separation of nonspacing marks from base characters. (See also the discussion in Section 5.6, Normalization. For details in particular contexts, see Unicode Technical Standard #10, “Unicode Collation Algorithm”; Unicode Standard Annex #14, “Line Breaking Properties”; and Unicode Standard Annex #29, “Text Boundaries.”)
Keyboard Input A common implementation for the input of combining character sequences is the use of dead keys. These keys match the mechanics used by typewriters to generate such sequences through overtyping the base character after the nonspacing mark. In computer implementations, keyboards enter a special state when a dead key is pressed for the accent and emit a precomposed character only when one of a limited number of “legal” base characters is entered. It is straightforward to adapt such a system to emit combining character sequences or precomposed characters as needed. Typists, especially in the Latin script, are trained on systems that work using dead keys. However, many scripts in the Unicode Standard (including the Latin script) may be imple-
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
5.12
Strategies for Handling Nonspacing Marks
171
mented according to the handwriting sequence, in which users type the base character first, followed by the accents or other nonspacing marks (see Figure 5-5).
Figure 5-5. Dead Keys Versus Handwriting Sequence
Dead Key
Zrich Zrich u Zürich ¨
Handwriting
u ¨
Zrich Zurich Zürich
In the case of handwriting sequence, each keystroke produces a distinct, natural change on the screen; there are no hidden states. To add an accent to any existing character, the user positions the insertion point (caret) after the character and types the accent.
Truncation There are two types of truncation: truncation by character count and truncation by displayed width. Truncation by character count can entail loss (be lossy) or be lossless. Truncation by character count is used where, due to storage restrictions, a limited number of characters can be entered into a field; it is also used where text is broken into buffers for transmission and other purposes. The latter case can be lossless if buffers are recombined seamlessly before processing or if lookahead is performed for possible combining character sequences straddling buffers. When fitting data into a field of limited storage length, some information will be lost. The preferred position for truncating text in that situation is on a grapheme cluster boundary. As Figure 5-6 shows, such truncation can mean truncating at an earlier point than the last character that would have fit within the physical storage limitation. (See Unicode Standard Annex #29, “Text Boundaries.”) Truncation by displayed width is used for visual display in a narrow field. In this case, truncation occurs on the basis of the width of the resulting string rather than on the basis of a character count. In simple systems, it is easiest to truncate by width, starting from the end and working backward by subtracting character widths as one goes. Because a trailing nonspacing mark does not contribute to the measurement of the string, the result will not separate nonspacing marks from their base characters. If the textual environment is more sophisticated, the widths of characters may depend on their context, due to effects such as kerning, ligatures, or contextual formation. For such
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html
Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html
Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html
This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006
Chapter 6
Writing Systems and Punctuation
6
This chapter begins the portion of the Unicode Standard devoted to the detailed description of each script or other related group of Unicode characters. Each of the subsequent chapters presents a historically or geographically related group of scripts. This chapter presents a general introduction to writing systems, explains how they can be used to classify scripts, and then presents a detailed discussion of punctuation characters that are shared across scripts. Scripts and Blocks. The codespace of the Unicode Standard is divided into subparts called blocks. Character blocks generally contain characters from a single script, and in many cases, a script is fully represented in its character block; however, some scripts are encoded using several blocks, which are not always adjacent. Discussion of scripts and other groups of characters are structured by character blocks. Corresponding subsection headers identify each block and its associated range of Unicode code points. The code charts in Chapter 17, Code Charts, are also organized by character blocks. Scripts and Writing Systems. There are many different kinds of writing systems in the world. Their variety poses some significant issues for character encoding in the Unicode Standard as well as for implementers of the standard. Those who first approach the Unicode Standard without a background in writing systems may find the huge list of scripts bewilderingly complex. Therefore, before considering the script descriptions in detail, this chapter first presents a brief introduction to the types of writing systems. That introduction explains basic terminology about scripts and character types that will be used again and again when discussing particular scripts. Punctuation. The rest of this chapter deals with a special case: punctuation marks, which tend to be scattered about in different blocks and which may be used in common by many scripts. Punctuation characters occur in several widely separated places in the character blocks, including Basic Latin, Latin-1 Supplement, General Punctuation, and CJK Symbols and Punctuation. There are also occasional punctuation characters in character blocks for specific scripts. Most punctuation characters are intended for common usage with any script, although some of them are script-specific. Some scripts use both common and script-specific punctuation characters, usually as the result of recent adoption of standard Western punctua-
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
198
Writing Systems and Punctuation
tion marks. While punctuation characters vary in details of appearance and function between different languages and scripts, their overall purpose is shared: They serve to separate or otherwise organize units of text, such as sentences and phrases, thereby helping to clarify the meaning of the text. Certain punctuation characters also occur in mathematical and scientific formulae.
6.1 Writing Systems This section presents a brief introduction to writing systems. It describes the different kinds of writing systems and relates them to the encoded scripts found in the Unicode Standard. This framework may help to make the variety of scripts, modern and historic, a little less daunting. The terminology used here follows that developed by Peter T. Daniels, a leading expert on writing systems of the world. The term writing system has two mutually exclusive meanings in this standard. As used in this section, “writing system” refers to a way that families of scripts may be classified by how they represent the sounds or words of human language. For example, the writing system of the Latin script is alphabetic. In other places in the standard, “writing system” refers to the way a particular language is written. For example, the modern Japanese writing system uses four scripts: Han ideographs, Hiragana, Katakana and Latin (Romaji). Alphabets. A writing system that consists of letters for the writing of both consonants and vowels is called an alphabet. The term “alphabet” is derived from the first two letters of the Greek script: alpha, beta. Consonants and vowels have equal status as letters in such a system. The Latin alphabet is the most widespread and well-known example of an alphabet, having been adapted for use in writing thousands of languages. The correspondence between letters and sounds may be either more or less exact. Many alphabets do not exhibit a one-to-one correspondence between distinct sounds and letters or groups of letters used to represent them; often this is an indication of original spellings that were not changed as the language changed. Not only are many sounds represented by letter combinations, such as “th” in English, but the language may have evolved since the writing conventions were settled. Examples range from cases such as Italian or Finnish, where the match between letter and sound is rather close, to English, which has notoriously complex and arbitrary spelling. Phonetic alphabets, in contrast, are used specifically for the precise transcription of the sounds of languages. The best known of these alphabets is the International Phonetic Alphabet, an adaptation and extension of the Latin alphabet by the addition of new letters and marks for specific sounds and modifications of sounds. Unlike normal alphabets, the intent of phonetic alphabets is that their letters exactly represent sounds. Phonetic alphabets are not used as general-purpose writing systems per se, but it is not uncommon for a formerly unwritten language to have an alphabet developed for it based on a phonetic alphabet. Abjads. A writing system in which only consonants are indicated is an abjad. The main letters are all consonants (or long vowels), with other vowels either left out entirely or option-
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
6.1
Writing Systems
199
ally indicated with the use of secondary marks on the consonants. The Phoenician script is a prototypical abjad; a better-known example is the Arabic writing system. The term “abjad” is derived from the first four letters of the traditional order of the Arabic script: alef, beh, jeem, dal. Abjads are often, although not exclusively, associated with Semitic languages, which have word structures particularly well suited to the use of consonantal writing. Some abjads allow consonant letters to mark long vowels, as the use of waw and yeh in Arabic for /u:/ or /i:/. Hebrew and Arabic are typically written without any vowel marking at all. The vowels, when they do occur in writing, are referred to as points or harakat, and are indicated by the use of diacritic dots and other marks placed above and below the consonantal letters. Syllabaries. In a syllabary, each symbol of the system typically represents both a consonant and a vowel, or in some instances more than one consonant and a vowel. One of the bestknown examples of a syllabary is Hiragana, used for Japanese, in which the units of the system represent the syllables ka, ki, ku, ke, ko, sa, si, su, se, so, and so on. In general parlance, the elements of a syllabary are not called letters, but rather syllables. This can lead to some confusion, however, because letters of alphabets and units of other writing systems are also used, singly or in combinations, to write syllables of languages. So in a broad sense, the term “letter” can be used to refer to the syllables of a syllabary. In syllabaries such as Cherokee, Hiragana, Katakana, and Yi, each symbol has a unique shape, with no particular shape relation to any of the consonant(s) or vowels of the syllables. In other cases, however, the syllabic symbols of a syllabary are not atomic; they can be built up out of parts that have a consistent relationship to the phonological parts of the syllable. The best example of this is the Hangul writing system for Korean. Each Hangul syllable is made up of a part for the initial consonant (or consonant cluster), a part for the vowel (or diphthong), and an optional part for the final consonant (or consonant cluster). The relationship between the sounds and the graphic parts to represent them is systematic enough for Korean that the graphic parts collectively are known as jamos and constitute a kind of alphabet on their own. The jamos of the Hangul writing system have another characteristic: their shapes are not completely arbitrary, but were devised with intentionally iconic shapes relating them to articulatory features of the sounds they represent in Korean. The Hangul writing system has thus also been classified as a featural syllabary. Abugidas. Abugidas represent a kind of blend of syllabic and alphabetic characteristics in a writing system. The Ethiopic script is an abugida. The term “abugida” is derived from the first four letters of the letters of the Ethiopic script in the Semitic order: alf, bet, gaml, dant. The order of vowels (-ä -u -i -a) is that of the traditional vowel order in the first four columns of the Ethiopic syllable chart. Historically, abugidas spread across South Asia and were adapted by many languages, often of phonologically very different types. This process has also resulted in many extensions, innovations, and/or simplifications of the original patterns. The best-known example of an abugida is the Devanagari script, used in modern times to write Hindi and many other Indian languages, and used classically to
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
200
Writing Systems and Punctuation
write Sanskrit. See Section 9.1, Devanagari, for a detailed description of how Devanagari works and is rendered. In an abugida, each consonant letter carries an inherent vowel, usually /a/. There are also vowel letters, often distinguished between a set of independent vowel letters, which occur on their own, and dependent vowel letters, or matras, which are subordinate to consonant letters. When a dependent vowel letter follows a consonant letter, the vowel overrides the inherent vowel of the consonant. This is shown schematically in Figure 6-1.
Figure 6-1. Overriding Inherent Vowels
ka + i í ki
ka + e í ke
ka + u í ku
ka + o í ko
Abugidas also typically contain a special element usually referred to as a halant, virama, or killer, which, when applied to a consonant letter with its inherent vowel, has the effect of removing the inherent vowel, resulting in a bare consonant sound. Because of legacy practice, three distinct approaches have been taken in the Unicode Standard for the encoding of abugidas: the Devanagari model, the Tibetan model, and the Thai model. The Devanagari model, used for most abugidas, encodes an explicit virama character and represents text in its logical order. The Thai model departs from the Devanagari model in that it represents text in its visual display order, based on the typewriter legacy, rather than in logical order. The Tibetan model avoids an explicit virama, instead encoding a sequence of subjoined consonants to represent consonants occurring in clusters in a syllable. The Ethiopic script is traditionally analyzed as an abugida, because the base character for each consonantal series is understood as having an inherent vowel. However, Ethiopic lacks some of the typical features of Brahmi-derived scripts, such as halants and matras. Historically, it was derived from early Semitic scripts and in its earliest form was an abjad. In its traditional presentation and its encoding in the Unicode Standard, it is now treated more like a syllabary. Logosyllabaries. The final major category of writing system is known as the logosyllabary. In a logosyllabary, the units of the writing system are used primarily to write words and/or morphemes of words, with some subsidiary usage to represent syllabic sounds per se. The best example of a logosyllabary is the Han script, used for writing Chinese and borrowed by a number of other East Asian languages for use as part of their writing systems. The term for a unit of the Han script is hànzì l% in Chinese, kanji l% in Japanese, and hanja l% in Korean. In many instances this unit also constitutes a word, but more typically, two or more units together are used to write a word. This unit has variously been referred to as an ideograph (“idea writing”), a logograph (“word writing”), or a sinogram, as well as other terms. No single English term is completely satisfactory or uncontroversial. In this standard, CJK ideograph is used because it is a widely understood term.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
6.1
Writing Systems
201
There are a number of other historical examples of logosyllabaries, such as Tangut, many of which may eventually be encoded in the Unicode Standard. They vary in the degree to which they combine logographic writing principles, where the symbols stand for morphemes or entire words, and syllabic writing principles, where the symbols come to represent syllables per se, divorced from their meaning as morphemes or words. In some notable instances, as for Sumero-Akkadian cuneiform, a logosyllabary may evolve through time into a syllabary or alphabet by shedding its use of logographs. In other instances, as for the Han script, the use of logographic characters is very well entrenched and persistent. However, even for the Han script a small number of characters are used purely to represent syllabic sounds, so as to be able to represent such things as foreign personal names and place names. The classification of a writing system is often somewhat blurred by complications in the exact ways in which it matches up written elements to the phonemes or syllables of a language. For example, although Hiragana is classified as a syllabary, it does not always have an exact match between syllables and written elements. Syllables with long vowels are not written with a single element, but rather with a sequence of elements. Thus the syllable with a long vowel k^ is written with two separate Hiragana symbols, {ku}+{u}. Because of these kinds of complications, one must always be careful not to assume too much about the structure of a writing system from its nominal classification. Typology of Scripts in the Unicode Standard. Table 6-1 lists all of the scripts currently encoded in the Unicode Standard, showing the writing system type for each. The list is an approximate guide, rather than a definitive classification, because of the mix of features seen in many scripts. The writing systems for some languages may be quite complex, mixing more than one type of script together in a composite system. Japanese is the best example; it mixes a logosyllabary (Han), two syllabaries (Hiragana and Katakana), and one alphabet (Latin, for romaji).
Table 6-1. Typology of Scripts in the Unicode Standard Alphabets
Latin, Greek, Cyrillic, Armenian, Thaana, Georgian, Ogham, Runic, Mongolian, Glagolitic, Coptic, Tifinagh, Old Italic, Gothic, Ugaritic, Old Persian, Deseret, Shavian, Osmanya, N’Ko
Abjads
Hebrew, Arabic, Syriac, Phoenician
Abugidas
Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Sinhala, Thai, Lao, Tibetan, Myanmar, Tagalog, Hanunóo, Buhid, Tagbanwa, Khmer, Limbu, Tai Le, New Tai Lue, Buginese, Syloti Nagri, Kharoshthi, Balinese, Phags-pa
Logosyllabaries
Han, Sumero-Akkadian
Simple Syllabaries
Cherokee, Hiragana, Katakana, Bopomofo, Yi, Linear B, Cypriot, Ethiopic, Canadian Aboriginal Syllabics
Featural Syllabaries Hangul Notational Systems. In addition to scripts for written natural languages, there are notational systems for other kinds of information. Some of these more closely resemble text
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
202
Writing Systems and Punctuation
than others. The Unicode Standard encodes symbols for use with mathematical notation, Western and Byzantine musical notation, and Braille, as well as symbols for use in divination, such as the Yijing hexagrams. Notational systems can be classified by how closely they resemble text. Even notational systems that do not fully resemble text may have symbols used in text. In the case of musical notation, for example, while the full notation is twodimensional, many of the encoded symbols are frequently referenced in texts about music and musical notation.
6.2 General Punctuation Punctuation characters—for example, U+002C comma and U+2022 bullet—are encoded only once, rather than being encoded again and again for particular scripts; such general-purpose punctuation may be used for any script or mixture of scripts. In contrast, punctuation principally used with a specific script is found in the block corresponding to that script, such as U+058A armenian hyphen, U+061B “ ” arabic semicolon, or the punctuation used with CJK ideographs in the CJK Symbols and Punctuation block. Scriptspecific punctuation characters may be unique in function, have different directionality, or be distinct in appearance or usage from their generic counterparts.
õ
Punctuation intended for use with several related scripts is often encoded with the principal script for the group. For example, U+1735 philippine single punctuation is encoded in a single location in the Hanunóo block, but it is intended for use with all four of the Philippine scripts. Use and Interpretation. The use and interpretation of punctuation characters can be heavily context dependent. For example, U+002E full stop can be used as sentence-ending punctuation, an abbreviation indicator, a decimal point, and so on. Many Unicode algorithms, such as the Bidirectional Algorithm and Line Breaking Algorithm, both of which treat numeric punctuation differently from text punctuation, resolve the status of any ambiguous punctuation mark depending on whether it is part of a number context. Legacy character encoding standards commonly include generic characters for punctuation instead of the more precisely specified characters used in printing. Examples include the single and double quotes, period, dash, and space. The Unicode Standard includes these generic characters, but also encodes the unambiguous characters independently: various forms of quotation marks, em dash, en dash, minus, hyphen, em space, en space, hair space, zero width space, and so on. Rendering. Punctuation characters vary in appearance with the font style, just like the surrounding text characters. In some cases, where used in the context of a particular script, a specific glyph style is preferred. For example, U+002E full stop should appear square when used with Armenian, but is typically circular when used with Latin. For mixed Latin/ Armenian text, two fonts (or one font allowing for context-dependent glyph variation) may need to be used to render the character faithfully.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
6.2
General Punctuation
203
Writing Direction. Punctuation characters shared across scripts have no inherent directionality. In a bidirectional context, their display direction is resolved according to the rules in Unicode Standard Annex #9, “The Bidirectional Algorithm.” Certain script-specific punctuation marks have an inherent directionality that matches the writing direction of the script. For an example, see “Dandas” later in this section. The image of certain paired punctuation marks, specifically those that are brackets, is mirrored when the character is part of a right-to-left directional run (see Section 4.7, Bidi Mirrored—Normative). Mirroring ensures that the opening and closing semantics of the character remains independent of the writing direction. The same is generally not true for other punctuation marks even when their image is not bilaterally symmetric, such as slash or the curly quotes. See also “Paired Punctuation” later in this section. In vertical writing, many punctuation characters have special vertical glyphs. Normally, fonts contain both the horizontal and vertical glyphs, and the selection of the appropriate glyph is based on the text orientation in effect at rendering time. However, see “CJK Compatibility Forms: Vertical Forms” later in this section. Figure 6-2 shows a set of three common shapes used for ideographic comma and ideographic full stop. The first shape in each row is that used for horizontal text, the last shape is that for vertical text. The centered form may be used with both horizontal and vertical text. See also Figure 6-4 for an example of vertical and horizontal forms for quotation marks.
Figure 6-2. Forms of CJK Punctuation
Horizontal
Centered
Vertical
、
、
、
。
。
。
Layout Controls. A number of characters in the blocks described in this section are not graphic punctuation characters, but rather affect the operation of layout algorithms. For a description of those characters, see Section 16.2, Layout Controls. Encoding Characters with Multiple Semantic Values. Some of the punctuation characters in the ASCII range (U+0020..U+007F) have multiple uses, either through ambiguity in the original standards or through accumulated reinterpretations of a limited code set. For example, 2716 is defined in ANSI X3.4 as apostrophe (closing single quotation mark; acute accent), and 2D16 is defined as hyphen-minus. In general, the Unicode Standard provides the same interpretation for the equivalent code points, without adding to or subtracting from their semantics. The Unicode Standard supplies unambiguous codes elsewhere for the most useful particular interpretations of these ASCII values; the corresponding unambiguous characters are cross-referenced in the character names list for this block. For more information, see “Apostrophes,” “Space Characters,” and “Dashes and Hyphens” later in this section.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
204
Writing Systems and Punctuation
Blocks Devoted to Punctuation For compatibility with widely used legacy character sets, the Basic Latin (ASCII) block (U+0000..U+007F) and the Latin-1 Supplement block (U+0080..U+00FF) contain several of the most common punctuation signs. They are isolated from the larger body of Unicode punctuation, signs, and symbols only because their relative code locations within ASCII and Latin-1 are so widely used in standards and software. The Unicode Standard has a number of blocks devoted specifically to encoding collections of punctuation characters. The General Punctuation block (U+2000..U+206F) contains the most common punctuation characters widely used in Latin typography, as well as a few specialized punctuation marks and a large number of format control characters. All of these punctuation characters are intended for generic use, and in principle they could be used with any script. The Supplemental Punctuation block (U+2E00..U+2E7F) is devoted to less commonly encountered punctuation marks, including those used in specialized notational systems or occurring primarily in ancient manuscript traditions. The CJK Symbols and Punctuation block (U+3000..U+303F) has the most commonly occurring punctuation specific to East Asian typography—that is, typography involving the rendering of text with CJK ideographs. The Vertical Forms block (U+FE10..U+FE1F), the CJK Compatibility Forms block (U+FE30..U+FE4F), the Small Form Variants block (U+FE50..U+FE6F), and the Halfwidth and Fullwidth Forms block (U+FF00..U+FFEF) contain many compatibility characters for punctuation marks, encoded for compatibility with a number of East Asian character encoding standards. Their primary use is for round-trip mapping with those legacy standards. For vertical text, the regular punctuation characters are used instead, with alternate glyphs for vertical layout supplied by the font. The punctuation characters in these various blocks are discussed below in terms of their general types.
Format Control Characters Format control characters are special characters that have no visible glyph of their own, but that affect the display of characters to which they are adjacent, or that have other specialized functions such as serving as invisible anchor points in text. All format control characters have General_Category=Cf. A significant number of format control characters are encoded in the General Punctuation block, but their descriptions are found in other sections. Cursive joining controls, as well as U+200B zero width space, U+2028 line separator, U+2029 paragraph separator, and U+2060 word joiner, are described in Section 16.2, Layout Controls. Bidirectional ordering controls are also discussed in Section 16.2, Layout Controls, but their detailed use is specified in Unicode Standard Annex #9, “The Bidirectional Algorithm.”
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
6.2
General Punctuation
205
Invisible operators are explained in Section 15.5, Invisible Mathematical Operators. Deprecated format characters related to obsolete models of Arabic text processing are described in Section 16.3, Deprecated Format Characters. The reserved code points U+2064..U+2069 and U+FFF0..U+FFF8, as well as any reserved code points in the range U+E0000..U+E0FFF, are reserved for the possible future encoding of other format control characters. Because of this, they are treated as default ignorable code points. For more information, see Section 5.20, Default Ignorable Code Points.
Space Characters The most commonly used space character is U+0020 space. Also often used is its nonbreaking counterpart, U+00A0 no-break space. These two characters have the same width, but behave differently for line breaking. For more information, see Unicode Standard Annex #14, “Line Breaking.” U+00A0 no-break space behaves like a numeric separator for the purposes of bidirectional layout. (See Unicode Standard Annex #9, “The Bidirectional Algorithm,” for a detailed discussion of the Unicode Bidirectional Algorithm.) In ideographic text, U+3000 ideographic space is commonly used because its width matches that of the ideographs. The main difference among other space characters is their width. U+2000..U+2006 are standard quad widths used in typography. U+2007 figure space has a fixed width, known as tabular width, which is the same width as digits used in tables. U+2008 punctuation space is a space defined to be the same width as a period. U+2009 thin space and U+200A hair space are successively smaller-width spaces used for narrow word gaps and for justification of type. The fixed-width space characters (U+2000..U+200A) are derived from conventional (hot lead) typography. Algorithmic kerning and justification in computerized typography do not use these characters. However, where they are used (for example, in typesetting mathematical formulae), their width is generally font-specified, and they typically do not expand during justification. The exception is U+2009 thin space, which sometimes gets adjusted. In addition to the various fixed-width space characters, there are a few script-specific space characters in the Unicode Standard. U+1680 ogham space mark is unusual in that it is generally rendered with a visible horizontal line, rather than being blank. Space characters with special behavior in word or line breaking are described in “Line and Word Breaking” in Section 16.2, Layout Controls, and Unicode Standard Annex #14, “Line Breaking.” U+00A0 no-break space has an additional, important function in the Unicode Standard. It may serve as the base character for displaying a nonspacing combining mark in apparent isolation. Versions of the standard prior to Version 4.1 indicated that U+0020 space could also be used for this function, but space is no longer recommended, because of potential interactions with the handling of space in XML and other markup languages. See Section 2.11, Combining Characters, for further discussion.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
206
Writing Systems and Punctuation
Space characters are found in several character blocks in the Unicode Standard. The list of space characters appears in Table 6-2.
Table 6-2. Unicode Space Characters Code
Name
U+0020 U+00A0 U+1680 U+180E U+2000 U+2001 U+2002 U+2003 U+2004 U+2005 U+2006 U+2007 U+2008 U+2009 U+200A U+202F U+205F U+3000
space no-break space ogham space mark mongolian vowel separator en quad em quad en space em space three-per-em space four-per-em space six-per-em space figure space punctuation space thin space hair space narrow no-break space medium mathematical space ideographic space
The space characters in the Unicode Standard can be identified by their General Category, [gc=Zs], in the Unicode Character Database. One exceptional “space” character is U+200B zero width space. This character, although called a “space” in its name, does not actually have any width or visible glyph in display. It functions primarily to indicate word boundaries in writing systems that do not actually use orthographic spaces to separate words in text. It is given the General Category [gc=Cf] and is treated as a format control character, rather than as a space character, in implementations. Further discussion of U+200B zero width space, as well as other zero-width characters with special properties, can be found in Section 16.2, Layout Controls.
Dashes and Hyphens Because of its prevalence in legacy encodings, U+002D hyphen-minus is the most common of the dash characters used to represent a hyphen. It has ambiguous semantic value and is rendered with an average width. U+2010 hyphen represents the hyphen as found in words such as “left-to-right.” It is rendered with a narrow width. When typesetting text, U+2010 hyphen is preferred over U+002D hyphen-minus. U+2011 non-breaking hyphen has the same semantic value as U+2010 hyphen, but should not be broken across lines. U+2012 figure dash has the same (ambiguous) semantic as the U+002D hyphen-minus, but has the same width as digits (if they are monospaced). U+2013 en dash is used to indicate a range of values, such as 1973–1984, although in some languages hyphen is used for that purpose. The en dash should be distinguished from the U+2212 minus sign, which is
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
6.2
General Punctuation
207
an arithmetic operator. Although it is not preferred in mathematical typesetting, typographers sometimes use U+2013 en dash to represent the minus sign, particularly a unary minus. When interpreting formulas, U+002D hyphen-minus, U+2012 figure dash, and U+2212 minus sign should each be taken as indicating a minus sign, as in “x = a - b”, unless a higher-level protocol precisely defines which of these characters serves that function. U+2014 em dash is used to make a break—like this—in the flow of a sentence. (Some typographers prefer to use U+2013 en dash set off with spaces – like this – to make the same kind of break.) Like many other conventions for punctuation characters, such usage may depend on language. This kind of dash is commonly represented with a typewriter as a double hyphen. In older mathematical typography, U+2014 em dash may also used to indicate a binary minus sign. U+2015 horizontal bar is used to introduce quoted text in some typographic styles. Dashes and hyphen characters may also be found in other character blocks in the Unicode Standard. A list of dash and hyphen characters appears in Table 6-3. For a description of the line breaking behavior of dashes and hyphens, see Unicode Standard Annex #14, “Line Breaking Properties.”
Table 6-3. Unicode Dash Characters Code
Name
U+002D U+007E U+058A U+05BE U+1806 U+2010 U+2011 U+2012 U+2013 U+2014 U+2015 U+2053 U+207B U+208B U+2212 U+2E17 U+301C U+3030 U+30A0 U+FE31 U+FE32 U+FE58 U+FE63 U+FF0D
hyphen-minus tilde (when used as swung dash) armenian hyphen hebrew punctuation maqaf mongolian todo soft hyphen hyphen non-breaking hyphen figure dash en dash em dash horizontal bar (= quotation dash) swung dash superscript minus subscript minus minus sign double oblique hyphen wave dash wavy dash katakana-hiragana double hyphen presentation form for vertical em dash presentation form for vertical en dash small em dash small hyphen-minus fullwidth hyphen-minus
Soft Hyphen. Despite its name, U+00AD soft hyphen is not a hyphen, but rather an invisible format character used to indicate optional intraword breaks. As described in
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html
Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html
Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html
This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006
Chapter 7
European Alphabetic Scripts 7 Modern European alphabetic scripts are derived from or influenced by the Greek script, which itself was an adaptation of the Phoenician alphabet. A Greek innovation was writing the letters from left to right, which is the writing direction for all the scripts derived from or inspired by Greek. The European alphabetic scripts and additional characters described in this chapter are Latin
Cyrillic
Georgian
Greek
Glagolitic
Modifier letters
Coptic
Armenian
Combining marks
The European scripts are all written from left to right. Many have separate lowercase and uppercase forms of the alphabet. Spaces are used to separate words. Accents and diacritical marks are used to indicate phonetic features and to extend the use of base scripts to additional languages. Some of these modification marks have evolved into small free-standing signs that can be treated as characters in their own right. The Latin script is used to write or transliterate texts in a wide variety of languages. The International Phonetic Alphabet (IPA) is an extension of the Latin alphabet, enabling it to represent the phonetics of all languages. Other Latin phonetic extensions are used for the Uralic Phonetic Alphabet. The Latin alphabet is derived from the alphabet used by the Etruscans, who had adopted a Western variant of the classical Greek alphabet (Section 14.2, Old Italic). Originally it contained only 24 capital letters. The modern Latin alphabet as it is found in the Basic Latin block owes its appearance to innovations of scribes during the Middle Ages and practices of the early Renaissance printers. The Cyrillic script was developed in the ninth century and is also based on Greek. Like Latin, Cyrillic is used to write or transliterate texts in many languages. The Georgian and Armenian scripts were devised in the fifth century and are influenced by Greek. Modern Georgian does not have separate uppercase and lowercase forms. The Coptic script was the last stage in the development of Egyptian writing. It represented the adaptation of the Greek alphabet to writing Egyptian, with the retention of forms from Demotic for sounds not adequately represented by Greek letters. Although primarily used
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
226
European Alphabetic Scripts
in Egypt from the fourth to the tenth century, it is described in this chapter because of its close relationship to the Greek script. Glagolitic is an early Slavic script related in some ways to both the Greek and the Cyrillic scripts. It was widely used in the Balkans but gradually died out, surviving the longest in Croatia. Like Coptic, however, it still has some modern use in liturgical contexts. This chapter also describes modifier letters and combining marks used with the Latin script and other scripts. The block descriptions for other archaic European alphabetic scripts, such as Gothic, Ogham, Old Italic, and Runic, can be found in Chapter 14, Archaic Scripts.
7.1 Latin The Latin script was derived from the Greek script. Today it is used to write a wide variety of languages all over the world. In the process of adapting it to other languages, numerous extensions have been devised. The most common is the addition of diacritical marks. Furthermore, the creation of digraphs, inverse or reverse forms, and outright new characters have all been used to extend the Latin script. The Latin script is written in linear sequence from left to right. Spaces are used to separate words and provide the primary line breaking opportunities. Hyphens are used where lines are broken in the middle of a word. (For more information, see Unicode Standard Annex #14, “Line Breaking Properties.”) Latin letters come in uppercase and lowercase pairs. Languages. Some indication of language or other usage is given for many characters within the names lists accompanying the character charts. Diacritical Marks. Speakers of different languages treat the addition of a diacritical mark to a base letter differently. In some languages, the combination is treated as a letter in the alphabet for the language. In others, such as English, the same words can often be spelled with and without the diacritical mark without implying any difference. Most languages that use the Latin script treat letters with diacritical marks as variations of the base letter, but do not accord the combination the full status of an independent letter in the alphabet. Widely used accented character combinations are provided as single characters to accommodate interoperation with pervasive practice in legacy encodings. Combining diacritical marks can express these and all other accented letters as combining character sequences. In the Unicode Standard, all diacritical marks are encoded in sequence after the base characters to which they apply. For more details, see the subsection “Combining Diacritical Marks” in Section 7.9, Combining Marks, and also Section 2.11, Combining Characters. Alternative Glyphs. Some characters have alternative representations, although they have a common semantic. In such cases, a preferred glyph is chosen to represent the character in the code charts, even though it may not be the form used under all circumstances. Some
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
7.1
Latin
227
Latin examples to illustrate this point are provided in Figure 7-1 and discussed in the text that follows.
Figure 7-1. Alternative Glyphs in Latin
aa gg @AU ST WV C D, " LR Common typographical variations of basic Latin letters include the open- and closed-loop forms of the lowercase letters “a” and “g”, as shown in the first example in Figure 7-1. In ordinary Latin text, such distinctions are merely glyphic alternates for the same characters; however, phonetic transcription systems, such as IPA and Pinyin, often make systematic distinctions between these forms. Variations in Diacritical Marks. The shape and placement of diacritical marks can be subject to considerable variation that might surprise a reader unfamiliar with such distinctions. For example, when Czech is typeset, U+010F latin small letter d with caron and U+0165 latin small letter t with caron are often rendered by glyphs with an apostrophe instead of with a caron, commonly known as a há`ek. See the second example in Figure 7-1. In Slovak, this use also applies to U+013E latin small letter l with caron and U+013D latin capital letter l with caron. The use of an apostrophe can avoid some line crashes over the ascenders of those letters and so result in better typography. In typewritten or handwritten documents, or in didactic and pedagogical material, glyphs with há`eks are preferred. A similar situation can be seen with the Latvian letter U+0123 latin small letter g with cedilla, as shown in example 3 in Figure 7-1. In good Latvian typography, this character is always shown with a rotated comma over the g, rather than a cedilla below the g, because of the typographical design and layout issues resulting from trying to place a cedilla below the descender loop of the g. Poor Latvian fonts may substitute an acute accent for the rotated comma, and handwritten or other printed forms may actually show the cedilla below the g. The uppercase form of the letter is always shown with a cedilla, as the rounded bottom of the G poses no problems for attachment of the cedilla. Other Latvian letters with a cedilla below (U+0137 latin small letter k with cedilla, U+0146 latin small letter n with cedilla, and U+0157 latin small letter r with cedilla) always prefer a glyph with a floating comma below, as there is no proper attachment point for a cedilla at the bottom of the base form.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
228
European Alphabetic Scripts
In Turkish and Romanian, a cedilla and a comma below sometimes replace one another depending on the font style, as shown in example 4 in Figure 7-1. The form with the cedilla is preferred in Turkish, and the form with the comma below is preferred in Romanian. The characters with explicit commas below are provided to permit the distinction from characters with a cedilla. Legacy encodings for these characters contain only a single form of each of these characters. ISO/IEC 8859-2 maps these to the form with the cedilla, while ISO/IEC 8859-16 maps them to the form with the comma below. Migrating Romanian 8-bit data to Unicode should be done with care. In general, characters with cedillas or ogoneks below are subject to variable typographical usage, depending on the availability and quality of fonts used, the technology, and the geographic area. Various hooks, commas, and squiggles may be substituted for the nominal forms of these diacritics below, and even the directions of the hooks may be reversed. Implementers should become familiar with particular typographical traditions before assuming that characters are missing or are wrongly represented in the code charts in the Unicode Standard. Exceptional Case Pairs. The characters U+0130 latin capital letter i with dot above and U+0131 latin small letter dotless i (used primarily in Turkish) are assumed to take ASCII “i” and “I”, respectively, as their case alternates. This mapping makes the corresponding reverse mapping language-specific; mapping in both directions requires special attention from the implementer (see Section 5.18, Case Mappings). Diacritics on i and j. A dotted (normal) i or j followed by a nonspacing mark above loses the dot in rendering. Thus, in the word naïve, the ï could be spelled with i + diaeresis. A dotted-i is not equivalent to a Turkish dotless-i + overdot, nor are other cases of accented dotted-i equivalent to accented dotless-i (for example, i + ¨‡ ± + ¨). The same pattern is used for j. Dotless-j is used in the Landsmålsalfabet, where it does not have a case pair. To express the forms sometimes used in the Baltic (where the dot is retained under a top accent in dictionaries), use i + overdot + accent (see Figure 7-2).
Figure 7-2. Diacritics on i and j
i+ ¨
ï
j+ ◊
j◊
.
+ ´ . i+ ´ +
i+
´i . ´i
All characters that use their dot in this manner have the Soft_Dotted property in Unicode. Vietnamese. In the modern Vietnamese alphabet, there are 12 vowel letters and 5 tone marks (see Figure 7-3). Normalization Form C represents the combination of vowel letter and tone mark as a single unit—for example, U+1EA8 ] latin capital letter a with circumflex and hook above. Normalization Form D decomposes this combination into
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
7.1
Latin
229
the combining sequence, such as . Some widely used implementations prefer storing the vowel letter and the tone mark separately.
Figure 7-3. Vietnamese Letters and Tone Marks
The Vietnamese vowels and other letters are found in the Basic Latin, Latin-1 Supplement, and Latin Extended-A blocks. Additional precomposed vowels and tone marks are found in the Latin Extended Additional block. The characters U+0300 combining grave accent, U+0309 combining hook above, U+0303 combining tilde, U+0301 combining acute accent, and U+0323 combining dot below should be used in representing the Vietnamese tone marks. The characters U+0340 combining grave tone mark and U+0341 combining acute tone mark are deprecated and should not be used. Standards. Unicode follows ISO/IEC 8859-1 in the layout of Latin letters up to U+00FF. ISO/IEC 8859-1, in turn, is based on older standards—among others, ASCII (ANSI X3.4), which is identical to ISO/IEC 646:1991-IRV. Like ASCII, ISO/IEC 8859-1 contains Latin letters, punctuation signs, and mathematical symbols. These additional characters are widely used with scripts other than Latin. The descriptions of these characters are found in Chapter 6, Writing Systems and Punctuation, and Chapter 15, Symbols. The Latin Extended-A block includes characters contained in ISO/IEC 8859— Part 2. Latin alphabet No. 2, Part 3. Latin alphabet No. 3, Part 4. Latin alphabet No. 4, and Part 9. Latin alphabet No. 5. Many of the other graphic characters contained in these standards, such as punctuation, signs, symbols, and diacritical marks, are already encoded in the Latin-1 Supplement block. Other characters from these parts of ISO/IEC 8859 are encoded in other blocks, primarily in the Spacing Modifier Letters block (U+02B0..U+02FF) and in the character blocks starting at and following the General Punctuation block. The Latin Extended-A block also covers additional characters from ISO/IEC 6937. The Latin Extended-B block covers, among others, characters in ISO 6438 Documentation — African coded character set for bibliographic information interchange, Pinyin Latin transcription characters from the People’s Republic of China national standard GB 2312 and from the Japanese national standard JIS X 0212, and Sami characters from ISO/IEC 8859 Part 10. Latin alphabet No. 6. The characters in the IPA block are taken from the 1989 revision of the International Phonetic Alphabet, published by the International Phonetic Association. Extensions from later IPA sources have also been added. Related Characters. For other Latin-derived characters, see Letterlike Symbols (U+2100..U+214F), Currency Symbols (U+20A0..U+20CF), Number Forms
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
230
European Alphabetic Scripts
(U+2150..U+218F), Enclosed Alphanumerics (U+2460..U+24FF), CJK Compatibility (U+3300..U+33FF), Fullwidth Forms (U+FF21..U+FF5A), and Mathematical Alphanumeric Symbols (U+1D400..U+1D7FF).
Letters of Basic Latin: U+0041–U+007A Only a small fraction of the languages written with the Latin script can be written entirely with the basic set of 26 uppercase and 26 lowercase Latin letters contained in this block. The 26 basic letter pairs form the core of the alphabets used by all the other languages that use the Latin script. A stream of text using one of these alphabets would therefore intermix characters from the Basic Latin block and other Latin blocks. Occasionally a few of the basic letter pairs are not used to write a language. For example, Italian does not use “j” or “w”.
Letters of the Latin-1 Supplement: U+00C0–U+00FF The Latin-1 supplement extends the basic 26 letter pairs of ASCII by providing additional letters for the major languages of Europe listed in the next paragraph. Languages. The languages supported by the Latin-1 supplement include Catalan, Danish, Dutch, Faroese, Finnish, Flemish, German, Icelandic, Irish, Italian, Norwegian, Portuguese, Spanish, and Swedish. Ordinals. U+00AA feminine ordinal indicator and U+00BA masculine ordinal indicator can be depicted with an underscore, but many modern fonts show them as superscripted Latin letters with no underscore. In sorting and searching, these characters should be treated as weakly equivalent to their Latin character equivalents.
Latin Extended-A: U+0100–U+017F The Latin Extended-A block contains a collection of letters that, when added to the letters contained in the Basic Latin and Latin-1 Supplement blocks, allow for the representation of most European languages that employ the Latin script. Many other languages can also be written with the characters in this block. Most of these characters are equivalent to precomposed combinations of base character forms and combining diacritical marks. These combinations may also be represented by means of composed character sequences. See Section 2.11, Combining Characters, and Section 7.9, Combining Marks. Compatibility Digraphs. The Latin Extended-A block contains five compatibility digraphs, encoded for compatibility with ISO/IEC 6937:1984. Two of these characters, U+0140 latin small letter l with middle dot and its uppercase version, were originally encoded in ISO/IEC 6937 for support of Catalan. In current conventions, the representation of this digraphic sequence in Catalan simply uses a sequence of an ordinary “l” and U+00B7 middle dot. Another pair of characters, U+0133 latin small ligature ij and its uppercase version, was provided to support the digraph “ij” in Dutch, often termed a “ligature” in discussions
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
7.1
Latin
231
of Dutch orthography. When adding intercharacter spacing for line justification, the “ij” is kept as a unit, and the space between the i and j does not increase. In titlecasing, both the i and the j are uppercased, as in the word “IJsselmeer.” Using a single code point might simplify software support for such features; however, because a vast amount of Dutch data is encoded without this digraph character, under most circumstances one will encounter an sequence. Finally, U+0149 latin small letter n preceded by apostrophe was encoded for use in Afrikaans. However, in nearly all cases it is better represented simply by a sequence of an apostrophe followed by “n”. Languages. Most languages supported by this block also require the concurrent use of characters contained in the Basic Latin and Latin-1 Supplement blocks. When combined with these two blocks, the Latin Extended-A block supports Afrikaans, Basque, Breton, Croatian, Czech, Esperanto, Estonian, French, Frisian, Greenlandic, Hungarian, Latin, Latvian, Lithuanian, Maltese, Polish, Provençal, Rhaeto-Romanic, Romanian, Romany, Sámi, Slovak, Slovenian, Sorbian, Turkish, Welsh, and many others.
Latin Extended-B: U+0180–U+024F The Latin Extended-B block contains letterforms used to extend Latin scripts to represent additional languages. It also contains phonetic symbols not included in the International Phonetic Alphabet (see the IPA Extensions block, U+0250..U+02AF). Arrangement. The characters are arranged in a nominal alphabetical order, followed by a small collection of Latinate forms. Uppercase and lowercase pairs are placed together where possible, but in many instances the other case form is encoded at some distant location and so is cross-referenced. Variations on the same base letter are arranged in the following order: turned, inverted, hook attachment, stroke extension or modification, different style, small cap, modified basic form, ligature, and Greek derived. Croatian Digraphs Matching Serbian Cyrillic Letters. Serbo-Croatian is a single language with paired alphabets: a Latin script (Croatian) and a Cyrillic script (Serbian). A set of compatibility digraph codes is provided for one-to-one transliteration. There are two potential uppercase forms for each digraph, depending on whether only the initial letter is to be capitalized (titlecase) or both (all uppercase). The Unicode Standard offers both forms so that software can convert one form to the other without changing font sets. The appropriate cross references are given for the lowercase letters. Pinyin Diacritic–Vowel Combinations. The Chinese standard GB 2312, the Japanese standard JIS X 0212, and some other standards include codes for Pinyin, which is used for Latin transcription of Mandarin Chinese. Most of the letters used in Pinyin romanization are already covered in the preceding Latin blocks. The group of 16 characters provided here completes the Pinyin character set specified in GB 2312 and JIS X 0212. Case Pairs. A number of characters in this block are uppercase forms of characters whose lowercase forms are part of some other grouping. Many of these characters came from the International Phonetic Alphabet; they acquired uppercase forms when they were adopted
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
232
European Alphabetic Scripts
into Latin script-based writing systems. Occasionally, however, alternative uppercase forms arose in this process. In some instances, research has shown that alternative uppercase forms are merely variants of the same character. If so, such variants are assigned a single Unicode code point, as is the case of U+01B7 latin capital letter ezh. But when research has shown that two uppercase forms are actually used in different ways, then they are given different codes; such is the case for U+018E latin capital letter reversed e and U+018F latin capital letter schwa. In this instance, the shared lowercase form is copied to enable unique case-pair mappings: U+01DD latin small letter turned e is a copy of U+0259 latin small letter schwa. For historical reasons, the names of some case pairs differ. For example, U+018E latin capital letter reversed e is the uppercase of U+01DD latin small letter turned e—not of U+0258 latin small letter reversed e. For default case mappings of Unicode characters, see Section 4.2, Case—Normative. Caseless Letters. A number of letters used with the Latin script are caseless—for example, the caseless glottal stop at U+0294 and U+01BB latin letter two with stroke, and the various letters denoting click sounds. Caseless letters retain their shape when uppercased. When titlecasing words, they may also act transparently; that is, if they occur in the leading position, the next following cased letter may be uppercased instead. Over the last several centuries, the trend in typographical development for the Latin script has tended to favor the eventual introduction of case pairs. See the following discussion of the glottal stop. The Unicode Standard may encode additional uppercase characters in such instances. However, for reasons of stability, the Standard will never add a new lowercase form for an existing uppercase character. See also “Caseless Matching” in Section 5.18, Case Mappings. Glottal Stop. There are two patterns of usage for the glottal stop in the Unicode Standard. U+0294 j latin letter glottal stop is a caseless letter used in IPA. It is also widely seen in language orthographies based on IPA or Americanist phonetic usage, in those instances where no casing is apparent for glottal stop. Such orthographies may avoid casing for glottal stop to the extent that when titlecasing strings, a word with an initial glottal stop may have its second letter uppercased instead of the first letter. In a small number of orthographies for languages of northwestern Canada, and in particular, for Chipewyan, Dogrib, and Slavey, case pairs have been introduced for glottal stop. For these orthographies, the cased glottal stop characters should be used: U+0241 k latin capital letter glottal stop and U+0242 l latin small letter glottal stop. The glyphs for the glottal stop are somewhat variable and overlap to a certain extent. The glyph shown in the code charts for U+0294 j latin letter glottal stop is a cap-height form as specified in IPA, but the same character is often shown with a glyph that resembles the top half of a question mark and that may or may not be cap height. U+0241 k latin capital letter glottal stop, while shown with a larger glyph in the code charts, often appears identical to U+0294. U+0242 l latin small letter glottal stop is a small form of U+0241.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
7.1
Latin
233
Various small, raised hook- or comma-shaped characters are often substituted for a glottal stop—for instance, U+02BC m modifier letter apostrophe, U+02BB n modifier letter turned comma, U+02C0 o modifier letter glottal stop, or U+02BE p modifier letter right half ring. U+02BB, in particular, is used in Hawaiian orthography as the nokina.
IPA Extensions: U+0250–U+02AF The IPA Extensions block contains primarily the unique symbols of the International Phonetic Alphabet, which is a standard system for indicating specific speech sounds. The IPA was first introduced in 1886 and has undergone occasional revisions of content and usage since that time. The Unicode Standard covers all single symbols and all diacritics in the last published IPA revision (1989) as well as a few symbols in former IPA usage that are no longer currently sanctioned. A few symbols have been added to this block that are part of the transcriptional practices of Sinologists, Americanists, and other linguists. Some of these practices have usages independent of the IPA and may use characters from other Latin blocks rather than IPA forms. Note also that a few nonstandard or obsolete phonetic symbols are encoded in the Latin Extended-B block. An essential feature of IPA is the use of combining diacritical marks. IPA diacritical mark characters are coded in the Combining Diacritical Marks block, U+0300.. U+036F. In IPA, diacritical marks can be freely applied to base form letters to indicate the fine degrees of phonetic differentiation required for precise recording of different languages. Standards. The International Phonetic Association standard considers IPA to be a separate alphabet, so it includes the entire Latin lowercase alphabet a–z, a number of extended Latin letters such as U+0153 œ latin small ligature oe, and a few Greek letters and other symbols as separate and distinct characters. In contrast, the Unicode Standard does not duplicate either the Latin lowercase letters a–z or other Latin or Greek letters in encoding IPA. Unlike other character standards referenced by the Unicode Standard, IPA constitutes an extended alphabet and phonetic transcriptional standard, rather than a character encoding standard. Unifications. The IPA characters are unified as much as possible with other letters, albeit not with nonletter symbols such as U+222B ´ integral. The IPA characters have also been adopted into the Latin-based alphabets of many written languages, such as some used in Africa. It is futile to attempt to distinguish a transcription from an actual alphabet in such cases. Therefore, many IPA characters are found outside the IPA Extensions block. IPA characters that are not found in the IPA Extensions block are listed as cross references at the beginning of the character names list for this block. IPA Alternates. In a few cases IPA practice has, over time, produced alternate forms, such as U+0269 latin small letter iota “ι” versus U+026A latin letter small capital i “i.” The Unicode Standard provides separate encodings for the two forms because they are used in a meaningfully distinct fashion.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
234
European Alphabetic Scripts
Case Pairs. IPA does not sanction case distinctions; in effect, its phonetic symbols are all lowercase. When IPA symbols are adopted into a particular alphabet and used by a given written language (as has occurred, for example, in Africa), they acquire uppercase forms. Because these uppercase forms are not themselves IPA symbols, they are generally encoded in the Latin Extended-B block (or other Latin extension blocks) and are cross-referenced with the IPA names list. Typographic Variants. IPA includes typographic variants of certain Latin and Greek letters that would ordinarily be considered variations of font style rather than of character identity, such as small capital letterforms. Examples include a typographic variant of the Greek letter phi φ and the borrowed letter Greek iota ι, which has a unique Latin uppercase form. These forms are encoded as separate characters in the Unicode Standard because they have distinct semantics in plain text. Affricate Digraph Ligatures. IPA officially sanctions six digraph ligatures used in transcription of coronal affricates. These are encoded at U+02A3 .. U+02A8. The IPA digraph ligatures are explicitly defined in IPA and have possible semantic values that make them not simply rendering forms. For example, while U+02A6 latin small letter ts digraph is a transcription for the sounds that could also be transcribed in IPA as “ts” , the choice of the digraph ligature may be the result of a deliberate distinction made by the transcriber regarding the systematic phonetic status of the affricate. The choice of whether to ligate cannot be left to rendering software based on the font available. This ligature also differs in typographical design from the “ts” ligature found in some oldstyle fonts. Arrangement. The IPA Extensions block is arranged in approximate alphabetical order according to the Latin letter that is graphically most similar to each symbol. This order has nothing to do with a phonetic arrangement of the IPA letters.
Phonetic Extensions: U+1D00–U+1DBF Most of the characters in the first of the two adjacent blocks comprising the phonetic extensions are used in the Uralic Phonetic Alphabet (UPA; also called Finno-Ugric Transcription, FUT), a highly specialized system that has been used by Uralicists globally for more than 100 years. Originally, it was chiefly used in Finland, Hungary, Estonia, Germany, Norway, Sweden, and Russia, but it is now known and used worldwide, including in North America and Japan. Uralic linguistic description, which treats the phonetics, phonology, and etymology of Uralic languages, is also used by other branches of linguistics, such as Indo-European, Turkic, and Altaic studies, as well as by other sciences, such as archaeology. A very large body of descriptive texts, grammars, dictionaries, and chrestomathies exists, and continues to be produced, using this system. The UPA makes use of approximately 258 characters, some of which are encoded in the Phonetic Extensions block; others are encoded in the other Latin blocks and in the Greek and Cyrillic blocks. The UPA takes full advantage of combining characters. It is not uncommon to find a base letter with three diacritics above and two below.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
7.1
Latin
235
Typographic Features of the UPA. Small capitalization in the UPA means voicelessness of a normally voiced sound. Small capitalization is also used to indicate certain either voiceless or half-voiced consonants. Superscripting indicates very short schwa vowels or transition vowels, or in general very short sounds. Subscripting indicates co-articulation caused by the preceding or following sound. Rotation (turned letters) indicates reduction; sideways (that is, 90 degrees counterclockwise) rotation is used where turning (180 degrees) might result in an ambiguous representation. UPA phonetic material is generally represented with italic glyphs, so as to separate it from the surrounding text. Other Phonetic Extensions. The remaining characters in the phonetics extension range U+1D6C..U+1DFF are derived from a wide variety of sources, including many technical orthographies developed by SIL linguists, as well as older historic sources. All attested phonetic characters showing struckthrough tildes, struckthrough bars, and retroflex or palatal hooks attached to the basic letter have been separately encoded here. Although separate combining marks exist in the Unicode Standard for overstruck diacritics and attached retroflex or palatal hooks, earlier encoded IPA letters such as U+0268 latin small letter i with stroke and U+026D latin small letter l with retroflex hook have never been been given decomposition mappings in the standard. For consistency, all newly encoded characters are handled analogously to the existing, more common characters of this type and are not given decomposition mappings. Because these characters do not have decompositions, they require special handling in some circumstances. See the discussion of single-script confusables in Unicode Technical Standard #39, “Unicode Security Mechanisms.” The Phonetic Extensions Supplement block also contains 37 superscript modifier letters. These complement the much more commonly used superscript modifier letters found in the Spacing Modifer Letters block. U+1D77 latin small letter turned g and U+1D78 modifier letter cyrillic en are used in Caucasian linguistics. U+1D79 latin small letter insular g is used in older Irish phonetic notation. It is to be distinguished from a Gaelic style glyph for U+0067 latin small letter g. Digraph for th. U+1D7A latin small letter th with strikethrough is a digraphic notation commonly found in some English-language dictionaries, representing the voiceless (inter)dental fricative, as in thin. While this character is clearly a digraph, the obligatory strikethrough across two letters distinguishes it from a “th” digraph per se, and there is no mechanism involving combining marks that can easily be used to represent it. A common alternative glyphic form for U+1D7A uses a horizontal bar to strike through the two letters, instead of a diagonal stroke.
Latin Extended Additional: U+1E00–U+1EFF The characters in this block constitute a number of precomposed combinations of Latin letters with one or more general diacritical marks. With the exception of U+1E9A latin
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
236
European Alphabetic Scripts
small letter a with right half ring, each of the characters contained in this block is a canonical decomposable character and may alternatively be represented with a base letter followed by one or more general diacritical mark characters found in the Combining Diacritical Marks block. Vietnamese Vowel Plus Tone Mark Combinations. A portion of this block (U+1EA0.. U+1EF9) comprises vowel letters of the modern Vietnamese alphabet (quxc ngy) combined with a diacritic mark that denotes the phonemic tone that applies to the syllable.
Latin Extended-C: U+2C60–U+2C7F This small block of additional Latin characters contains orthographic Latin additions for minority languages, a few historic Latin letters, and further extensions for phonetic notations. Uighur. The Latin orthography for the Uighur language was influenced by widespread conventions for extension of the Cyrillic script for representing Central Asian languages. In particular, a number of Latin characters were extended with a Cyrillic-style descender diacritic to create new letters for use with Uighur. Claudian Letters. The Roman emperor Claudius invented three additional letters for use with the Latin script. Those letters saw limited usage during his reign, but were abandoned soon afterward. The half h letter is encoded in this block. The other two letters are encoded in other blocks: U+2132 turned capital f and U+2183 roman numeral reversed one hundred (unified with the Claudian letter reversed c). Claudian letters in inscriptions are uppercase only, but may be transcribed by scholars in lowercase.
Latin Extended-D: U+A720–U+A7FF This block is intended for further encoding of historic letters for the Latin script and other rare phonetic and orthographic extensions to the script. For Unicode 5.0, it contains only two modifier tone letters for use with UPA.
Latin Ligatures: U+FB00–U+FB06 This range in the Alphabetic Presentation Forms block (U+FB00..U+FB4F) contains several common Latin ligatures, which occur in legacy encodings. Whether to use a Latin ligature is a matter of typographical style as well as a result of the orthographical rules of the language. Some languages prohibit ligatures across word boundaries. In these cases, it is preferable for the implementations to use unligated characters in the backing store and provide out-of-band information to the display layer where ligatures may be placed. Some format controls in the Unicode Standard can affect the formation of ligatures. See “Controlling Ligatures” in Section 16.2, Layout Controls.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
7.2
Greek
237
7.2 Greek Greek: U+0370–U+03FF The Greek script is used for writing the Greek language. The Greek script had a strong influence on the development of the Latin, Cyrillic, and Coptic scripts. The Greek script is written in linear sequence from left to right with the frequent use of nonspacing marks. There are two styles of such use: monotonic, which uses a single mark called tonos, and polytonic, which uses multiple marks. Greek letters come in uppercase and lowercase pairs. Spaces are used to separate words and provide the primary line breaking opportunities. Archaic Greek texts do not use spaces. Standards. The Unicode encoding of Greek is based on ISO/IEC 8859-7, which is equivalent to the Greek national standard ELOT 928, designed for monotonic Greek. The Unicode Standard encodes Greek characters in the same relative positions as in ISO/IEC 88597. A number of variant and archaic characters are taken from the bibliographic standard ISO 5428. Polytonic Greek. Polytonic Greek, used for ancient Greek (classical and Byzantine) and occasionally for modern Greek, may be encoded using either combining character sequences or precomposed base plus diacritic combinations. For the latter, see the following subsection, “Greek Extended: U+1F00–U+1FFF.” Nonspacing Marks. Several nonspacing marks commonly used with the Greek script are found in the Combining Diacritical Marks range (see Table 7-1).
Table 7-1. Nonspacing Marks Used with Greek Code
Name
Alternative Names
U+0300 U+0301 U+0304 U+0306 U+0308 U+0313 U+0314 U+0342 U+0343 U+0345
combining grave accent combining acute accent combining macron combining breve combining diaeresis combining comma above combining reversed comma above combining greek perispomeni combining greek koronis combining greek ypogegrammeni
varia tonos, oxia
dialytika psili, smooth breathing mark dasia, rough breathing mark circumflex, tilde, inverted breve comma above iota subscript
Because the characters in the Combining Diacritical Marks block are encoded by shape, not by meaning, they are appropriate for use in Greek where applicable. The character U+0344 combining greek dialytika tonos should not be used. The combination of dialytika plus tonos is instead represented by the sequence .
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
238
European Alphabetic Scripts
Multiple nonspacing marks applied to the same baseform character are encoded in insideout sequence. See the general rules for applying nonspacing marks in Section 2.11, Combining Characters. The basic Greek accent written in modern Greek is called tonos. It is represented by an acute accent (U+0301). The shape that the acute accent takes over Greek letters is generally steeper than that shown over Latin letters in Western European typographic traditions, and in earlier editions of this standard was mistakenly shown as a vertical line over the vowel. Polytonic Greek has several contrastive accents, and the accent, or tonos, written with an acute accent is referred to as oxia, in contrast to the varia, which is written with a grave accent. U+0342 combining greek perispomeni may appear as a circumflex N, an inverted breve ., a tilde O, or occasionally a macron -. Because of this variation in form, the perispomeni was encoded distinctly from U+0303 combining tilde. U+0313 combining comma above and U+0343 combining greek koronis both take the form of a raised comma over a baseform letter. U+0343 combining greek koronis was included for compatibility reasons; U+0313 combining comma above is the preferred form for general use. Greek uses guillemets for quotation marks; for Ancient Greek, the quotations tend to follow local publishing practice. Because of the possibility of confusion between smooth breathing marks and curly single quotation marks, the latter are best avoided where possible. When either breathing mark is followed by an acute or grave accent, the pair is rendered side-by-side rather than vertically stacked. Accents are typically written above their base letter in an all-lowercase or all-uppercase word; they may also be omitted from an all-uppercase word. However, in a titlecase word, accents applied to the first letter are commonly written to the left of that letter. This is a matter of presentation only—the internal representation is still the base letter followed by the combining marks. It is not the stand-alone version of the accents, which occur before the base letter in the text stream. Iota. The nonspacing mark ypogegrammeni (also known as iota subscript in English) can be applied to the vowels alpha, eta, and omega to represent historic diphthongs. This mark appears as a small iota below the vowel. When applied to a single uppercase vowel, the iota does not appear as a subscript, but is instead normally rendered as a regular lowercase iota to the right of the uppercase vowel. This form of the iota is called prosgegrammeni (also known as iota adscript in English). In completely uppercased words, the iota subscript should be replaced by a capital iota following the vowel. Precomposed characters that contain iota subscript or iota adscript also have special mappings. (See Section 5.18, Case Mappings.) Archaic representations of Greek words, which did not have lowercase or accents, use the Greek capital letter iota following the vowel for these diphthongs. Such archaic representations require special case mapping, which may not be automatically derivable. Variant Letterforms. U+03A5 greek capital letter upsilon has two common forms: one looks essentially like the Latin capital Y, and the other has two symmetric upper branches that curl like rams’ horns, “Y”. The Y-form glyph has been chosen consistently for use in the code charts, both for monotonic and polytonic Greek. For mathematical usage,
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
7.2
Greek
239
the rams’ horn form of the glyph is required to distinguish it from the Latin Y. A third form is also encoded as U+03D2 greek upsilon with hook symbol (see Figure 7-4). The precomposed characters U+03D3 greek upsilon with acute and hook symbol and U+03D4 greek upsilon with diaeresis and hook symbol should not normally be needed, except where necessary for backward compatibility for legacy character sets.
Figure 7-4. Variations in Greek Capital Letter Upsilon
XYZ Variant forms of several other Greek letters are encoded as separate characters in this block. Often (but not always), they represent different forms taken on by the character when it appears in the final position of a word. Examples include U+03C2 greek small letter final sigma used in a final position and U+03D0 greek beta symbol, which is the form that U+03B2 greek small letter beta would take on in a medial or final position. Of these variant letterforms, only final sigma should be used in encoding standard Greek text to indicate a final sigma. It is also encoded in ISO/IEC 8859-7 and ISO 5428 for this purpose. Because use of the final sigma is a matter of spelling convention, software should not automatically substitute a final form for a nominal form at the end of a word. However, when performing lowercasing, the final form needs to be generated based on the context. See Section 3.13, Default Case Algorithms. In contrast, U+03D0 greek beta symbol, U+03D1 greek theta symbol, U+03D2 greek upsilon with hook symbol, U+03D5 greek phi symbol, U+03F0 greek kappa symbol, U+03F1 greek rho symbol, U+03F4 greek capital theta symbol, U+03F5 greek lunate epsilon symbol, and U+03F6 greek reversed lunate epsilon symbol should be used only in mathematical formulas—never in Greek text. If positional or other shape differences are desired for these characters, they should be implemented by a font or rendering engine. Representative Glyphs for Greek Phi. Starting with The Unicode Standard, Version 3.0, and the concurrent second edition of ISO/IEC 10646-1, the representative glyphs for U+03C6 ϕ greek small letter phi and U+03D5 φ greek phi symbol were swapped compared to earlier versions. In ordinary Greek text, the character U+03C6 is used exclusively, although this character has considerable glyphic variation, sometimes represented with a glyph more like the representative glyph shown for U+03C6 ϕ (the “loopy” form) and less often with a glyph more like the representative glyph shown for U+03D5 φ (the “straight” form). For mathematical and technical use, the straight form of the small phi is an important symbol and needs to be consistently distinguishable from the loopy form. The straight-form phi glyph is used as the representative glyph for the symbol phi at U+03D5 to satisfy this distinction. The representative glyphs were reversed in versions of the Unicode Standard prior to Unicode 3.0. This resulted in the problem that the character explicitly identified as the mathe-
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
240
European Alphabetic Scripts
matical symbol did not have the straight form of the character that is the preferred glyph for that use. Furthermore, it made it unnecessarily difficult for general-purpose fonts supporting ordinary Greek text to add support for Greek letters used as mathematical symbols. This resulted from the fact that many of those fonts already used the loopy-form glyph for U+03C6, as preferred for Greek body text; to support the phi symbol as well, they would have had to disrupt glyph choices already optimized for Greek text. When mapping symbol sets or SGML entities to the Unicode Standard, it is important to make sure that codes or entities that require the straight form of the phi symbol be mapped to U+03D5 and not to U+03C6. Mapping to the latter should be reserved for codes or entities that represent the small phi as used in ordinary Greek text. Fonts used primarily for Greek text may use either glyph form for U+03C6, but fonts that also intend to support technical use of the Greek letters should use the loopy form to ensure appropriate contrast with the straight form used for U+03D5. Greek Letters as Symbols. The use of Greek letters for mathematical variables and operators is well established. Characters from the Greek block may be used for these symbols. For compatibility purposes, a few Greek letters are separately encoded as symbols in other character blocks. Examples include U+00B5 µ micro sign in the Latin-1 Supplement character block and U+2126 Ω ohm sign in the Letterlike Symbols character block. The ohm sign is canonically equivalent to the capital omega, and normalization would remove any distinction. Its use is therefore discouraged in favor of capital omega. The same equivalence does not exist between micro sign and mu, and use of either character as a micro sign is common. For Greek text, only the mu should be used. Symbols Versus Numbers. The characters stigma, koppa, and sampi are used only as numerals, whereas archaic koppa and digamma are used only as letters. Compatibility Punctuation. Two specific modern Greek punctuation marks are encoded in the Greek and Coptic block: U+037E “;” greek question mark and U+0387 “·” greek ano teleia. The Greek question mark (or erotimatiko) has the shape of a semicolon, but functions as a question mark in the Greek script. The ano teleia has the shape of a middle dot, but functions as a semicolon in the Greek script. These two compatibility punctuation characters have canonical equivalences to U+003B semicolon and U+00B7 middle dot, respectively; as a result, normalized Greek text will lose any distinctions between the Greek compatibility punctuation characters and the common punctuation marks. Furthermore, ISO/IEC 8859-7 and most vendor code pages for Greek simply make use of semicolon and middle dot for the punctuation in question. Therefore, use of U+037E and U+0387 is not necessary for interoperating with legacy Greek data, and their use is not generally encouraged for representation of Greek punctuation. Historic Letters. Historic Greek letters have been retained from ISO 5428. Coptic-Unique Letters. In the Unicode Standard prior to Version 4.1, the Coptic script was regarded primarily as a stylistic variant of the Greek alphabet. The letters unique to Coptic
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html
Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html
Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html
This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006
Chapter 8
Middle Eastern Scripts
8
The scripts in this chapter have a common origin in the ancient Phoenician alphabet. They include Hebrew
Syriac
Arabic
Thaana
The Hebrew script is used in Israel and for languages of the Diaspora. The Arabic script is used to write many languages throughout the Middle East, North Africa, and certain parts of Asia. The Syriac script is used to write a number of Middle Eastern languages. These three also function as major liturgical scripts, used worldwide by various religious groups. The Thaana script is used to write Dhivehi, the language of the Republic of Maldives, an island nation in the middle of the Indian Ocean. The Middle Eastern scripts are mostly abjads, with small character sets. Words are demarcated by spaces. Except for Thaana, these scripts include a number of distinctive punctuation marks. In addition, the Arabic script includes traditional forms for digits, called “Arabic-Indic digits” in the Unicode Standard. Text in these scripts is written from right to left. Implementations of these scripts must conform to the Unicode Bidirectional Algorithm (see Unicode Standard Annex #9, “The Bidirectional Algorithm”). For more information about writing direction, see Section 2.10, Writing Direction. There are also special security considerations that apply to bidirectional scripts, especially with regard to their use in identifiers. For more information about these issues, see Unicode Technical Report #36, “Unicode Security Considerations.” Arabic and Syriac are cursive scripts even when typeset, unlike Hebrew and Thaana, where letters are unconnected. Most letters in Arabic and Syriac assume different forms depending on their position in a word. Shaping rules for the rendering of text are specified in Section 8.2, Arabic, and Section 8.3, Syriac. Shaping rules are not required for Hebrew because only five letters have position-dependent final forms, and these forms are separately encoded. Historically, Middle Eastern scripts did not write short vowels. Nowadays, short vowels are represented by marks positioned above or below a consonantal letter. Vowels and other marks of pronunciation (“vocalization”) are encoded as combining characters, so support for vocalized text necessitates use of composed character sequences. Yiddish, Syriac, and
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
264
Middle Eastern Scripts
Thaana are normally written with vocalization; Hebrew and Arabic are usually written unvocalized.
8.1 Hebrew Hebrew: U+0590–U+05FF The Hebrew script is used for writing the Hebrew language as well as Yiddish, Judezmo (Ladino), and a number of other languages. Vowels and various other marks are written as points, which are applied to consonantal base letters; these marks are usually omitted in Hebrew, except for liturgical texts and other special applications. Five Hebrew letters assume a different graphic form when they occur last in a word. Directionality. The Hebrew script is written from right to left. Conformant implementations of Hebrew script must use the Unicode Bidirectional Algorithm (see Unicode Standard Annex #9, “The Bidirectional Algorithm”). Cursive. The Unicode Standard uses the term cursive to refer to writing where the letters of a word are connected. A handwritten form of Hebrew is known as cursive, but its rounded letters are generally unconnected, so the Unicode definition does not apply. Fonts based on cursive Hebrew exist. They are used not only to show examples of Hebrew handwriting, but also for display purposes. Standards. ISO/IEC 8859-8—Part 8. Latin/Hebrew Alphabet. The Unicode Standard encodes the Hebrew alphabetic characters in the same relative positions as in ISO/IEC 8859-8; however, there are no points or Hebrew punctuation characters in that ISO standard. Vowels and Other Marks of Pronunciation. These combining marks, generically called points in the context of Hebrew, indicate vowels or other modifications of consonantal letters. General rules for applying combining marks are given in Section 2.11, Combining Characters, and Section 3.11, Canonical Ordering Behavior. Additional Hebrew-specific behavior is described below. Hebrew points can be separated into four classes: dagesh, shin dot and sin dot, vowels, and other marks of punctuation. Dagesh, U+05BC hebrew point dagesh or mapiq, has the form of a dot that appears inside the letter that it affects. It is not a vowel but rather a diacritic that affects the pronunciation of a consonant. The same base consonant can also have a vowel and/or other diacritics. Dagesh is the only element that goes inside a letter. The dotted Hebrew consonant shin is explicitly encoded as the sequence U+05E9 hebrew letter shin followed by U+05C1 hebrew point shin dot. The shin dot is positioned on the upper-right side of the undotted base letter. Similarly, the dotted consonant sin is explicitly encoded as the sequence U+05E9 hebrew letter shin followed by U+05C2 hebrew point sin dot. The sin dot is positioned on the upper-left side of the base letter.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
8.1
Hebrew
265
The two dots are mutually exclusive. The base letter shin can also have a dagesh, a vowel, and other diacritics. The two dots are not used with any other base character. Vowels all appear below the base character that they affect, except for holam, U+05B9 hebrew point holam, which appears above left. The following points represent vowels: U+05B0..U+05B9, U+05BB. The remaining three points are marks of pronunciation: U+05BD hebrew point meteg, U+05BF hebrew point rafe, and U+FB1E hebrew point judeo-spanish varika. Meteg, also known as siluq, goes below the base character; rafe and varika go above it. The varika, used in Judezmo, is a glyphic variant of rafe. Shin and Sin. Separate characters for the dotted letters shin and sin are not included in this block. When it is necessary to distinguish between the two forms, they should be encoded as U+05E9 hebrew letter shin followed by the appropriate dot, either U+05C1 hebrew point shin dot or U+05C2 hebrew point sin dot. (See preceding discussion.) This practice is consistent with Israeli standard encoding. Final (Contextual Variant) Letterforms. Variant forms of five Hebrew letters are encoded as separate characters in this block, as in Hebrew standards including ISO/IEC 8859-8. These variant forms are generally used in place of the nominal letterforms at the end of words. Certain words, however, are spelled with nominal rather than final forms, particularly names and foreign borrowings in Hebrew and some words in Yiddish. Because final form usage is a matter of spelling convention, software should not automatically substitute final forms for nominal forms at the end of words. The positional variants should be coded directly and rendered one-to-one via their own glyphs—that is, without contextual analysis. Yiddish Digraphs. The digraphs are considered to be independent characters in Yiddish. The Unicode Standard has included them as separate characters so as to distinguish certain letter combinations in Yiddish text—for example, to distinguish the digraph double vav from an occurrence of a consonantal vav followed by a vocalic vav. The use of digraphs is consistent with standard Yiddish orthography. Other letters of the Yiddish alphabet, such as pasekh alef, can be composed from other characters, although alphabetic presentation forms are also encoded. Punctuation. Most punctuation marks used with the Hebrew script are not given independent codes (that is, they are unified with Latin punctuation) except for the few cases where the mark has a unique form in Hebrew—namely, U+05BE hebrew punctuation maqaf, U+05C0 hebrew punctuation paseq (also known as legarmeh), U+05C3 hebrew punctuation sof pasuq, U+05F3 hebrew punctuation geresh, and U+05F4 hebrew punctuation gershayim. For paired punctuation such as parentheses, the glyphs chosen to represent U+0028 left parenthesis and U+0029 right parenthesis will depend on the direction of the rendered text. See Section 4.7, Bidi Mirrored—Normative, for more information. For additional punctuation to be used with the Hebrew script, see Section 6.2, General Punctuation.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
266
Middle Eastern Scripts
Cantillation Marks. Cantillation marks are used in publishing liturgical texts, including the Bible. There are various historical schools of cantillation marking; the set of marks included in the Unicode Standard follows the Israeli standard SI 1311.2. Positioning. Marks may combine with vowels and other points, and complex typographic rules dictate how to position these combinations. The vertical placement (meaning above, below, or inside) of points and marks is very well defined. The horizontal placement (meaning left, right, or center) of points is also very well defined. The horizontal placement of marks, by contrast, is not well defined, and convention allows for the different placement of marks relative to their base character. When points and marks are located below the same base letter, the point always comes first (on the right) and the mark after it (on the left), except for the marks yetiv, U+059A hebrew accent yetiv, and dehi, U+05AD hebrew accent dehi. These two marks come first (on the right) and are followed (on the left) by the point. These rules are followed when points and marks are located above the same base letter: • If the point is holam, all cantillation marks precede it (on the right) except pashta, U+0599 hebrew accent pashta. • Pashta always follows (goes to the left of) points. • Holam on a sin consonant (shin base + sin dot) follows (goes to the left of) the sin dot. However, the two combining marks are sometimes rendered as a single assimilated dot. • Shin dot and sin dot are generally represented closer vertically to the base letter than other points and marks that go above it. Meteg. Meteg, U+05BD hebrew point meteg, frequently co-occurs with vowel points below the consonant. Typically, meteg is placed to the left of the vowel, although in some manuscripts and printed texts it is positioned to the right of the vowel. The difference in positioning is not known to have any semantic significance; nevertheless, some authors wish to retain the positioning found in source documents. The alternate vowel-meteg ordering can be represented in terms of alternate ordering of characters in encoded representation. However, because of the fixed-position canonical combining classes to which meteg and vowel points are assigned, differences in ordering of such characters are not preserved under normalization. The combining grapheme joiner can be used within a vowel-meteg sequence to preserve an ordering distinction under normalization. For more information, see the description of U+034F combining grapheme joiner in Section 16.2, Layout Controls. For example, to display meteg to the left of (after, for a right-to-left script) the vowel point sheva, U+05B0 hebrew point sheva, the sequence of meteg following sheva can be used: <sheva, meteg>
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
8.1
Hebrew
267
Because these marks are canonically ordered, this sequence is preserved under normalization. Then, to display meteg to the right of the sheva, the sequence with meteg preceding sheva with an intervening CGJ can be used: <meteg, CGJ, sheva> A further complication arises for combinations of meteg with hataf vowels: U+05B1 hebrew point hataf segol, U+05B2 hebrew point hataf patah, and U+05B3 hebrew point hataf qamats. These vowel points have two side-by-side components. Meteg can be placed to the left or the right of a hataf vowel, but it also is often placed between the two components of the hataf vowel. A three-way positioning distinction is needed for such cases. The combining grapheme joiner can be used to preserve an ordering that places meteg to the right of a hataf vowel, as described for combinations of meteg with non-hataf vowels, such as sheva. Placement of meteg between the components of a hataf vowel can be conceptualized as a ligature of the hataf vowel and a nominally positioned meteg. With this in mind, the ligation-control functionality of U+200D zero width joiner and U+200C zero width nonjoiner can be used as a mechanism to control the visual distinction between a nominally positioned meteg to the left of a hataf vowel versus the medially positioned meteg within the hataf vowel. That is, zero width joiner can be used to request explicitly a medially positioned meteg, and zero width non-joiner can be used to request explicitly a left-positioned meteg. Just as different font implementations may or may not display an “fi” ligature by default, different font implementations may or may not display meteg in a medial position when combined with hataf vowels by default. As a result, authors who want to ensure left-position versus medial-position display of meteg with hataf vowels across all font implementations may use joiner characters to distinguish these cases. Thus the following encoded representations can be used for different positioning of meteg with a hataf vowel, such as hataf patah: left-positioned meteg:
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
268
Middle Eastern Scripts
qamats or qamats gadol; the new letterform for the other reading is qamats qatan. Not all users of Biblical Hebrew use atnah hafukh and qamats qatan. If the distinction between accents atnah hafukh and yerah ben yomo is not made, then only U+05AA hebrew accent yerah ben yomo is used. If the distinction between vowels qamats gadol and qamats qatan is not made, then only U+05B8 hebrew point qamats is used. Implementations that support Hebrew accents and vowel points may not necessarily support the special-usage characters U+05A2 hebrew accent atnah hafukh and U+05C7 hebrew point qamats qatan. Holam Male and Holam Haser. The vowel point holam represents the vowel phoneme /o/. The consonant letter vav represents the consonant phoneme /w/, but in some words is used to represent a vowel, /o/. When the point holam is used on vav, the combination usually represents the vowel /o/, but in a very small number of cases represents the consonantvowel combination /wo/. A typographic distinction is made between these two in many versions of Biblical text. In most cases, in which vav + holam together represents the vowel /o/, the point holam is centered above the vav and referred to as holam male. In the less frequent cases, in which the vav represents the consonant /w/, some versions show the point holam positioned above left. This is referred to as holam haser. The character U+05BA hebrew point holam haser for vav is intended for use as holam haser only in those cases where a distinction is needed. When the distinction is made, the character U+05B9 hebrew point holam is used to represent the point holam male on vav. U+05BA hebrew point holam haser for vav is intended for use only on vav; results of combining this character with other base characters are not defined. Not all users distinguish between the two forms of holam, and not all implementations can be assumed to support U+05BA hebrew point holam haser for vav. Puncta Extraordinaria. In the Hebrew Bible, dots are written in various places above or below the base letters that are distinct from the vowel points and accents. These dots are referred to by scholars as puncta extraordinaria, and there are two kinds. The upper punctum, the more common of the two, has been encoded since Unicode 2.0 as U+05C4 hebrew mark upper dot. The lower punctum is used in only one verse of the Bible, Psalm 27:13, and is encoded as U+05C5 hebrew mark lower dot. The puncta generally differ in appearance from dots that occur above letters used to represent numbers; the number dots should be represented using U+0307 combining dot above and U+0308 combining diaeresis. Nun Hafukha. The nun hafukha is a special symbol that appears to have been used for scribal annotations, although its exact functions are uncertain. It is used a total of nine times in the Hebrew Bible, although not all versions include it, and there are variations in the exact locations in which it is used. There is also variation in the glyph used: it often has the appearance of a rotated or reversed nun and is very often called inverted nun; it may also appear similar to a half tet or have some other form. Currency Symbol. The new sheqel sign (U+20AA) is encoded in the currency block.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
8.2
Arabic
269
Alphabetic Presentation Forms: U+FB1D–U+FB4F The Hebrew characters in this block are chiefly of two types: variants of letters and marks encoded in the main Hebrew block, and precomposed combinations of a Hebrew letter or digraph with one or more vowels or pronunciation marks. This block contains all of the vocalized letters of the Yiddish alphabet. The alef lamed ligature and a Hebrew variant of the plus sign are included as well. The Hebrew plus sign variant, U+FB29 hebrew letter alternative plus sign, is used more often in handwriting than in print, but it does occur in school textbooks. It is used by those who wish to avoid cross symbols, which can have religious and historical connotations. U+FB20 hebrew letter alternative ayin is an alternative form of ayin that may replace the basic form U+05E2 hebrew letter ayin when there is a diacritical mark below it. The basic form of ayin is often designed with a descender, which can interfere with a mark below the letter. U+FB20 is encoded for compatibility with implementations that substitute the alternative form in the character data, as opposed to using a substitute glyph at rendering time. Use of Wide Letters. Wide letterforms are used in handwriting and in print to achieve even margins. The wide-form letters in the Unicode Standard are those that are most commonly “stretched” in justification. If Hebrew text is to be rendered with even margins, justification should be left to the text-formatting software. These alphabetic presentation forms are included for compatibility purposes. For the preferred encoding, see the Hebrew presentation forms, U+FB1D..U+FB4F. For letterlike symbols, see U+2135..U+2138.
8.2 Arabic Arabic: U+0600–U+06FF The Arabic script is used for writing the Arabic language and has been extended to represent a number of other languages, such as Persian, Urdu, Pashto, Sindhi, and Kurdish, as well as many African languages. Urdu is often written with the ornate Nastaliq script variety. Some languages, such as Indonesian/Malay, Turkish, and Ingush, formerly used the Arabic script but now employ the Latin or Cyrillic scripts. The Arabic script is cursive, even in its printed form (see Figure 8-1). As a result, the same letter may be written in different forms depending on how it joins with its neighbors. Vowels and various other marks may be written as combining marks called harakat, which are applied to consonantal base letters. In normal writing, however, these harakat are omitted. Directionality. The Arabic script is written from right to left. Conformant implementations of Arabic script must use the Unicode Bidirectional Algorithm to reorder the memory
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
270
Middle Eastern Scripts
representation for display (see Unicode Standard Annex #9, “The Bidirectional Algorithm”).
Figure 8-1. Directionality and Cursive Connection
Memory representation: After reordering: After joining: Standards. ISO/IEC 8859-6—Part 6. Latin/Arabic Alphabet. The Unicode Standard encodes the basic Arabic characters in the same relative positions as in ISO/IEC 8859-6. ISO/IEC 8859-6, in turn, is based on ECMA-114, which was based on ASMO 449. Encoding Principles. The basic set of Arabic letters is well defined. Each letter receives only one Unicode character value in the basic Arabic block, no matter how many different contextual appearances it may exhibit in text. Each Arabic letter in the Unicode Standard may be said to represent the inherent semantic identity of the letter. A word is spelled as a sequence of these letters. The representative glyph shown in the Unicode character chart for an Arabic letter is usually the form of the letter when standing by itself. It is simply used to distinguish and identify the character in the code charts and does not restrict the glyphs used to represent it. Punctuation. Most punctuation marks used with the Arabic script are not given independent codes (that is, they are unified with Latin punctuation), except for the few cases where the mark has a significantly different appearance in Arabic—namely, U+060C arabic comma, U+061B arabic semicolon, U+061E arabic triple dot punctuation mark, U+061F arabic question mark, and U+066A arabic percent sign. For paired punctuation such as parentheses, the glyphs chosen to represent U+0028 left parenthesis and U+0029 right parenthesis will depend on the direction of the rendered text. The Non-joiner and the Joiner. The Unicode Standard provides two user-selectable formatting codes: U+200C zero width non-joiner and U+200D zero width joiner (see Figure 8-2, Figure 8-3, and Figure 8-4). The use of a non-joiner between two letters prevents those letters from forming a cursive connection with each other when rendered. Examples include the Persian plural suffix, some Persian proper names, and Ottoman Turkish vowels. The use of a joiner adjacent to a suitable letter permits that letter to form a cursive connection without a visible neighbor. This provides a simple way to encode some special cases, such as exhibiting a connecting form in isolation. For further discussion of joiners and non-joiners, see Section 16.2, Layout Controls. Harakat (Vowel) Nonspacing Marks. Harakat are marks that indicate vowels or other modifications of consonant letters. The code charts depict a character in the harakat range in relation to a dashed circle, indicating that this character is intended to be applied via some process to the character that precedes it in the text stream (that is, the base character).
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
8.2
Arabic
271
Figure 8-2. Using a Joiner
Memory representation: After reordering: After joining: Figure 8-3. Using a Non-joiner
Memory representation: After reordering: After joining: Figure 8-4. Combinations of Joiners and Non-joiners
Memory representation: After reordering: After joining: General rules for applying nonspacing marks are given in Section 7.9, Combining Marks. The few marks that are placed after (to the left of) the base character are treated as ordinary spacing characters in the Unicode Standard. The Unicode Standard does not specify a sequence order in case of multiple harakat applied to the same Arabic base character, as there is no possible ambiguity of interpretation. For more information about the canonical ordering of nonspacing marks, see Section 2.11, Combining Characters, and Section 3.11, Canonical Ordering Behavior. The placement and rendering of vowel and other marks in Arabic strongly depends on the typographical environment or even the typographical style. For example, in Chapter 17, Code Charts, the default position of U+0651 L arabic shadda is with the glyph placed above the base character, whereas for U+064D arabic kasratan the glyph is placed below the base character, as shown in the first example in Figure 8-5. However, computer fonts often follow an approach that originated in metal typesetting and combine the kasratan with shadda in a ligature placed above the text, as shown in the second example in Figure 8-5. Arabic-Indic Digits. The names for the forms of decimal digits vary widely across different languages. The decimal numbering system originated in India (Devanagari vwx …) and was subsequently adopted in the Arabic world with a different appearance (Arabic ٠١٢٣ …). The Europeans adopted decimal numbers from the Arabic world, although
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
272
Middle Eastern Scripts
Figure 8-5. Placement of Harakat
once again the forms of the digits changed greatly (European 0123 …). The European forms were later adopted widely around the world and are used even in many Arabicspeaking countries in North Africa. In each case, the interpretation of decimal numbers remained the same. However, the forms of the digits changed to such a degree that they are no longer recognizably the same characters. Because of the origin of these characters, the European decimal numbers are widely known as “Arabic numerals” or “Hindi-Arabic numerals,” whereas the decimal numbers in use in the Arabic world are widely known there as “Hindi numbers.” The Unicode Standard includes Indic digits (including forms used with different Indic scripts), Arabic digits (with forms used in most of the Arabic world), and European digits (now used internationally). Because of this decision, the traditional names could not be retained without confusion. In addition, there are two main variants of the Arabic digits: those used in Iran, Pakistan, and Afghanistan (here called Eastern Arabic-Indic) and those used in other parts of the Arabic world. In summary, the Unicode Standard uses the names shown in Table 8-1.
Table 8-1. Arabic Digit Names Name
Code Points
Forms
European
U+0030..U+0039
0123456789
Arabic-Indic
U+0660..U+0669
Eastern Arabic-Indic
U+06F0..U+06F9
Indic (Devanagari)
U+0966..U+096F
٠١٢٣٤٥٦٧٨٩ ÒÚÛÙıˆ˜¯˘ vwx yz{|}~
There is substantial variation in usage of glyphs for the Eastern Arabic-Indic digits, especially for the digits four, five, six, and seven. Table 8-2 illustrates this variation with some example glyphs for digits in languages of Iran, Pakistan, and India. While some usage of the Persian glyph for U+06F7 extended arabic-indic digit seven can be documented for Sindhi, the form shown in Table 8-2 is predominant. The Unicode Standard provides a single, complete sequence of digits for Persian, Sindhi, and Urdu to account for the differences in appearance and directional treatment when rendering them. (For a complete discussion of directional formatting of numbers in the Unicode Standard, see Unicode Standard Annex #9, “The Bidirectional Algorithm.”)
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
8.2
Arabic
273
Table 8-2. Glyph Variation in Eastern Arabic-Indic Digits Code Point Digit
Persian
Sindhi
Urdu
U+06F4
4
D
d
T
U+06F5
5
E
e
U
U+06F6
6
F
f
V
U+06F7
7
G
g
W
Extended Arabic Letters. Arabic script is used to write major languages, such as Persian and Urdu, but it has also been used to transcribe some lesser-used languages, such as Baluchi and Lahnda, which have little tradition of printed typography. As a result, the Unicode Standard encodes multiple forms of some Extended Arabic letters because the character forms and usages are not well documented for a number of languages. For additional extended Arabic letters, see the Arabic Supplement block, U+0750..U+077F. Koranic Annotation Signs. These characters are used in the Koran to mark pronunciation and other annotation. The enclosing mark U+06DE is used to enclose a digit. When rendered, the digit appears in a smaller size. Additional Vowel Marks. When the Arabic script is adopted as the writing system for a language other than Arabic, it is often necessary to represent vowel sounds or distinctions not made in Arabic. In some cases, conventions such as the addition of small dots above and/or below the standard Arabic fatha, damma, and kasra signs have been used. Classical Arabic has only three canonical vowels (/a/, /i/, /u/), whereas languages such as Urdu and Persian include other contrasting vowels such as /o/ and /e/. For this reason, it is imperative that speakers of these languages be able to show the difference between /e/ and / i/ (U+0656 arabic subscript alef), and between /o/ and /u/ (U+0657 arabic inverted damma). At the same time, the use of these two diacritics in Arabic is redundant, merely emphasizing that the underlying vowel is long. Honorifics. Marks known as honorifics represent phrases expressing the status of a person and are in widespread use in the Arabic-script world. Most have a specifically religious meaning. In effect, these marks are combining characters at the word level, rather than being associated with a single base character. Depending on the letter shapes present in the name and the calligraphic style in use, the honorific mark may be applied to a letter somewhere in the middle of the name. The normalization algorithm does not move such wordlevel combining characters to the end of the word. Date Separator. U+060D arabic date separator is used in Pakistan and India between the numeric date and the month name when writing out a date. This sign is distinct from U+002F solidus, which is used, for example, as a separator in currency amounts. Full Stop. U+061E arabic triple dot punctuation mark is encoded for traditional orthographic practice using the Arabic script to write African languages such as Hausa, Wolof, Fulani, and Mandinka. These languages use arabic triple dot punctuation mark as a full stop.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
274
Middle Eastern Scripts
Currency Symbols. U+060B afghani sign is a currency symbol used in Afghanistan. The symbol is derived from an abbreviation of the name of the currency, which has become a symbol in its own right. U+FDFC rial sign is a currency symbol used in Iran. Unlike the afghani sign, U+FDFC rial sign is considered a compability character, encoded for compatibility with Iranian standards. Ordinarily in Persian “rial” is simply spelled out as the sequence of letters, <0631, 06CC, 0627, 0644>. End of Ayah. U+06DD arabic end of ayah graphically encloses a sequence of zero or more digits (of General Category Nd) that follow it in the data stream. The enclosure terminates with any non-digit. For behavior of a similar prefixed formatting control, see the discussion of U+070F syriac abbreviation mark in Section 8.3, Syriac. Other Signs Spanning Numbers. Several other special signs are written in association with numbers in the Arabic script. U+0600 arabic number sign signals the beginning of a number; it is written below the digits of the number. U+0601 arabic sign sanah indicates a year (that is, as part of a date). This sign is rendered below the digits of the number it precedes. Its appearance is a vestigial form of the Arabic word for year, /sanatu/ (seen noon teh-marbuta), but it is now a sign in its own right and is widely used to mark a numeric year even in non-Arabic languages where the Arabic word would not be known. The use of the year sign is illustrated in Figure 8-6.
Figure 8-6. Arabic Year Sign
Z
U+0602 arabic footnote marker is another of these signs; it is used in the Arabic script in conjunction with the footnote number itself. It also precedes the digits in logical order and is written to extend underneath them. Finally, U+0603 arabic sign safha functions as a page sign, preceding and extending under a sequence of digits for a page number. Like U+06DD arabic end of ayah, all of these signs can span multiple-digit numbers, rather than just a single digit. They are not formally considered combining marks in the sense used by the Unicode Standard, although they clearly interact graphically with the sequence of digits that follows them. They precede the sequence of digits that they span, rather than following a base character, as would be the case for a combining mark. Their General Category value is Cf (format control character). Unlike most other format control characters, however, they should be rendered with a visible glyph, even in circumstances where no suitable digit or sequence of digits follows them in logical order. Poetic Verse Sign. U+060E arabic poetic verse sign is a special symbol often used to mark the beginning of a poetic verse. Although it is similar to U+0602 arabic footnote marker in appearance, the poetic sign is simply a symbol. In contrast, the footnote marker
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
8.2
Arabic
275
is a format control character that has complex rendering in conjunction with following digits. U+060F arabic sign misra is another symbol used in poetry.
Arabic Cursive Joining Minimum Rendering Requirements. A rendering or display process must convert between the logical order in which characters are placed in the backing store and the visual (or physical) order required by the display device. See Unicode Standard Annex #9, “The Bidirectional Algorithm,” for a description of the conversion between logical and visual orders. The cursive nature of the Arabic script imposes special requirements on display or rendering processes that are not typically found in Latin script-based systems. At a minimum, a display process must select an appropriate glyph to depict each Arabic letter according to its immediate joining context; furthermore, it must substitute certain ligature glyphs for sequences of Arabic characters. The remainder of this section specifies a minimum set of rules that provide legible Arabic joining and ligature substitution behavior. Joining Classes. Each Arabic letter must be depicted by one of a number of possible contextual glyph forms. The appropriate form is determined on the basis of its joining class and the joining class of adjacent characters. Each Arabic character falls into one of the classes shown in Table 8-3. (See ArabicShaping.txt in the Unicode Character Database for a complete list.) In this table, right and left refer to visual order. The characters of the rightjoining class are exemplified in more detail in Table 8-8, and those of the dual-joining class are shown in Table 8-7. When characters do not join or cause joining (such as dammatan), they are classified as transparent.
Table 8-3. Primary Arabic Joining Classes Joining Class
Symbols
Members
Right-joining Left-joining Dual-joining Join-causing
R L D C
Non-joining
U
ALEF, DAL, THAL, REH, ZAIN … None BEH, TEH, THEH, JEEM … ZERO WIDTH JOINER (200D) and TATWEEL (0640). These characters are distinguished from the dual-joining characters in that they do not change shape themselves. ZERO WIDTH NON-JOINER (200C) and all spacing characters, except those explicitly mentioned as being one of the other joining classes, are non-joining. These include HAMZA (0621), HIGH HAMZA (0674), spaces, digits, punctuation, non-Arabic letters, and so on. Also, U+0600 arabic number sign..U+0603 arabic sign safha and U+06DD arabic end of ayah.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
276
Middle Eastern Scripts
Table 8-3. Primary Arabic Joining Classes (Continued) Joining Class
Symbols
Members
Transparent
T
All nonspacing marks (General Category Mn or Me) and most format control characters (General Category Cf) are transparent to cursive joining. These include FATHATAN (064B) and other Arabic harakat, HAMZA BELOW (0655), SUPERSCRIPT ALEF (0670), combining Koranic annotation signs, and nonspacing marks from other scripts. Also U+070F syriac abbreviation mark.
Table 8-4 defines derived superclasses of the primary Arabic joining classes; those superclasses are used in the cursive joining rules. In this table, right and left refer to visual order.
Table 8-4. Derived Arabic Joining Classes Joining Class
Members
Right join-causing Left join-causing
Superset of dual-joining, left-joining, and join-causing Superset of dual-joining, right-joining, and join-causing
Joining Rules. The following rules describe the joining behavior of Arabic letters in terms of their display (visual) order. In other words, the positions of letterforms in the included examples are presented as they would appear on the screen after the Bidirectional Algorithm has reordered the characters of a line of text. An implementation may choose to restate the following rules according to logical order so as to apply them before the Bidirectional Algorithm’s reordering phase. In this case, the words right and left as used in this section would become preceding and following. In the following rules, if X refers to a character, then various glyph types representing that character are referred to as shown in Table 8-5.
Table 8-5. Arabic Glyph Types Glyph Types
Description
Xn Xr
Nominal glyph form as it appears in the code charts Right-joining glyph form (both right-joining and dual-joining characters may employ this form) Left-joining glyph form (both left-joining and dual-joining characters may employ this form) Dual-joining (medial) glyph form that joins on both left and right (only dualjoining characters employ this form)
Xl Xm
R1 Transparent characters do not affect the joining behavior of base (spacing) characters. For example:
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html
Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html
Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html
This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006
Chapter 9
South Asian Scripts-I
9
The following South Asian scripts are described in this chapter: Devanagari
Gujarati
Telugu
Bengali
Oriya
Kannada
Gurmukhi
Tamil
Malayalam
The scripts of South Asia share so many common features that a side-by-side comparison of a few will often reveal structural similarities even in the modern letterforms. With minor historical exceptions, they are written from left to right. They are all abugidas in which most symbols stand for a consonant plus an inherent vowel (usually the sound /a/). Wordinitial vowels in many of these scripts have distinct symbols, and word-internal vowels are usually written by juxtaposing a vowel sign in the vicinity of the affected consonant. Absence of the inherent vowel, when that occurs, is frequently marked with a special sign. In the Unicode Standard, this sign is denoted by the Sanskrit word virZma. In some languages, another designation is preferred. In Hindi, for example, the word hal refers to the character itself, and halant refers to the consonant that has its inherent vowel suppressed; in Tamil, the word pukki is used. The virama sign nominally serves to suppress the inherent vowel of the consonant to which it is applied; it is a combining character, with its shape varying from script to script. Most of the scripts of South Asia, from north of the Himalayas to Sri Lanka in the south, from Pakistan in the west to the easternmost islands of Indonesia, are derived from the ancient Brahmi script. The oldest lengthy inscriptions of India, the edicts of Ashoka from the third century bce, were written in two scripts, Kharoshthi and Brahmi. These are both ultimately of Semitic origin, probably deriving from Aramaic, which was an important administrative language of the Middle East at that time. Kharoshthi, written from right to left, was supplanted by Brahmi and its derivatives. The descendants of Brahmi spread with myriad changes throughout the subcontinent and outlying islands. There are said to be some 200 different scripts deriving from it. By the eleventh century, the modern script known as Devanagari was in ascendancy in India proper as the major script of Sanskrit literature. The North Indian branch of scripts was, like Brahmi itself, chiefly used to write Indo-European languages such as Pali and Sanskrit, and eventually the Hindi, Bengali, and Gujarati languages, though it was also the source for scripts for non-Indo-European languages such as Tibetan, Mongolian, and Lepcha.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
296
South Asian Scripts-I
The South Indian scripts are also derived from Brahmi and, therefore, share many structural characteristics. These scripts were first used to write Pali and Sanskrit but were later adapted for use in writing non-Indo-European languages—namely, the languages of the Dravidian family of southern India and Sri Lanka. Because of their use for Dravidian languages, the South Indian scripts developed many characteristics that distinguish them from the North Indian scripts. South Indian scripts were also exported to southeast Asia and were the source of scripts such as Lanna and Myanmar, as well as the insular scripts of the Philippines and Indonesia. The shapes of letters in the South Indian scripts took on a quite distinct look from the shapes of letters in the North Indian scripts. Some scholars suggest that this occurred because writing materials such as palm leaves encouraged changes in the way letters were written. The major official scripts of India proper, including Devanagari, are documented in this chapter. They are all encoded according to a common plan, so that comparable characters are in the same order and relative location. This structural arrangement, which facilitates transliteration to some degree, is based on the Indian national standard (ISCII) encoding for these scripts and makes use of a virama. While the arrangement of the encoding for the scripts of India is based on ISCII, this does not imply that the rendering behavior of South Indian scripts in particular is the same as that of Devanagari or other North Indian scripts. Implementations should ensure that adequate attention is given to the actual behavior of those scripts; they should not assume that they work just as Devanagari does. Each block description in this chapter describes the most important aspects of rendering for a particular script as well as unique behaviors it may have. Many of the character names in this group of scripts represent the same sounds, and common naming conventions are used for the scripts of India.
9.1 Devanagari Devanagari: U+0900–U+097F The Devanagari script is used for writing classical Sanskrit and its modern historical derivative, Hindi. Extensions to the Sanskrit repertoire are used to write other related languages of India (such as Marathi) and of Nepal (Nepali). In addition, the Devanagari script is used to write the following languages: Awadhi, Bagheli, Bhatneri, Bhili, Bihari, Braj Bhasha, Chhattisgarhi, Garhwali, Gondi (Betul, Chhindwara, and Mandla dialects), Harauti, Ho, Jaipuri, Kachchhi, Kanauji, Konkani, Kului, Kumaoni, Kurku, Kurukh, Marwari, Mundari, Newari, Palpa, and Santali. All other Indic scripts, as well as the Sinhala script of Sri Lanka, the Tibetan script, and the Southeast Asian scripts, are historically connected with the Devanagari script as descendants of the ancient Brahmi script. The entire family of scripts shares a large number of structural features.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
9.1
Devanagari
297
The principles of the Indic scripts are covered in some detail in this introduction to the Devanagari script. The remaining introductions to the Indic scripts are abbreviated but highlight any differences from Devanagari where appropriate. Standards. The Devanagari block of the Unicode Standard is based on ISCII-1988 (Indian Script Code for Information Interchange). The ISCII standard of 1988 differs from and is an update of earlier ISCII standards issued in 1983 and 1986. The Unicode Standard encodes Devanagari characters in the same relative positions as those coded in positions A0–F416 in the ISCII-1988 standard. The same character code layout is followed for eight other Indic scripts in the Unicode Standard: Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam. This parallel code layout emphasizes the structural similarities of the Brahmi scripts and follows the stated intention of the Indian coding standards to enable one-to-one mappings between analogous coding positions in different scripts in the family. Sinhala, Tibetan, Thai, Lao, Khmer, Myanmar, and other scripts depart to a greater extent from the Devanagari structural pattern, so the Unicode Standard does not attempt to provide any direct mappings for these scripts to the Devanagari order. In November 1991, at the time The Unicode Standard, Version 1.0, was published, the Bureau of Indian Standards published a new version of ISCII in Indian Standard (IS) 13194:1991. This new version partially modified the layout and repertoire of the ISCII1988 standard. Because of these events, the Unicode Standard does not precisely follow the layout of the current version of ISCII. Nevertheless, the Unicode Standard remains a superset of the ISCII-1991 repertoire except for a number of new Vedic extension characters defined in IS 13194:1991 Annex G—Extended Character Set for Vedic. Modern, non-Vedic texts encoded with ISCII-1991 may be automatically converted to Unicode code points and back to their original encoding without loss of information. Encoding Principles. The writing systems that employ Devanagari and other Indic scripts constitute abugidas—a cross between syllabic writing systems and alphabetic writing systems. The effective unit of these writing systems is the orthographic syllable, consisting of a consonant and vowel (CV) core and, optionally, one or more preceding consonants, with a canonical structure of (((C )C )C)V. The orthographic syllable need not correspond exactly with a phonological syllable, especially when a consonant cluster is involved, but the writing system is built on phonological principles and tends to correspond quite closely to pronunciation. The orthographic syllable is built up of alphabetic pieces, the actual letters of the Devanagari script. These pieces consist of three distinct character types: consonant letters, independent vowels, and dependent vowel signs. In a text sequence, these characters are stored in logical (phonetic) order.
Principles of the Devanagari Script Rendering Devanagari Characters. Devanagari characters, like characters from many other scripts, can combine or change shape depending on their context. A character’s
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
298
South Asian Scripts-I
appearance is affected by its ordering with respect to other characters, the font used to render the character, and the application or system environment. These variables can cause the appearance of Devanagari characters to differ from their nominal glyphs (used in the code charts). Additionally, a few Devanagari characters cause a change in the order of the displayed characters. This reordering is not commonly seen in non-Indic scripts and occurs independently of any bidirectional character reordering that might be required. Consonant Letters. Each consonant letter represents a single consonantal sound but also has the peculiarity of having an inherent vowel, generally the short vowel /a/ in Devanagari and the other Indic scripts. Thus U+0915 devanagari letter ka represents not just /k/ but also /ka/. In the presence of a dependent vowel, however, the inherent vowel associated with a consonant letter is overridden by the dependent vowel. Consonant letters may also be rendered as half-forms, which are presentation forms used to depict the initial consonant in consonant clusters. These half-forms do not have an inherent vowel. Their rendered forms in Devanagari often resemble the full consonant but are missing the vertical stem, which marks a syllabic core. (The stem glyph is graphically and historically related to the sign denoting the inherent /a/ vowel.) Some Devanagari consonant letters have alternative presentation forms whose choice depends on neighboring consonants. This variability is especially notable for U+0930 devanagari letter ra, which has numerous different forms, both as the initial element and as the final element of a consonant cluster. Only the nominal forms, rather than the contextual alternatives, are depicted in the code chart. The traditional Sanskrit/Devanagari alphabetic encoding order for consonants follows articulatory phonetic principles, starting with velar consonants and moving forward to bilabial consonants, followed by liquids and then fricatives. ISCII and the Unicode Standard both observe this traditional order. Independent Vowel Letters. The independent vowels in Devanagari are letters that stand on their own. The writing system treats independent vowels as orthographic CV syllables in which the consonant is null. The independent vowel letters are used to write syllables that start with a vowel. Dependent Vowel Signs (Matras). The dependent vowels serve as the common manner of writing noninherent vowels and are generally referred to as vowel signs, or as matras in Sanskrit. The dependent vowels do not stand alone; rather, they are visibly depicted in combination with a base letterform. A single consonant or a consonant cluster may have a dependent vowel applied to it to indicate the vowel quality of the syllable, when it is different from the inherent vowel. Explicit appearance of a dependent vowel in a syllable overrides the inherent vowel of a single consonant letter. The greatest variation among different Indic scripts is found in the way that the dependent vowels are applied to base letterforms. Devanagari has a collection of nonspacing dependent vowel signs that may appear above or below a consonant letter, as well as spacing dependent vowel signs that may occur to the right or to the left of a consonant letter or
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
9.1
Devanagari
299
consonant cluster. Other Indic scripts generally have one or more of these forms, but what is a nonspacing mark in one script may be a spacing mark in another. Also, some of the Indic scripts have single dependent vowels that are indicated by two or more glyph components—and those glyph components may surround a consonant letter both to the left and to the right or may occur both above and below it. The Devanagari script has only one character denoting a left-side dependent vowel sign: U+093F devanagari vowel sign i. Other Indic scripts either have no such vowel signs (Telugu and Kannada) or include as many as three of these signs (Bengali, Tamil, and Malayalam). Vowel Letters. Vowel letters are encoded atomically in Unicode, even if they can be analyzed visually as consisting of multiple parts. Table 9-1 shows the letters that can be analyzed, the single code point that should be used to represent them in text, and the sequence of code points resulting from analysis that should not be used.
Table 9-1. Devanagari Vowel Letters To Represent
N O ä ç R S ë U V W
Use
Do Not Use
0904
<0905, 0946>
0906
<0905, 093E>
090A
<0909, 0941>
090D
<090F, 0945>
090E
<090F, 0946>
0910
<090F, 0947>
0911
<0905, 0949>
0912
<0905, 094A>
0913
<0905, 094B>
0914
<0905, 094C>
Virama (Halant). Devanagari employs a sign known in Sanskrit as the virama or vowel omission sign. In Hindi, it is called hal or halant, and that term is used in referring to the virama or to a consonant with its vowel suppressed by the virama. The terms are used interchangeably in this section. The virama sign, U+094D devanagari sign virama, nominally serves to cancel (or kill) the inherent vowel of the consonant to which it is applied. When a consonant has lost its inherent vowel by the application of virama, it is known as a dead consonant; in contrast, a live consonant is one that retains its inherent vowel or is written with an explicit dependent vowel sign. In the Unicode Standard, a dead consonant is defined as a sequence consisting
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
300
South Asian Scripts-I
of a consonant letter followed by a virama. The default rendering for a dead consonant is to position the virama as a combining mark bound to the consonant letterform. For example, if Cn denotes the nominal form of consonant C, and Cd denotes the dead consonant form, then a dead consonant is encoded as shown in Figure 9-1.
Figure 9-1. Dead Consonants in Devanagari TAn + VIRAMAn → TAd
Ã
+
˜
→
Ø
Consonant Conjuncts. The Indic scripts are noted for a large number of consonant conjunct forms that serve as orthographic abbreviations (ligatures) of two or more adjacent letterforms. This abbreviation takes place only in the context of a consonant cluster. An orthographic consonant cluster is defined as a sequence of characters that represents one or more dead consonants (denoted Cd) followed by a normal, live consonant letter (denoted Cl). Under normal circumstances, a consonant cluster is depicted with a conjunct glyph if such a glyph is available in the current font. In the absence of a conjunct glyph, the one or more dead consonants that form part of the cluster are depicted using half-form glyphs. In the absence of half-form glyphs, the dead consonants are depicted using the nominal consonant forms combined with visible virama signs (see Figure 9-2).
Figure 9-2. Conjunct Formations in Devanagari ₍1₎ GAd + DHAl → GAh + DHAn
ª˜
+
œ
→
ǜ
₍2₎ KAd + KAl → K.KAn
∑˜
+
∑→ P
₍3₎ KAd + SSAl → K.SSAn
∑˜
·
+
S
→
₍4₎ RAd + KAl → KAl + RAsup
⁄˜
+
∑
→
∑F
A number of types of conjunct formations appear in these examples: (1) a half-form of GA in its combination with the full form of DHA; (2) a vertical conjunct K.KA; and (3) a fully ligated conjunct K.SSA, in which the components are no longer distinct. In example (4) in Figure 9-2, the dead consonant RAd is depicted with the nonspacing combining mark RAsup (repha). A well-designed Indic script font may contain hundreds of conjunct glyphs, but they are not encoded as Unicode characters because they are the result of ligation of distinct letters.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
9.1
Devanagari
301
Indic script rendering software must be able to map appropriate combinations of characters in context to the appropriate conjunct glyphs in fonts. Explicit Virama (Halant). Normally a virama character serves to create dead consonants that are, in turn, combined with subsequent consonants to form conjuncts. This behavior usually results in a virama sign not being depicted visually. Occasionally, this default behavior is not desired when a dead consonant should be excluded from conjunct formation, in which case the virama sign is visibly rendered. To accomplish this goal, the Unicode Standard adopts the convention of placing the character U+200C zero width non-joiner immediately after the encoded dead consonant that is to be excluded from conjunct formation. In this case, the virama sign is always depicted as appropriate for the consonant to which it is attached. For example, in Figure 9-3, the use of zero width non-joiner prevents the default formation of the conjunct form (K.SSAn).
S
Figure 9-3. Preventing Conjunct Forms in Devanagari KAd + ZWNJ + SSAl → KAd + SSAn
∑˜
+
ZW NJ
+
·
→
∑· ˜
Explicit Half-Consonants. When a dead consonant participates in forming a conjunct, the dead consonant form is often absorbed into the conjunct form, such that it is no longer distinctly visible. In other contexts, the dead consonant may remain visible as a half-consonant form. In general, a half-consonant form is distinguished from the nominal consonant form by the loss of its inherent vowel stem, a vertical stem appearing to the right side of the consonant form. In other cases, the vertical stem remains but some part of its right-side geometry is missing. In certain cases, it is desirable to prevent a dead consonant from assuming full conjunct formation yet still not appear with an explicit virama. In these cases, the half-form of the consonant is used. To explicitly encode a half-consonant form, the Unicode Standard adopts the convention of placing the character U+200D zero width joiner immediately after the encoded dead consonant. The zero width joiner denotes a nonvisible letter that presents linking or cursive joining behavior on either side (that is, to the previous or following letter). Therefore, in the present context, the zero width joiner may be considered to present a context to which a preceding dead consonant may join so as to create the half-form of the consonant. For example, if Ch denotes the half-form glyph of consonant C, then a half-consonant form is represented as shown in Figure 9-4. In the absence of the zero width joiner, the sequence in Figure 9-4 would normally produce the full conjunct form (K.SSAn).
S
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
302
South Asian Scripts-I
Figure 9-4. Half-Consonants in Devanagari KAd + ZWJ + SSAl → KAh + SSAn
∑˜
+
ZW J
+
·
→
Ä·
This encoding of half-consonant forms also applies in the absence of a base letterform. That is, this technique may be used to encode independent half-forms, as shown in Figure 9-5.
Figure 9-5. Independent Half-Forms in Devanagari GAd
+
ZWJ
ª˜
+
ZW J
→ GAh
Ç
→
Other Indic scripts have similar half-forms for the initial consonants of a conjunct. Some, such as Oriya, also have similar half-forms for the final consonants; those are represented as shown in Figure 9-6.
Figure 9-6. Half-Consonants in Oriya KAn + ZWJ + VIRAMA + TAl → KAl + TAh
<˜
+
ZW J
+
>
+
U
→
<
In the absence of the zero width joiner, the sequence in Figure 9-6 would normally pro(K.TAn). duce the full conjunct form
V
Consonant Forms. In summary, each consonant may be encoded such that it denotes a live consonant, a dead consonant that may be absorbed into a conjunct, the half-form of a dead consonant, or a dead consonant with an overt halant that does not get absorbed into a conjunct (see Figure 9-7). As the rendering of conjuncts and half-forms depends on the availability of glyphs in the font, the following fallback strategy should be employed: • If the coded character sequence would normally render with a full conjunct, but such a conjunct is not available, the fallback rendering is to use half-forms. If those are not available, the fallback rendering should use an explicit (visible) virama.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
9.1
Devanagari
303
Figure 9-7. Consonant Forms in Devanagari and Oriya
• If the coded character sequence would normally render with a half-form (it contains a ZWJ), but half-forms are not available, the fallback rendering should use an explicit (visible) virama.
Rendering Devanagari Rules for Rendering. This section provides more formal and detailed rules for minimal rendering of Devanagari as part of a plain text sequence. It describes the mapping between Unicode characters and the glyphs in a Devanagari font. It also describes the combining and ordering of those glyphs. These rules provide minimal requirements for legibly rendering interchanged Devanagari text. As with any script, a more complex procedure can add rendering characteristics, depending on the font and application. In a font that is capable of rendering Devanagari, the number of glyphs is greater than the number of Devanagari characters. Notation. In the next set of rules, the following notation applies: Cn
Nominal glyph form of consonant C as it appears in the code charts.
Cl
A live consonant, depicted identically to Cn.
Cd
Glyph depicting the dead consonant form of consonant C.
Ch
Glyph depicting the half-consonant form of consonant C.
Ln
Nominal glyph form of a conjunct ligature consisting of two or more component consonants. A conjunct ligature composed of two consonants X and Y is also denoted X.Yn.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
304
South Asian Scripts-I
RAsup
A nonspacing combining mark glyph form of U+0930 devanagari letter ra positioned above or attached to the upper part of a base glyph form. This form is also known as repha.
RAsub
A nonspacing combining mark glyph form of U+0930 devanagari letter ra positioned below or attached to the lower part of a base glyph form.
Vvs
Glyph depicting the dependent vowel sign form of a vowel V.
VIRAMAn
The nominal glyph form of the nonspacing combining mark depicting U+094D devanagari sign virama.
A virama character is not always depicted. When it is depicted, it adopts this nonspacing mark form. Dead Consonant Rule. The following rule logically precedes the application of any other rule to form a dead consonant. Once formed, a dead consonant may be subject to other rules described next. R1 When a consonant Cn precedes a VIRAMAn , it is considered to be a dead consonant Cd . A consonant Cn that does not precede VIRAMAn is considered to be a live consonant Cl .
TAn + VIRAMAn → TAd
Ã
˜
+
→
Ø
Consonant RA Rules. The character U+0930 devanagari letter ra takes one of a number of visual forms depending on its context in a consonant cluster. By default, this letter is depicted with its nominal glyph form (as shown in the code charts). In some contexts, it is depicted using one of two nonspacing glyph forms that combine with a base letterform. R2 If the dead consonant RAd precedes a consonant, then it is replaced by the superscript nonspacing mark RAsup , which is positioned so that it applies to the logically subsequent element in the memory representation.
RAd + KAl → KAl + RAsup
⁄˜
+
∑
→
∑
+
F
Displayed Output
→
∑F
→
⁄˜ F
RAd + RAd → RAd + RAsup 1
⁄˜
2
+
⁄˜
2
→
Copyright © 1991-2007, Unicode, Inc.
⁄˜
1
+
F
The Unicode Standard 5.0 – Electronic edition
9.1
Devanagari
305
R3 If the superscript mark RAsup is to be applied to a dead consonant and that dead consonant is combined with another consonant to form a conjunct ligature, then the mark is positioned so that it applies to the conjunct ligature form as a whole.
RAd + JAd + NYAl → J.NYAn + RAsup
⁄˜
+
¡˜
ƒ
+
Æ
→
Displayed Output
F
+
ÆF
→
R4 If the superscript mark RAsup is to be applied to a dead consonant that is subsequently replaced by its half-consonant form, then the mark is positioned so that it applies to the form that serves as the base of the consonant cluster.
RAd + GAd + GHAl → GAh + GHAl + RAsup
⁄˜
+
ª˜
Ω
+
→
Ç
+
Ω
F
+
Displayed Output
→
ÇΩ F
R5 In conformance with the ISCII standard, the half-consonant form RRAh is represented as eyelash-RA. This form of RA is commonly used in writing Marathi and Newari.
RRAn + VIRAMAn → RRAh
⁄.
˜
+
→
:
R5a For compatibility with The Unicode Standard, Version 2.0, if the dead consonant RAd precedes zero width joiner, then the half-consonant form RAh , depicted as eyelash-RA, is used instead of RAsup .
RAd
+
ZWJ
⁄˜
+
ZW J
→ RAh →
:
R6 Except for the dead consonant RAd , when a dead consonant Cd precedes the live consonant RAl , then Cd is replaced with its nominal form Cn , and RA is replaced by the subscript nonspacing mark RAsub , which is positioned so that it applies to Cn .
TTHAd + RAl → TTHAn + RAsub Displayed Output
∆˜
+
⁄
→
∆
The Unicode Standard 5.0 – Electronic edition
+
˛
→
∆˛
Copyright © 1991–2007 Unicode, Inc.
306
South Asian Scripts-I
R7 For certain consonants, the mark RAsub may graphically combine with the consonant to form a conjunct ligature form. These combinations, such as the one shown here, are further addressed by the ligature rules described shortly.
PHAd + RAl → PHAn + RAsub
”˜
+
⁄
”
→
˛
+
Displayed Output
→
p
R8 If a dead consonant (other than RAd ) precedes RAd , then the substitution of RA for RAsub is performed as described above; however, the VIRAMA that formed RAd remains so as to form a dead consonant conjunct form.
TAd + RAd → TAn + RAsub + VIRAMAn → T.RAd
Ø
+
⁄˜
→
Ã
˛
+
˜
+
→
d˜
A dead consonant conjunct form that contains an absorbed RAd may subsequently combine to form a multipart conjunct form.
T.RAd + YAl → T.R.YAn
d˜
+
ÿ
òÿ
→
Modifier Mark Rules. In addition to vowel signs, three other types of combining marks may be applied to a component of an orthographic syllable or to the syllable as a whole: nukta, bindus, and svaras. R9 The nukta sign, which modifies a consonant form, is placed immediately after the consonant in the memory representation and is attached to that consonant in rendering. If the consonant represents a dead consonant, then NUKTA should precede VIRAMA in the memory representation.
KAn + NUKTAn + VIRAMAn → QAd
∑
+
.
+
˜
→
∑ .˜
R10 Other modifying marks, in particular bindus and svaras, apply to the orthographic syllable as a whole and should follow (in the memory representation) all other characters that constitute the syllable. The bindus should follow any vowel signs, and the svaras should come last. The relative placement of these marks is
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
9.1
Devanagari
307
horizontal rather than vertical; the horizontal rendering order may vary according to typographic concerns.
KAn + AAvs + CANDRABINDUn
∑
Ê
+
°
+
∑Ê °
→
Ligature Rules. Subsequent to the application of the rules just described, a set of rules governing ligature formation apply. The precise application of these rules depends on the availability of glyphs in the current font being used to display the text. R11 If a dead consonant immediately precedes another dead consonant or a live consonant, then the first dead consonant may join the subsequent element to form a two-part conjunct ligature form.
JAd + NYAl → J.NYAn
¡˜
+
ƒ
TTAd + TTHAl → TT.TTHAn
Æ
→
≈˜
+
∆
_
→
R12 A conjunct ligature form can itself behave as a dead consonant and enter into further, more complex ligatures.
SAd + TAd + RAn → SAd + T.RAn → S.T.RAn
‚˜
+
Ø
⁄
+
→
‚˜
+
d
→
ù
A conjunct ligature form can also produce a half-form.
K.SSAd + YAl → K.SSh + YAn
S˜
+
ÿ
→
óÿ
R13 If a nominal consonant or conjunct ligature form precedes RAsub as a result of the application of rule R6, then the consonant or ligature form may join with RAsub to form a multipart conjunct ligature (see rule R6 for more information).
KAn + RAsub → K.RAn
∑
+
˛
→
R
The Unicode Standard 5.0 – Electronic edition
PHAn + RAsub → PH.RAn
”
+
˛
→
p
Copyright © 1991–2007 Unicode, Inc.
308
South Asian Scripts-I
R14 In some cases, other combining marks will combine with a base consonant, either attaching at a nonstandard location or changing shape. In minimal rendering, there are only two cases: RAl with Uvs or UUvs .
RAl + Uvs → RUn
⁄
G
+
→
RAl + UUvs → RUUn
L
⁄
+
H
→
M
Memory Representation and Rendering Order. The storage of plain text in Devanagari and all other Indic scripts generally follows phonetic order; that is, a CV syllable with a dependent vowel is always encoded as a consonant letter C followed by a vowel sign V in the memory representation. This order is employed by the ISCII standard and corresponds to both the phonetic order and the keying order of textual data (see Figure 9-8).
Figure 9-8. Rendering Order in Devanagari Character Order
Glyph Order
KAn
+
Ivs →
∑
+
Á
Ivs + KAn
Á∑
→
Because Devanagari and other Indic scripts have some dependent vowels that must be depicted to the left side of their consonant letter, the software that renders the Indic scripts must be able to reorder elements in mapping from the logical (character) store to the presentational (glyph) rendering. For example, if Cn denotes the nominal form of consonant C, and Vvs denotes a left-side dependent vowel sign form of vowel V, then a reordering of glyphs with respect to encoded characters occurs as just shown. R15 When the dependent vowel Ivs is used to override the inherent vowel of a syllable, it is always written to the extreme left of the orthographic syllable. If the orthographic syllable contains a consonant cluster, then this vowel is always depicted to the left of that cluster.
TAd + RAl + Ivs → T.RAn + Ivs → Ivs + T.RAd
Ø
+
⁄
+
Á
→
Copyright © 1991-2007, Unicode, Inc.
d
+
Á
→
Ád
The Unicode Standard 5.0 – Electronic edition
9.1
Devanagari
309
R16 The presence of an explicit virama (either caused by a ZWNJ or by the absence of a conjunct in the font) blocks this reordering, and the dependent vowel Ivs is rendered after the rightmost such explicit virama.
TAd + Ã + RAl + Ivs → TAd + Ivs + RAl
§ + Ã + ⁄+ Á→ F Sample Half-Forms. Table 9-2 shows examples of half-consonant forms that are commonly used with the Devanagari script. These forms are glyphs, not characters. They may be encoded explicitly using zero width joiner as shown. In normal conjunct formation, they may be used spontaneously to depict a dead consonant in combination with subsequent consonant forms.
Table 9-2. Sample Devanagari Half-Forms
∑+ π+ ª+ Ω+ ø+ ¡+ √+ ƒ+ À+ Ã+ Õ+ œ+
0
+
Ä →
0
+
Ä →
0
+
Ä →
0
+
Ä →
0
+
Ä →
0
+
Ä →
0
+
Ä →
0
+
Ä →
0
+
Ä →
0
+
Ä →
0
+
Ä →
0
+
Ä →
Ä Å Ç É Ñ Ö ß Ü á à â ä
–+ “+ ”+ ’+ ÷+ ◊+ ÿ+ ‹+ fl+ ‡+ ·+ ‚+
0
+
Ä →
0
+
Ä →
0
+
Ä →
0
+
Ä →
0
+
Ä →
0
+
Ä →
0
+
Ä →
0
+
Ä →
0
+
Ä →
0
+
Ä →
0
+
Ä →
0
+
Ä →
ã å ç é è ê ë í ì î ï ñ
Sample Ligatures. Table 9-3 shows examples of conjunct ligature forms that are commonly used with the Devanagari script. These forms are glyphs, not characters. Not every writing system that employs this script uses all of these forms; in particular, many of these forms are used only in writing Sanskrit texts. Furthermore, individual fonts may provide fewer or more ligature forms than are depicted here.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
310
South Asian Scripts-I
Table 9-3. Sample Devanagari Ligatures
∑+ ∑+ ∑+ ∑+ æ+ æ+ æ+ æ+ ƒ+ ¡+ Œ+ Œ+ Œ+ Œ+ Œ+ Œ+ Œ+ Œ+ ≈+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
∑→ Ã→ ⁄ → ·→ ∑→ π→ ª→ Ω→ ¡→ ƒ→ Ω→ Œ→ œ→ ’→ ÷→ ◊→ ÿ→ fl→ ≈→
P Q R S V W X Y ¨ Æ f g h i j k l m ^
≈+ ∆+ «+ «+ «+ Ã+ Ã+ –+ ”+ ‡+ „+ „+ „+ „+ „+ ⁄ + ⁄ + ⁄ + ‚+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
∆ ∆ ª « … Ã ⁄ – ⁄ ⁄ ◊ ÿ ‹ fl
→ → → → → → → → → → → → → →
A
→
B
→
C
→
A
→
0
+
d
→
_ n ` a b c d Ÿ p o r s t u N L M D ù
Sample Half-Ligature Forms. In addition to half-form glyphs of individual consonants, half-forms are used to depict conjunct ligature forms. A sample of such forms is shown in Table 9-4. These forms are glyphs, not characters. They may be encoded explicitly using zero width joiner as shown. In normal conjunct formation, they may be used spontaneously to depict a conjunct ligature in combination with subsequent consonant forms.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
9.1
Devanagari
311
Table 9-4. Sample Devanagari Half-Ligature Forms
∑+ ¡+ Ã+ Ã+ ‡+
0
+
0
+
0
+
0
+
0
+
·+ ƒ+ Ã+ ⁄+ ⁄+
0
+
Ä →
0
+
Ä →
0
+
Ä →
0
+
Ä →
0
+
Ä →
ó ô û ò ü
Language-Specific Allographs. In Marathi and some South Indian orthographies, variant glyphs are preferred for U+0932 devanagari letter la and U+0936 devanagari letter sha, as shown in Figure 9-9. Marathi also makes use of the “eyelash” form of the letter RA, as discussed in rule R5.
Figure 9-9. Marathi Allographs Normal
Marathi
LA
Normal
Marathi
SHA U+0932
U+0936
Combining Marks. Devanagari and other Indic scripts have a number of combining marks that could be considered diacritic. One class of these marks, known as bindus, is represented by U+0901 devanagari sign candrabindu and U+0902 devanagari sign anusvara. These marks indicate nasalization or final nasal closure of a syllable. U+093C devanagari sign nukta is a true diacritic. It is used to extend the basic set of consonant letters by modifying them (with a subscript dot in Devanagari) to create new letters. U+0951..U+0954 are a set of combining marks used in transcription of Sanskrit texts. Digits. Each Indic script has a distinct set of digits appropriate to that script. These digits may or may not be used in ordinary text in that script. European digits have displaced the Indic script forms in modern usage in many of the scripts. Some Indic scripts—notably Tamil—lack a distinct digit for zero. Punctuation and Symbols. U+0964 1 devanagari danda is similar to a full stop. U+0965 2 devanagari double danda marks the end of a verse in traditional texts. The term danda is from Sanskrit, and the punctuation mark is generally referred to as a viram instead in Hindi. Although the danda and double danda are encoded in the Devanagari block, the intent is that they be used as common punctuation for all the major scripts of India covered by this chapter. Danda and double danda punctuation marks are not separately encoded for
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
312
South Asian Scripts-I
Bengali, Gujarati, and so on. However, analogous punctuation marks for other Brahmiderived scripts are separately encoded, particularly for scripts used primarily outside of India. Many modern languages written in the Devanagari script intersperse punctuation derived from the Latin script. Thus U+002C comma and U+002E full stop are freely used in writing Hindi, and the danda is usually restricted to more traditional texts. However, the danda may be preserved when such traditional texts are transliterated into the Latin script. U+0970 3 devanagari abbreviation sign appears after letters or combinations of letters and marks the sequence as an abbreviation. Encoding Structure. The Unicode Standard organizes the nine principal Indic scripts in blocks of 128 encoding points each. The first six columns in each script are isomorphic with the ISCII-1988 encoding, except that the last 11 positions (U+0955.. U+095F in Devanagari, for example), which are unassigned or undefined in ISCII-1988, are used in the Unicode encoding. The seventh column in each of these scripts, along with the last 11 positions in the sixth column, represent additional character assignments in the Unicode Standard that are matched across all nine scripts. For example, positions U+xx66..U+xx6F and U+xxE6 .. U+xxEF code the Indic script digits for each script. The eighth column for each script is reserved for script-specific additions that do not correspond from one Indic script to the next. Other Languages. The characters U+097B devanagari letter gga, U+097C devanagari letter jja, U+097E devanagari letter ddda, and U+097F devanagari letter bba are used to write Sindhi implosive consonants. Previous versions of the Unicode Standard recommended representing those characters as a combination of the usual consonants with nukta and anudatta, but those combinations are no longer recommended. Konkani makes use of additional sounds that can be represented with combinations such as U+091A devanagari letter ca plus U+093C devanagari sign nukta and U+091F devanagari letter tta plus U+0949 devanagari vowel sign candra o.
9.2 Bengali Bengali: U+0980–U+09FF The Bengali script is a North Indian script closely related to Devanagari. It is used to write the Bengali language primarily in the West Bengal state and in the nation of Bangladesh. It is also used to write Assamese in Assam and a number of other minority languages, such as Bishnupriya Manipuri, Daphla, Garo, Hallam, Khasi, Mizo, Munda, Naga, Rian, and Santali, in northeastern India. Virama (Hasant). The Bengali script uses the Unicode virama model to form conjunct consonants. In Bengali, the virama is known as hasant.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
9.2
Bengali
313
Vowel Letters. Vowel letters are encoded atomically in Unicode, even if they can be analyzed visually as consisting of multiple parts. Table 9-5 shows the letters that can be analyzed, the single code point that should be used to represent them in text, and the sequence of code points resulting from analysis that should not be used.
Table 9-5. Bengali Vowel Letters To Represent
X
Use
Do Not Use
0986
<0985, 09BE>
Two-Part Vowel Signs. The Bengali script, along with a number of other Indic scripts, makes use of two-part vowel signs. In these vowels one-half of the vowel is placed on each side of a consonant letter or cluster—for example, U+09CB bengali vowel sign o and U+09CC bengali vowel sign au. The vowel signs are coded in each case in the position in the charts isomorphic with the corresponding vowel in Devanagari. Hence U+09CC bengali vowel sign au is isomorphic with U+094C devanagari vowel sign au. To provide compatibility with existing implementations of the scripts that use two-part vowel signs, the Unicode Standard explicitly encodes the right half of these vowel signs. For example, U+09D7 bengali au length mark represents the right-half glyph component of U+09CC bengali vowel sign au. Special Characters. U+09F2..U+09F9 are a series of Bengali additions for writing currency and fractions. Rendering Behavior. Like other Brahmic scripts in the Unicode Standard, Bengali uses the hasant to form conjunct characters. For example, U+0995 $ bengali letter ka + U+09CD z bengali sign virama + U+09B7 % bengali letter ssa yields the conjunct & KSSA, which is pronounced khya in Assamese. For general principles regarding the rendering of the Bengali script, see the rules for rendering in Section 9.1, Devanagari. Consonant-Vowel Ligatures. Some Bengali consonant plus vowel combinations have two distinct visual presentations. The first visual presentation is a traditional ligated form, in which the vowel combines with the consonant in a novel way. In the second presentation, the vowel is joined to the consonant but retains its nominal form, and the combination is not considered a ligature. These consonant-vowel combinations are illustrated in Table 9-6. The ligature forms of these consonant-vowel combinations are traditional. They are used in handwriting and some printing. The “non-ligated” forms are more common; they are used in newspapers and are associated with modern typefaces. However, the traditional ligatures are preferred in some contexts. No semantic distinctions are made in Bengali text on the basis of the two different presentations of these consonant-vowel combinations. However, some users consider it important that implementations support both forms and that the distinction be representable in plain text. This may be accomplished by using U+200D zero width joiner and U+200C zero width non-joiner to influence ligature glyph selection. (See “Cursive Connection and Ligatures” in Section 16.2, Layout Controls.)
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
314
South Asian Scripts-I
Table 9-6. Bengali Consonant-Vowel Combinations Code Points
Ligated
Non-ligated
gu <0997, 09C1> ru <09B0, 09C1> ru¯ <09B0, 09C2> ´ su
<09B6, 09C1>
hu <09B9, 09C1> hr
<09B9, 09C3>
A given font implementation can choose whether to treat the ligature forms of the consonant-vowel combinations as the defaults for rendering. If the non-ligated form is the default, then ZWJ can be inserted to request a ligature, as shown in Figure 9-10.
Figure 9-10. Requesting Bengali Consonant-Vowel Ligature
0997
0997
09C1
200D
09C1
If the ligated form is the default for a given font implementation, then ZWNJ can be inserted to block a ligature, as shown in Figure 9-11.
Figure 9-11. Blocking Bengali Consonant-Vowel Ligature
0997
0997
09C1
200C
09C1
Khanda Ta. In Bengali, a dead consonant ta makes use of a special form, U+09CE bengali letter khanda ta. This form is used in all contexts except where it is immediately followed by one of the consonants: ta, tha, na, ba, ma, ya, or ra.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html
Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html
Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html
This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006
Chapter 10
South Asian Scripts-II
10
This chapter documents scripts of South Asia aside from the major official scripts of India, which are documented in Chapter 9, South Asian Scripts-I. The following South Asian scripts are described in this chapter: Sinhala
Phags-pa
Syloti Nagri
Tibetan
Limbu
Kharoshthi
Sinhala has a virama-based model, but is not structurally mapped to ISCII. Tibetan stands apart, using a subjoined consonant model for conjoined consonants, reflecting its somewhat different structure and usage. Phags-pa is a historical script related to Tibetan that was created as the national script of the Mongol empire. Even though Phags-pa was used mostly in Eastern and Central Asia for writing text in the Mongolian and Chinese languages, it is discussed in this chapter because of its close historical connection to the Tibetan script. The Limbu script makes use of an explicit encoding of syllable-final consonants. Syloti Nagri is used to write the modern Sylheti language of northeast Bangladesh. The oldest lengthy inscriptions of India, the edicts of Ashoka from the third century bce, were written in two scripts, Kharoshthi and Brahmi. These are both ultimately of Semitic origin, probably deriving from Aramaic, which was an important administrative language of the Middle East at that time. Kharoshthi, which was written from right to left, was supplanted by Brahmi and its derivatives.
10.1 Sinhala Sinhala: U+0D80–U+0DFF The Sinhala script, also known as Sinhalese, is used to write the Sinhala language, the majority language of Sri Lanka. It is also used to write the Pali and Sanskrit languages. The script is a descendant of Brahmi and resembles the scripts of South India in form and structure.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
342
South Asian Scripts-II
Sinhala differs from other languages of the region in that it has a series of prenasalized stops that are distinguished from the combination of a nasal followed by a stop. In other words, both forms occur and are written differently—for example, AB a8}a [a:;a] “sound” versus ACDE aV}a [a9;a] “egg.” In addition, Sinhala has separate distinct signs for both a short and a long low front vowel sounding similar to the initial vowel of the English word “apple,” usually represented in IPA as U+00E6 æ latin small letter ae (ash). The independent forms of these vowels are encoded at U+0D87 and U+0D88; the corresponding dependent forms are U+0DD0 and U+0DD1. Because of these extra letters, the encoding for Sinhala does not precisely follow the pattern established for the other Indic scripts (for example, Devanagari). It does use the same general structure, making use of phonetic order, matra reordering, and use of the virama (U+0DCA sinhala sign al-lakuna) to indicate conjunct consonant clusters. Sinhala does not use half-forms in the Devanagari manner, but does use many ligatures. Vowel Letters. Vowel letters are encoded atomically in Unicode, even if they can be analyzed visually as consisting of multiple parts. Table 10-1 shows the letters that can be analyzed, the single code point that should be used to represent them in text, and the sequence of code points resulting from analysis that should not be used.
Table 10-1. Sinhala Vowel Letters To Represent
1 á 3 å é 6 í 8 ñ
Use
Do Not Use
0D86
<0D85, 0DCF>
0D87
<0D85, 0DD0>
0D88
<0D85, 0DD1>
0D8C
<0D8B, 0DDF>
0D8E
<0D8D, 0DD8>
0D90
<0D8F, 0DDF>
0D92
<0D91, 0DCA>
0D93
<0D91, 0DD9>
0D96
<0D94, 0DDF>
Other Letters for Tamil. The Sinhala script may also be used to write Tamil. In this case, some additional combinations may be required. Some letters, such as U+0DBB sinhala letter rayanna and U+0DB1 sinhala letter dantaja nayanna, may be modified by adding the equivalent of a nukta. There is, however, no nukta presently encoded in the Sinhala block. Historical Symbols. Neither U+0DF4 w sinhala punctuation kunddaliya nor the Sinhala numerals are in general use today, having been replaced by Western-style punctua-
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
10.2
Tibetan
343
tion and Western digits. The kunddaliya was formerly used as a full stop or period. It is included for scholarly use. The Sinhala numerals are not presently encoded.
10.2 Tibetan Tibetan: U+0F00–U+0FFF The Tibetan script is used for writing Tibetan in several countries and regions throughout the Himalayas. Aside from Tibet itself, the script is used in Ladakh, Nepal, and northern areas of India bordering Tibet where large Tibetan-speaking populations now reside. The Tibetan script is also used in Bhutan to write Dzongkha, the official language of that country. In addition, Tibetan is used as the language of philosophy and liturgy by Buddhist traditions spread from Tibet into the Mongolian cultural area that encompasses Mongolia, Buriatia, Kalmykia, and Tuva. The Tibetan scripting and grammatical systems were originally defined together in the sixth century by royal decree when the Tibetan King Songtsen Gampo sent 16 men to India to study Indian languages. One of those men, Thumi Sambhota, is credited with creating the Tibetan writing system upon his return, having studied various Indic scripts and grammars. The king’s primary purpose was to bring Buddhism from India to Tibet. The new script system was therefore designed with compatibility extensions for Indic (principally Sanskrit) transliteration so that Buddhist texts could be represented properly. Because of this origin, over the last 1,500 years the Tibetan script has been widely used to represent Indic words, a number of which have been adopted into the Tibetan language retaining their original spelling. A note on Latin transliteration: Tibetan spelling is traditional and does not generally reflect modern pronunciation. Throughout this section, Tibetan words are represented in italics when transcribed as spoken, followed at first occurrence by a parenthetical transliteration; in these transliterations, the presence of the tsek (tsheg) character is expressed with a hyphen. Thumi Sambhota’s original grammar treatise defined two script styles. The first, called uchen (dbu-can, “with head”), is a formal “inscriptional capitals” style said to be based on an old form of Devanagari. It is the script used in Tibetan xylograph books and the one used in the coding tables. The second style, called u-mey (dbu-med, or “headless”), is more cursive and said to be based on the Wartu script. Numerous styles of u-mey have evolved since then, including both formal calligraphic styles used in manuscripts and running handwriting styles. All Tibetan scripts follow the same lettering rules, though there is a slight difference in the way that certain compound stacks are formed in uchen and u-mey. General Principles of the Tibetan Script. Tibetan grammar divides letters into consonants and vowels. There are 30 consonants, and each consonant is represented by a discrete written character. There are five vowel sounds, only four of which are represented by written marks. The four vowels that are explicitly represented in writing are each represented with
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
344
South Asian Scripts-II
a single mark that is applied above or below a consonant to indicate the application of that vowel to that consonant. The absence of one of the four marks implies that the first vowel sound (like a short “ah” in English) is present and is not modified to one of the four other possibilities. Three of the four marks are written above the consonants; one is written below. Each word in Tibetan has a base or root consonant. The base consonant can be written singly or it can have other consonants added above or below it to make a vertically “stacked” letter. Tibetan grammar contains a very complete set of rules regarding letter gender, and these rules dictate which letters can be written in adjacent positions. The rules therefore dictate which combinations of consonants can be joined to make stacks. Any combination not allowed by the gender rules does not occur in native Tibetan words. However, when transcribing other languages (for example, Sanskrit, Chinese) into Tibetan, these rules do not operate. In certain instances other than transliteration, any consonant may be combined with any other subjoined consonant. Implementations should therefore be prepared to accept and display any combinations. For example, the syllable spyir “general,” pronounced [t"í#], is a typical example of a Tibetan syllable that includes a stack comprising a head letter, two subscript letters, and a vowel sign. Figure 10-1 shows the characters in the order in which they appear in the backing store.
Figure 10-1. Tibetan Syllable Structure
U+0F66 TIBETAN LETTER SA U+0FA4 TIBETAN SUBJOINED LETTER PA U+0FB1 TIBETAN SUBJOINED LETTER YA U+0F72 TIBETAN VOWEL SIGN I U+0F62 TIBETAN LETTER RA U+0F0B TIBETAN MARK INTERSYLLABIC TSHEG
The model adopted to encode the Tibetan lettering set described above contains the following groups of items: Tibetan consonants, vowels, numerals, punctuation, ornamental signs and marks, and Tibetan-transliterated Sanskrit consonants and vowels. Each of these will be described in this section. Both in this description and in Tibetan, the terms “subjoined” (-btags) and “head” (-mgo) are used in different senses. In the structural sense, they indicate specific slots defined in native Tibetan orthography. In spatial terms, they refer to the position in the stack; anything in the topmost position is “head,” anything not in the topmost position is “subjoined.” Unless explicitly qualified, the terms “subjoined” and “head” are used here in their spatial sense. For example, in a conjunct like “rka,” the letter in the root slot is “KA.” Because it is not the topmost letter of the stack, however, it is expressed with a subjoined
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
10.2
Tibetan
345
character code, while “RA”, which is structurally in the head slot, is expressed with a nominal character code. In a conjunct “kra,” in which the root slot is also occupied with “KA”, the “KA” is encoded with a nominal character code because it is in the topmost position in the stack. The Tibetan script has its own system of formatting, and details of that system relevant to the characters encoded in this standard are explained herein. However, an increasing number of publications in Tibetan do not strictly adhere to this original formatting system. This change is due to the partial move from publishing on long, horizontal, loose-leaf folios, to publishing in vertically oriented, bound books. The Tibetan script also has a punctuation set designed to meet needs quite different from the punctuation that has evolved for Western scripts. With the appearance of Tibetan newspapers, magazines, school textbooks, and Western-style reference books in the last 20 or 30 years, Tibetans have begun using things like columns, indented blocks of text, Western-style headings, and footnotes. Some Western punctuation marks, including brackets, parentheses, and quotation marks, are becoming commonplace in these kinds of publication. With the introduction of more sophisticated electronic publishing systems, there is also a renaissance in the publication of voluminous religious and philosophical works in the traditional horizontal, loose-leaf format—many set in digital typefaces closely conforming to the proportions of traditional hand-lettered text. Consonants. The system described here has been devised to encode the Tibetan system of writing consonants in both single and stacked forms. All of the consonants are encoded a first time from U+0F40 through U+0F69. There are the basic Tibetan consonants and, in addition, six compound consonants used to represent the Indic consonants gha, jha, d.ha, dha, bha, and ksh.a. These codes are used to represent occurrences of either a stand-alone consonant or a consonant in the head position of a vertical stack. Glyphs generated from these codes will always sit in the normal position starting at and dropping down from the design baseline. All of the consonants are then encoded a second time. These second encodings from U+0F90 through U+0FB9 represent consonants in subjoined stack position. To represent a single consonant in a text stream, one of the first “nominal” set of codes is placed. To represent a stack of consonants in the text stream, a “nominal” consonant code is followed directly by one or more of the subjoined consonant codes. The stack so formed continues for as long as subjoined consonant codes are contiguously placed. This encoding method was chosen over an alternative method that would have involved a virama-based encoding, such as Devanagari. There were two main reasons for this choice. First, the virama is not normally used in the Tibetan writing system to create letter combinations. There is a virama in the Tibetan script, but only because of the need to represent Devanagari; called “srog-med”, it is encoded at U+0F84 tibetan mark halanta. The virama is never used in writing Tibetan words and can be—but almost never is—used as a substitute for stacking in writing Sanskrit mantras in the Tibetan script. Second, there is a prevalence of stacking in native Tibetan, and the model chosen specifically results in decreased data storage requirements. Furthermore, in languages other than Tibetan, there
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
346
South Asian Scripts-II
are many cases where stacks occur that do not appear in Tibetan-language texts; it is thus imperative to have a model that allows for any consonant to be stacked with any subjoined consonant(s). Thus a model for stack building was chosen that follows the Tibetan approach to creating letter combinations, but is not limited to a specific set of the possible combinations. Vowels. Each of the four basic Tibetan vowel marks is coded as a separate entity. These code points are U+0F72, U+0F74, U+0F7A, and U+0F7C. For compatibility, a set of several compound vowels for Sanskrit transcription is also provided in the other code points between U+0F71 and U+0F7D. Most Tibetan users do not view these compound vowels as single characters, and their use is limited to Sanskrit words. It is acceptable for users to enter these compounds as a series of simpler elements and have software render them appropriately. Canonical equivalences are specified for all of these code points except U+0F77 and U+0F79. All vowel signs are nonspacing marks above or below a stack of consonants, sometimes on both sides. A stand-alone consonant or a stack of consonants can have a vowel sign applied to it. In accordance with the rules of Tibetan writing, a code for a vowel sign applied to a consonant should always be placed after the bare consonant or the stack of consonants formed by the method just described. All of the symbols and punctuation marks have straightforward encodings. Further information about many of them appears later in this section. Coding Order. In general, the correct coding order for a stream of text will be the same as the order in which Tibetans spell and in which the characters of the text would be written by hand. For example, the correct coding order for the most complex Tibetan stack would be head position consonant first subjoined consonant ... (intermediate subjoined consonants, if any) last subjoined consonant subjoined vowel a-chung (U+0F71) standard or compound vowel sign, or virama Where used, the character U+0F39 tibetan mark tsa -phru occurs immediately after the consonant it modifies. Allographical Considerations. When consonants are combined to form a stack, one of them retains the status of being the principal consonant in the stack. The principal consonant always retains its stand-alone form. However, consonants placed in the “head” and “subjoined” positions to the main consonant sometimes retain their stand-alone forms and sometimes are given new, special forms. Because of this fact, certain consonants are given a further, special encoding treatment—namely, “wa” (U+0F5D), “ya” (U+0F61), and “ra” (U+0F62).
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
10.2
Tibetan
347
Head Position “ra”. When the consonant “ra” is written in the “head” position (ra-mgo, pronounced ra-go) at the top of a stack in the normal Tibetan-defined lettering set, the shape of the consonant can change. This is called ra-go (ra-mgo). It can either be a fullform shape or the full-form shape but with the bottom stroke removed (looking like a short-stemmed letter “T”). This requirement of “ra” in the head position where the glyph representing it can change shape is correctly coded by using the stand-alone “ra” consonant (U+0F62) followed by the appropriate subjoined consonant(s). For example, in the normal Tibetan ra-mgo combinations, the “ra” in the head position is mostly written as the half-ra but in the case of “ra + subjoined nya” must be written as the full-form “ra”. Thus the normal Tibetan ra-mgo combinations are correctly encoded with the normal “ra” consonant (U+0F62) because it can change shape as required. It is the responsibility of the font developer to provide the correct glyphs for representing the characters where the “ra” in the head position will change shape—for example, as in “ra + subjoined nya”. Full-Form “ra” in Head Position. Some instances of “ra” in the head position require that the consonant be represented as a full-form “ra” that never changes. This is not standard usage for the Tibetan language itself, but rather occurs in transliteration and transcription. Only in these cases should the character U+0F6A tibetan letter fixed-form ra be used instead of U+0F62 tibetan letter ra. This “ra” will always be represented as a full-form “ra consonant” and will never change shape to the form where the lower stroke has been cut off. For example, the letter combination “ra + ya”, when appearing in transliterated Sanskrit works, is correctly written with a full-form “ra” followed by either a modified subjoined “ya” form or a full-form subjoined “ya” form. Note that the fixed-form “ra” should be used only in combinations where “ra” would normally transform into a short form but the user specifically wants to prevent that change. For example, the combination “ra + subjoined nya” never requires the use of fixed-form “ra”, because “ra” normally retains its full glyph form over “nya”. It is the responsibility of the font developer to provide the appropriate glyphs to represent the encodings. Subjoined Position “wa”, “ya”, and “ra”. All three of these consonants can be written in subjoined position to the main consonant according to normal Tibetan grammar. In this position, all of them change to a new shape. The “wa” consonant when written in subjoined position is not a full “wa” letter any longer but is literally the bottom-right corner of the “wa” letter cut off and appended below it. For that reason, it is called a wazur (wa-zur, or “corner of a wa”) or, less frequently but just as validly, wa-ta (wa-btags) to indicate that it is a subjoined “wa”. The consonants “ya” and “ra” when in the subjoined position are called ya-ta (ya-btags) and ra-ta (ra-btags), respectively. To encode these subjoined consonants that follow the rules of normal Tibetan grammar, the shape-changed, subjoined forms U+0FAD tibetan subjoined letter wa, U+0FB1 tibetan subjoined letter ya, and U+0FB2 tibetan subjoined letter ra should be used. All three of these subjoined consonants also have full-form non-shape-changing counterparts for the needs of transliterated and transcribed text. For this purpose, the full subjoined consonants that do not change shape (encoded at U+0FBA, U+0FBB, and U+0FBC, respectively) are used where necessary. The combinations of “ra + ya” are a good example
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
348
South Asian Scripts-II
because they include instances of “ra” taking a short (ya-btags) form and “ra” taking a fullform subjoined “ya”. U+0FB0 tibetan subjoined letter -a (a-chung) should be used only in the very rare cases where a full-sized subjoined a-chung letter is required. The small vowel lengthening a-chung encoded as U+0F71 tibetan vowel sign aa is far more frequently used in Tibetan text, and it is therefore recommended that implementations treat this character (rather than U+0FB0) as the normal subjoined a-chung. Halanta (Srog-Med). Because two sets of consonants are encoded for Tibetan, with the second set providing explicit ligature formation, there is no need for a “dead character” in Tibetan. When a halanta (srog-med) is used in Tibetan, its purpose is to suppress the inherent vowel “a”. If anything, the halanta should prevent any vowel or consonant from forming a ligature with the consonant preceding the halanta. In Tibetan text, this character should be displayed beneath the base character as a combining glyph and not used as a (purposeless) dead character. Line Breaking Considerations. Tibetan text separates units called natively tsek-bar (“tshegbar”), an inexact translation of which is “syllable.” Tsek-bar is literally the unit of text between tseks and is generally a consonant cluster with all of its prefixes, suffixes, and vowel signs. It is not a “syllable” in the English sense. Tibetan script has only two break characters. The primary break character is the standard interword tsek (tsheg), which is encoded at U+0F0B. The second break character is the space. Space or tsek characters in a stream of Tibetan text are not always break characters and so need proper contextual handling. The primary delimiter character in Tibetan text is the tsek (U+0F0B tibetan mark intersyllabic tsheg). In general, automatic line breaking processes may break after any occurrence of this tsek, except where it follows a U+0F44 tibetan letter nga (with or without a vowel sign) and precedes a shay (U+0F0D), or where Tibetan grammatical rules do not permit a break. (Normally, tsek is not written before shay except after “nga”. This type of tsek-after-nga is called “nga-phye-tsheg” and may be expressed by U+0F0B or by the special character U+0F0C, a nonbreaking form of tsek.) The Unicode names for these two types of tsek are misnomers, retained for compatibility. The standard tsek U+0F0B tibetan mark intersyllabic tsheg is always required to be a potentially breaking character, whereas the “nga-phye-tsheg” is always required to be a nonbreaking tsek. U+0F0C tibetan mark delimiter tsheg bstar is specifically not a “delimiter” and is not for general use. There are no other break characters in Tibetan text. Unlike English, Tibetan has no system for hyphenating or otherwise breaking a word within the group of letters making up the word. Tibetan text formatting does not allow text to be broken within a word. Whitespace appears in Tibetan text, although it should be represented by U+00A0 nobreak space instead of U+0020 space. Tibetan text breaks lines after tsek instead of at whitespace.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
10.2
Tibetan
349
Complete Tibetan text formatting is best handled by a formatter in the application and not just by the code stream. If the interword and nonbreaking tseks are properly employed as breaking and nonbreaking characters, respectively, and if all spaces are nonbreaking spaces, then any application will still wrap lines correctly on that basis, even though the breaks might be sometimes inelegant. Tibetan Punctuation. The punctuation apparatus of Tibetan is relatively limited. The principal punctuation characters are the tsek; the shay (transliterated “shad”), which is a vertical stroke used to mark the end of a section of text; the space used sparingly as a space; and two of several variant forms of the shay that are used in specialized situations requiring a shay. There are also several other marks and signs but they are sparingly used. The shay at U+0F0D marks the end of a piece of text called “tshig-grub”. The mode of marking bears no commonality with English phrases or sentences and should not be described as a delimiter of phrases. In Tibetan grammatical terms, a shay is used to mark the end of an expression (“brjod-pa”) and a complete expression. Two shays are used at the end of whole topics (“don-tshan”). Because some writers use the double shay with a different spacing than would be obtained by coding two adjacent occurrences of U+0F0D, the double shay has been coded at U+0F0E with the intent that it would have a larger spacing between component shays than if two shays were simply written together. However, most writers do not use an unusual spacing between the double shay, so the application should allow the user to write two U+0F0D codes one after the other. Additionally, font designers will have to decide whether to implement these shays with a larger than normal gap. The U+0F11 rin-chen-pung-shay (rin-chen-spungs-shad) is a variant shay used in a specific “new-line” situation. Its use was not defined in the original grammars but Tibetan tradition gives it a highly defined use. The drul-shay (“sbrul-shad”) is likewise not defined by the original grammars but has a highly defined use; it is used for separating sections of meaning that are equivalent to topics (“don-tshan”) and subtopics. A drul-shay is usually surrounded on both sides by the equivalent of about three spaces (though no rule is specified). Hard spaces will be needed for these instances because the drul-shay should not appear at the beginning of a new line and the whole structure of spacing-plus-shay should not be broken up, if possible. Tibetan texts use a yig-go (“head mark,” yig-mgo) to indicate the beginning of the front of a folio, there being no other certain way, in the loose-leaf style of traditional Tibetan books, to tell which is the front of a page. The head mark can and does vary from text to text; there are many different ways to write it. The common type of head mark has been provided for with U+0F04 tibetan mark initial yig mgo mdun ma and its extension U+0F05 tibetan mark closing yig mgo sgab ma. An initial mark yig-mgo can be written alone or combined with as many as three closing marks following it. When the initial mark is written in combination with one or more closing marks, the individual parts of the whole must stay in proper registration with each other to appear authentic. Therefore, it is strongly recommended that font developers create precomposed ligature glyphs to represent the various combinations of these two characters. The less common head marks mainly appear in Nyingmapa and Bonpo literature. Three of these head marks have been provided for with U+0F01, U+0F02, and U+0F03; however, many others have not been encoded. Font devel-
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
350
South Asian Scripts-II
opers will have to deal with the fact that many types of head marks in use in this literature have not been encoded, cannot be represented by a replacement that has been encoded, and will be required by some users. Two characters, U+0F3C tibetan mark ang khang gyon and U+0F3D tibetan mark ang khang gyas, are paired punctuation; they are typically used together to form a roof over one or more digits or words. In this case, kerning or special ligatures may be required for proper rendering. The right ang khang may also be used much as a single closing parenthesis is used in forming lists; again, special kerning may be required for proper rendering. The marks U+0F3E tibetan sign yar tshes and U+0F3F tibetan sign mar tshes are paired signs used to combine with digits; special glyphs or compositional metrics are required for their use. A set of frequently occurring astrological and religious signs specific to Tibetan is encoded between U+0FBE and U+0FCF. U+0F34, which means “et cetera” or “and so on,” is used after the first few tsek-bar of a recurring phrase. U+0FBE (often three times) indicates a refrain. U+0F36 and U+0FBF are used to indicate where text should be inserted within other text or as references to footnotes or marginal notes. Other Characters. The Wheel of Dharma, which occurs sometimes in Tibetan texts, is encoded in the Miscellaneous Symbols block at U+2638. Left-facing and right-facing swastika symbols are likewise used. They are found among the Chinese ideographs at U+534D (“yung-drung-chi-khor”) and U+5350 (“yung-drungnang-khor”). The marks U+0F35 tibetan mark ngas bzung nyi zla and U+0F37 tibetan mark ngas bzung sgor rtags conceptually attach to a tsek-bar rather than to an individual character and function more like attributes than characters—for example, as underlining to mark or emphasize text. In Tibetan interspersed commentaries, they may be used to tag the tsek-bar belonging to the root text that is being commented on. The same thing is often accomplished by setting the tsek-bar belonging to the root text in large type and the commentary in small type. Correct placement of these glyphs may be problematic. If they are treated as normal combining marks, they can be entered into the text following the vowel signs in a stack; if used, their presence will need to be accounted for by searching algorithms, among other things. Tibetan Half-Numbers. The half-number forms (U+0F2A..U+0F33) are peculiar to Tibetan, though other scripts (for example, Bengali) have similar fractional concepts. The value of each half-number is 0.5 less than the number within which it appears. These forms are used only in some traditional contexts and appear as the last digit of a multidigit number. For example, the sequence of digits “U+0F24 U+0F2C” represents the number 42.5 or forty-two and one-half. Tibetan Transliteration and Transcription of Other Languages. Tibetan traditions are in place for transliterating other languages. Most commonly, Sanskrit has been the language
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
10.2
Tibetan
351
being transliterated, although Chinese has become more common in modern times. Additionally, Mongolian has a transliterated form. There are even some conventions for transliterating English. One feature of Tibetan script/grammar is that it allows for totally accurate transliteration of Sanskrit. The basic Tibetan letterforms and punctuation marks contain most of what is needed, although a few extra things are required. With these additions, Sanskrit can be transliterated perfectly into Tibetan, and the Tibetan transliteration can be rendered backward perfectly into Sanskrit with no ambiguities or difficulties. The six Sanskrit retroflex letters are interleaved among the other consonants. The compound Sanskrit consonants are not included in normal Tibetan. They could be made using the method described earlier for Tibetan stacked consonants, generally by subjoining “ha”. However, to maintain consistency in transliterated texts and for ease in transmission and searching, it is recommended that implementations of Sanskrit in the Tibetan script use the precomposed forms of aspirated letters (and U+0F69, “ka + reversed sha”) whenever possible, rather than implementing these consonants as completely decomposed stacks. Implementations must ensure that decomposed stacks and precomposed forms are interpreted equivalently (see Section 3.7, Decomposition). The compound consonants are explicitly coded as follows: U+0F93 tibetan subjoined letter gha, U+0F9D tibetan subjoined letter ddha, U+0FA2 tibetan subjoined letter dha, U+0FA7 tibetan subjoined letter bha, U+0FAC tibetan subjoined letter dzha, and U+0FB9 tibetan subjoined letter kssa. The vowel signs of Sanskrit not included in Tibetan are encoded with other vowel signs between U+0F70 and U+0F7D. U+0F7F tibetan sign rnam bcad (nam chay) is the visarga, and U+0F7E tibetan sign rjes su nga ro (ngaro) is the anusvara. See Section 9.1, Devanagari, for more information on these two characters. The characters encoded in the range U+0F88..U+0F8B are used in transliterated text and are most commonly found in Kalachakra literature. When the Tibetan script is used to transliterate Sanskrit, consonants are sometimes stacked in ways that are not allowed in native Tibetan stacks. Even complex forms of this stacking behavior are catered for properly by the method described earlier for coding Tibetan stacks. Other Signs. U+0F09 tibetan mark bskur yig mgo is a list enumerator used at the beginning of administrative letters in Bhutan, as is the petition honorific U+0F0A tibetan mark bka- shog yig mgo. U+0F3A tibetan mark gug rtags gyon and U+0F3B tibetan mark gug rtags gyas are paired punctuation marks (brackets). The sign U+0F39 tibetan mark tsa -phru (tsa-’phru, which is a lenition mark) is the ornamental flaglike mark that is an integral part of the three consonants U+0F59 tibetan letter tsa, U+0F5A tibetan letter tsha, and U+0F5B tibetan letter dza. Although those consonants are not decomposable, this mark has been abstracted and may by itself be applied to “pha” and other consonants to make new letters for use in transliteration and transcription of other languages. For example, in modern literary Tibetan, it is one of the
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
352
South Asian Scripts-II
ways used to transcribe the Chinese “fa” and “va” sounds not represented by the normal Tibetan consonants. Tsa-’phru is also used to represent tsa, tsha, and dza in abbreviations. Traditional Text Formatting and Line Justification. Native Tibetan texts (“pecha”) are written and printed using a justification system that is, strictly speaking, right-ragged but with an attempt to right-justify. Each page has a margin. That margin is usually demarcated with visible border lines required of a pecha. In modern times, when Tibetan text is produced in Western-style books, the margin lines may be dropped and an invisible margin used. When writing the text within the margins, an attempt is made to have the lines of text justified up to the right margin. To do so, writers keep an eye on the overall line length as they fill lines with text and try manually to justify to the right margin. Even then, a gap at the right margin often cannot be filled. If the gap is short, it will be left as is and the line will be said to be justified enough, even though by machine-justification standards the line is not truly flush on the right. If the gap is large, the intervening space will be filled with as many tseks as are required to justify the line. Again, the justification is not done perfectly in the way that English text might be perfectly right-justified; as long as the last tsek is more or less at the right margin, that will do. The net result is that of a right-justified, blocklike look to the text, but the actual lines are always a little right-ragged. Justifying tseks are nearly always used to pad the end of a line when the preceding character is a tsek—in other words, when the end of a line arrives in the middle of tshig-grub (see the previous definition under “Tibetan Punctuation”). However, it is unusual for a line that ends at the end of a tshig-grub to have justifying tseks added to the shay at the end of the tshig-grub. That is, a sequence like that shown in the first line of Figure 10-2 is not usually padded as in the second line of Figure 10-2, though it is allowable. In this case, instead of justifying the line with tseks, the space between shays is enlarged and/or the whitespace following the final shay is usually left as is. Padding is never applied following an actual space character. For example, given the existence of a space after a shay, a line such as the third line of Figure 10-2 may not be written with the padding as shown because the final shay should have a space after it, and padding is never applied after spaces. The same rule applies where the final consonant of a tshig-grub that ends a line is a “ka” or “ga”. In that case, the ending shay is dropped but a space is still required after the consonant and that space must not be padded. For example, the appearance shown in the fourth line of Figure 10-2 is not acceptable.
Figure 10-2. Justifying Tibetan Tseks
1 2 3 4 Tibetan text has two rules regarding the formatting of text at the beginning of a new line. There are severe constraints on which characters can start a new line, and the first rule is
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
10.3
Phags-pa
353
traditionally stated as follows: A shay of any description may never start a new line. Nothing except actual words of text can start a new line, with the only exception being a go-yig (yig-mgo) at the head of a front page or a da-tshe (zla-tshe, meaning “crescent moon”—for example, U+0F05) or one of its variations, which is effectively an “in-line” go-yig (yigmgo), on any other line. One of two or three ornamental shays is also commonly used in short pieces of prose in place of the more formal da-tshe. This also means that a space may not start a new line in the flow of text. If there is a major break in a text, a new line might be indented. A syllable (tsheg-bar) that comes at the end of a tshig-grub and that starts a new line must have the shay that would normally follow it replaced by a rin-chen-spungs-shad (U+0F11). The reason for this second rule is that the presence of the rin-chen-spungs-shad makes the end of tshig-grub more visible and hence makes the text easier to read. In verse, the second shay following the first rin-chen-spungs-shad is sometimes replaced with a rin-chen-spungs-shad, though the practice is formally incorrect. It is a writer’s trick done to make a particular scribing of a text more elegant. Although a moderately popular device, it does breaks the rule. Not only is rin-chen-spungs-shad used as the replacement for the shay but a whole class of “ornamental shays” are used for the same purpose. All are scribal variants on a rin-chen-spungs-shad, which is correctly written with three dots above it. Tibetan Shorthand Abbreviations (bskungs-yig) and Limitations of the Encoding. A consonant functioning as the word base (ming-gzhi) is allowed to take only one vowel sign according to Tibetan grammar. The Tibetan shorthand writing technique called bskungsyig does allow one or more words to be contracted into a single, very unusual combination of consonants and vowels. This construction frequently entails the application of more than one vowel sign to a single consonant or stack, and the composition of the stacks themselves can break the rules of normal Tibetan grammar. For this reason, vowel signs sometimes interact typographically, which accounts for their particular combining classes (see Section 4.3, Combining Classes—Normative). The Unicode Standard accounts for plain text compounds of Tibetan that contain at most one base consonant, any number of subjoined consonants, followed by any number of vowel signs. This coverage constitutes the vast majority of Tibetan text. Rarely, stacks are seen that contain more than one such consonant-vowel combination in a vertical arrangement. These stacks are highly unusual and are considered beyond the scope of plain text rendering. They may be handled by higher-level mechanisms.
10.3 Phags-pa Phags-pa: U+A840–U+A87F The Phags-pa script is an historic script with some limited modern use. It bears some similarity to Tibetan and has no case distinctions. It is written vertically in columns running
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html
Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html
Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html
This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006
Chapter 11
Southeast Asian Scripts
11
The following scripts are discussed in this chapter: Thai
Khmer
Philippine scripts
Lao
Tai Le
Buginese
Myanmar
New Tai Lue
Balinese
The scripts of Southeast Asia are written from left to right; many use no interword spacing but use spaces or marks between phrases. They are mostly abugidas, but with various idiosyncrasies that distinguish them from the scripts of South Asia. The four Philippine scripts included here operate on similar principles; each uses nonspacing vowel signs. In addition, the Tagalog script has a virama. The term “Tai” refers to a family of languages spoken in Southeast Asia, including Thai, Lao, and Shan. This term is also part of the name of a number of scripts encoded in the Unicode Standard. The Tai Le script is used to write the language of the same name, which is spoken in south central Yunnan (China). The New Tai Lue script, also known as Xishuang Banna Dai, is unrelated to the Tai Le script, but is also used in south Yunnan. Buginese and Balinese are scripts of Indonesia, and both are ultimately related to scripts of South Asia. Buginese is used in Sulawesi; Balinese is used on the island of Bali.
11.1 Thai Thai: U+0E00–U+0E7F The Thai script is used to write Thai and other Southeast Asian languages, such as Kuy, Lanna Tai, and Pali. It is a member of the Indic family of scripts descended from Brahmi. Thai modifies the original Brahmi letter shapes and extends the number of letters to accommodate features of the Thai language, including tone marks derived from superscript digits. At the same time, the Thai script lacks the conjunct consonant mechanism and independent vowel letters found in most other Brahmi-derived scripts. As in all scripts of this family, the predominant writing direction is from left to right.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
374
Southeast Asian Scripts
Standards. Thai layout in the Unicode Standard is based on the Thai Industrial Standard 620-2529, and its updated version 620-2533. Encoding Principles. In common with most Brahmi-derived scripts, each Thai consonant letter represents a syllable possessing an inherent vowel sound. For Thai, that inherent vowel is /o/ in the medial position and /a/ in the final position. The consonants are divided into classes that historically represented distinct sounds, but in modern Thai indicate tonal differences. The inherent vowel and tone of a syllable are then modified by addition of vowel signs and tone marks attached to the base consonant letter. Some of the vowel signs and all of the tone marks are rendered in the script as diacritics attached above or below the base consonant. These combining signs and marks are encoded after the modified consonant in the memory representation. Most of the Thai vowel signs are rendered by full letter-sized inline glyphs placed either before (that is, to the left of ), after (to the right of ) , or around (on both sides of ) the glyph for the base consonant letter. In the Thai encoding, the letter-sized glyphs that are placed before (left of ) the base consonant letter, in full or partial representation of a vowel sign, are, in fact, encoded as separate characters that are typed and stored before the base consonant character. This encoding for left-side Thai vowel sign glyphs (and similarly in Lao) differs from the conventions for all other Indic scripts, which uniformly encode all vowels after the base consonant. The difference is necessitated by the encoding practice commonly employed with Thai character data as represented by the Thai Industrial Standard. The glyph positions for Thai syllables are summarized in Table 11-1.
Table 11-1. Glyph Positions in Thai Syllables Syllable ka ka: ki ki: ku ku: ku’ ku’: ke ke: kae kae: ko
Glyphs
CD CE CF CG CH CI CJ CK LCD LC MCD MC NCD
Copyright © 1991-2007, Unicode, Inc.
Code Point Sequence 0E01 0E30 0E01 0E32 0E01 0E34 0E01 0E35 0E01 0E38 0E01 0E39 0E01 0E36 0E01 0E37 0E40 0E01 0E30 0E40 0E01 0E41 0E01 0E30 0E41 0E01 0E42 0E01 0E30
The Unicode Standard 5.0 – Electronic edition
11.1
Thai
375
Table 11-1. Glyph Positions in Thai Syllables (Continued) Syllable ko: ko’ ko’: koe koe: kia ku’a kua kaw koe:y kay kay kam kri
Glyphs
NC LCED CO LCOD LCO LCGP LCKO CQR LCE LCP SC TC CU CV
Code Point Sequence 0E42 0E01 0E40 0E01 0E32 0E30 0E01 0E2D 0E40 0E01 0E2D 0E30 0E40 0E01 0E2D 0E40 0E01 0E35 0E22 0E40 0E01 0E37 0E2D 0E01 0E31 0E27 0E40 0E01 0E32 0E40 0E01 0E22 0E44 0E01 0E43 0E01 0E01 0E33 0E01 0E24
Rendering of Thai Combining Marks. The combining classes assigned to tone marks (107) and to other combining characters displayed above (0) do not fully account for their typographic interaction. For the purpose of rendering, the Thai combining marks above (U+0E31, U+0E34..U+0E37, U+0E47..U+0E4E) should be displayed outward from the base character they modify, in the order in which they appear in the text. In particular, a sequence containing should be displayed with the nikhahit above the mai ek, and a sequence containing should be displayed with the mai ek above the nikhahit. This does not preclude input processors from helping the user by pointing out or correcting typing mistakes, perhaps taking into account the language. For example, because the string <mai ek, nikhahit> is not useful for the Thai language and is likely a typing mistake, an input processor could reject it or correct it to
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
376
Southeast Asian Scripts
Thai Punctuation. Thai uses a variety of punctuation marks particular to this script. U+0E4F thai character fongman is the Thai bullet, which is used to mark items in lists or appears at the beginning of a verse, sentence, paragraph, or other textual segment. U+0E46 thai character maiyamok is used to mark repetition of preceding letters. U+0E2F thai character paiyannoi is used to indicate elision or abbreviation of letters; it is itself viewed as a kind of letter, however, and is used with considerable frequency because of its appearance in such words as the Thai name for Bangkok. Paiyannoi is also used in combination (U+0E2F U+0E25 U+0E2F) to create a construct called paiyanyai, which means “et cetera, and so forth.” The Thai paiyanyai is comparable to its analogue in the Khmer script: U+17D8 khmer sign beyyal. U+0E5A thai character angkhankhu is used to mark the end of a long segment of text. It can be combined with a following U+0E30 thai character sara a to mark a larger segment of text; typically this usage can be seen at the end of a verse in poetry. U+0E5B thai character khomut marks the end of a chapter or document, where it always follows the angkhankhu + sara a combination. The Thai angkhankhu and its combination with sara a to mark breaks in text have analogues in many other Brahmi-derived scripts. For example, they are closely related to U+17D4 khmer sign khan and U+17D5 khmer sign bariyoosan, which are themselves ultimately related to the danda and double danda of Devanagari. Thai words are not separated by spaces. Instead, text is laid out with spaces introduced at text segments where Western typography would typically make use of commas or periods. However, Latin-based punctuation such as comma, period, and colon are also used in text, particularly in conjunction with Latin letters or in formatting numbers, addresses, and so forth. If word boundary indications are desired—for example, for the use of automatic line layout algorithms—the character U+200B zero width space should be used to place invisible marks for such breaks. The zero width space can grow to have a visible width when justified. See Figure 16-2. Thai Transcription of Pali and Sanskrit. The Thai script is frequently used to write Pali and Sanskrit. When so used, consonant clusters are represented by the explicit use of U+0E3A thai character phinthu (virama) to mark the removal of the inherent vowel. There is no conjoining behavior, unlike in other Indic scripts. U+0E4D thai character nikhahit is the Pali nigghahita and Sanskrit anusvara. U+0E30 thai character sara a is the Sanskrit visarga. U+0E24 thai character ru and U+0E26 thai character lu are vocalic /r/ and /l/, with U+0E45 thai character lakkhangyao used to indicate their lengthening.
11.2 Lao Lao: U+0E80–U+0EFF The Lao language and script are closely related to Thai. The Unicode Standard encodes the characters of the Lao script in the same relative order as the Thai characters.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
11.2
Lao
377
Encoding Principles. Lao contains fewer letters than Thai because by 1960 it was simplified to be fairly phonemic, whereas Thai maintains many etymological spellings that are homonyms. Unlike in Thai, Lao consonant letters are conceived of as simply representing the consonant sound, rather than a syllable with an inherent vowel. The vowel [a] is always represented explicitly with U+0EB0 lao vowel sign a. Punctuation. Regular word spacing is not used in Lao; spaces separate phrases or sentences instead. Glyph Placement. The glyph placements for Lao syllables are summarized in Table 11-2.
Table 11-2. Glyph Positions in Lao Syllables Syllable
Glyphs
Code Point Sequence
ka
WX WY WZ W[ W\ W] W^ W_ `WX `W aWX aW bWX bW `WYX Wc `WZ `W[ `Wkd `Wl `W_f Wej `WeY
0E81 0EB0
ka: ki ki: ku ku: ku’ ku’: ke ke: kae kae: ko ko: ko’ ko’: koe koe: kia ku’a kua kaw
0E81 0EB2 0E81 0EB4 0E81 0EB5 0E81 0EB8 0E81 0EB9 0E81 0EB6 0E81 0EB7 0EC0 0E81 0EB0 0EC0 0E81 0EC1 0E81 0EB0 0EC1 0E81 0EC2 0E81 0EB0 0EC2 0E81 0EC0 0E81 0EB2 0EB0 0E81 0ECD 0EC0 0E81 0EB4 0EC0 0E81 0EB5 0EC0 0E81 0EB1 0EBD 0EC0 0E81 0EA2 0EC0 0E81 0EB7 0EAD 0E81 0EBB 0EA7 0EC0 0E81 0EBB 0EB2
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
378
Southeast Asian Scripts
Table 11-2. Glyph Positions in Lao Syllables (Continued) Syllable koe:y kay kay kam
Glyphs
`W[d `W[l gW hW Wi
Code Point Sequence 0EC0 0E81 0EB5 0EBD 0EC0 0E81 0EB5 0EA2 0EC4 0E81 0EC3 0E81 0E81 0EB3
Additional Letters. A few additional letters in Lao have no match in Thai: U+0EBB lao vowel sign mai kon U+0EBC lao semivowel sign lo U+0EBD lao semivowel sign nyo The preceding two semivowel signs are the last remnants of the system of subscript medials, which in Myanmar retains additional distinctions. Myanmar and Khmer include a full set of subscript consonant forms used for conjuncts. Thai no longer uses any of these forms; Lao has just the two. Rendering of Lao Combining Marks. The combining classes assigned to tone marks (122) and to other combining characters displayed above (0) do not fully account for their typographic interaction. For the purpose of rendering, the Lao combining marks above (U+0EB1, U+0EB4..U+0EB7, U+0EBB, U+0EC8..U+0ECD) should be displayed outward from the base character they modify, in the order in which they appear in the text. In particular, a sequence containing should be displayed with the niggahita above the mai ek, and a sequence containing should be displayed with the mai ek above the niggahita. This does not preclude input processors from helping the user by pointing out or correcting typing mistakes, perhaps taking into account the language. For example, because the string <mai ek, niggahita> is not useful for the Lao language and is likely a typing mistake, an input processor could reject it or correct it to
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
11.3
Myanmar
379
11.3 Myanmar Myanmar: U+1000–U+109F The Myanmar script is used to write Burmese, the majority language of Myanmar (formerly called Burma). Variations and extensions of the script are used to write other languages of the region, such as Shan and Mon, as well as Pali and Sanskrit. The Myanmar script was formerly known as the Burmese script, but the term “Myanmar” is now preferred. The Myanmar writing system derives from a Brahmi-related script borrowed from South India in about the eighth century to write the Mon language. The first inscription in the Myanmar script dates from the eleventh century and uses an alphabet almost identical to that of the Mon inscriptions. Aside from rounding of the originally square characters, this script has remained largely unchanged to the present. It is said that the rounder forms were developed to permit writing on palm leaves without tearing the writing surface of the leaf. Because of its Brahmi origins, the Myanmar script shares the structural features of its Indic relatives: consonant symbols include an inherent “a” vowel; various signs are attached to a consonant to indicate a different vowel; ligatures and conjuncts are used to indicate consonant clusters; and the overall writing direction is from left to right. Thus, despite great differences in appearance and detail, the Myanmar script follows the same basic principles as, for example, Devanagari. Standards. There is not yet an official national standard for the encoding of Myanmar/Burmese. The current encoding was prepared with the consultation of experts from the Myanmar Information Technology Standardization Committee (MITSC) in Yangon (Rangoon). The MITSC, formed by the government in 1997, consists of experts from the Myanmar Computer Scientists’ Association, Myanmar Language Commission, and Myanmar Historical Commission. Encoding Principles. As with Indic scripts, the Myanmar encoding represents only the basic underlying characters; multiple glyphs and rendering transformations are required to assemble the final visual form for each syllable. Even some single characters, such as U+102C " myanmar vowel sign aa, may assume variant forms (for example, #) depending on the other characters with which they combine. Conversely, characters and combinations that may appear visually identical in some fonts, such as U+101D ! myanmar letter wa and U+1040 ! myanmar digit zero, are distinguished by their underlying encoding. Composite Characters. As is the case in many other scripts, some Myanmar letters or signs may be analyzed as composites of two or more other characters and are not encoded separately. The following are examples of Myanmar letters represented by combining character sequences:
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
380
Southeast Asian Scripts
myanmar vowel sign o U+1000 . ka + U+1031 & vowel sign e + U+102C " vowel sign aa → ) k] myanmar vowel sign au U+1000 . ka + U+1031 & vowel sign e + U+102C " vowel sign aa + U+1039 ' virama + U+200C Ã → * kau myanmar vowel sign ui U+1000 . ka + U+102F % vowel sign u + U+102D $ vowel sign i → ( kui Encoding Subranges. The basic consonants, independent vowels, and dependent vowel signs required for writing the Myanmar language are encoded at the beginning of the Myanmar range. Extensions of each of these categories for use in writing other languages, such as Pali and Sanskrit, are appended at the end of the range. In between these two sets lie the script-specific signs, punctuation, and digits. Conjunct and Medial Consonants. As in other Indic-derived scripts, conjunction of two consonant letters is indicated by the insertion of a virama U+1039 ' myanmar sign virama between them. It causes ligation or other rendered combination of the consonants, although the virama itself is not rendered visibly. The conjunct form of U+1004 + myanmar letter nga is rendered as a superscript sign called kinzi. Kinzi is encoded in logical order as a conjunct consonant before the syllable to which it applies; this is similar to the treatment of the Devanagari ra. (See Section 9.1, Devanagari, rule R2.) For example, kinzi applied to U+1000 . myanmar letter ka would be written via the following sequence: U+1004 + nga + U+1039 ' virama + U+1000 . ka → - vka The Myanmar script traditionally distinguishes a set of subscript “medial” consonants: forms of ya, ra, wa, and ha that are considered to be modifiers of the syllable’s vowel. Graphically, these medial consonants are sometimes written as subscripts, but sometimes, as in the case of ra, they surround the base consonant instead. In the Myanmar encoding, the medial consonants are treated as conjuncts; that is, they are coded using the virama. For example, the word krwe , [kjwei] (“to drop off ” ) would be written via the following sequence: U+1000 . ka + U+1039 ' virama + U+101B / ra + U+1039 ' virama + U+101D ! wa + U+1031 & vowel sign e → , krwe Explicit Virama. The virama U+1039 ' myanmar sign virama also participates in some common constructions where it appears as a visible sign, commonly termed killer. In this usage where it appears as a visible diacritic, U+1039 is followed by a U+200C zero width non-joiner, as with Devanagari (see Figure 9-3).
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
11.3
Myanmar
381
Ordering of Syllable Components. Dependent vowels and other signs are encoded after the consonant to which they apply, except for kinzi, which precedes the consonant. Characters occur in the relative order shown in Table 11-3.
Table 11-3. Myanmar Syllabic Structure Name
Encoding
Example
kinzi
consonant
[U+1000..U+1021]
subscript consonant
medial ya
medial ra
medial wa
medial ha
vowel sign e
U+1031
vowel sign u, uu
[U+102F, U+1030]
vowel sign i, ii, ai
[U+102D, U+102E, U+1032]
vowel sign aa
U+102C
anusvara
U+1036
atha (killer)
dot below
U+1037
visarga
U+1038
# * $ % & ( ) + ,, z . , /, 0 1 2 3 4 5
U+1031 & myanmar vowel sign e is encoded after its consonant (as in the earlier example), although in visual presentation its glyph appears before (to the left of) the consonant form. Spacing. Myanmar does not use any whitespace between words. If word boundary indications are desired—for example, for the use of automatic line layout algorithms—the character U+200B zero width space should be used to place invisible marks for such breaks. The zero width space can grow to have a visible width when justified.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
382
Southeast Asian Scripts
11.4 Khmer Khmer: U+1780–U+17FF Khmer, also known as Cambodian, is the official language of the Kingdom of Cambodia. Mutually intelligible dialects are also spoken in northeastern Thailand and in the Mekong Delta region of Vietnam. Although Khmer is not an Indo-European language, it has borrowed much vocabulary from Sanskrit and Pali, and religious texts in those languages have been both transliterated and translated into Khmer. The Khmer script is also used to render a number of regional minority languages, such as Tampuan, Krung, and Cham. The Khmer script, called aksaa khmae (“Khmer letters”), is also the official script of Cambodia. It is descended from the Brahmi script of South India, as are Thai, Lao, Myanmar, Old Mon, and others. The exact sources have not been determined, but there is a great similarity between the earliest inscriptions in the region and the Pallawa script of the Coromandel coast of India. Khmer has been a unique and independent script for more than 1,400 years. Modern Khmer has two basic styles of script: the aksaa crieng (“slanted script”) and the aksaa muul (“round script”). There is no fundamental structural difference between the two. The slanted script (in its “standing” variant) is chosen as representative in Chapter 17, Code Charts.
Principles of the Khmer Script Structurally, the Khmer script has many features in common with other Brahmi-derived scripts, such as Devanagari and Myanmar. Consonant characters bear an inherent vowel sound, with additional signs placed before, above, below, and/or after the consonants to indicate a vowel other than the inherent one. The overall writing direction is left to right. In comparison with the Devanagari script, explained in detail in Section 9.1, Devanagari, the Khmer script has developed several distinctive features during its evolution. Glottal Consonant. The Khmer script has a consonant character for a glottal stop (qa) that bears an inherent vowel sound and can have an optional vowel sign. While Khmer also has independent vowel characters like Devanagari, as shown in Table 11-4, in principle many of its sounds can be represented by using qa and a vowel sign. This does not mean these representations are always interchangeable in real words. Some words are written with one variant to the exclusion of others. Subscript Consonants. Subscript consonant signs differ from independent consonant characters and are called coeng (literally, “foot, leg”) after their subscript position. While a consonant character can constitute an orthographic syllable by itself, a subscript consonant sign cannot. Note that U+17A1 C khmer letter la does not have a corresponding subscript consonant sign in standard Khmer, but does have a subscript in the Khmer script used in Thailand.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
11.4
Khmer
383
Table 11-4. Independent Khmer Vowel Characters Name i ii u uk uu uuv ry ryy ly lyy e ai oo au
Independent Vowel
Qa with Vowel Sign
G H I J K L M N O P Q R S, T U
DY, DY], DZ DZ, DY] D], Dl] D]" D^, Dl^ D^> <[ <\ =[ =\ cD, dD eD co ci
Subscript consonant signs are used to represent any consonant following the first consonant in an orthographic syllable. They also have an inherent vowel sound, which may be suppressed if the syllable bears a vowel sign or another subscript consonant. The subscript consonant signs are often used to represent a consonant cluster. Two consecutive consonant characters cannot represent a consonant cluster because the inherent vowel sound in between is retained. To suppress the vowel, a subscript consonant sign (or rarely a subscript independent vowel) replaces the second consonant character. Theoretically, any consonant cluster composed of any number of consonant sounds without inherent vowel sounds in between can be represented systematically by a consonant character and as many subscript consonant signs as necessary. Examples of subscript consonant signs for a consonant cluster follow:
=t lo + coeng + ngo [l}mq] “sesame” (compare =& lo + ngo [lmq}] “to haunt”)
="2%Z lo + ka + coeng + sa + coeng + mo + ii [lr'ksmei] “beauty, luck” McB/ ka + aa + ha + coeng + vo + e [kaqfeq] “coffee” The subscript consonant signs in the Khmer script can be used to denote a final consonant, although this practice is uncommon.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
384
Southeast Asian Scripts
Examples of subscript consonant signs for a closing consonant follow:
^ht to + aa + nikahit + coeng + ngo [tr'}] “both” (= ^h&) (≠ *^hh [t}m'm]) cBZ, ha + oe + coeng + yo [ha'i] “already” (= cBZ;) (≠ *cB,Z [hya']) While these subscript consonant signs are usually attached to a consonant character, they can also be attached to an independent vowel character. Although this practice is relatively rare, it is used in one very common word, meaning “to give.” Examples of subscript consonant signs attached to an independent vowel character follow:
S, qoo-1 + coeng + yo [paoi] “to give” (= S; and also T,) S+ qoo-1 + coeng + mo [paom] “exclamation of solemn affirmation” (= S:) Subscript Independent Vowel Signs. Some independent vowel characters also have corresponding subscript independent vowel signs, although these are rarely used today. Examples of subscript independent vowel signs follow:
7B: pha + coeng + qe + mo [pspaem] “sweet” (= d75: pha + coeng + qa + ae + mo)
B>3r; ha + coeng + ry + to + samyok sannya + yo [harotey] “heart” (royal) (= BM3r; ha + ry + to + samyok sannya + yo) Consonant Registers. The Khmer language has a richer set of vowels than the languages for which the ancestral script was used, although it has a smaller set of consonant sounds. The Khmer script takes advantage of this situation by assigning different characters to represent the same consonant using different inherent vowels. Khmer consonant characters and signs are organized into two series or registers, whose inherent vowels are nominally -a in the first register and -o in the second register, as shown in Table 11-5. The register of a consonant character is generally reflected on the last letter of its transliterated name. Some consonant characters and signs have a counterpart whose consonant sound is the same but whose register is different, as ka and ko in the first row of the table. For the other consonant characters and signs, two “shifter” signs are available. U+17C9 khmer sign muusikatoan converts a consonant character and sign from the second to the first register, while U+17CA khmer sign triisap converts a consonant from the first register to the second (rows 2–4). To represent pa, however, muusikatoan is attached not to po but to ba, in an exceptional use (row 5). The phonetic value of a dependent vowel sign may also change depending on the context of the consonant(s) to which it is attached (row 6). Encoding Principles. Like other related scripts, the Khmer encoding represents only the basic underlying characters; multiple glyphs and rendering transformations are required to assemble the final visual form for each orthographic syllable. Individual characters, such as U+1789 khmer letter nyo, may assume variant forms depending on the other characters with which they combine.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
11.4
Khmer
385
Table 11-5. Two Registers of Khmer Consonants Row 1 2 3
First Register
Second Register
" ka [ktq] “neck” $ ko [kmq] “mute”
5
6" ba + ka [btqk] “to return” 6k: ba + muusikatoan + mo [ptqm]
6
"^< ka + u + ro [koq] “to stir”
$^< ko + u + ro [kuq] “to sketch”
4
“blockhouse”
8: po + mo [pmqm] “to put into the mouth”
Subscript Consonant Signs. In the way that many Cambodians analyze Khmer today, subscript consonant signs are considered to be different entities from consonant characters. The Unicode Standard does not assign independent code points for the subscript consonant signs. Instead, each of these signs is represented by the sequence of two characters: a special control character (U+17D2 khmer sign coeng) and a corresponding consonant character. This is analogous to the virama model employed for representing conjuncts in other related scripts. Subscripted independent vowels are encoded in the same manner. Because the coeng sign character does not exist as a letter or sign in the Khmer script, the Unicode model departs from the ordinary way that Khmer is conceived of and taught to native Khmer speakers. Consequently, the encoding may not be intuitive to a native user of the Khmer writing system, although it is able to represent Khmer correctly. U+17D2 A khmer sign coeng is not actually a coeng but a coeng generator, because coeng in Khmer refers to the subscript consonant sign. The glyph for U+17D2 A khmer sign coeng shown in the code charts is arbitrary and is not actually rendered directly; the dotted box around the glyph indicates that special rendering is required. To aid Khmer script users, a listing of typical Khmer subscript consonant letters has been provided in Table 11-6 together with their descriptive names following preferred Khmer practice. While the Unicode encoding represents both the subscripts and the combined vowel letters with a pair of code points, they should be treated as a unit for most processing purposes. In other words, the sequence functions as if it had been encoded as a single character. A number of independent vowels also have subscript forms, as shown in Table 11-8.
Table 11-6. Khmer Subscript Consonant Signs Glyph
!p !q !r
Code
Name
17D2 1780
khmer consonant sign coeng ka
17D2 1781
khmer consonant sign coeng kha
17D2 1782
khmer consonant sign coeng ko
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
386
Southeast Asian Scripts
Table 11-6. Khmer Subscript Consonant Signs (Continued) Glyph
!s !t !u !v !w !x !y !z !{ !| !} !~ !" !# !$ !% !& !' !( !) !* !+ !, -! !. !/ !0 !1
Code
Name
17D2 1783
khmer consonant sign coeng kho
17D2 1784
khmer consonant sign coeng ngo
17D2 1785
khmer consonant sign coeng ca
17D2 1786
khmer consonant sign coeng cha
17D2 1787
khmer consonant sign coeng co
17D2 1788
khmer consonant sign coeng cho
17D2 1789
khmer consonant sign coeng nyo
17D2 178A
khmer consonant sign coeng da
17D2 178B
khmer consonant sign coeng ttha
17D2 178C
khmer consonant sign coeng do
17D2 178D
khmer consonant sign coeng ttho
17D2 178E
khmer consonant sign coeng na
17D2 178F
khmer consonant sign coeng ta
17D2 1790
khmer consonant sign coeng tha
17D2 1791
khmer consonant sign coeng to
17D2 1792
khmer consonant sign coeng tho
17D2 1793
khmer consonant sign coeng no
17D2 1794
khmer consonant sign coeng ba
17D2 1795
khmer consonant sign coeng pha
17D2 1796
khmer consonant sign coeng po
17D2 1797
khmer consonant sign coeng pho
17D2 1798
khmer consonant sign coeng mo
17D2 1799
khmer consonant sign coeng yo
17D2 179A
khmer consonant sign coeng ro
17D2 179B
khmer consonant sign coeng lo
17D2 179C
khmer consonant sign coeng vo
17D2 179D
khmer consonant sign coeng sha
17D2 179E
khmer consonant sign coeng ssa
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
11.4
Khmer
387
Table 11-6. Khmer Subscript Consonant Signs (Continued) Glyph
Code
!2 !3 !4 !5
Name
17D2 179F
khmer consonant sign coeng sa
17D2 17A0
khmer consonant sign coeng ha
17D2 17A1
khmer consonant sign coeng la
17D2 17A2
khmer vowel sign coeng qa
As noted earlier, represents a subscript form of la that is not used in Cambodia, although it is employed in Thailand. Dependent Vowel Signs. Most of the Khmer dependent vowel signs are represented with a single character that is applied after the base consonant character and optional subscript consonant signs. Three of these Khmer vowel signs are not encoded as single characters in in the Unicode Standard. The vowel sign am is encoded as a nasalization sign, U+17C6 khmer sign nikahit. Two vowel signs, om and aam, have not been assigned independent code points. They are represented by the sequence of a vowel (U+17BB khmer vowel sign u and U+17B6 khmer vowel sign aa, respectively) and U+17C6 khmer sign nikahit. The nikahit is superficially similar to anusvara, the nasalization sign in the Devanagari script, although in Khmer it is usually regarded as a vowel sign am. Anusvara not only represents a special nasal sound, but also can be used in place of one of the five nasal consonants homorganic to the subsequent consonant (velar, palatal, retroflex, dental, or labial, respectively). Anusvara can be used concurrently with any vowel sign in the same orthographic syllable. Nikahit, in contrast, functions differently. Its final sound is [m], irrespective of the type of the subsequent consonant. It is not used concurrently with the vowels ii, e, ua, oe, oo, and so on, although it is used with the vowel signs aa and u. In these cases the combination is sometimes regarded as a unit—aam and om, respectively. The sound that aam represents is [m'm], not [aqm]. The sequences used for these combinations are shown in Table 11-7.
Table 11-7. Khmer Composite Dependent Vowel Signs with Nikahit Glyph
!h] !hX
Code
Name
17BB 17C6
khmer vowel sign om
17B6 17C6
khmer vowel sign aam
Examples of dependent vowel signs ending with [m] follow:
,h da + nikahit [dtm] “to pound” (compare ,: da + mo [dtqm] “nectar”)
ch po + aa + nikahit [pm'm] “to carry in the beak” (compare c: po + aa + mo [pè'm] “mouth of a river”)
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html
Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html
Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html
This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006
Chapter 12
East Asian Scripts
12
This chapter presents the following scripts: Han
Hiragana
Hangul
Bopomofo
Katakana
Yi
The characters that are now called East Asian ideographs, and known as Han ideographs in the Unicode Standard, were developed in China in the second millennium bce. The basic system of writing Chinese using ideographs has not changed since that time, although the set of ideographs used, their specific shapes, and the technologies involved have developed over the centuries. The encoding of Chinese ideographs in the Unicode Standard is described in Section 12.1, Han. As civilizations developed surrounding China, they frequently adapted China’s ideographs for writing their own languages. Japan, Korea, and Vietnam all borrowed and modified Chinese ideographs for their own languages. Chinese is an isolating language, monosyllabic and noninflecting, and ideographic writing suits it well. As Han ideographs were adopted for unrelated languages, however, extensive modifications were required. Chinese ideographs were originally used to write Japanese, for which they are, in fact, ill suited. As an adaptation, the Japanese developed two syllabaries, hiragana and katakana, whose shapes are simplified or stylized versions of certain ideographs. (See Section 12.4, Hiragana and Katakana.) Chinese ideographs are called kanji in Japanese and are still used, in combination with hiragana and katakana, in modern Japanese. In Korea, Chinese ideographs were originally used to write Korean, for which they are also ill suited. The Koreans developed an alphabetic system, Hangul, discussed in Section 12.6, Hangul. The shapes of Hangul syllables or the letter-like jamos from which they are composed are not directly influenced by Chinese ideographs. However, the individual jamos are grouped into syllabic blocks that resemble ideographs both visually and in the relationship they have to the spoken language (one syllable per block). Chinese ideographs are called hanja in Korean and are still used together with Hangul in South Korea for modern Korean. The Unicode Standard includes a complete set of Korean Hangul syllables as well as the individual jamos, which can also be used to write Korean. Section 3.12, Conjoining Jamo Behavior, describes how to use the conjoining jamos and how to convert between the two methods for representing Korean.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
408
East Asian Scripts
In Vietnam, a set of native ideographs was created for Vietnamese based on the same principles used to create new ideographs for Chinese. These Vietnamese ideographs were used through the beginning of the twentieth century and are occasionally used in more recent signage and other limited contexts. Yi was originally written using a set of ideographs invented in imitation of the Chinese. Modern Yi as encoded in the Unicode Standard is a syllabary derived from these ideographs and is discussed in Section 12.7, Yi. Bopomofo, discussed in Section 12.3, Bopomofo, is another recently invented syllabic system, used to represent Chinese phonetics. In all these East Asian scripts, the characters (Chinese ideographs, Japanese kana, Korean Hangul syllables, and Yi syllables) are written within uniformly sized rectangles, usually squares. Traditionally, the basic writing direction followed the conventions of Chinese handwriting, in top-down vertical lines arranged from right to left across the page. Under the influence of Western printing technologies, a horizontal, left-to-right directionality has become common, and proportional fonts are seeing increased use, particularly in Japan. Horizontal, right-to-left text is also found on occasion, usually for shorter texts such as inscriptions or store signs. Diacritical marks are rarely used, although phonetic annotations are not uncommon. Older editions of the Chinese classics sometimes use the ideographic tone marks (U+302A..U+302D) to indicate unusual pronunciations of characters. Many older character sets include characters intended to simplify the implementation of East Asian scripts, such as variant punctuation forms for text written vertically, halfwidth forms (which occupy only half a rectangle), and fullwidth forms (which allow Latin letters to occupy a full rectangle). These characters are included in the Unicode Standard for compatibility with older standards. Appendix E, Han Unification History, describes how the diverse typographic traditions of mainland China, Taiwan, Japan, Korea, and Vietnam have been reconciled to provide a common set of ideographs in the Unicode Standard for all these languages and regions.
12.1 Han CJK Unified Ideographs The Unicode Standard contains a set of unified Han ideographic characters used in the written Chinese, Japanese, and Korean languages.1 The term Han, derived from the Chinese Han Dynasty, refers generally to Chinese traditional culture. The Han ideographic 1. Although the term “CJK”—Chinese, Japanese, and Korean—is used throughout this text to describe the languages that currently use Han ideographic characters, it should be noted that earlier Vietnamese writing systems were based on Han ideographs. Consequently, the term “CJKV” would be more accurate in a historical sense. Han ideographs are still used for historical, religious, and pedagogical purposes in Vietnam.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
12.1
Han
409
characters make up a coherent script, which was traditionally written vertically, with the vertical lines ordered from right to left. In modern usage, especially in technical works and in computer-rendered text, the Han script is written horizontally from left to right and is freely mixed with Latin or other scripts. When used in writing Japanese or Korean, the Han characters are interspersed with other scripts unique to those languages (Hiragana and Katakana for Japanese; Hangul syllables for Korean). The term “Han ideographic characters” is used within the Unicode Standard as a common term traditionally used in Western texts, although “sinogram” is preferred by professional linguists. Taken literally, the word “ideograph” applies only to some of the ancient original character forms, which indeed arose as ideographic depictions. The vast majority of Han characters were developed later via composition, borrowing, and other non-ideographic principles, but the term “Han ideographs” remains in English usage as a conventional cover term for the script as a whole. The Han ideographic characters constitute a very large set, numbering in the tens of thousands. They have a long history of use in East Asia. Enormous compendia of Han ideographic characters exist because of a continuous, millennia-long scholarly tradition of collecting all Han character citations, including variant, mistaken, and nonce forms, into annotated character dictionaries. Because of the large size of the Han ideographic character repertoire, and because of the particular problems that the characters pose for standardizing their encoding, this character block description is more extended than that for other scripts and is divided into subsections. The first two subsections, “CJK Standards” and “Blocks Containing Han Ideographs,” describe the character set standards used as sources and the way in which the Unicode Standard divides Han ideographs into blocks. These subsections are followed by an extended discussion of the characteristics of Han characters, with particular attention being paid to the problem of unification of encoding for characters used for different languages. There is a formal statement of the principles behind the Unified Han character encoding adopted in the Unicode Standard and the order of its arrangement. For a detailed account of the background and history of development of the Unified Han character encoding, see Appendix E, Han Unification History.
CJK Standards The Unicode Standard draws its unified Han character repertoire of 70,229 characters from a number of character set standards. These standards are grouped into seven initial sources, as indicated in Table 12-1. The primary work of unifying and ordering the characters from these sources was done by the Ideographic Rapporteur Group (IRG), a subgroup of ISO/ IEC JTC1/SC2/WG2. The G, T, J, K, KP, and V sources represent the characters submitted to the IRG by its member bodies. The G source consists of submissions from mainland China, the Hong Kong SAR, and Singapore. The other five sources are the submissions from Taiwan, Japan, South and North Korea, and Vietnam, respectively. The U source represents character set stan-
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
410
East Asian Scripts
Table 12-1. Initial Sources for Unified Han G source:
G0 G1 G3 G5 G7
GB 2312-80 GB 12345-90 with 58 Hong Kong and 92 Korean “Idu” characters GB 7589-87 unsimplified forms GB 7590-87 unsimplified forms General Purpose Hanzi List for Modern Chinese Language, and General List of Simplified Hanzi GS Singapore Characters G8 GB 8565-88 GE GB 16500-95 T source: T1 CNS 11643-1992 1st plane T2 CNS 11643-1992 2nd plane T3 CNS 11643-1992 3rd plane with some additional characters T4 CNS 11643-1992 4th plane T5 CNS 11643-1992 5th plane T6 CNS 11643-1992 6th plane T7 CNS 11643-1992 7th plane TF CNS 11643-1992 15th plane J source: J0 JIS X 0208-1990 J1 JIS X 0212-1990 JA Unified Japanese IT Vendors Contemporary Ideographs, 1993 K source: K0 KS C 5601-1987 (unique ideographs) K1 KS C 5657-1991 K2 PKS C 5700-1 1994 K3 PKS C 5700-2 1994 KP source: KP0 KPS 9566-97 KP1 KPS 10721-2000 V source: V0 TCVN 5773:1993 V1 TCVN 6056:1995 U source: KS C 5601-1987 (duplicate ideographs) ANSI Z39.64-1989 (EACC) Big-5 (Taiwan) CCCII, level 1 GB 12052-89 (Korean) JEF (Fujitsu) PRC Telegraph Code Taiwan Telegraph Code (CCDC) Xerox Chinese Han Character Shapes Permitted for Personal Names (Japan) IBM Selected Japanese and Korean Ideographs
dards that were not submitted to the IRG by any member body but that were used by the Unicode Consortium. For each of the IRG sources, the table contains an abbreviated source name in the second column and a descriptive source name in the third column. The abbreviated names are used in various data files published by the Unicode Consortium and ISO/IEC to identify the specific IRG sources.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
12.1
Han
411
In some cases, the entire ideographic repertoire of the original character set standards was not included in the corresponding source. Three reasons explain this decision: 1. Where the repertoires of two of the character set standards within a single source have considerable overlap, the characters in the overlap might be included only once in the source. This approach is used, for example, with GB 2312-80 and GB 12345-90, which have many ideographs in common. Characters in GB 12345-90 that are duplicates of characters in GB 2312-80 are not included in the G source. 2. Where a character set standard is based on unification rules that differ substantially from those used by the IRG, many variant characters found in the character set standard will not be included in the source. This situation is the case with CNS 11643-1992, EACC, and CCCII. It is the only case where full roundtrip compatibility with the Han ideograph repertoire of the relevant character set standards is not guaranteed. 3. KS C 5601-1987 contains numerous duplicate ideographs included because they have multiple pronunciations in Korean. These multiply encoded ideographs are not included in the K source but are included in the U source. They are encoded in the CJK Compatibility Ideographs block to provide full roundtrip compatibility with KS C 5601-1987 (now known as KS X 1001:1998).
Blocks Containing Han Ideographs Han ideographic characters are found in five main blocks of the Unicode Standard, as shown in Table 12-2.
Table 12-2. Blocks Containing Han Ideographs Block
Range
Comment
CJK Unified Ideographs CJK Unified Ideographs Extension A CJK Unified Ideographs Extension B CJK Compatibility Ideographs
4E00-9FFF 3400-4DFF 20000-2A6DF F900-FAFF
Common Rare Rare, historic Duplicates, unifiable variants, corporate characters Unifiable variants
CJK Compatibility Ideographs Sup- 2F800-2FA1F plement
Characters in the three unified ideographs blocks are defined by the IRG, based on Han unification principles explained later in this section. The two compatibility ideographs blocks contain various duplicate or unifiable variant characters encoded for round-trip compatibility with various legacy standards. The initial repertoire of the CJK Unified Ideographs block contains characters submitted to the IRG prior to 1992, consisting of commonly used characters. That initial repertoire was
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
412
East Asian Scripts
derived entirely from the G, T, J, and K sources. It has subsequently been extended with small sets of unified ideographs or ideographic components needed for interoperability with the HKSCS standard (U+9FA6..U+9FB3) and with the GB 18030 standard (U+9FB4..U+9FBB). Characters in the CJK Unified Ideographs Extension A block are rare and are not unifiable with characters in the CJK Unified Ideographs block. They were submitted to the IRG during 1992–1998 and are derived entirely from the G, T, J, K, and V sources. The CJK Unified Ideographs Extension B block contains rare and historic characters that are also not unifiable with characters in the CJK Unified Ideographs block. They were submitted to the IRG during 1998–2002 and are derived from a long list of additional sources, including major dictionaries, as documented in Table 12-8. The only principled difference in the unification work done by the IRG on the three unified ideograph blocks is that the Source Separation Rule (rule R1) was applied only to the original CJK Unified Ideographs block and not to the two extension blocks. The Source Separation Rule states that ideographs that are distinctly encoded in a source must not be unified. (For further discussion, see “Principles of Han Unification” later in this section.) The three unified ideograph blocks are not closed repertoires. Each contains a small range of reserved code points at the end of the block. Additional unified ideographs may eventually be encoded in those ranges—as has already occurred in the CJK Unified Ideographs block itself. There is no guarantee that any such Han ideographic additions would be of the same types or from the same sources as preexisting characters in the block, and implementations should be careful not to make hard-coded assumptions regarding the range of assignments within the Han ideographic blocks in general. Unifiable Han characters unique to the U source are found in the CJK Compatibility Ideographs block. There are 12 of these characters: U+FA0E, U+FA0F, U+FA11, U+FA13, U+FA14, U+FA1F, U+FA21, U+FA23, U+FA24, U+FA27, U+FA28, and U+FA29. The remaining characters in the CJK Compatibility Ideographs block and the CJK Compatiblity Ideographs Supplement block are either duplicates or unifiable variants of a character in one of the blocks of unified ideographs. IICore. IICore (International Ideograph Core) is an important set of Han ideographs, incorporating characters from all the defined blocks. This set of nearly 10,000 characters has been developed by the IRG and represents the set of characters in everyday use throughout East Asia. By covering the characters in IICore, developers guarantee that they can handle all the needs of almost all of their customers. This coverage is of particular use on devices such as cell phones or PDAs, which have relatively stringent resource limitations. Characters in IICore are explicitly tagged as such in the Unihan Database (see “Unihan Database” in Section 4.1, Unicode Character Database).
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
12.1
Han
413
General Characteristics of Han Ideographs The authoritative Japanese dictionary Koujien defines Han characters to be characters that originated among the Chinese to write the Chinese language. They are now used in China, Japan, and Korea. They are logographic (each character represents a word, not just a sound) characters that developed from pictographic and ideographic principles. They are also used phonetically. In Japan they are generally called kanji (Han, that is, Chinese, characters) including the “national characters” (kokuji) such as touge (mountain pass), which have been created using the same principles. They are also called mana (true names, as opposed to kana, false or borrowed names).2 For many centuries, written Chinese was the accepted written standard throughout East Asia. The influence of the Chinese language and its written form on the modern East Asian languages is similar to the influence of Latin on the vocabulary and written forms of languages in the West. This influence is immediately visible in the mixture of Han characters and native phonetic scripts (kana in Japan, hangul in Korea) as now used in the orthographies of Japan and Korea (see Table 12-3).
Table 12-3. Common Han Characters Han Character
Chinese
Japanese
Korean
English Translation
1
ti#n
ten, ame
chen
heaven, sky
2
dì
chi, tsuchi
ci
earth, ground
3
rén
jin, hito
in
man, person
4
sh#n
san, yama
san
mountain
5
shu$
sui, mizu
swu
water
6
shàng
jou, ue
sang
above
7
xià
ka, shita
ha
below
The evolution of character shapes and semantic drift over the centuries has resulted in changes to the original forms and meanings. For example, the Chinese character 8 tZng (Japanese tou or yu, Korean thang), which originally meant “hot water,” has come to mean “soup” in Chinese. “Hot water” remains the primary meaning in Japanese and Korean, whereas “soup” appears in more recent borrowings from Chinese, such as “soup noodles” 2. Lee Collins’ translation from the Japanese, Koujien, Izuru, Shinmura, ed. (Tokyo: Iwanami Shoten, 1983).
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
414
East Asian Scripts
(Japanese tanmen; Korean thangmyen). Still, the identical appearance and similarities in meaning are dramatic and more than justify the concept of a unified Han script that transcends language. The “nationality” of the Han characters became an issue only when each country began to create coded character sets (for example, China’s GB 2312-80, Japan’s JIS X 0208-1978, and Korea’s KS C 5601-87) based on purely local needs. This problem appears to have arisen more from the priority placed on local requirements and lack of coordination with other countries, rather than out of conscious design. Nevertheless, the identity of the Han characters is fundamentally independent of language, as shown by dictionary definitions, vocabulary lists, and encoding standards. Terminology. Several standard romanizations of the term used to refer to East Asian ideographic characters are commonly used. They include hànzì (Chinese), kanzi (Japanese), kanji (colloquial Japanese), hanja (Korean), and ChÔ hán (Vietnamese). The standard English translations for these terms are interchangeable: Han character, Han ideographic character, East Asian ideographic character, or CJK ideographic character. For the purpose of clarity, the Unicode Standard uses some subset of the English terms when referring to these characters. The term Kanzi is used in reference to a specific Japanese government publication. The unrelated term KangXi (which is a Chinese reign name, rather than another romanization of “Han character”) is used only when referring to the primary dictionary used for determining Han character arrangement in the Unicode Standard. (See Table 12-7.) Distinguishing Han Character Usage Between Languages. There is some concern that unifying the Han characters may lead to confusion because they are sometimes used differently by the various East Asian languages. Computationally, Han character unification presents no more difficulty than employing a single Latin character set that is used to write languages as different as English and French. Programmers do not expect the characters “c”, “h”, “a”, and “t” alone to tell us whether chat is a French word for cat or an English word meaning “informal talk.” Likewise, we depend on context to identify the American hood (of a car) with the British bonnet. Few computer users are confused by the fact that ASCII can also be used to represent such words as the Welsh word ynghyd, which are strange looking to English eyes. Although it would be convenient to identify words by language for programs such as spell-checkers, it is neither practical nor productive to encode a separate Latin character set for every language that uses it. Similarly, the Han characters are often combined to “spell” words whose meaning may not be evident from the constituent characters. For example, the two characters “to cut” and “hand” mean “postage stamp” in Japanese, but the compound may appear to be nonsense to a speaker of Chinese or Korean (see Figure 12-1).
Figure 12-1. Han Spelling
+ to cut
hand
Copyright © 1991-2007, Unicode, Inc.
=
1. Japanese “stamp” 2. Chinese “cut hand”
The Unicode Standard 5.0 – Electronic edition
12.1
Han
415
Even within one language, a computer requires context to distinguish the meanings of words represented by coded characters. The word chuugoku in Japanese, for example, may refer to China or to a district in central west Honshuu (see Figure 12-2).
Figure 12-2. Semantic Context for Han Characters
+ middle
country
=
1. China 2. Chuugoku district of Honshuu
Coding these two characters as four so as to capture this distinction would probably cause more confusion and still not provide a general solution. The Unicode Standard leaves the issues of language tagging and word recognition up to a higher level of software and does not attempt to encode the language of the Han characters. Simplified and Traditional Chinese. There are currently two main varieties of written Chinese: “simplified Chinese” (jiântîzì), used in most parts of the People’s Republic of China (PRC) and Singapore, and “traditional Chinese” (fántîzì), used predominantly in the Hong Kong and Macao SARs, Taiwan, and overseas Chinese communities. The process of interconverting between the two is a complex one. This complexity arises largely because a single simplified form may correspond to multiple traditional forms, such as U+53F0 3, which is a traditional character in its own right and the simplified form for U+6AAF 4, U+81FA 5, and U+98B1 6. Moreover, vocabulary differences have arisen between Mandarin as spoken in Taiwan and Mandarin as spoken in the PRC, the most notable of which is the usual name of the language itself: guóy& (the National Language) in Taiwan and p&t]nghuà (the Common Speech) in the PRC. Merely converting the character content of a text from simplified Chinese to the appropriate traditional counterpart is insufficient to change a simplified Chinese document to traditional Chinese, or vice versa. (The vast majority of Chinese characters are the same in both simplified and traditional Chinese.) There are two PRC national standards, GB 2312-80 and GB 12345-90, which are intended to represent simplified and traditional Chinese, respectively. The character repertoires of the two are the same, but the simplified forms occur in GB 2312-80 and the traditional ones in GB 12345-90. These are both part of the IRG G source, with traditional forms and simplified forms separated where they differ. As a result, the Unicode Standard contains a number of distinct simplifications for characters, such as U+8AAC i and U+8BF4 j. While there are lists of official simplifications published by the PRC, most of these are obtained by applying a few general principles to specific areas. In particular, there is a set of radicals (such as U+2F94 / kangxi radical speech, U+2F99 0 kangxi radical shell, U+2FA8 1 kangxi radical gate, and U+2FC3 2 kangxi radical bird) for which simplifications exist (U+2EC8 + cjk radical c-simplified speech, U+2EC9 , cjk radical c-simplified shell, U+2ED4 - cjk radical c-simplified gate, and U+2EE6 . cjk radical c-simplified bird). The basic technique for simplifying a character containing one of these radicals is to substitute the simplified radical, as in the previous example.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
416
East Asian Scripts
The Unicode Standard does not explicitly encode all simplified forms for traditional Chinese characters. Where the simplified and traditional forms exist as different encoded characters, each should be used as appropriate. The Unicode Standard does not specify how to represent a new simplified form (or, more rarely, a new traditional form) that can be derived algorithmically from an encoded traditional form (simplified form). Dialects of Chinese. Chinese is not a single language, but a complex of spoken forms that share a single written form. Although these spoken forms are referred to as dialects, they are actually mutually unintelligible and distinct languages. Virtually all modern written Chinese is Mandarin, the dominant language in both the PRC and Taiwan. Speakers of other Chinese languages learn to read and write Mandarin, although they pronounce it using the rules of their own language. (This would be like having Spanish children read and write only French, but pronouncing it as if it were Spanish.) The major non-Mandarin Chinese languages are Cantonese (spoken in the Hong Kong and Macao SARs, in many overseas Chinese communities, and in much of Guangzhou province), Wu, Min, Hakka, Gan, and Xiang. Prior to the twentieth century, the standard form of written Chinese was literary Chinese, a form derived from the classical Chinese written, but probably not spoken by Confucius in the sixth century bce. The ideographic repertoire of the Unicode Standard is sufficient for all but the most specialized texts of modern Chinese, literary Chinese, and classical Chinese. Preclassical Chinese, written using seal forms or oracle bone forms, has not been systematically incorporated into the Unicode Standard. Of Chinese languages, Cantonese is occasionally found in printed materials; the others are almost never seen in printed form. There is less standardization for the ideographic repertoires of these languages, and no fully systematic effort has been undertaken to catalog the nonstandard ideographs they use. Because of efforts on the part of the government of the Hong Kong SAR, however, the current ideographic repertoire of the Unicode Standard should be adequate for many—but not all—written Cantonese texts. Sorting Han Ideographs. The Unicode Standard does not define a method by which ideographic characters are sorted; the requirements for sorting differ by locale and application. Possible collating sequences include phonetic, radical-stroke (KangXi, Xinhua Zidian, and so on), four-corner, and total stroke count. Raw character codes alone are seldom sufficient to achieve a usable ordering in any of these schemes; ancillary data are usually required. (See Table 12-7.) Character Glyphs. In form, Han characters are monospaced. Every character takes the same vertical and horizontal space, regardless of how simple or complex its particular form is. This practice follows from the long history of printing and typographical practice in China, which traditionally placed each character in a square cell. When written vertically, there are also a number of named cursive styles for Han characters, but the cursive forms of the characters tend to be quite idiosyncratic and are not implemented in general-purpose Han character fonts for computers.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
12.1
Han
417
There may be a wide variation in the glyphs used in different countries and for different applications. The most commonly used typefaces in one country may not be used in others. The types of glyphs used to depict characters in the Han ideographic repertoire of the Unicode Standard have been constrained by available fonts. Users are advised to consult authoritative sources for the appropriate glyphs for individual markets and applications. It is assumed that most Unicode implementations will provide users with the ability to select the font (or mixture of fonts) that is most appropriate for a given locale.
Principles of Han Unification Three-Dimensional Conceptual Model. To develop the explicit rules for unification, a conceptual framework was developed to model the nature of Han ideographic characters. This model expresses written elements in terms of three primary attributes: semantic (meaning, function), abstract shape (general form), and actual shape (instantiated, typeface form). These attributes are graphically represented in three dimensions according to the X, Y, and Z axes (see Figure 12-3).
1
2
Z (typeface)
Y(
abs
trac
t sh
ape
)
Figure 12-3. Three-Dimensional Conceptual Model
X (semantic)
The semantic attribute (represented along the X axis) distinguishes characters by meaning and usage. Distinctions are made between entirely unrelated characters such as > (marsh) and : (machine) as well as extensions or borrowings beyond the original semantic cluster such as ;1 (a phonetic borrowing used as a simplified form of :) and ;2 (table, the original meaning).
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
418
East Asian Scripts
The abstract shape attribute (the Y axis) distinguishes the variant forms of a single character with a single semantic attribute (that is, a character with a single position on the X axis). The actual shape (typeface) attribute (the Z axis) is for differences of type design (the actual shape used in imaging) of each variant form. Only characters that have the same abstract shape (that is, occupy a single point on the X and Y axes) are potential candidates for unification. Z-axis typeface and stylistic differences are generally ignored.
Unification Rules The following rules were applied during the process of merging Han characters from the different source character sets. R1 Source Separation Rule. If two ideographs are distinct in a primary source standard, then they are not unified. • This rule is sometimes called the round-trip rule because its goal is to facilitate a round-trip conversion of character data between an IRG source standard and the Unicode Standard without loss of information. • This rule was applied only for the work on the original CJK Unified Ideographs block [also known as the Unified Repertoire and Ordering (URO)]. The IRG dropped this rule in 1992 and will not use it in future work. Figure 12-4 illustrates six variants of the CJK ideograph meaning “sword.”
Figure 12-4. CJK Source Separation
“sword” Each of the six variants in Figure 12-4 is separately encoded in one of the primary source standards—in this case, J0 (JIS X 0208-1990), as shown in Table 12-4.
Table 12-4. Source Encoding for Sword Variants Unicode
JIS
U+5263 U+528D U+5271 U+5294 U+5292 U+91FC
J0-3775 J0-5178 J0-517B J0-5179 J0-517A J0-6E5F
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
12.1
Han
419
Because the six sword characters are historically related, they are not subject to disunification by the Noncognate Rule (R2) and thus would ordinarily have been considered for possible abstract shape-based unification by R3. Under that rule, the fourth and fifth variants would probably have been unified for encoding. However, the Source Separation Rule required that all six variants be separately encoded, precluding them from any consideration of shape-based unification. Further variants of the “sword” ideograph, U+5251 and U+528E, are also separately encoded because of application of the Source Separation Rule—in that case applied to one or more Chinese primary source standards, rather than to the J0 Japanese primary source standard. R2 Noncognate Rule. In general, if two ideographs are unrelated in historical derivation (noncognate characters), then they are not unified. For example, the ideographs in Figure 12-5, although visually quite similar, are nevertheless not unified because they are historically unrelated and have distinct meanings.
Figure 12-5. Not Cognates, Not Unified
≠ earth
warrior, scholar
R3 By means of a two-level classification (described next), the abstract shape of each ideograph is determined. Any two ideographs that possess the same abstract shape are then unified provided that their unification is not disallowed by either the Source Separation Rule or the Noncognate Rule.
Abstract Shape Two-Level Classification. Using the three-dimensional model, characters are analyzed in a two-level classification. The two-level classification distinguishes characters by abstract shape (Y axis) and actual shape of a particular typeface (Z axis). Variant forms are identified based on the difference of abstract shapes. To determine differences in abstract shape and actual shape, the structure and features of each component of an ideograph are analyzed as follows. Ideographic Component Structure. The component structure of each ideograph is examined. A component is a geometrical combination of primitive elements. Various ideographs can be configured with these components used in conjunction with other components. Some components can be combined to make a component more complicated in its structure. Therefore, an ideograph can be defined as a component tree with the entire ideograph as the root node and with the bottom nodes consisting of primitive elements (see Figure 12-6 and Figure 12-7). Ideograph Features. The following features of each ideograph to be compared are examined:
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
420
East Asian Scripts
Figure 12-6. Ideographic Component Structure
Figure 12-7. The Most Superior Node of an Ideographic Component
vs. vs. vs.
• Number of components • Relative positions of components in each complete ideograph • Structure of a corresponding component • Treatment in a source character set • Radical contained in a component Uniqueness or Unification. If one or more of these features are different between the ideographs compared, the ideographs are considered to have different abstract shapes and, therefore, are considered unique characters and are not unified. If all of these features are identical between the ideographs, the ideographs are considered to have the same abstract shape and are unified. Spatial Positioning. Ideographs may exist as a unit or may be a component of more complex ideographs. A source standard may describe a requirement for a component with a specific spatial positioning that would be otherwise unified on the principle of having the same abstract shape as an existing full ideograph. Examples of spatial positioning for ideographic components are left half, top half, and so on. Examples. Table 12-5 gives examples of some typical differences in abstract character shape, resulting in decisions not to unify characters. Also included in the table are all three instances of disunification based on distinctions in spatial positioning. Differences in the actual shapes of ideographs that have been unified are illustrated in Table 12-6.
Han Ideograph Arrangement The arrangement of the Unicode Han characters is based on the positions of characters as they are listed in four major dictionaries. The KangXi Zidian was chosen as primary
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
12.1
Han
421
Table 12-5. Ideographs Not Unified Characters
Reason
a‡b c‡d e‡f
Different number of components
g‡h i‡j k‡l a‡b c‡d e‡f
Characters treated differently in a source character set
Same number of components placed in different relative positions Same number and same relative position of components, corresponding components structured differently Characters with different radical in a component Same abstract shape, different actual shape Same abstract shape, different position (U+9FBB versus U+470C) Same abstract shape, different position (U+9FB9 versus U+20509) Same abstract shape, different position (U+9FBA versus U+2099D)
Table 12-6. Ideographs Unified Characters
m»n q»r s»t u»v w»x y»z P»Q ~»T o»p a.
Reason Different writing sequence Differences in overshoot at the stroke termination Differences in contact of strokes Differences in protrusion at the folded corner of strokes Differences in bent strokes Differences in stroke termination Differences in accent at the stroke initiation Difference in rooftop modification Difference in rotated strokes/dotsa
These ideographs (having the same abstract shape) would have been unified except for the Source Separation Rule.
because it contains most of the source characters and because the dictionary itself and the principles of character ordering it employs are commonly used throughout East Asia. The Han ideograph arrangement follows the index (page and position) of the dictionaries listed in Table 12-7 with their priorities. When a character is found in the KangXi Zidian, it follows the KangXi Zidian order. When it is not found in the KangXi Zidian and it is found in Dai Kan-Wa Jiten, it is given a position extrapolated from the KangXi position of the preceding character in Dai Kan-Wa Jiten.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
422
East Asian Scripts
Table 12-7. Han Ideograph Arrangement Priority Dictionary 1 2 3 4
KangXi Zidian Dai Kan-Wa Jiten Hanyu Da Zidian Dae Jaweon
City
Publisher
Version
Beijing Tokyo Chengdu Seoul
Zhonghua Bookstore, 1989 Taishuukan Shoten, 1986 Sichuan Cishu Publishing, 1986 Samseong Publishing Co. Ltd, 1988
Seventh edition Revised edition First edition First edition
When it is not found in either KangXi or Dai Kan-Wa, then the Hanyu Da Zidian and Dae Jaweon dictionaries are consulted in a similar manner. Ideographs with simplified KangXi radicals are placed in a group following the traditional KangXi radical from which the simplified radical is derived. For example, characters with the simplified radical + corresponding to KangXi radical / follow the last nonsimplified character having / as a radical. The arrangement for these simplified characters is that of the Hanyu Da Zidian. The few characters that are not found in any of the four dictionaries are placed following characters with the same KangXi radical and stroke count. Radical-Stroke Order. The radical-stroke order that results is a culturally neutral order. It does not exactly match the order found in common dictionaries. Information for sorting all CJK ideographs by the radical-stroke method is found in the Unihan Database (see “Unihan Database” in Section 4.1, Unicode Character Database). It should be used if characters from the various blocks containing ideographs (see Table 12-2) are to be properly interleaved. Note, however, that there is no standard way of ordering characters with the same radical-stroke count; for most purposes, Unicode code point order would be as acceptable as any other way. A radical-stroke index to the IICore subset of the CJK unified ideographs is provided in Chapter 18, Han Radical-Stroke Index, to help locate the most useful and common Han characters in the standard. A full radical-stroke index of all CJK unified ideographs, together with a complete chart listing, can be found on the Unicode Web site. Details regarding the form of the online charts for the CJK unified ideographs are discussed in Section 17.2, CJK Unified Ideographs.
Mappings for Han Ideographs The mappings defined by the IRG between the ideographs in the Unicode Standard and the IRG sources are specified in the Unihan Database. These mappings are considered to be normative parts of ISO/IEC 10646 and of the Unicode Standard; that is, the characters are defined to be the targets for conversion of these characters in these character set standards. These mappings have been derived from editions of the source standards provided directly to the IRG by its member bodies, and they may not match mappings derived from the published editions of these standards. For this reason, developers may choose to use alternative mappings more directly correlated with published editions.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html
Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html
Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html
This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006
Chapter 13
Additional Modern Scripts 13 This chapter contains a collection of additional scripts in modern use that do not fit well into the script categories featured in other chapters: Ethiopic
Tifinagh
Canadian Aboriginal Syllabics
Mongolian
N’Ko
Deseret
Osmanya
Cherokee
Shavian
Ethiopic, Mongolian, and Tifinagh are scripts with long histories. Although their roots can be traced back to the original Semitic and North African writing systems, they would not be classified as Middle Eastern scripts today. The remaining scripts in this chapter have been developed relatively recently. Some of them show roots in Latin and other letterforms, including shorthand. They are all original creative contributions intended specifically to serve the linguistic communities that use them.
13.1 Ethiopic Ethiopic: U+1200–U+137F The Ethiopic syllabary originally evolved for writing the Semitic language Ge’ez. Indeed, the English noun “Ethiopic” simply means “the Ge’ez language.” Ge’ez itself is now limited to liturgical usage, but its script has been adopted for modern use in writing several languages of central east Africa, including Amharic, Tigre, and Oromo. Basic and Extended Ethiopic. The Ethiopic characters encoded here are the basic set that has become established in common usage for writing major languages. As with other productive scripts, the basic Ethiopic forms are sometimes modified to produce an extended range of characters for writing additional languages. Encoding Principles. The syllables of the Ethiopic script are traditionally presented as a two-dimensional matrix of consonant-vowel combinations. The encoding follows this structure; in particular, the codespace range U+1200..U+1357 is interpreted as a matrix of 43 consonants crossed with 8 vowels, making 344 conceptual syllables. Most of these consonant-vowel syllables are represented by characters in the script, but some of them happen to be unused, accounting for the blank cells in the matrix.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
446
Additional Modern Scripts
Variant Glyph Forms. A given Ethiopic syllable may be represented by different glyph forms, analogous to the glyph variants of Latin lowercase “a” or “g”, which do not coexist in the same font. Thus the particular glyph shown in the code chart for each position in the matrix is merely one representation of that conceptual syllable, and the glyph itself is not the object that is encoded. Labialized Subseries. A few Ethiopic consonants have labialized (“W”) forms that are traditionally allotted their own consonant series in the syllable matrix, although only a subset of the possible vowel forms are realized. Each of these derivative series is encoded immediately after the corresponding main consonant series. Because the standard vowel series includes both “AA” and “WAA”, two different cells of the syllable matrix might represent the “consonant + W + AA” syllable. For example: U+1257 = QH + WAA: potential but unused version of qhwaa U+125B = QHW + AA: ethiopic syllable qhwaa In these cases, where the two conceptual syllables are equivalent, the entry in the labialized subseries is encoded and not the “consonant + WAA” entry in the main syllable series. The six specific cases are enumerated in Table 13-1. In three of these cases, the -WAA position in the syllable matrix has been reanalyzed and used for encoding a syllable in -OA for extended Ethiopic.
Table 13-1. Labialized Forms in Ethiopic -WAA -WAA Form QWAA QHWAA XWAA KWAA KXWAA GWAA
Encoded as U+124B d U+125B e U+128B f U+12B3 g U+12C3 h U+1313 i
Not Used
Contrast
1247 1257 1287 12AF 12BF 130F
U+1247 { QOA U+1287 | XOA U+12AF } KOA
Also, within the labialized subseries, the sixth vowel (“-E”) forms are sometimes considered to be second vowel (“-U”) forms. For example: U+1249 = QW + U: unused version of qwe U+124D = QW + E: ethiopic syllable qwe In these cases, where the two syllables are nearly equivalent, the “-E” entry is encoded and not the “-U” entry. The six specific cases are enumerated in Table 13-2. Keyboard Input. Because the Ethiopic script includes more than 300 characters, the units of keyboard input must constitute some smaller set of entities, typically 43+8 codes interpreted as the coordinates of the syllable matrix. Because these keyboard input codes are expected to be transient entities that are resolved into syllabic characters before they enter stored text, keyboard input codes are not specified in this standard.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
13.1
Ethiopic
447
Table 13-2. Labialized Forms in Ethiopic -WE “-WE” Form Encoded as U+124D j U+125D k U+128D l U+12B5 m U+12C5 n U+1315 o
QWE QHWE XWE KWE KXWE GWE
Not Used 1249 1259 1289 12B1 12C1 1311
Syllable Names. The Ethiopic script often has multiple syllables corresponding to the same Latin letter, making it difficult to assign unique Latin names. Therefore the names list makes use of certain devices (such as doubling a Latin letter in the name) merely to create uniqueness; this device has no relation to the phonetics of these syllables in any particular language. Encoding Order and Sorting. The order of the consonants in the encoding is based on the traditional alphabetical order. It may differ from the sort order used for one or another language, if only because in many languages various pairs or triplets of syllables are treated as equivalent in the first sorting pass. For example, an Amharic dictionary may start out with a section headed by three H-like syllables: U+1200 ethiopic syllable ha U+1210 ethiopic syllable hha U+1280 ethiopic syllable xa Thus the encoding order cannot and does not implement a collation procedure for any particular language using this script. Word Separators. The traditional word separator is U+1361 ethiopic wordspace ( : ). In modern usage, a plain white wordspace (U+0020 space) is becoming common. Section Mark. One or more section marks are typically used on a separate line to mark the separation of sections. Commonly, an odd number is used and they are separated by spaces. Diacritical Marks. The Ethiopic script generally makes no use of diacritical marks, but they are sometimes employed for scholarly or didactic purposes. In particular, U+135F ethiopic combining gemination mark and U+030E combining double vertical line above are sometimes used to indicate emphasis or gemination (consonant doubling). Numbers. Ethiopic digit glyphs are derived from the Greek alphabet, possibly borrowed from Coptic letterforms. In modern use, European digits are often used. The Ethiopic number system does not use a zero, nor is it based on digital-positional notation. A number is denoted as a sequence of powers of 100, each preceded by a coefficient (2 through 99). In each term of the series, the power 100^n is indicated by n HUNDRED characters (merged to a digraph when n = 2). The coefficient is indicated by a tens digit and a ones digit, either of which is absent if its value is zero.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
448
Additional Modern Scripts
For example, the number 2345 is represented by 2345 = (20 + 3)*100^1 + (40 + 5)*100^0 = 20 3 100
40 5
= TWENTY THREE HUNDRED FORTY FIVE = 1373 136B 137B 1375 136D MNOPQ A language using the Ethiopic script may have a word for “thousand,” such as Amharic “SHI” (U+123A), and a quantity such as 2,345 may also be written as it is spoken in that language, which in the case of Amharic happens to parallel English: 2,345 = TWO thousand THREE HUNDRED FORTY FIVE = 136A 123A 136B 137B 1375 136D RSNOPQ
Ethiopic Extensions: U+1380–U+139F, U+2D80–U+2DDF The Ethiopic script is used for a large number of languages and dialects in Ethiopia and in some instances has been extended significantly beyond the set of characters used for major languages such as Amharic and Tigre. There are two blocks of extensions to the Ethiopic script: Ethiopic Supplement U+1380..U+139F and Ethiopic Extended U+2D80..U+2DDF. Those extensions cover such languages as Me’en, Blin, and Sebatbeit, which use many additional characters. Several other characters for Ethiopic script extensions can be found in the main Ethiopic script block in the range U+1200..U+137F. The Ethiopic Supplement block also contains a set of tonal marks. They are used in multiline scored layout. Like other musical (an)notational systems of this type, these tonal marks require a higher-level protocol to enable proper rendering.
13.2 Mongolian Mongolian: U+1800–U+18AF The Mongolians are key representatives of a cultural-linguistic group known as Altaic, after the Altai mountains of central Asia. In the past, these peoples have dominated the vast expanses of Asia and beyond, from the Baltic to the Sea of Japan. Echoes of Altaic languages remain from Finland, Hungary, and Turkey, across central Asia, to Korea and Japan. Today the Mongolians are represented politically in Mongolia proper (formally the Mongolian People’s Republic, also known as Outer Mongolia) and Inner Mongolia (formally the Inner Mongolia Autonomous Region, China), with Mongolian populations also living in other areas of China. The Mongolian block unifies Mongolian and the three derivative scripts Todo, Manchu, and Sibe. Each of the three derivative scripts shares some common letters with Mongolian,
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
13.2
Mongolian
449
and these letters are encoded only once. Each derivative script also has a number of modified letter forms or new letters, which are encoded separately. Mongolian, Todo, and Manchu also have a number special “Ali Gali” letters that are used for transcribing Tibetan and Sanskrit in Buddhist texts. History. The Mongolian script was derived from the Uighur script around the beginning of the thirteenth century, during the reign of Genghis Khan. The Uighur script, which was in use from about the eighth to the fifteenth centuries, was derived from Sogdian Aramaic, a Semitic script written horizontally from right to left. Probably under the influence of the Chinese script, the Uighur script became rotated 90 degrees counterclockwise so that the lines of text read vertically in columns running from left to right. The Mongolian script inherited this directionality from the Uighur script. The Mongolian script has remained in continuous use for writing Mongolian within the Inner Mongolia Autonomous Region of the People’s Republic of China and elsewhere in China. However, in the Mongolian People’s Republic (Outer Mongolia), the traditional script was replaced by a Cyrillic orthography in the early 1940s. The traditional script has been revived to an extent since the early 1990s, so that now both the Cyrillic and the Mongolian scripts are used. The spelling used with the traditional Mongolian script represents the literary language of the seventeenth and early eighteenth centuries, whereas the Cyrillic script is used to represent the modern, colloquial pronunciation of words. As a consequence, there is no one-to-one relationship between the traditional Mongolian orthography and Cyrillic orthography. Approximate correspondence mappings are indicated in the code charts, but are not necessarily unique in either direction. All of the Cyrillic characters needed to write Mongolian are included in the Cyrillic block of the Unicode Standard. In addition to the traditional Mongolian script of Mongolia, several historical modifications and adaptations of the Mongolian script have emerged elsewhere. These adaptations are often referred to as scripts in their own right, although for the purposes of character encoding in the Unicode Standard they are treated as styles of the Mongolian script and share encoding of their basic letters. The Todo script is a modified and improved version of the Mongolian script, devised in 1648 by Zaya Pandita for use by the Kalmyk Mongolians, who had migrated to Russia in the sixteenth century, and who now inhabit the Republic of Kalmykia in the Russian Federation. The name Todo means “clear” in Mongolian; it refers to the fact that the new script eliminates the ambiguities inherent in the original Mongolian script. The orthography of the Todo script also reflects the Oirat-Kalmyk dialects of Mongolian rather than literary Mongolian. In Kalmykia, the Todo script was replaced by a succession of Cyrillic and Latin orthographies from the mid-1920s and is no longer in active use. Until very recently the Todo script was still used by speakers of the Oirat and Kalmyk dialects within Xinjiang and Qinghai in China. The Manchu script is an adaptation of the Mongolian script used to write Manchu, a Tungusic language that is not closely related to Mongolian. The Mongolian script was first adapted for writing Manchu in 1599 under the orders of the Manchu leader Nurhachi, but few examples of this early form of the Manchu script survive. In 1632, the Manchu scholar
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
450
Additional Modern Scripts
Dahai reformed the script by adding circles and dots to certain letters in an effort to distinguish their different sounds and by devising new letters to represent the sounds of the Chinese language. When the Manchu people conquered China to rule as the Qing dynasty (1644–1911), Manchu become the language of state. The ensuing systematic program of translation from Chinese created a large and important corpus of books written in Manchu. Over time the Manchu people became completely sinified, and as a spoken language Manchu is now almost extinct. The Sibe (also spelled Sibo, Xibe, or Xibo) people are closely related to the Manchus, and their language is often classified as a dialect of Manchu. The Sibe people are widely dispersed across northwest and northeast China due to deliberate programs of ethnic dispersal during the Qing dynasty. The majority have become assimilated into the local population and no longer speak the Sibe language. However, there is a substantial Sibe population in the Sibe Autonomous County in the Ili River valley in Western Xinjiang, the descendants of border guards posted to Xinjiang in 1764, who still speak and write the Sibe language. The Sibe script is based on the Manchu script, with a few modified letters. Directionality. The Mongolian script is written vertically from top to bottom in columns running from left to right. In modern contexts, words or phrases may be embedded in horizontal scripts. In such a case, the Mongolian text will be rotated 90 degrees counterclockwise so that it reads from left to right. When rendering Mongolian text in a system that does not support vertical layout, the text should be laid out in horizontal lines running left to right, with the glyphs rotated 90 degrees counterclockwise with respect to their orientation in the code charts. If such text is viewed sideways, the usual Mongolian column order appears reversed, but this orientation can be workable for short stretches of text. There are no bidirectional effects in such a layout because all text is horizontal left to right. Encoding Principles. The encoding model for Mongolian is somewhat different from that for any other script within Unicode, and in many respects it is the most complicated. For this reason, only the essential features of Mongolian shaping behavior are presented here; the precise details are to be presented in a separate technical report. The Semitic alphabet from which the Mongolian script was ultimately derived is fundamentally inadequate for representing the sounds of the Mongolian language. As a result, many of the Mongolian letters are used to represent two different sounds, and the correct pronunciation of a letter may be known only from the context. In this respect, Mongolian orthography is similar to English spelling, in which the pronunciation of a letter such as c may be known only from the context. Unlike in the Latin script, in which c /k/ and c /s/ are treated as the same letter and encoded as a single character, in the Mongolian script different phonetic values of the same glyph may be encoded as distinct characters. Modern Mongolian grammars consider the phonetic value of a letter to be its distinguishing feature, rather than its glyph shape. For example, the four Mongolian vowels o, u, ö, and ü are considered four distinct letters and are encoded as four characters (U+1823, U+1824, U+1825, and U+1826, respectively), even though o is written identically to u in all positional forms, ö is written identically to ü in all
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
13.2
Mongolian
451
positional forms, o and u are normally distinguished from ö and ü only in the first syllable of a word. Likewise, the letters t (U+1832) and d (U+1833) are often indistinguishable. For example, pairs of Mongolian words such as urtu “long” and ordu “palace, camp, horde” or ende “here” and ada “devil” are written identically, but are represented using different sequences of Unicode characters, as shown in Figure 13-1. There are many such examples in Mongolian, but not in Todo, Manchu, or Sibe, which have largely eliminated ambiguous letters.
Figure 13-1. Mongolian Glyph Convergence
urtu
ordu
1824
1823
1837
1837
1832
1833
1824
1824
ende
ada
1821
1820
1828
1833
1833
1820
1821
Cursive Joining. The Mongolian script is cursive, and the letters constituting a word are normally joined together. In most cases the letters join together naturally along a vertical stem, but in the case of certain “bowed” consonants (for example, U+182A mongolian letter ba and the feminine form of U+182C mongolian letter qa), which lack a trailing vertical stem, they may form ligatures with a following vowel. This is illustrated in Figure 13-2, where the letter ba combines with the letter u to form a ligature in the Mongolian word abu “father.”
Figure 13-2. Mongolian Consonant Ligation
abu 1820 182A 1824
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
452
Additional Modern Scripts
Many letters also have distinct glyph forms depending on their position within a word. These positional forms are classified as initial, medial, final, or isolate. The medial form is often the same as the initial form, but the final form is always distinct from the initial or medial form. Figure 13-3 shows the Mongolian letters U+1823 o and U+1821 e, rendered with distinct positional forms initially and finally in the Mongolian words odo “now” and ene “this.”
Figure 13-3. Mongolian Positional Forms
odo
ene
1823
1821
1833
1828
1823
1821
U+200C zero width non-joiner (ZWNJ) and U+200D zero width joiner (ZWJ) may be used to select a particular positional form of a letter in isolation or to override the expected positional form within a word. Basically, they evoke the same contextual selection effects in neighboring letters as do non-joining or joining regular letters, but are themselves invisible (see Chapter 16, Special Areas and Format Characters). For example, the various positional forms of U+1820 mongolian letter a may be selected by means of the following character sequences: <1820> selects the isolate form. <1820 200D> selects the initial form. <200D 1820> selects the final form. <200D 1820 200D> selects the medial form. Some letters have additional variant forms that do not depend on their position within a word, but instead reflect differences between modern versus traditional orthographic practice or lexical considerations—for example, special forms used for writing foreign words. On occasion, other contextual rules may condition a variant form selection. For example, a certain variant of a letter may be required when it occurs in the first syllable of a word or when it occurs immediately after a particular letter. The various positional and variant glyph forms of a letter are considered presentation forms and are not encoded separately. It is the responsibility of the rendering system to select the correct glyph form for a letter according to its context. Free Variation Selectors. When a glyph form that cannot be predicted algorithmically is required (for example, when writing a foreign word), the user needs to append an appropriate variation selector to the letter to indicate to the rendering system which glyph form is required. The following free variation selectors are provided for use specifically with the Mongolian block:
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
13.2
Mongolian
453
U+180B mongolian free variation selector one (FVS1) U+180C mongolian free variation selector two (FVS2) U+180D mongolian free variation selector three (FVS3) These format characters normally have no visual appearance. When required, a free variation selector immediately follows the base character it modifies. This combination of base character and variation selector is known as a standardized variant. The table of standardized variants, StandardizedVariants.txt, in the Unicode Character Database exhaustively lists all currently defined standardized variants. All combinations not listed in the table are unspecified and are reserved for future standardization; no conformant process may interpret them as standardized variants. Therefore, any free variation selector not immediately preceded by one of their defined base characters will be ignored. Figure 13-4 gives an example of how a free variation selector may be used to select a particular glyph variant. In modern orthography, the initial letter ga in the Mongolian word gal “fire” is written with two dots; in traditional orthography, the letter ga is written without any dots. By default, the dotted form of the letter ga is selected, but this behavior may be overridden by means of FVS1, so that ga plus FVS1 selects the undotted form of the letter ga.
Figure 13-4. Mongolian Free Variation Selector
gal
gal
182D
182D
1820
180B
182F
1820 182F
It is important to appreciate that even though a particular standardized variant may be defined for a letter, the user needs to apply the appropriate free variation selector only if the correct glyph form cannot be predicted automatically by the rendering system. In most cases, in running text, there will be few occasions when a free variation selector is required to disambiguate the glyph form. Older documentation, external to the Unicode Standard, listed the action of the free variation selectors by using ZWJ to explicitly indicate the shaping environment affected by the variation selector. The relative order of the ZWJ and the free variation selector in these documents was different from the one required by Section 16.4, Variation Selectors. Older implementations of Mongolian free variation selectors may therefore interpret a sequence such as a base character followed by first by ZWJ and then by FVS1 as if it were a base character followed first by FVS1 and then by ZWJ.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
454
Additional Modern Scripts
Representative Glyphs. The representative glyph in the code charts is generally the isolate form for the vowels and the initial form for the consonants. Letters that share the same glyph forms are distinguished by using different positional forms for the representative glyph. For example, the representative glyph for U+1823 mongolian letter o is the isolate form, whereas the representative glyph for U+1824 mongolian letter u is the initial form. However, this distinction is only nominal, as the glyphs for the two characters are identical for the same positional form. Likewise, the representative glyphs for U+1863 mongolian letter sibe ka and U+1874 mongolian letter manchu ka both take the final form, as their initial forms are identical to the representative glyph for U+182C mongolian letter qa (the initial form). Vowel Harmony. Mongolian has a system of vowel harmony, whereby the vowels in a word are either all “masculine” and “neuter” vowels (that is, back vowels plus /i/) or all “feminine” and “neuter” vowels (that is, front vowels plus /i/). Words that are written with masculine/neuter vowels are considered to be masculine, and words that are written with feminine/neuter vowels are considered to be feminine. Words with only neuter vowels behave as feminine words (for example, take feminine suffixes). Manchu and Sibe have a similar system of vowel harmony, although it is not so strict. Some words in these two scripts may include both masculine and feminine vowels, and separated suffixes with masculine or feminine vowels may be applied to a stem irrespective of its gender. Vowel harmony is an important element of the encoding model, as the gender of a word determines the glyph form of the velar series of consonant letters for Mongolian, Todo, Sibe, and Manchu. In each script, the velar letters have both masculine and feminine forms. For Mongolian and Todo, the masculine and feminine forms of these letters have different pronunciations. When one of the velar consonants precedes a vowel, it takes the masculine form before masculine vowels, and the feminine form before feminine or neuter vowels. In the latter case, a ligature of the consonant and vowel is required. When one of these consonants precedes another consonant or is the final letter in a word, it may take either a masculine or feminine glyph form, depending on its context. The rendering system should automatically select the correct gender form for these letters based on the gender of the word (in Mongolian and Todo) or the gender of the preceding vowel (in Manchu and Sibe). This is illustrated by Figure 13-5, where U+182D mongolian letter ga takes a masculine glyph form when it occurs finally in the masculine word jarlig “order,” but takes a feminine glyph form when it occurs finally in the feminine word chirig “soldier.” In this example, the gender form of the final letter ga depends on whether the first vowel in the word is a back (masculine) vowel or a front (feminine or neuter) vowel. Where the gender is ambiguous or a form not derivable from the context is required, the user needs to specify which form is required by means of the appropriate free variation selector. Narrow No-Break Space. In Mongolian, Todo, Manchu, and Sibe, certain grammatical suffixes are separated from the stem of a word or from other suffixes by a narrow gap. There are many such suffixes in Mongolian, usually occurring in masculine and feminine pairs (for example, the dative suffixes -dur and -dür), and a stem may take multiple suffixes. In
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
13.2
Mongolian
455
Figure 13-5. Mongolian Gender Forms
jarlig
chirig 1834
1835 1820 1837 182F 1822 182D
1822 1837 1822 182D
contrast, there are only six separated suffixes for Manchu and Sibe, and stems do not take more than one suffix at a time. As any suffixes are considered to be an integral part of the word as a whole, a line break opportunity does not occur before a suffix, and the whitespace is represented using U+202F narrow no-break space (NNBSP). For a Mongolian font it is recommended that the width of NNBSP should be one-third the width of an ordinary space (U+0020 space). NNBSP affects the form of the preceding and following letters. The final letter of the stem or suffix preceding the NNBSP takes the final positional form, whereas the first letter of the suffix following NNBSP may take the normal initial form, a variant initial form, a medial form, or a final form, depending on the particular suffix. Mongolian Vowel Separator. In Mongolian, the letters a (U+1820) and e (U+1821) in a word-final position may take a “forward tail” form or a “backward tail” form depending on the preceding consonant that they are attached to. In some words, a final letter a or e is separated from the preceding consonant by a narrow gap, in which case the vowel always takes the “forward tail” form. U+180E mongolian vowel separator (MVS) is used to represent the whitespace that separates a final letter a or e from the rest of the word. MVS is very similar in function to NNBSP, as it divides a word with a narrow non-breaking whitespace. Whereas NNBSP marks off a grammatical suffix, however, the a or e following MVS is not a suffix but an integral part of the word stem. Whether a final letter a or e is joined or separated is purely lexical and is not a question of varying orthography. For example, the word qana <182C, 1820, 1828, 1820> without a gap before the final letter a means “the outer casing of a vein,” whereas the word qana <182C, 1820, 1828, 180E, 1820> with a gap before the final letter a means “the wall of a tent,” as shown in Figure 13-6.
Figure 13-6. Mongolian Vowel Separator
Qana with Connected Final
The Unicode Standard 5.0 – Electronic edition
Qana with Separated Final
Copyright © 1991–2007 Unicode, Inc.
Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html
Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html
Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html
This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006
Chapter 14
Archaic Scripts
14
The following historic scripts are encoded in Version 5.0 of the Unicode Standard: Ogham
Linear B
Ugaritic
Old Italic
Cypriot
Old Persian
Runic
Phoenician
Sumero-Akkadian
Gothic Unicode encodes a number of historic scripts. Although they are no longer used to write living languages, documents and inscriptions using these scripts exist, both for extinct and precursors of modern languages. The primary user communities for these scripts are scholars interested in studying the scripts and the languages written in them. Some of the historical scripts are related to each other and to modern alphabets. The Ogham script is indigenous to Ireland. While its originators may have been aware of the Latin or Greek scripts, it seems clear that the sound values of Ogham letters were suited to the phonology of a form of Primitive Irish. Old Italic was derived from Greek and was used to write Etruscan and other languages in Italy. It was borrowed by the Romans and is the immediate ancestor of the Latin script now used worldwide. Old Italic had other descendants, too: The Alpine alphabets seem to have been influential in devising the Runic script, which has a distinct angular appearance owing to its use in carving inscriptions in stone and wood. Gothic, like Cyrillic, was developed on the basis of Greek at a much later date than Old Italic. The two historic scripts of northwestern Europe, Runic and Ogham, have a distinct appearance owing to their primary use in carving inscriptions in stone and wood. They are conventionally rendered from left to right in scholarly literature, but on the original stone carvings often proceeded in an arch tracing the outline of the stone. Both Linear B and Cypriot are syllabaries that were used to write Greek. Linear B is the older of the two scripts, and there are some similarities between a few of the characters that may not be accidental. Cypriot may descend from Cypro-Minoan, which in turn may descend from Linear B. The Phoenician alphabet was used in various forms around the Mediterranean. It is ancestral to Latin, Greek, Hebrew, and many other scripts both modern and historical.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
472
Archaic Scripts
Three ancient cuneiform scripts are described in this chapter: Ugaritic, Old Persian, and Sumero-Akkadian. The largest and oldest of these is Sumero-Akkadian. The other two scripts are not derived directly from the Sumero-Akkadian tradition but had common writing technology, consisting of wedges indented into clay tablets with reed styluses. Ugaritic texts are about as old as the earliest extant Biblical texts. Old Persian texts are newer, dating from the fifth century bce.
14.1 Ogham Ogham: U+1680–U+169F Ogham is an alphabetic script devised to write a very early form of Irish. Monumental Ogham inscriptions are found in Ireland, Wales, Scotland, England, and on the Isle of Man. Many of the Scottish inscriptions are undeciphered and may be in Pictish. It is probable that Ogham (Old Irish “Ogam”) was widely written in wood in early times. The main flowering of “classical” Ogham, rendered in monumental stone, was in the fifth and sixth centuries ce. Such inscriptions were mainly employed as territorial markers and memorials; the more ancient examples are standing stones. The script was originally written along the edges of stone where two faces meet; when written on paper, the central “stemlines” of the script can be said to represent the edge of the stone. Inscriptions written on stemlines cut into the face of the stone, instead of along its edge, are known as “scholastic” and are of a later date (post-seventh century). Notes were also commonly written in Ogham in manuscripts as recently as the sixteenth century. Structure. The Ogham alphabet consists of 26 distinct characters (feda), the first 20 of which are considered to be primary and the last 6 (forfeda) supplementary. The four primary series are called aicmí (plural of aicme, meaning “family”). Each aicme was named after its first character, (Aicme Beithe, Aicme Uatha, meaning “the B Family,” “the H Family,” and so forth). The character names used in this standard reflect the spelling of the names in modern Irish Gaelic, except that the acute accent is stripped from Úr, Éabhadh, Ór, and Ifín, and the mutation of nGéadal is not reflected. Rendering. Ogham text is read beginning from the bottom left side of a stone, continuing upward, across the top, and down the right side (in the case of long inscriptions). Monumental Ogham was incised chiefly in a bottom-to-top direction, though there are examples of left-to-right bilingual inscriptions in Irish and Latin. Manuscript Ogham accommodated the horizontal left-to-right direction of the Latin script, and the vowels were written as vertical strokes as opposed to the incised notches of the inscriptions. Ogham should therefore be rendered on computers from left to right or from bottom to top (never starting from top to bottom). Forfeda (Supplementary Characters). In printed and in manuscript Ogham, the fonts are conventionally designed with a central stemline, but this convention is not necessary. In implementations without the stemline, the character U+1680 ogham space mark should
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
14.2
Old Italic
473
be given its conventional width and simply left blank like U+0020 space. U+169B ogham feather mark and U+169C ogham reversed feather mark are used at the beginning and the end of Ogham text, particularly in manuscript Ogham. In some cases, only the Ogham feather mark is used, which can indicate the direction of the text. The word latheirt MNOPQRSTPU shows the use of the feather marks. This word was written in the margin of a ninth-century Latin grammar and means “massive hangover,” which may be the scribe’s apology for any errors in his text.
14.2 Old Italic Old Italic: U+10300–U+1032F The Old Italic script unifies a number of related historical alphabets located on the Italian peninsula. Some of these were used for non-Indo-European languages (Etruscan and probably North Picene), and some for various Indo-European languages belonging to the Italic branch (Faliscan and members of the Sabellian group, including Oscan, Umbrian, and South Picene). The ultimate source for the alphabets in ancient Italy is Euboean Greek used at Ischia and Cumae in the bay of Naples in the eighth century bce. Unfortunately, no Greek abecedaries from southern Italy have survived. Faliscan, Oscan, Umbrian, North Picene, and South Picene all derive from an Etruscan form of the alphabet. There are some 10,000 inscriptions in Etruscan. By the time of the earliest Etruscan inscriptions, circa 700 bce, local distinctions are already found in the use of the alphabet. Three major stylistic divisions are identified: the Northern, Southern, and Caere/Veii. Use of Etruscan can be divided into two stages, owing largely to the phonological changes that occurred: the “archaic Etruscan alphabet,” used from the seventh to the fifth centuries bce, and the “neo-Etruscan alphabet,” used from the fourth to the first centuries bce. Glyphs for eight of the letters differ between the two periods; additionally, neo-Etruscan abandoned the letters ka, ku, and eks. The unification of these alphabets into a single Old Italic script requires language-specific fonts because the glyphs most commonly used may differ somewhat depending on the language being represented. Most of the languages have added characters to the common repertoire: Etruscan and Faliscan add letter ef; Oscan adds letter ef, letter ii, and letter uu; Umbrian adds letter ef, letter ers, and letter che; North Picene adds letter uu; and Adriatic adds letter ii and letter uu. The Latin script itself derives from a south Etruscan model, probably from Caere or Veii, around the mid-seventh century bce or a bit earlier. However, because there are significant differences between Latin and Faliscan of the seventh and sixth centuries bce in terms of formal differences (glyph shapes, directionality) and differences in the repertoire of letters used, this warrants a distinctive character block. Fonts for early Latin should use the uppercase code positions U+0041..U+005A. The unified Alpine script, which includes the
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
474
Archaic Scripts
Venetic, Rhaetic, Lepontic, and Gallic alphabets, has not yet been proposed for addition to the Unicode Standard but is considered to differ enough from both Old Italic and Latin to warrant independent encoding. The Alpine script is thought to be the source for Runic, which is encoded at U+16A0..U+16FF. (See Section 14.3, Runic.) Character names assigned to the Old Italic block are unattested but have been reconstructed according to the analysis made by Sampson (1985). While the Greek character names (alpha, beta, gamma, and so on) were borrowed directly from the Phoenician names (modified to Greek phonology), the Etruscans are thought to have abandoned the Greek names in favor of a phonetically based nomenclature, where stops were pronounced with a following -e sound, and liquids and sibilants (which can be pronounced more or less on their own) were pronounced with a leading e- sound (so [k], [d] became [ke:], [de:] became [l:], [m:] became [el], [em]). It is these names, according to Sampson, which were borrowed by the Romans when they took their script from the Etruscans. Directionality. Most early Etruscan texts have right-to-left directionality. From the third century bce, left-to-right texts appear, showing the influence of Latin. Oscan, Umbrian, and Faliscan also generally have right-to-left directionality. Boustrophedon appears rarely, and not especially early (for instance, the Forum inscription dates to 550–500 bce). Despite this, for reasons of implementation simplicity, many scholars prefer left-to-right presentation of texts, as this is also their practice when transcribing the texts into Latin script. Accordingly, the Old Italic script has a default directionality of strong left-to-right in this standard. If the default directionality of the script is overridden to produce a right-to-left presentation, the glyphs in Old Italic fonts should also be mirrored from the representative glyphs shown in the code charts. This kind of behavior is not uncommon in archaic scripts; for example, archaic Greek letters may be mirrored when written from right to left in boustrophedon. Punctuation. The earliest inscriptions are written with no space between words in what is called scriptio continua. There are numerous Etruscan inscriptions with dots separating word forms, attested as early as the second quarter of the seventh century bce. This punctuation is sometimes, but only rarely, used to separate syllables rather than words. From the sixth century bce, words were often separated by one, two, or three dots spaced vertically above each other. Numerals. Etruscan numerals are not well attested in the available materials, but are employed in the same fashion as Roman numerals. Several additional numerals are attested, but as their use is at present uncertain, they are not yet encoded in the Unicode Standard. Glyphs. The default glyphs in the code charts are based on the most common shapes found for each letter. Most of these are similar to the Marsiliana abecedary (mid-seventh century bce). Note that the phonetic values for U+10317 old italic letter eks [ks] and U+10319 old italic letter khe [kh] show the influence of western, Euboean Greek; eastern Greek has U+03A7 greek capital letter chi [x] and U+03A8 greek capital letter psi [ps] instead.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
14.3
Runic
475
The geographic distribution of the Old Italic script is shown in Figure 14-1. In the figure, the approximate distribution of the ancient languages that used Old Italic alphabets is shown in white. Areas for the ancient languages that used other scripts are shown in gray, and the labels for those languages are shown in oblique type. In particular, note that the ancient Greek colonies of the southern Italian and Sicilian coasts used the Greek script proper. Also, languages such as Ligurian, Venetic, and so on, of the far north of Italy made use of alphabets of the Alpine script. Rome, of course, is shown in gray, because Latin was written with the Latin alphabet, now encoded in the Latin script.
Figure 14-1. Distribution of Old Italic Rhaetic Venetic
Lepontic Gallic Etruscan N. Picene Umbrian S. Picene
Central Sabellian languages Oscan
Ligurian Etruscan Faliscan Latin (Rome)
Messapic
Volscian
Elimian Sicanian
Greek Siculan
14.3 Runic Runic: U+16A0–U+16F0 The Runic script was historically used to write the languages of the early and medieval societies in the German, Scandinavian, and Anglo-Saxon areas. Use of the Runic script in various forms covers a period from the first century to the nineteenth century. Some 6,000 Runic inscriptions are known. They form an indispensable source of information about the development of the Germanic languages. Historical Script. The Runic script is an historical script, whose most important use today is in scholarly and popular works about the old Runic inscriptions and their interpretation. The Runic script illustrates many technical problems that are typical for this kind of script. Unlike many other scripts in the Unicode Standard, which predominantly serve the needs of the modern user community—with occasional extensions for historic forms—the encoding of the Runic script attempts to suit the needs of texts from different periods of time and from distinct societies that had little contact with one another.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
476
Archaic Scripts
Direction. Like other early writing systems, runes could be written either from left to right or from right to left, or moving first in one direction and then the other (boustrophedon), or following the outlines of the inscribed object. At times, characters appear in mirror image, or upside down, or both. In modern scholarly literature, Runic is written from left to right. Therefore, the letters of the Runic script have a default directionality of strong leftto-right in this standard. The Runic Alphabet. Present-day knowledge about runes is incomplete. The set of graphemically distinct units shows greater variation in its graphical shapes than most modern scripts. The Runic alphabet changed several times during its history, both in the number and the shapes of the letters contained in it. The shapes of most runes can be related to some Latin capital letter, but not necessarily to a letter representing the same sound. The most conspicuous difference between the Latin and the Runic alphabets is the order of the letters. The Runic alphabet is known as the futhark from the name of its first six letters. The original old futhark contained 24 runes: †¢ ¶ ® ± ≤ ∑ π
∫æ ¡ √ « » …
œ “ ÷ ◊ ⁄‹ fi fl
They are usually transliterated in this way: f u ˛a r k g w
h n i j Ôp z s
t b e ml} d o
In England and Friesland, seven more runes were added from the fifth to the ninth century. In the Scandinavian countries, the futhark changed in a different way; in the eighth century, the simplified younger futhark appeared. It consists of only 16 runes, some of which are used in two different forms. The long-branch form is shown here: † ¢ ¶ ¨ ± ¥
º æ ¡ ≈ À
œ “ ÿ ⁄Ê
f u ˛ o r k
h n i a s
t b ml Ä
The use of runes continued in Scandinavia during the Middle Ages. During that time, the futhark was influenced by the Latin alphabet and new runes were invented so that there was full correspondence with the Latin letters. Representative Glyphs. The known inscriptions can include considerable variations of shape for a given rune, sometimes to the point where the nonspecialist will mistake the shape for a different rune. There is no dominant main form for some runes, particularly for many runes added in the Anglo-Friesian and medieval Nordic systems. When transcribing a Runic inscription into its Unicode-encoded form, one cannot rely on the idealized representative glyph shape in the character charts alone. One must take into account to which of the four Runic systems an inscription belongs and be knowledgeable about the permitted form variations within each system. The representative glyphs were chosen to provide an image that distinguishes each rune visually from all other runes in the same system. For actual use, it might be advisable to use a separate font for each Runic system. Of particular note is the fact that the glyph for U+16C4 ƒ runic letter ger is actually a rare form, as the more common form is already used for U+16E1 · runic letter ior.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
14.4
Gothic
477
Unifications. When a rune in an earlier writing system evolved into several different runes in a later system, the unification of the earlier rune with one of the later runes was based on similarity in graphic form rather than similarity in sound value. In cases where a substantial change in the typical graphical form has occurred, though the historical continuity is undisputed, unification has not been attempted. When runes from different writing systems have the same graphic form but different origins and denote different sounds, they have been coded as separate characters. Long-Branch and Short-Twig. Two sharply different graphic forms, the long-branch and the short-twig form, were used for 9 of the 16 Viking Age Nordic runes. Although only one form is used in a given inscription, there are runologically important exceptions. In some cases, the two forms were used to convey different meanings in later use in the medieval system. Therefore the two forms have been separated in the Unicode Standard. Staveless Runes. Staveless runes are a third form of the Viking Age Nordic runes, a kind of Runic shorthand. The number of known inscriptions is small and the graphic forms of many of the runes show great variability between inscriptions. For this reason, staveless runes have been unified with the corresponding Viking Age Nordic runes. The corresponding Viking Age Nordic runes must be used to encode these characters—specifically the short-twig characters, where both short-twig and long-branch characters exist. Punctuation Marks. The wide variety of Runic punctuation marks has been reduced to three distinct characters based on simple aspects of their graphical form, as very little is known about any difference in intended meaning between marks that look different. Any other punctuation marks have been unified with shared punctuation marks elsewhere in the Unicode Standard. Golden Numbers. Runes were used as symbols for Sunday letters and golden numbers on calendar staves used in Scandinavia during the Middle Ages. To complete the number series 1–19, three more calendar runes were added. They are included after the punctuation marks. Encoding. A total of 81 characters of the Runic script are included in the Unicode Standard. Of these, 75 are Runic letters, 3 are punctuation marks, and 3 are Runic symbols. The order of the Runic characters follows the traditional futhark order, with variants and derived runes being inserted directly after the corresponding ancestor. Runic character names are based as much as possible on the sometimes several traditional names for each rune, often with the Latin transliteration at the end of the name.
14.4 Gothic Gothic: U+10330–U+1034F The Gothic script was devised in the fourth century by the Gothic bishop, Wulfila (311–383 ce), to provide his people with a written language and a means of reading his translation of
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html
Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html
Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html
This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006
Chapter 15
Symbols
15
The universe of symbols is rich and open-ended. The collection of encoded symbols in the Unicode Standard encompasses the following: Currency symbols
Geometrical symbols
Letterlike symbols
Miscellaneous symbols and dingbats
Mathematical alphabets
Enclosed and square symbols
Number forms
Braille patterns
Mathematical symbols
Western and Byzantine musical symbols
Invisible mathematical operators
Ancient Greek musical notation
Technical symbols There are other notational systems not covered by the Unicode Standard. Some symbols mark the transition between pictorial items and text elements; because they do not have a well-defined place in plain text, they are not encoded here. Combining marks may be used with symbols, particularly the set encoded at U+20D0.. U+20FF (see Section 7.9, Combining Marks). Letterlike and currency symbols, as well as number forms including superscripts and subscripts, are typically subject to the same font and style changes as the surrounding text. Where square and enclosed symbols occur in East Asian contexts, they generally follow the prevailing type styles. Other symbols have an appearance that is independent of type style, or a more limited or altogether different range of type style variation than the regular text surrounding them. For example, mathematical alphanumeric symbols are typically used for mathematical variables; those letterlike symbols that are part of this set carry semantic information in their type style. This fact restricts—but does not completely eliminate—possible style variations. However, symbols such as mathematical operators can be used with any script or independent of any script. Special invisible operator characters can be used to explicitly encode some mathematical operations, such as multiplication, which are normally implied by juxtaposition. This aids in automatic interpretation of mathematical notation.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
490
Symbols
In a bidirectional context (see Unicode Standard Annex #9, “The Bidirectional Algorithm”), symbol characters have no inherent directionality but resolve according to the Unicode Bidirectional Algorithm. Where the image of a symbol is not bilaterally symmetric, the mirror image is used when the character is part of the right-to-left text stream (see Section 4.7, Bidi Mirrored—Normative). Dingbats and optical character recognition characters are different from all other characters in the standard, in that they are encoded based on their precise appearance. Braille patterns are a special case, because they can be used to write text. They are included as symbols, as the Unicode Standard encodes only their shapes; the association of letters to patterns is left to other standards. When a character stream is intended primarily to convey text information, it should be coded using one of the scripts. Only when it is intended to convey a particular binding of text to Braille pattern sequence should it be coded using the Braille patterns. Musical notation—particularly Western musical notation—is different from ordinary text in the way it is laid out, especially the representation of pitch and duration in Western musical notation. However, ordinary text commonly refers to the basic graphical elements that are used in musical notation, and it is primarily those symbols that are encoded in the Unicode Standard. Additional sets of symbols are encoded to support historical systems of musical notation. Many symbols encoded in the Unicode Standard are intended to support legacy implementations and obsolescent practices, such as terminal emulation or other character mode user interfaces. Examples include box drawing components and control pictures. Many of the symbols encoded in Unicode can be used as operators or given some other syntactical function in a formal language syntax. For more information, see Unicode Standard Annex #31, “Identifier and Pattern Syntax.”
15.1 Currency Symbols Currency symbols are intended to encode the customary symbolic signs used to indicate certain currencies in general text. These signs vary in shape and are often used for more than one currency. Not all currencies are represented by a special currency symbol; some use multiple-letter strings instead, such as “Sfr” for Swiss franc. Moreover, the abbreviations for currencies can vary by language. The Common Locale Data Registry (CLDR) provides further information; see Section B.6, Other Unicode Online Resources. Therefore, implementations that are concerned with the exact identity of a currency should not depend on an encoded currency sign character. Instead, they should follow standards such as the ISO 4217 three-letter currency codes, which are specific to currencies—for example, USD for U.S. dollar, CAD for Canadian dollar. Unification. The Unicode Standard does not duplicate encodings where more than one currency is expressed with the same symbol. Many currency symbols are overstruck letters.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
15.1
Currency Symbols
491
There are therefore many minor variants, such as the U+0024 dollar sign $, with one or two vertical bars, or other graphical variation, as shown in Figure 15-1.
Figure 15-1. Alternative Glyphs for Dollar Sign
$$ Claims that glyph variants of a certain currency symbol are used consistently to indicate a particular currency could not be substantiated upon further research. Therefore, the Unicode Standard considers these variants to be typographical and provides a single encoding for them. See ISO/IEC 10367, Annex B (informative), for an example of multiple renderings for U+00A3 pound sign. Fonts. Currency symbols are commonly designed to display at the same width as a digit (most often a European digit, U+0030..U+0039) to assist in alignment of monetary values in tabular displays. Like letters, they tend to follow the stylistic design features of particular fonts because they are used often and need to harmonize with body text. In particular, even though there may be more or less normative designs for the currency sign per se, as for the euro sign, type designers freely adapt such designs to make them fit the logic of the rest of their fonts. This partly explains why currency signs show more glyph variation than other types of symbols.
Currency Symbols: U+20A0–U+20CF This block contains currency symbols that are not encoded in other blocks. Common currency symbols encoded in other blocks are listed in Table 15-1.
Table 15-1. Currency Symbols Encoded in Other Blocks Currency
Unicode Code Point
Dollar, milreis, escudo, peso Cent Pound and lira General currency Yen or yuan Dutch florin Afghani Rupee Rupee Rupee Rupee Baht Riel German mark (historic) Yuan, yen, won, HKD Yen Yuan
U+0024 U+00A2 U+00A3 U+00A4 U+00A5 U+0192 U+060B U+09F2 U+09F3 U+0AF1 U+0BF9 U+0E3F U+17DB U+2133 U+5143 U+5186 U+5706
The Unicode Standard 5.0 – Electronic edition
dollar sign cent sign pound sign currency sign yen sign latin small letter f with hook afghani sign bengali rupee mark bengali rupee sign gujarati rupee sign tamil rupee sign thai currency symbol baht khmer currency symbol riel script capital m cjk unified ideograph-5143 cjk unified ideograph-5186 cjk unified ideograph-5706
Copyright © 1991–2007 Unicode, Inc.
492
Symbols
Table 15-1. Currency Symbols Encoded in Other Blocks (Continued) Yuan, yen, won, HKD, NTD U+5713 cjk unified ideograph-5713 Rial U+FDFC rial sign
Lira Sign. A separate currency sign U+20A4 lira sign is encoded for compatibility with the HP Roman-8 character set, which is still widely implemented in printers. In general, U+00A3 pound sign should be used for both the various currencies known as pound (or punt) and the various currencies known as lira—for example, the former currency of Italy and the lira still in use in Turkey. Widespread implementation practice in Italian and Turkish systems has long made use of U+00A3 as the currency sign for the lira. As in the case of the dollar sign, the glyphic distinction between single- and double-bar versions of the sign is not indicative of a systematic difference in the currency. Yen and Yuan. Like the dollar sign and the pound sign, U+00A5 yen sign has been used as the currency sign for more than one currency. While there may be some preferences to use a double-bar glyph for the yen currency of Japan (JPY) and a single-bar glyph for the yuan (renminbi) currency of China (CNY), this distinction is not systematic in all font designs, and there is considerable overlap in usage. As listed in Table 15-1, there are also a number of CJK ideographs to represent the words yen (or en) and yuan, as well as the Korean word won, and these also tend to overlap in use as currency symbols. In the Unicode Standard, U+00A5 yen sign is intended to be the character for the currency sign for both the yen and the yuan, with details of glyphic presentation left to font choice and local preferences. Euro Sign. The single currency for member countries of the European Economic and Monetary Union is the euro (EUR). The euro character is encoded in the Unicode Standard as U+20AC euro sign. For additional forms of currency symbols, see Fullwidth Forms (U+FFE0..U+FFE6).
15.2 Letterlike Symbols Letterlike Symbols: U+2100–U+214F Letterlike symbols are symbols derived in some way from ordinary letters of an alphabetic script. This block includes symbols based on Latin, Greek, and Hebrew letters. Stylistic variations of single letters are used for semantics in mathematical notation. See “Mathematical Alphanumeric Symbols” in this section for the use of letterlike symbols in mathematical formulas. Some letterforms have given rise to specialized symbols, such as U+211E prescription take. Numero Sign. U+2116 numero sign is provided both for Cyrillic use, where it looks like M, and for compatibility with Asian standards, where it looks like ñ.. Figure 15-2 illustrates a number of alternative glyphs for this sign. Instead of using a special symbol, French practice is to use an “N” or an “n”, according to context, followed by a superscript small letter “o” (No or no; plural Nos or nos). Legacy data encoded in ISO/IEC 8859-1 (Latin-1) or
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
15.2
Letterlike Symbols
493
other 8-bit character sets may also have represented the numero sign by a sequence of “N” followed by the degree sign (U+00B0 degree sign). Implementations interworking with legacy data should be aware of such alternative representations for the numero sign when converting data.
Figure 15-2. Alternative Glyphs for Numero Sign
Unit Symbols. Several letterlike symbols are used to indicate units. In most cases, however, such as for SI units (Système International), the use of regular letters or other symbols is preferred. U+2113 script small l is commonly used as a non-SI symbol for the liter. Official SI usage prefers the regular lowercase letter l. Three letterlike symbols have been given canonical equivalence to regular letters: U+2126 ohm sign, U+212A kelvin sign, and U+212B angstrom sign. In all three instances, the regular letter should be used. If text is normalized according to Unicode Standard Annex #15, “Unicode Normalization Forms,” these three characters will be replaced by their regular equivalents. In normal use, it is better to represent degrees Celsius “°C” with a sequence of U+00B0 degree sign + U+0043 latin capital letter c, rather than U+2103 degree celsius. For searching, treat these two sequences as identical. Similarly, the sequence U+00B0 degree sign + U+0046 latin capital letter f is preferred over U+2109 degree fahrenheit, and those two sequences should be treated as identical for searching. Compatibility. Some symbols are composites of several letters. Many of these composite symbols are encoded for compatibility with Asian and other legacy encodings. (See also “CJK Compatibility Ideographs” in Section 12.1, Han.) The use of these composite symbols is discouraged where their presence is not required by compatibility. For example, in normal use, the symbols U+2121 TEL telephone sign and U+213B FAX facsimile sign are simply spelled out. In the context of East Asian typography, many letterlike symbols, and in particular composites, form part of a collection of compatibility symbols, the larger part of which is located in the CJK Compatibility block (see Section 15.9, Enclosed and Square). When used in this way, these symbols are rendered as “wide” characters occupying a full cell. They remain upright in vertical layout, contrary to the rotated rendering of their regular letter equivalents. See Unicode Standard Annex #11, “East Asian Width,” for more information. Where the letterlike symbols have alphabetic equivalents, they collate in alphabetic sequence; otherwise, they should be treated as neutral symbols. The letterlike symbols may have different directional properties than normal letters. For example, the four transfinite cardinal symbols (U+2135..U+2138) are used in ordinary mathematical text and do not share the strong right-to-left directionality of the Hebrew letters from which they are derived.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
494
Symbols
Styles. The letterlike symbols include some of the few instances in which the Unicode Standard encodes stylistic variants of letters as distinct characters. For example, there are instances of blackletter (Fraktur), double-struck, italic, and script styles for certain Latin letters used as mathematical symbols. The choice of these stylistic variants for encoding reflects their common use as distinct symbols. They form part of the larger set of mathematical alphanumeric symbols. For the complete set and more information on its use, see “Mathematical Alphanumeric Symbols” in this section. These symbols should not be used in ordinary, nonscientific texts. Despite its name, U+2118 script capital p is neither script nor capital—it is uniquely the Weierstrass elliptic function symbol derived from a calligraphic lowercase p. U+2113 script small l is derived from a special italic form of the lowercase letter l and, when it occurs in mathematical notation, is known as the symbol ell. Use U+1D4C1 mathematical script small l as the lowercase script l for mathematical notation. Standards. The Unicode Standard encodes letterlike symbols from many different national standards and corporate collections.
Mathematical Alphanumeric Symbols: U+1D400–U+1D7FF The Mathematical Alphanumeric Symbols block contains a large extension of letterlike symbols used in mathematical notation, typically for variables. The characters in this block are intended for use only in mathematical or technical notation; they are not intended for use in nontechnical text. When used with markup languages—for example, with Mathematical Markup Language (MathML)—the characters are expected to be used directly, instead of indirectly via entity references or by composing them from base letters and style markup. Words Used as Variables. In some specialties, whole words are used as variables, not just single letters. For these cases, style markup is preferred because in ordinary mathematical notation the juxtaposition of variables generally implies multiplication, not word formation as in ordinary text. Markup not only provides the necessary scoping in these cases, but also allows the use of a more extended alphabet.
Mathematical Alphabets Basic Set of Alphanumeric Characters. Mathematical notation uses a basic set of mathematical alphanumeric characters, which consists of the following: • The set of basic Latin digits (0–9) (U+0030..U+0039) • The set of basic uppercase and lowercase Latin letters (a– z, A–Z) • The uppercase Greek letters ë–© (U+0391..U+03A9), plus the nabla á (U+2207) and the variant of theta p given by U+03F4 • The lowercase Greek letters α–… (U+03B1..U+03C9), plus the partial differential sign Ç (U+2202), and the six glyph variants q, r, s, t, u, and v, given by U+03F5, U+03D1, U+03F0, U+03D5, U+03F1, and U+03D6, respectively
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
15.2
Letterlike Symbols
495
Only unaccented forms of the letters are used for mathematical notation, because general accents such as the acute accent would interfere with common mathematical diacritics. Examples of common mathematical diacritics that can interfere with general accents are the circumflex, macron, or the single or double dot above, the latter two of which are used in physics to denote derivatives with respect to the time variable. Mathematical symbols with diacritics are always represented by combining character sequences. For some characters in the basic set of Greek characters, two variants of the same character are included. This is because they can appear in the same mathematical document with different meanings, even though they would have the same meaning in Greek text. (See “Variant Letterforms” in Section 7.2, Greek.) Additional Characters. In addition to this basic set, mathematical notation uses the uppercase and lowercase digamma, in regular (U+03DC and U+03DD) and bold (U+1D7CA and U+1D7CB), and the four Hebrew-derived characters (U+2135..U+2138). Occasional uses of other alphabetic and numeric characters are known. Examples include U+0428 cyrillic capital letter sha, U+306E hiragana letter no, and Eastern Arabic-Indic digits (U+06F0..U+06F9). However, these characters are used only in their basic forms, rather than in multiple mathematical styles. Dotless Characters. In the Unicode Standard, the characters “i” and “j”, including their variations in the mathematical alphabets, have the Soft_Dotted property. Any conformant renderer will remove the dot when the character is followed by a nonspacing combining mark above. Therefore, using an individual mathematical italic i or j with math accents would result in the intended display. However, in mathematical equations an entire subexpression can be placed underneath a math accent—for example, when a “wide hat” is placed on top of i+j, as shown in Figure 15-3.
Figure 15-3. Wide Mathematical Accents
ˆ i+j = iˆ + jˆ In such a situation, a renderer can no longer rely simply on the presence of an adjacent combining character to substitute for the un-dotted glyph, and whether the dots should be removed in such a situation is no longer predictable. Authors differ in whether they expect the dotted or dotless forms in that case. In some documents mathematical italic dotless i or j is used explicitly without any combining marks, or even in contrast to the dotted versions. Therefore, the Unicode Standard provides the explicitly dotless characters U+1D6A4 mathematical italic small dotless i and U+1D6A5 mathematical italic small dotless j. These two characters map to the ISOAMSO entities imath and jmath or the TEX macros \imath and \jmath. These entities are, by default, always italic. The appearance of these two characters in the code charts is similar to the shapes of the entities documented in the ISO 9573-13 entity sets and used by TEX. The mathematical dotless characters do not have case mappings.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
496
Symbols
Semantic Distinctions. Mathematical notation requires a number of Latin and Greek alphabets that initially appear to be mere font variations of one another. The letter H can appear as plain or upright (H), bold (H), italic (H), as well as script, Fraktur, and other styles. However, in any given document, these characters have distinct, and usually unrelated, mathematical semantics. For example, a normal H represents a different variable from a bold H, and so on. If these attributes are dropped in plain text, the distinctions are lost and the meaning of the text is altered. Without the distinctions, the well-known Hamiltonian formula turns into the integral equation in the variable H as shown in Figure 15-4.
Figure 15-4. Style Variants and Semantic Distinctions in Mathematics
Hamiltonian formula: Integral equation:
, = ∫dτ (q E 2 + µ H 2 ) H = ∫dτ(εE 2 + µH 2 )
Mathematicians will object that a properly formatted integral equation requires all the letters in this example (except for the “d”) to be in italics. However, because the distinction between s and H has been lost, they would recognize it as a fallback representation of an integral equation, and not as a fallback representation of the Hamiltonian. By encoding a separate set of alphabets, it is possible to preserve such distinctions in plain text. Mathematical Alphabets. The alphanumeric symbols encountered in mathematics and encoded in the Unicode Standard are given in Table 15-2.
Table 15-2. Mathematical Alphanumeric Symbols Math Style
Characters from Basic Set Location
plain (upright, serifed) bold italic bold italic script (calligraphic) bold script (calligraphic) Fraktur bold Fraktur double-struck sans-serif sans-serif bold sans-serif italic sans-serif bold italic monospace
Latin, Greek, and digits Latin, Greek, and digits Latin and Greek Latin and Greek Latin Latin Latin Latin Latin and digits Latin and digits Latin, Greek, and digits Latin Latin and Greek Latin and digits
Copyright © 1991-2007, Unicode, Inc.
BMP Plane 1 Plane 1 Plane 1 Plane 1 Plane 1 Plane 1 Plane 1 Plane 1 Plane 1 Plane 1 Plane 1 Plane 1 Plane 1
The Unicode Standard 5.0 – Electronic edition
15.2
Letterlike Symbols
497
The plain letters have been unified with the existing characters in the Basic Latin and Greek blocks. There are 24 double-struck, italic, Fraktur, and script characters that already exist in the Letterlike Symbols block (U+2100..U+214F). These are explicitly unified with the characters in this block, and corresponding holes have been left in the mathematical alphabets. The alphabets in this block encode only semantic distinction, but not which specific font will be used to supply the actual plain, script, Fraktur, double-struck, sans-serif, or monospace glyphs. Especially the script and double-struck styles can show considerable variation across fonts. Characters from the Mathematical Alphanumeric Symbols block are not to be used for nonmathematical styled text. Compatibility Decompositions. All mathematical alphanumeric symbols have compatibility decompositions to the base Latin and Greek letters. This does not imply that the use of these characters is discouraged for mathematical use. Folding away such distinctions by applying the compatibility mappings is usually not desirable, however, as it loses the semantic distinctions for which these characters were encoded. See Unicode Standard Annex #15, “Unicode Normalization Forms.”
Fonts Used for Mathematical Alphabets Mathematicians place strict requirements on the specific fonts used to represent mathematical variables. Readers of a mathematical text need to be able to distinguish single-letter variables from each other, even when they do not appear in close proximity. They must be able to recognize the letter itself, whether it is part of the text or is a mathematical variable, and lastly which mathematical alphabet it is from. Fraktur. The blackletter style is often referred to as Fraktur or Gothic in various sources. Technically, Fraktur and Gothic typefaces are distinct designs from blackletter, but any of several font styles similar in appearance to the forms shown in the charts can be used. Note that in East Asian typography, the term Gothic is commonly used to indicate a sans-serif type style. Math Italics. Mathematical variables are most commonly set in a form of italics, but not all italic fonts can be used successfully. For example, a math italic font should avoid a “tail” on the lowercase italic letter z because it clashes with subscripts. In common text fonts, the italic letter v and Greek letter nu are not very distinct. A rounded italic letter v is therefore preferred in a mathematical font. There are other characters that sometimes have similar shapes and require special attention to avoid ambiguity. Examples are shown in Figure 15-5. Hard-to-Distinguish Letters. Not all sans-serif fonts allow an easy distinction between lowercase l and uppercase I, and not all monospaced (monowidth) fonts allow a distinction between the letter l and the digit one. Such fonts are not usable for mathematics. In Fraktur, the letters ' and (, in particular, must be made distinguishable. Overburdened blackletter forms are inappropriate for mathematical notation. Similarly, the digit zero must be distinct from the uppercase letter O for all mathematical alphanumeric sets. Some characters are so similar that even mathematical fonts do not attempt to provide distinct glyphs for
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
498
Symbols
Figure 15-5. Easily Confused Shapes for Mathematical Glyphs
italic a
alpha
italic v (pointed)
nu
italic v (rounded)
upsilon
script X
chi
plain Y
Upsilon
them. Their use is normally avoided in mathematical notation unless no confusion is possible in a given context—for example, uppercase A and uppercase Alpha. Font Support for Combining Diacritics. Mathematical equations require that characters be combined with diacritics (dots, tilde, circumflex, or arrows above are common), as well as followed or preceded by superscripted or subscripted letters or numbers. This requirement leads to designs for italic styles that are less inclined and script styles that have smaller overhangs and less slant than equivalent styles commonly used for text such as wedding invitations. Type Style for Script Characters. In some instances, a deliberate unification with a nonmathematical symbol has been undertaken; for example, U+2133 is unified with the pre1949 symbol for the German currency unit Mark. This unification restricts the range of glyphs that can be used for this character in the charts. Therefore the font used for the representative glyphs in the code charts is based on a simplified “English Script” style, as per recommendation by the American Mathematical Society. For consistency, other script characters in the Letterlike Symbols block are now shown in the same type style. Double-Struck Characters. The double-struck glyphs shown in earlier editions of the standard attempted to match the design used for all the other Latin characters in the standard, which is based on Times. The current set of fonts was prepared in consultation with the American Mathematical Society and leading mathematical publishers; it shows much simpler forms that are derived from the forms written on a blackboard. However, both serifed and non-serifed forms can be used in mathematical texts, and inline fonts are found in works published by certain publishers.
15.3 Number Forms Number Forms: U+2150–U+218F Many number form characters are composite or duplicate forms encoded solely for compatibility with existing standards. The use of these composite symbols is discouraged where their presence is not required by compatibility.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
15.3
Number Forms
499
Fractions. The Number Forms block contains a series of vulgar fraction characters, encoded for compatibility with legacy character encoding standards. These characters are intended to represent both of the common forms of vulgar fractions: forms with a rightslanted division slash, such as G, as shown in the code charts, and forms with a horizontal division line, such as H, which are considered to be alternative glyphs for the same fractions, as shown in Figure 15-6. A few other vulgar fraction characters are located in the Latin-1 block in the range U+00BC..U+00BE.
Figure 15-6. Alternate Forms of Vulgar Fractions
GH The vulgar fraction characters are given compatibility decompositions using U+2044 “/” fraction slash. Use of the fraction slash is the more generic way to represent fractions in text; it can be used to construct fractional number forms that are not included in the collections of vulgar fraction characters. For more information on the fraction slash, see “Other Punctuation” in Section 6.2, General Punctuation. Roman Numerals. For most purposes, it is preferable to compose the Roman numerals from sequences of the appropriate Latin letters. However, the uppercase and lowercase variants of the Roman numerals through 12, plus L, C, D, and M, have been encoded for compatibility with East Asian standards. Unlike sequences of Latin letters, these symbols remain upright in vertical layout. Additionally, in certain locales, compact date formats use Roman numerals for the month, but may expect the use of a single character. In identifiers, the use of Roman numeral symbols—particularly those based on a single letter of the Latin alphabet—can lead to spoofing. For more information, see Unicode Technical Report #36, “Unicode Security Considerations.” U+2180 roman numeral one thousand c d and U+216F roman numeral one thousand can be considered to be glyphic variants of the same Roman numeral, but are distinguished because they are not generally interchangeable and because U+2180 cannot be considered to be a compatibility equivalent to the Latin letter M. U+2181 roman numeral five thousand and U+2182 roman numeral ten thousand are distinct characters used in Roman numerals; they do not have compatibility decompositions in the Unicode Standard. U+2183 roman numeral reversed one hundred is a form used in combinations with C and/or I to form large numbers—some of which vary with single character number forms such as D, M, U+2181, or others. U+2183 is also used for the Claudian letter antisigma.
CJK Number Forms Chinese Counting-Rod Numerals. Counting-rod numerals were used in pre-modern East Asian mathematical texts in conjunction with counting rods used to represent and manipulate numbers. The counting rods were a set of small sticks, several centimeters long that were arranged in patterns on a gridded counting board. Counting rods and the counting
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
500
Symbols
board provided a flexible system for mathematicians to manipulate numbers, allowing for considerable sophistication in mathematics. The specifics of the patterns used to represent various numbers using counting rods varied, but there are two main constants: Two sets of numbers were used for alternate columns; one set was used for the ones, hundreds, and ten-thousands columns in the grid, while the other set was used for the tens and thousands. The shapes used for the counting-rod numerals in the Unicode Standard follow conventions from the Song dynasty in China, when traditional Chinese mathematics had reached its peak. Fragmentary material from many early Han dynasty texts shows different orientation conventions for the numerals, with horizontal and vertical marks swapped for the digits and tens places. Zero was indicated by a blank square on the counting board and was either avoided in written texts or was represented with U+3007 ideographic number zero. (Historically, U+3007 ideographic number zero originated as a dot; as time passed, it increased in size until it became the same size as an ideograph. The actual size of U+3007 ideographic number zero in mathematical texts varies, but this variation should be considered a font difference.) Written texts could also take advantage of the alternating shapes for the numerals to avoid having to explicitly represent zero. Thus 6,708 can be distinguished from 678, because the former would be /'(, whereas the latter would be &0(. Negative numbers were originally indicated on the counting board by using rods of a different color. In written texts, a diagonal slash from lower right to upper left is overlaid upon the rightmost digit. On occasion, the slash might not be actually overlaid. U+20E5 combining reverse solidus overlay should be used for this negative sign. The predominant use of counting-rod numerals in texts was as part of diagrams of counting boards. They are, however, occasionally used in other contexts, and they may even occur within the body of modern texts. Suzhou-Style Numerals. The Suzhou-style numerals (Mandarin su1zhou1ma3zi) are CJK ideographic number forms encoded in the CJK Symbols and Punctuation block in the ranges U+3021..U+3029 and U+3038..U+303A. The Suzhou-style numerals are modified forms of CJK ideographic numerals that are used by shopkeepers in China to mark prices. They are also known as “commercial forms,” “shop units,” or “grass numbers.” They are encoded for compatibility with the CNS 116431992 and Big Five standards. The forms for ten, twenty, and thirty, encoded at U+3038..U+303A, are also encoded as CJK unified ideographs: U+5341, U+5344, and U+5345, respectively. (For twenty, see also U+5EFE and U+5EFF.) These commercial forms of Chinese numerals should be distinguished from the use of other CJK unified ideographs as accounting numbers to deter fraud. See Table 4-9 in Section 4.6, Numeric Value—Normative, for a list of ideographs used as accounting numbers. Why are the Suzhou numbers called Hangzhou numerals in the Unicode names? No one has been able to trace this back. Hangzhou is a district in China that is near the Suzhou district, but the name “Hangzhou” does not occur in other sources that discuss these number forms.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
15.3
Number Forms
501
Superscripts and Subscripts: U+2070–U+209F In general, the Unicode Standard does not attempt to describe the positioning of a character above or below the baseline in typographical layout. Therefore, the preferred means to encode superscripted letters or digits, such as “1st” or “DC0016”, is by style or markup in rich text. However, in some instances superscript or subscript letters are used as part of the plain text content of specialized phonetic alphabets, such as the Uralic Phonetic Alphabet. These superscript and subscript letters are mostly from the Latin or Greek scripts. These characters are encoded in other character blocks, along with other modifier letters or phonetic letters. In addition, superscript digits are used to indicate tone in transliteration of many languages. The use of superscript two and superscript three is common legacy practice when referring to units of area and volume in general texts. A certain number of additional superscript and subscript characters are needed for roundtrip conversions to other standards and legacy code pages. Most such characters are encoded in this block and are considered compatibility characters. Parsing of Superscript and Subscript Digits. In the Unicode Character Database, superscript and subscript digits have not been given the General_Category property value Decimal_Number (gc=Nd), so as to prevent expressions like 23 from being interpreted like 23 by simplistic parsers. This should not be construed as preventing more sophisticated numeric parsers, such as general mathematical expression parsers, from correctly identifying these compatibility superscript and subscript characters as digits and interpreting them appropriately. Standards. Many of the characters in the Superscripts and Subscripts block are from character sets registered in the ISO International Register of Coded Character Sets to be Used With Escape Sequences, under the registration standard ISO/IEC 2375, for use with ISO/IEC 2022. Two MARC 21 character sets used by libraries include the digits, plus signs, minus signs, and parentheses. Superscripts and Subscripts in Other Blocks. The superscript digits one, two, and three are coded in the Latin-1 Supplement block to provide code point compatibility with ISO/IEC 8859-1. For a discussion of U+00AA feminine ordinal indicator and U+00BA masculine ordinal indicator, see “Letters of the Latin-1 Supplement” in Section 7.1, Latin. U+2120 service mark and U+2122 trade mark sign are commonly used symbols that are encoded in the Letterlike Symbols block (U+2100..U+214F); they consist of sequences of two superscripted letters each. For phonetic usage, there are a small number of superscript letters located in the Spacing Modifier Letters block (U+02B0..U+02FF) and a large number of superscript and subscript letters in the Phonetic Extensions block (U+1D00..U+1D7F) and in the Phonetic Extensions Supplement block (U+1D80..U+1DBF). The superscripted letters do not contain the word “superscript” in their character names, but are simply called modifier letters. Finally, a small set of superscripted CJK ideographs, used for the Japanese system of syntactic markup of Classical Chinese text for reading, is located in the Kanbun block (U+3190..U+319F).
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
502
Symbols
15.4 Mathematical Symbols The Unicode Standard provides a large set of standard mathematical characters to support publications of scientific, technical, and mathematical texts on and off the Web. In addition to the mathematical symbols and arrows contained in the blocks described in this section, mathematical operators are found in the Basic Latin (ASCII) and Latin-1 Supplement blocks. A few of the symbols from the Miscellaneous Technical, Miscellaneous Symbols, and Dingbats blocks, as well as characters from General Punctuation, are also used in mathematical notation. For Latin and Greek letters in special font styles that are used as mathematical variables, such as U+210B ã script capital h, as well as the Hebrew letter alef used as the first transfinite cardinal symbol encoded by U+2135 ℵ alef symbol, see “Letterlike Symbols” and “Mathematical Alphanumeric Symbols” in Section 15.2, Letterlike Symbols. The repertoire of mathematical symbols in Unicode enables the display of virtually all standard mathematical symbols. Nevertheless, no collection of mathematical symbols can ever be considered complete; mathematicians and other scientists are continually inventing new mathematical symbols. More symbols will be added as they become widely accepted in the scientific communities. Semantics. The same mathematical symbol may have different meanings in different subdisciplines or different contexts. The Unicode Standard encodes only a single character for a single symbolic form. For example, the “+” symbol normally denotes addition in a mathematical context, but it might refer to concatenation in a computer science context dealing with strings, indicate incrementation, or have any number of other functions in given contexts. It is up to the application to distinguish such meanings according to the appropriate context. Where information is available about the usage (or usages) of particular symbols, it has been indicated in the character annotations in Chapter 17, Code Charts. Mathematical Property. The mathematical (math) property is an informative property of characters that are used as operators in mathematical formulas. The mathematical property may be useful in identifying characters commonly used in mathematical text and formulas. However, a number of these characters have multiple usages and may occur with nonmathematical semantics. For example, U+002D hyphen-minus may also be used as a hyphen—and not as a mathematical minus sign. Other characters, including some alphabetic, numeric, punctuation, spaces, arrows, and geometric shapes, are used in mathematical expressions as well, but are even more dependent on the context for their identification. A list of characters with the mathematical property is provided in the Unicode Character Database. For a classification of mathematical characters by typographical behavior and mapping to ISO 9573-13 entity sets, see Unicode Technical Report #25, “Unicode Support for Mathematics.”
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
15.4
Mathematical Symbols
503
Mathematical Operators: U+2200–U+22FF The Mathematical Operators block includes character encodings for operators, relations, geometric symbols, and a few other symbols with special usages confined largely to mathematical contexts. Standards. Many national standards’ mathematical operators are covered by the characters encoded in this block. These standards include such special collections as ANSI Y10.20, ISO 6862, ISO 8879, and portions of the collection of the American Mathematical Society, as well as the original repertoire of TEX. Encoding Principles. Mathematical operators often have more than one meaning. Therefore the encoding of this block is intentionally rather shape-based, with numerous instances in which several semantic values can be attributed to the same Unicode code point. For example, U+2218 ° ring operator may be the equivalent of white small circle or composite function or apl jot. The Unicode Standard does not attempt to distinguish all possible semantic values that may be applied to mathematical operators or relation symbols. The Unicode Standard does include many characters that appear to be quite similar to one another, but that may well convey different meanings in a given context. Conversely, mathematical operators, and especially relation symbols, may appear in various standards, handbooks, and fonts with a large number of purely graphical variants. Where variants were recognizable as such from the sources, they were not encoded separately. For relation symbols, the choice of a vertical or forward-slanting stroke typically seems to be an aesthetic one, but both slants might appear in a given context. However, a back-slanted stroke almost always has a distinct meaning compared to the forward-slanted stroke. See Section 16.4, Variation Selectors, for more information on some particular variants. Unifications. Mathematical operators such as implies ⇒ and if and only if ↔ have been unified with the corresponding arrows (U+21D2 rightwards double arrow and U+2194 left right arrow, respectively) in the Arrows block. The operator U+2208 element of is occasionally rendered with a taller shape than shown in the code charts. Mathematical handbooks and standards consulted treat these characters as variants of the same glyph. U+220A small element of is a distinctively small version of the element of that originates in mathematical pi fonts. The operators U+226B much greater-than and U+226A much less-than are sometimes rendered in a nested shape. The nested shapes are encoded separately as U+2AA2 double nested greater-than and U+2AA1 double nested less-than. A large class of unifications applies to variants of relation symbols involving negation. Variants involving vertical or slanted negation slashes and negation slashes of different lengths are not separately encoded. For example, U+2288 neither a subset of nor equal to is the archetype for several different glyph variants noted in various collections. In two instances in this block, essentially stylistic variants are separately encoded: U+2265 greater-than or equal to is distinguished from U+2267 greater-than over equal to; the same distinction applies to U+2264 less-than or equal to and U+2266 less-
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
504
Symbols
than over equal to. Further instances of the encoding of such stylistic variants can be found in the supplemental blocks of mathematical operators. The primary reason for such duplication is for compatibility with existing standards. Greek-Derived Symbols. Several mathematical operators derived from Greek characters have been given separate encodings because they are used differently from the corresponding letters. These operators may occasionally occur in context with Greek-letter variables. They include U+2206 ∆ increment, U+220F è n-ary product, and U+2211 ∑ n-ary summation. The latter two are large operators that take limits. Other duplicated Greek characters are those for U+00B5 µ micro sign in the Latin-1 Supplement block, U+2126 Ω ohm sign in Letterlike Symbols, and several characters among the APL functional symbols in the Miscellaneous Technical block. Most other Greek characters with special mathematical semantics are found in the Greek block because duplicates were not required for compatibility. Additional sets of mathematical-style Greek alphabets are found in the Mathematical Alphanumeric Symbols block. N-ary Operators. N-ary operators are distinguished from binary operators by their larger size and by the fact that in mathematical layout, they take limit expressions. Invisible Operators. In mathematics, some operators or punctuation are often implied but not displayed. For a set of invisible operators that can be used to mark these implied operators in the text, see Section 15.5, Invisible Mathematical Operators. Minus Sign. U+2212 “–” minus sign is a mathematical operator, to be distinguished from the ASCII-derived U+002D “-” hyphen-minus, which may look the same as a minus sign or be shorter in length. (For a complete list of dashes in the Unicode Standard, see Table 6-3.) U+22EE..U+22F1 are a set of ellipses used in matrix notation. U+2052 “%” commercial minus sign is a specialized form of the minus sign. Its use is described in Section 6.2, General Punctuation. Delimiters. Many mathematical delimiters are unified with punctuation characters. See Section 6.2, General Punctuation, for more information. Some of the set of ornamental Brackets in the range U+2768..U+2775 are also used as mathematical delimiters. See Section 15.8, Miscellaneous Symbols and Dingbats. See also Section 15.6, Technical Symbols, for specialized characters used for large vertical or horizontal delimiters. Bidirectional Layout. In a bidirectional context, with the exception of arrows, the glyphs for mathematical operators and delimiters are adjusted as described in Unicode Standard Annex #9, “The Bidirectional Algorithm.” See Section 4.7, Bidi Mirrored—Normative, and “Semantics of Paired Punctuation” in Section 6.2, General Punctuation. Other Elements of Mathematical Notation. In addition to the symbols in these blocks, mathematical and scientific notation makes frequent use of arrows, punctuation characters, letterlike symbols, geometrical shapes, and miscellaneous and technical symbols. For an extensive discussion of mathematical alphanumeric symbols, see Section 15.2, Letterlike Symbols. For additional information on all the mathematical operators and other symbols, see Unicode Technical Report #25, “Unicode Support for Mathematics.”
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
15.4
Mathematical Symbols
505
Supplements to Mathematical Symbols and Arrows The Unicode Standard defines a number of additional blocks to supplement the repertoire of mathematical operators and arrows. These additions are intended to extend the Unicode repertoire sufficiently to cover the needs of such applications as MathML, modern mathematical formula editing and presentation software, and symbolic algebra systems. Standards. MathML, an XML application, is intended to support the full legacy collection of the ISO mathematical entity sets. Accordingly, the repertoire of mathematical symbols for the Unicode Standard has been supplemented by the full list of mathematical entity sets in ISO TR 9573-13, Public entity sets for mathematics and science. An additional repertoire was provided from the amalgamated collection of the STIX Project (Scientific and Technical Information Exchange). That collection includes, but is not limited to, symbols gleaned from mathematical publications by experts of the American Mathematical Society and symbol sets provided by Elsevier Publishing and by the American Physical Society.
Supplemental Mathematical Operators: U+2A00–U+2AFF The Supplemental Mathematical Operators block contains many additional symbols to supplement the collection of mathematical operators. In addition, the Miscellaneous Symbols and Arrows block (U+2B00..U+2BFF) has been set aside to encode additional mathematical symbols, arrows, and geometric shapes.
Miscellaneous Mathematical Symbols-A: U+27C0–U+27EF The Miscellaneous Mathematical Symbols-A block contains symbols that are used mostly as operators or delimiters in mathematical notation. Mathematical Brackets. The mathematical white square brackets, angle brackets, and double angle brackets encoded at U+27E6..U+27EB are intended for ordinary mathematical use of these particular bracket types. They are unambiguously narrow, for use in mathematical and scientific notation, and should be distinguished from the corresponding wide forms of white square brackets, angle brackets, and double angle brackets used in CJK typography. (See the discussion of the CJK Symbols and Punctuation block in Section 6.2, General Punctuation.) Note especially that the “bra” and “ket” angle brackets (U+2329 left-pointing angle bracket and U+232A right-pointing angle bracket, respectively) are now deprecated for use with mathematics because of their canonical equivalence to CJK angle brackets, which is likely to result in unintended spacing problems if used in mathematical formulae.
Miscellaneous Mathematical Symbols-B: U+2980–U+29FF The Miscellaneous Mathematical Symbols-B block contains miscellaneous symbols used for mathematical notation, including fences and other delimiters. Some of the symbols in this block may also be used as operators in some contexts.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
506
Symbols
Wiggly Fence. U+29D8 left wiggly fence has a superficial similarity to U+FE34 presentation form for vertical wavy low line. The latter is a wiggly sidebar character, intended for legacy support as a style of underlining character in a vertical text layout context; it has a compatibility mapping to U+005F low line. This represents a very different usage from the standard use of fence characters in mathematical notation.
Arrows: U+2190–U+21FF Arrows are used for a variety of purposes: to imply directional relation, to show logical derivation or implication, and to represent the cursor control keys. Accordingly, the Unicode Standard includes a fairly extensive set of generic arrow shapes, especially those for which there are established usages with well-defined semantics. It does not attempt to encode every possible stylistic variant of arrows separately, especially where their use is mainly decorative. For most arrow variants, the Unicode Standard provides encodings in the two horizontal directions, often in the four cardinal directions. For the single and double arrows, the Unicode Standard provides encodings in eight directions. Bidirectional Layout. In bidirectional layout, arrows are not automatically mirrored, because the direction of the arrow could be relative to the text direction or relative to an absolute direction. Therefore, if text is copied from a left-to-right to a right-to-left context, or vice versa, the character code for the desired arrow direction in the new context must be used. For example, it might be necessary to change U+21D2 rightwards double arrow to U+21D0 leftwards double arrow to maintain the semantics of “implies” in a rightto-left context. For more information on bidirectional layout, see Unicode Standard Annex #9, “The Bidirectional Algorithm.” Standards. The Unicode Standard encodes arrows from many different international and national standards as well as corporate collections. Unifications. Arrows expressing mathematical relations have been encoded in the Arrows block as well as in the supplemental arrows blocks. An example is U+21D2 ⇒ rightwards double arrow, which may be used to denote implies. Where available, such usage information is indicated in the annotations to individual characters in Chapter 17, Code Charts. However, because the arrows have such a wide variety of applications, there may be several semantic values for the same Unicode character value.
Supplemental Arrows The Supplemental Arrows-A (U+27F0..U+27FF), Supplemental Arrows-B (U+2900.. U+297F), and Miscellaneous Symbols and Arrows (U+2B00..U+2BFF) blocks contain a large repertoire of arrows to supplement the main set in the Arrows block. Long Arrows. The long arrows encoded in the range U+27F5..U+27FF map to standard SGML entity sets supported by MathML. Long arrows represent distinct semantics from their short counterparts, rather than mere stylistic glyph differences. For example, the shorter forms of arrows are often used in connection with limits, whereas the longer ones are associated with mappings. The use of the long arrows is so common that they were
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
15.5
Invisible Mathematical Operators
507
assigned entity names in the ISOAMSA entity set, one of the suite of mathematical symbol entity sets covered by the Unicode Standard.
Standardized Variants of Mathematical Symbols These mathematical variants are all produced with the addition of U+FE00 variation selector-1 (VS1) to mathematical operator base characters. The valid combinations are listed in the file StandardizedVariants.txt in the Unicode Character Database. All combinations not listed there are unspecified and are reserved for future standardization; no conformant process may interpret them as standardized variants. Change in Representative Glyphs for U+2278 and U+2279. In Version 3.2 of the Unicode Standard, the representative glyphs for U+2278 neither less-than nor greater-than and U+2279 neither greater-than nor less-than were changed from using a vertical cancellation to using a slanted cancellation. This change was made to match the longstanding canonical decompositions for these characters, which use U+0338 combining long solidus overlay. The symmetric forms using the vertical stroke continue to be acceptable glyph variants. Using U+2278 or U+2279 with VS1 will request these variants explicitly, as will using U+2276 less-than or greater-than or U+2277 greater-than or less-than with U+20D2 combining long vertical line overlay. Unless fonts are created with the intention to add support for both forms (via VS1 for the upright forms), there is no need to revise the glyphs in existing fonts; the glyphic range implied by using the base character code alone encompasses both shapes. For more information, see Section 16.4, Variation Selectors.
15.5 Invisible Mathematical Operators In mathematics, some operators and punctuation are often implied but not displayed. The General Punctuation block contains several special format control characters known as invisible operators, which can be used to make such operators explicit for use in machine interpretation of mathematical expressions. Use of invisible operators is optional and is intended for interchange with math-aware programs. A more complete discussion of mathematical notation can be found in Unicode Technical Report #25, “Unicode Support for Mathematics.” Invisible Separator. U+2063 invisible separator (also known as invisible comma) is intended for use in index expressions and other mathematical notation where two adjacent variables form a list and are not implicitly multiplied. In mathematical notation, commas are not always explicitly present, but they need to be indicated for symbolic calculation software to help it disambiguate a sequence from a multiplication. For example, the double ij subscript in the variable aij means ai, j —that is, the i and j are separate indices and not a single variable with the name ij or even the product of i and j. To represent the implied list separation in the subscript ij , one can insert a nondisplaying invisible separator between the
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html
Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html
Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html
This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006
Chapter 16
Special Areas and Format Characters 16 This chapter describes several kinds of characters that have special properties as well as areas of the codespace that are set aside for special purposes: Control codes
Surrogates area
Noncharacters
Layout controls
Variation selectors
Specials
Deprecated format characters
Private-use characters
Tag characters
In addition to regular characters, the Unicode Standard contains a number of format characters. These characters are not normally rendered directly, but rather influence the layout of text or otherwise affect the operation of text processes. The Unicode Standard contains code positions for the 64 control characters and the DEL character found in ISO standards and many vendor character sets. The choice of control function associated with a given character code is outside the scope of the Unicode Standard, with the exception of those control characters specified in this chapter. Layout controls are not themselves rendered visibly, but influence the behavior of algorithms for line breaking, word breaking, glyph selection, and bidirectional ordering. Surrogate code points are reserved and are to be used in pairs—called surrogate pairs—to access 1,048,544 supplementary characters. Variation selectors allow the specification of standardized variants of characters. This ability is particularly useful where the majority of implementations would treat the two variants as two forms of the same character, but where some implementations need to differentiate between the two. By using a variation selector, such differentiation can be made explicit. Private-use characters are reserved for private use. Their meaning is defined by private agreement. Noncharacters are code points that are permanently reserved and will never have characters assigned to them.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
532
Special Areas and Format Characters
The Specials block contains characters that are neither graphic characters nor traditional controls. Tag characters support a general scheme for the internal tagging of text streams in the absence of other mechanisms, such as markup languages. They are reserved for use with specific plain text-based protocols that specify their usage. Their use in ordinary text is strongly discouraged.
16.1 Control Codes There are 65 code points set aside in the Unicode Standard for compatibility with the C0 and C1 control codes defined in the ISO/IEC 2022 framework. The ranges of these code points are U+0000..U+001F, U+007F, and U+0080..U+009F, which correspond to the 8-bit controls 0016 to 1F16 (C0 controls), 7F16 (delete), and 8016 to 9F16 (C1 controls), respectively. For example, the 8-bit legacy control code character tabulation (or tab) is the byte value 0916; the Unicode Standard encodes the corresponding control code at U+0009. The Unicode Standard provides for the intact interchange of these code points, neither adding to nor subtracting from their semantics. The semantics of the control codes are generally determined by the application with which they are used. However, in the absence of specific application uses, they may be interpreted according to the control function semantics specified in ISO/IEC 6429:1992. In general, the use of control codes constitutes a higher-level protocol and is beyond the scope of the Unicode Standard. For example, the use of ISO/IEC 6429 control sequences for controlling bidirectional formatting would be a legitimate higher-level protocol layered on top of the plain text of the Unicode Standard. Higher-level protocols are not specified by the Unicode Standard; their existence cannot be assumed without a separate agreement between the parties interchanging such data.
Representing Control Sequences There is a simple, one-to-one mapping between 7-bit (and 8-bit) control codes and the Unicode control codes: every 7-bit (or 8-bit) control code is numerically equal to its corresponding Unicode code point. For example, if the ASCII line feed control code (0A16) is to be used for line break control, then the text “WX
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
16.1
Control Codes
533
trol codes in the three Unicode encoding forms simply follows the rules for any other code points in the standard: UTF-8: <01 54 69 6D 65 73 02> UTF-16: <0001 0054 0069 006D 0065 0073 0002> UTF-32: <00000001 00000054 00000069 0000006D 00000065 00000073 00000002> Escape Sequences. Escape sequences are a particular type of protocol that consists of the use of some set of ASCII characters introduced by the escape control code, 1B16, to convey extra-textual information. When converting escape sequences into and out of Unicode text, they should be converted on a character-by-character basis. For instance, “ESC-A” <1B 41> would be converted into the Unicode coded character sequence <001B, 0041>. Interpretation of U+0041 as part of the escape sequence, rather than as latin capital letter a, is the responsibility of the higher-level protocol that makes use of such escape sequences. This approach allows for low-level conversion processes to conformantly convert escape sequences into and out of the Unicode Standard without needing to actually recognize the escape sequences as such. If a process uses escape sequences or other configurations of control code sequences to embed additional information about text (such as formatting attributes or structure), then such sequences constitute a higher-level protocol that is outside the scope of the Unicode Standard.
Specification of Control Code Semantics Several control codes are commonly used in plain text, particularly those involved in line and paragraph formatting. The use of these control codes is widespread and important to interoperability. Therefore, the Unicode Standard specifies semantics for their use with the rest of the encoded characters in the standard. Table 16-1 lists those control codes.
Table 16-1. Control Codes Specified in the Unicode Standard Code Point Abbreviation ISO/IEC 6429 Name U+0009 U+000A U+000B U+000C U+000D U+001C U+001D U+001E U+001F U+0085
HT LF VT FF CR FS GS RS US NEL
The Unicode Standard 5.0 – Electronic edition
character tabulation (tab) line feed line tabulation (vertical tab) form feed carriage return information separator four information separator three information separator two information separator one next line
Copyright © 1991–2007 Unicode, Inc.
534
Special Areas and Format Characters
Most of the control codes in Table 16-1 have the White_Space property. They have the directional property values of S, B, or WS, rather than the default of ON used for other control codes. (See Unicode Standard Annex #9, “The Bidirectional Algorithm.”) In addition, the separator semantics of the control codes U+001C..U+001F are recognized in the Bidirectional Algorithm. U+0009..U+000D and U+0085 also have line breaking property values that differ from the default CM value for other control codes. (See Unicode Standard Annex #14, “Line Breaking Properties.”) U+0000 null may be used as a Unicode string terminator, as in the C language. Such usage is outside the scope of the Unicode Standard, which does not require any particular formal language representation of a string or any particular usage of null. Newline Function. In particular, one or more of the control codes U+000A line feed, U+000D carriage return, and the Unicode equivalent of the EBCDIC next line can encode a newline function. A newline function can act like a line separator or a paragraph separator, depending on the application. See Section 16.2, Layout Controls, for information on how to interpret a line or paragraph separator. The exact encoding of a newline function depends on the application domain. For information on how to identify a newline function, see Section 5.8, Newline Guidelines.
16.2 Layout Controls The effect of layout controls is specific to particular text processes. As much as possible, layout controls are transparent to those text processes for which they were not intended. In other words, their effects are mutually orthogonal.
Line and Word Breaking The following gives a brief summary of the intended behavior of certain layout controls. For a full description of line and word breaking layout controls, see Unicode Standard Annex #14, “Line Breaking Properties.” No-Break Space. U+00A0 no-break space has the same width as U+0020 space, but the no-break space indicates that, under normal circumstances, no line breaks are permitted between it and surrounding characters, unless the preceding or following character is a line or paragraph separator or space or zero width space. For a complete list of space characters in the Unicode Standard, see Table 6-2. Word Joiner. U+2060 word joiner behaves like U+00A0 no-break space in that it indicates the absence of word boundaries; however, the word joiner has no width. The function of the character is to indicate that line breaks are not allowed between the adjoining characters, except next to hard line breaks. For example, the word joiner can be inserted after the fourth character in the text “base+delta” to indicate that there should be no line break between the “e” and the “+”. The word joiner can be used to prevent line breaking with other characters that do not have nonbreaking variants, such as U+2009 thin space or U+2015 horizontal bar, by bracketing the character.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
16.2
Layout Controls
535
The word joiner must not be confused with the zero width joiner or the combining grapheme joiner, which have very different functions. In particular, inserting a word joiner between two characters has no effect on their ligating and cursive joining behavior. The word joiner should be ignored in contexts other than word or line breaking. Zero Width No-Break Space. In addition to its primary meaning of byte order mark (see “Byte Order Mark” in Section 16.8, Specials), the code point U+FEFF possesses the semantics of zero width no-break space, which matches that of word joiner. Until Unicode 3.2, U+FEFF was the only code point with word joining semantics, but because it is more commonly used as byte order mark, the use of U+2060 word joiner to indicate word joining is strongly preferred for any new text. Implementations should continue to support the word joining semantics of U+FEFF for backward compatibility. Zero Width Space. The U+200B zero width space indicates a word boundary, except that it has no width. Zero-width space characters are intended to be used in languages that have no visible word spacing to represent word breaks, such as Thai, Khmer, and Japanese. When text is justified, ZWSP has no effect on letter spacing—for example, in English or Japanese usage. There may be circumstances with other scripts, such as Thai, where extra space is applied around ZWSP as a result of justification, as shown in Table 16-2. This approach is unlike the use of fixed-width space characters, such as U+2002 en space, that have specified width and should not be automatically expanded during justification (see Section 6.2, General Punctuation).
Table 16-2. Letter Spacing Type Memory
Justification Examples
the ISP® Charts
the ISP®Charts Display 2 the ISP®Char t s Display 1
Explanation The is inserted to allow line break after ® Without letter spacing Increased letter spacing
Display 3
the ISP®Charts
“Thai-style” letter spacing
Display 4
the I S P ®Cha r t s
incorrectly inhibiting letter spacing (after ®)
In some languages such as German and Russian, increased letter spacing is used to indicate emphasis. Implementers should be aware of this issue. Zero-Width Spaces and Joiner Characters. The zero-width spaces are not to be confused with the zero-width joiner characters. U+200C zero width non-joiner and U+200D zero width joiner have no effect on word boundaries, and zero width no-break space and zero width space have no effect on joining or linking behavior. In other words, the zero-width joiner characters should be ignored when determining word boundaries; zero
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
536
Special Areas and Format Characters
width space should be ignored when determining cursive joining behavior. See “Cursive Connection” later in this section. Hyphenation. U+00AD soft hyphen (SHY) indicates an intraword break point, where a line break is preferred if a word must be hyphenated or otherwise broken across lines. Such break points are generally determined by an automatic hyphenator. SHY can be used with any script, but its use is generally limited to situations where users need to override the behavior of such a hyphenator. The visible rendering of a line break at an intraword break point, whether automatically determined or indicated by a SHY, depends on the surrounding characters, the rules governing the script and language used, and, at times, the meaning of the word. The precise rules are outside the scope of this standard, but see Unicode Standard Annex #14, “Line Breaking Properties,” for additional information. A common default rendering is to insert a hyphen before the line break, but this is insufficient or even incorrect in many situations. Contrast this usage with U+2027 hyphenation point, which is used for a visible indication of the place of hyphenation in dictionaries. For a complete list of dash characters in the Unicode Standard, including all the hyphens, see Table 6-3. The Unicode Standard includes two nonbreaking hyphen characters: U+2011 non-breaking hyphen and U+0F0C tibetan mark delimiter tsheg bstar. See Section 10.2, Tibetan, for more discussion of the Tibetan-specific line breaking behavior. Line and Paragraph Separator. The Unicode Standard provides two unambiguous characters, U+2028 line separator and U+2029 paragraph separator, to separate lines and paragraphs. They are considered the default form of denoting line and paragraph boundaries in Unicode plain text. A new line is begun after each line separator. A new paragraph is begun after each paragraph separator. As these characters are separator codes, it is not necessary either to start the first line or paragraph or to end the last line or paragraph with them. Doing so would indicate that there was an empty paragraph or line following. The paragraph separator can be inserted between paragraphs of text. Its use allows the creation of plain text files, which can be laid out on a different line width at the receiving end. The line separator can be used to indicate an unconditional end of line. A paragraph separator indicates where a new paragraph should start. Any interparagraph formatting would be applied. This formatting could cause, for example, the line to be broken, any interparagraph line spacing to be applied, and the first line to be indented. A line separator indicates that a line break should occur at this point; although the text continues on the next line, it does not start a new paragraph—no interparagraph line spacing or paragraphic indentation is applied. For more information on line separators, see Section 5.8, Newline Guidelines.
Cursive Connection and Ligatures In some fonts for some scripts, consecutive characters in a text stream may be rendered via adjacent glyphs that cursively join to each other, so as to emulate connected handwriting.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
16.2
Layout Controls
537
For example, cursive joining is implemented in nearly all fonts for the Arabic scripts and in a few handwriting-like fonts for the Latin script. Cursive rendering is implemented by joining glyphs in the font and by using a process that selects the particular joining glyph to represent each individual character occurrence, based on the joining nature of its neighboring characters. This glyph selection is implemented in the rendering engine, typically using information in the font. In many cases there is an even closer binding, where a sequence of characters is represented by a single glyph, called a ligature. Ligatures can occur in both cursive and noncursive fonts. Where ligatures are available, it is the task of the rendering system to select a ligature to create the most appropriate line layout. However, the rendering system cannot define the locations where ligatures are possible because there are many languages in which ligature formation requires more information. For example, in some languages, ligatures are never formed across syllable boundaries. On occasion, an author may wish to override the normal automatic selection of connecting glyphs or ligatures. Typically, this choice is made to achieve one of the following effects: • Cause nondefault joining appearance (for example, as is sometimes required in writing Persian using the Arabic script) • Exhibit the joining-variant glyphs themselves in isolation • Request a ligature to be formed where it normally would not be • Request a ligature not to be formed where it normally would be The Unicode Standard provides two characters that influence joining and ligature glyph selection: U+200C zero width non-joiner and U+200D zero width joiner. The zero width joiner and non-joiner request a rendering system to have more or less of a connection between characters than they would otherwise have. Such a connection may be a simple cursive link, or it may include control of ligatures. The zero width joiner and non-joiner characters are designed for use in plain text; they should not be used where higher-level ligation and cursive control is available. (See Unicode Technical Report #20, “Unicode in XML and Other Markup Languages,” for more information.) Moreover, they are essentially requests for the rendering system to take into account when laying out the text; while a rendering system should consider them, it is perfectly acceptable for the system to disregard these requests. The ZWJ and ZWNJ are designed for marking the unusual cases where ligatures or cursive connections are required or prohibited. These characters are not to be used in all cases where ligatures or cursive connections are desired; instead, they are meant only for overriding the normal behavior of the text. Joiner. U+200D zero width joiner is intended to produce a more connected rendering of adjacent characters than would otherwise be the case, if possible. In particular: • If the two characters could form a ligature but do not normally, ZWJ requests that the ligature be used.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
538
Special Areas and Format Characters
• Otherwise, if either of the characters could cursively connect but do not normally, ZWJ requests that each of the characters take a cursive-connection form where possible. In a sequence like <X, ZWJ, Y>, where a cursive form exists for X but not for Y, the presence of ZWJ requests a cursive form for X. Otherwise, where neither a ligature nor a cursive connection is available, the ZWJ has no effect. In other words, given the three broad categories below, ZWJ requests that glyphs in the highest available category (for the given font) be used: 1. Ligated 2. Cursively connected 3. Unconnected Non-joiner. U+200C zero width non-joiner is intended to break both cursive connections and ligatures in rendering. ZWNJ requests that glyphs in the lowest available category (for the given font) be used. For those unusual circumstances where someone wants to forbid ligatures in a sequence XY but promote cursive connection, the sequence <X, ZWJ, ZWNJ, ZWJ, Y> can be used. The ZWNJ breaks ligatures, while the two adjacent joiners cause the X and Y to take adjacent cursive forms (where they exist). Similarly, if someone wanted to have X take a cursive form but Y be isolated, then the sequence <X, ZWJ, ZWNJ, Y> could be used (as in previous versions of the Unicode Standard). Examples are shown in Figure 16-3. Cursive Connection. For cursive connection, the joiner and non-joiner characters typically do not modify the contextual selection process itself, but instead change the context of a particular character occurrence. By providing a non-joining adjacent character where the adjacent character otherwise would be joining, or vice versa, they indicate that the rendering process should select a different joining glyph. This process can be used in two ways: to prevent a cursive joining or to exhibit joining glyphs in isolation. In Figure 16-1, the insertion of the ZWNJ overrides the normal cursive joining of sad and lam.
Figure 16-1. Prevention of Joining
π+›
޻
π+Ã+›
›π
0635
0635
0644
200C
Copyright © 1991-2007, Unicode, Inc.
0644
The Unicode Standard 5.0 – Electronic edition
16.2
Layout Controls
539
In Figure 16-2, the normal display of ghain without ZWJ before or after it uses the nominal (isolated) glyph form. When preceded and followed by ZWJ characters, however, the ghain is rendered with its medial form glyph in isolation.
Figure 16-2. Exhibition of Joining Glyphs in Isolation
Õ
Õ
063A
Ä+Õ+Ä 200D
063A
200D
–
The examples in Figure 16-1 and Figure 16-2 are adapted from the Iranian national coded character set standard, ISIRI 3342, which defines ZWNJ and ZWJ as “pseudo space” and “pseudo connection,” respectively. Examples. Figure 16-3 provides samples of desired renderings when the joiner or nonjoiner is inserted between two characters. The examples presume that all of the glyphs are available in the font. If, for example, the ligatures are not available, the display would fall back to the unligated forms. Each of the entries in the first column of Figure 16-3 shows two characters in visual display order. The column headings show characters to be inserted between those two characters. The cells below show the respective display when the joiners in the heading row are inserted between the original two characters.
Figure 16-3. Effect of Intervening Joiners Character Sequences
f i
0066
0069
0627
0644
062C
0645
062C
0648
As Is
f i or fi
fi
fi
fi
For backward compatibility, between Arabic characters a ZWJ acts just like the sequence
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
540
Special Areas and Format Characters
Transparency. The property value of Joining_Type=Transparent applies to characters that should not interfere with cursive connection, even when they occur in sequence between two characters that are connected cursively. These include all nonspacing marks and most format control characters, except for ZWJ and ZWNJ themselves. Note, in particular, that enclosing combining marks are also transparent as regards cursive connection. For example, using U+20DD combining enclosing circle to circle an Arabic letter in a sequence should not cause that Arabic letter to change its cursive connections to neighboring letters. See Section 8.2, Arabic, for more on joining classes and the details regarding Arabic cursive joining. Joiner and Non-joiner in Indic Scripts. In Indic text, the ZWJ and ZWNJ are used to request particular display forms. A ZWJ after a sequence of consonant plus virama requests what is called a “half-form” of that consonant. A ZWNJ after a sequence of consonant plus virama requests that conjunct formation be interrupted, usually resulting in an explicit virama on that consonant. There are a few more specialized uses as well. For more information, see the discussions in Chapter 9, South Asian Scripts-I. Implementation Notes. For modern font technologies, such as OpenType or AAT, font vendors should add ZWJ to their ligature mapping tables as appropriate. Thus, where a font had a mapping from “f ” + “i” to fi, the font designer should add the mapping from “f ” + ZWJ + “i” to fi. In contrast, ZWNJ will normally have the desired effect naturally for most fonts without any change, as it simply obstructs the normal ligature/cursive connection behavior. As with all other alternate format characters, fonts should use an invisible zero-width glyph for representation of both ZWJ and ZWNJ. Filtering Joiner and Non-joiner. zero width joiner and zero width non-joiner are format control characters. As such, and in common with other format control characters, they are ordinarily ignored by processes that analyze text content. For example, a spellchecker or a search operation should filter them out when checking for matches. There are exceptions, however. In particular scripts—most notably the Indic scripts—ZWJ and ZWNJ have specialized usages that may be of orthographic significance. In those contexts, blind filtering of all instances of ZWJ or ZWNJ may result in ignoring distinctions relevant to the user’s notion of text content. Implementers should be aware of these exceptional circumstances, so that searching and matching operations behave as expected for those scripts.
Combining Grapheme Joiner U+034F combining grapheme joiner (CGJ) is used to affect the collation of adjacent characters for purposes of language-sensitive collation and searching. It is also used to distinguish sequences that would otherwise be canonically equivalent. Formally, the combining grapheme joiner is not a format control character, but rather a combining mark. It has the General_Category value gc=Mn and the canonical combining class value ccc=0.
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
16.2
Layout Controls
541
As a result of these properties, the presence of a combining grapheme joiner in the midst of a combining character sequence does not interrupt the combining character sequence; any process that is accumulating and processing all the characters of a combining character sequence would include a combining grapheme joiner as part of that sequence. This differs from the behavior of most format control characters, whose presence would interrupt a combining character sequence. In addition, because the combining grapheme joiner has the canonical combining class of 0, canonical reordering will not reorder any adjacent combining marks around a combining grapheme joiner. (See the definition of canonical reordering in Section 3.11, Canonical Ordering Behavior.) In turn, this means that insertion of a combining grapheme joiner between two combining marks will prevent normalization from switching the positions of those two combining marks, regardless of their own combining classes. Blocking Reordering. The CGJ has no visible glyph and no other format effect on neighboring characters but simply blocks reordering of combining marks. It can therefore be used as a tool to distinguish two alternative orderings of a sequence of combining marks for some exceptional processing or rendering purpose, whenever normalization would otherwise eliminate the distinction between the two sequences. For example, using CGJ to block reordering is one way to maintain distinction between differently ordered sequences of certain Hebrew accents and marks. These distinctions are necessary for analytic and text representational purposes. However, these characters were assigned fixed-position combining classes despite the fact that they interact typographically. As a result, normalization treats differently ordered sequences as equivalent. In particular, the sequence
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
542
Special Areas and Format Characters
combining marks, giving the two strings distinct keys. That makes it possible to treat them distinctly in searching and sorting without having to tailor the weights for either the combining grapheme joiner or the combining marks. The CGJ can also be used to prevent the formation of contractions in the Unicode Collation Algorithm. For example, while “ch” is sorted as a single unit in a tailored Slovak collation, the sequence
Bidirectional Ordering Controls Bidirectional ordering controls are used in the Bidirectional Algorithm, described in Unicode Standard Annex #9, “The Bidirectional Algorithm.” Systems that handle right-to-left scripts such as Arabic, Syriac, and Hebrew, for example, should interpret these format control characters. The bidirectional ordering controls are shown in Table 16-3. As with other format control characters, bidirectional ordering controls affect the layout of the text in which they are contained but should be ignored for other text processes, such as sorting or searching. However, text processes that modify text content must maintain these characters correctly, because matching pairs of bidirectional ordering controls must be
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
16.3
Deprecated Format Characters
543
Table 16-3. Bidirectional Ordering Controls Code
Name
Abbreviation
U+200E U+200F U+202A U+202B U+202C U+202D U+202E
left-to-right mark right-to-left mark left-to-right embedding right-to-left embedding pop directional formatting left-to-right override right-to-left override
lrm rlm lre rle pdf lro rlo
coordinated, so as not to disrupt the layout and interpretation of bidirectional text. Each instance of a lre, rle, lro, or rlo is normally paired with a corresponding pdf. U+200E left-to-right mark and U+200F right-to-left mark have the semantics of an invisible character of zero width, except that these characters have strong directionality. They are intended to be used to resolve cases of ambiguous directionality in the context of bidirectional texts; they are not paired. Unlike U+200B zero width space, these characters carry no word breaking semantics. (See Unicode Standard Annex #9, “The Bidirectional Algorithm,” for more information.)
16.3 Deprecated Format Characters Deprecated Format Characters: U+206A–U+206F Three pairs of deprecated format characters are encoded in this block: • Symmetric swapping format characters used to control the glyphs that depict characters such as “(” (The default state is activated.) • Character shaping selectors used to control the shaping behavior of the Arabic compatibility characters (The default state is inhibited.) • Numeric shape selectors used to override the normal shapes of the Western digits (The default state is nominal.) The use of these character shaping selectors and codes for digit shapes is strongly discouraged in the Unicode Standard. Instead, the appropriate character codes should be used with the default state. For example, if contextual forms for Arabic characters are desired, then the nominal characters should be used, not the presentation forms with the shaping selectors. Similarly, if the Arabic digit forms are desired, then the explicit characters should be used, such as U+0660 arabic-indic digit zero. Symmetric Swapping. The symmetric swapping format characters are used in conjunction with the class of left- and right-handed pairs of characters (symmetric characters), such as parentheses. The characters thus affected are listed in Section 4.7, Bidi Mirrored—Norma-
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html
Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html
Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html
This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006
Chapter 17
Code Charts
17 Disclaimer
Character images shown in the code charts are not prescriptive. In actual fonts, considerable variations are to be expected. The code charts that follow present the characters of the Unicode Standard. Characters are organized into related groups called blocks. Many scripts are fully contained within a single character block, but other scripts, including some of the most widely used scripts, have characters divided across several blocks. Separate blocks contain common punctuation characters and different types of symbols. A character names list follows each character chart. The character names list itemizes every character in the block and provides supplementary information in many cases. Charts for CJK Unified Ideographs and for Hangul syllables are not printed in this chapter, but are available online, as discussed in Section 17.2, CJK Unified Ideographs, and Section 17.3, Hangul Syllables. An index to distinctive character names is found at the back of this book; a full set of character names appears in the Unicode Character Database.
17.1 Character Names List The following illustration identifies the components of typical entries in the character names list. code
image
00AE
®
00AF
¯
entry REGISTERED SIGN = registered trade mark sign (1.0)
(Version 1.0 name)
MACRON = overline, APL overbar • this is a spacing character → 02C9 ¯ modifier letter macron
(Unicode name) (alternative names) (informative note) (cross reference)
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
564
Code Charts → 0304 ÄÑ combining macron → 0305 ÄÖ combining overline » 0020 † 0304 ÄÑ
00E5
å
(compatibility decomposition)
LATIN SMALL LETTER A WITH RING ABOVE • Danish, Norwegian, Swedish, Walloon (sample of language use) · 0061 a 030A Ää (canonical decomposition)
Images in the Code Charts and Character Lists Each character in these code charts is shown with a representative glyph. A representative glyph is not a prescriptive form of the character, but rather one that enables recognition of the intended character to a knowledgeable user and facilitates lookup of the character in the code charts. In many cases, there are more or less well-established alternative glyphic representations for the same character. Designers of high-quality fonts will do their own research into the preferred glyphic appearance of Unicode characters. In addition, many scripts require context-dependent glyph shaping, glyph positioning, or ligatures, none of which is shown in the code charts. The representative glyphs for the Latin, Greek, and Cyrillic scripts in the code charts are based on a serifed, Times-like font. Some characters have alternative forms. For example, even the ASCII character U+0061 latin small letter a has two common alternative forms: the “a” used in Times and the “—” that occurs in many other font styles. In a Timeslike font, the character U+03A5 greek capital letter upsilon looks like “Y”; the form Y is common in other font styles. The fonts used for other scripts are similar to Times in that each represents a common, widely used design, with variable stroke width and serifs or similar devices, where applicable, to show each character as distinctly as possible. Sans-serif fonts with uniform stroke width tend to have less visibly distinct characters. In the code charts, sans-serif fonts are used for archaic scripts that predate the invention of serifs, for example. A different case is U+010F latin small letter d with caron, which is commonly typeset as @ instead of A. In such cases, the code charts show the more common variant in preference to a more didactic archetypical shape. Many characters have been unified and have different appearances in different language contexts. The shape shown for U+2116 ñ numero sign is a fullwidth shape as it would be used in East Asian fonts. In Cyrillic usage, M is the universally recognized glyph. See Figure 15-2. In certain cases, characters need to be represented by more or less condensed, shifted, or distorted glyphs to make them fit the format of the code charts. For example, U+0D10 ê malayalam letter ai is shown in a reduced size to fit the character cell. Sometimes characters need to be given artificial shapes to make them recognizable in the code charts. Examples are the space characters and such characters as U+00AD K soft hyphen and U+2011 L non-breaking hyphen, where the special behavior of the
Copyright © 1991-2007, Unicode, Inc.
The Unicode Standard 5.0 – Electronic edition
17.1
Character Names List
565
hyphen is indicated by the dashed box and the letters. This use of a dashed box is not correlated with the General Category value of the character. When characters are used in context, the surrounding text gives important clues as to identity, size, and positioning. In the code charts, these clues are absent. For example, U+2075 ısuperscript five is shown much smaller than it would be in a Times-like text font. Combining characters are shown with a dotted circle—for example, U+0940 M devanagari vowel sign ii. The relative position of the dotted circle gives an approximate indication of the location of the base character in relation to the combining mark. During rendering, additional adjustments are necessary. Accents such as U+0302 combining circumflex accent are adjusted vertically and horizontally based on the height and width of the base character, as in “Ó” versus “Ù”. For non-European scripts, typical typefaces were selected that allow as much distinction as possible among the different characters. The Unicode Standard contains many characters that are used in writing minority languages or that are historical characters, often used primarily in manuscripts or inscriptions. Where there is no strong tradition of printed materials, the typography of a character may not be settled.
Character Names The character names in the code charts precisely match the normative character names in the Unicode Character Database. Character names are unique and stable. By convention, they are in uppercase. Because character names are stable, mistaken names will not be revised, but may be annotated. For example: 2118
}
SCRIPT CAPITAL P = Weierstrass elliptic function • actually this has the form of a lowercase calligraphic p, despite its name
For more information on character names, see Section 4.8, Name—Normative.
Informative Aliases An informative alias (preceded by =) is an alternate name for a character. Characters may have several aliases, and aliases for different characters are not guaranteed to be unique. Aliases are informative and may be updated. By convention, aliases are in lowercase, except where they contain proper names. Where an alias matches the name of a character in The Unicode Standard, Version 1.0, it is listed first, followed by “1.0” in parentheses. Because the formal character names may differ in unexpected ways from commonly used names (for example, pilcrow sign = paragraph sign), some aliases may be useful alternate choices for indicating characters in user interfaces. In the Hangul Jamo block, U+1100..U+11FF, the normative short jamo names are given as aliases.
The Unicode Standard 5.0 – Electronic edition
Copyright © 1991–2007 Unicode, Inc.
C0 Controls and Basic Latin Range: 0000–007F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0000
C0 Controls and Basic Latin 000 0
0023
0014
0024
0015
0025
0016
0026
0017
0018
0019
0027
0028
0029
001A
002A
001B
002B
,
0031
0041
0051
0061
0032
0042
0052
0062
0033
0043
0053
0063
0034
0044
0054
0064
0071
r 0072
s 0073
t 0074
0035
0045
0055
0065
0036
0046
0056
0066
0075
v 0076
001C
002C
0037
0038
0039
: 003A
0047
0048
0057
0058
0059
0069
J Z j 004A
005A
004B
005B
< L \ 003C
0068
I Y i 0049
; K [ 003B
0067
004C
005C
006A
0077
0078
y 0079
z 007A
k { 006B
007B
l
|
006C
007C
- = M ] m }
001D
002D
001E
002E
.
! / 000F
2
0013
+
000E
F
0022
*
000D
E
1 A Q a q
9
000C
D
0012
p 0070
)
000B
C
0060
8 H X h x
000A
B
0050
( 0009
A
0021
0040
7 G W g w
0008
9
0011
0030
007
' 0007
8
0020
& 6 F V f 0006
7
006
% 5 E U e u 0005
6
005
$ 4 D T d 0004
5
004
# 3 C S c 0003
4
003
" 2 B R b 0002
3
0010
! 0001
2
002
" 0 @ P ` 0000
1
001
007F
001F
002F
003D
004D
005D
006D
007D
> N ^ n ~ 003E
004E
005E
006E
007E
? O _ o #
003F
004F
005F
006F
007F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0000
C0 Controls and Basic Latin
C0 controls Alias names are those for ISO/IEC 6429:1992. Commonly used alternative aliases are also shown. 0000
0026
001D
ASCII punctuation and symbols Based on ISO/IEC 646. 0020 SPACE • sometimes considered a control code • other space characters: 2000 –200A → 00A0 no-break space → 200B zero width space → 2060 word joiner → 3000 ideographic space → FEFF zero width no-break space 0021 ! EXCLAMATION MARK = factorial = bang → 00A1 ¡ inverted exclamation mark → 01C3 latin letter retroflex click → 203C double exclamation mark → 203D interrobang → 2762 heavy exclamation mark ornament 0022 " QUOTATION MARK • neutral (vertical), used as opening or closing quotation mark • preferred characters in English for paired quotation marks are 201C “ & 201D ” → 02BA modifier letter double prime → 030B combining double acute accent → 030E combining double vertical line above → 2033 double prime → 3003 〃 ditto mark 0023 # NUMBER SIGN = pound sign, hash, crosshatch, octothorpe → 2114 l b bar symbol → 266F music sharp sign 0024 $ DOLLAR SIGN = milreis, escudo • glyph may have one or two vertical bars • other currency symbol characters: 20A0 –20B5 é → 00A4 ¤ currency sign 0025 % PERCENT SIGN → 066A arabic percent sign → 2030 ‰ per mille sign → 2031 per ten thousand sign → 2052 commercial minus sign 0026 & AMPERSAND → 204A ô tironian sign et → 214B turned ampersand
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
3
C1 Controls and Latin-1 Supplement Range: 0080–00FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0080
C1 Controls and Latin-1 Supplement 008 0
008E
F
00C0
00D0
00E0
00A3
0094
00A4
00B1
00C1
00D1
00E1
00F
00F0
00F1
² Â Ò â ò
00B2
00C2
00D2
00E2
00F2
³ Ã Ó ã ó
00B3
00C3
00D3
00E3
00F3
´ Ä Ô ä ô
00B4
00C4
00D4
00E4
0095
00A5
0096
00A6
0097
00A7
0098
00A8
00B5
0099
00A9
009A
00AA
00C5
00D5
00E5
00F4
00F5
¶ Æ Ö æ ö
00B6
00C6
00D6
00E6
00F6
· Ç × ç ÷
00B7
00C7
00D7
00E7
00F7
¸ È Ø è ø
00B8
00B9
00C8
00D8
00E8
00F8
É Ù é ù 00C9
00D9
00E9
00F9
º Ê Ú ê ú
00BA
00CA
00DA
00EA
009B
00AB
00BB
00CB
00DB
00EB
009C
00AC
00BC
00CC
00DC
00EC
# ½ Í Ý í 008D
E
00E
¬ ¼ Ì Ü ì 008C
D
00D
00FA
« » Ë Û ë û 008B
C
0093
ª 008A
B
00B0
© ¹ 0089
A
00A2
¨ 0088
9
0092
§ 0087
8
00A1
¦ 0086
7
00C
¥ µ Å Õ å õ 0085
6
0091
¤ 0084
5
00A0
£ 0083
4
0090
¢ 0082
3
00B
¡ ± Á Ñ á ñ 0081
2
00A
" ° À Ð à ð 0080
1
009
00FF
009D
009E
00AD
009F
00CD
00DD
® ¾ Î Þ 00AE
! ¯ 008F
00BD
00AF
00ED
î
00FB
ü
00FC
ý
00FD
þ
00BE
00CE
00DE
00EE
00FE
¿
Ï
ß
ï
ÿ
00BF
00CF
00DF
00EF
00FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
7
Latin Extended-A Range: 0100–017F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0100
Latin Extended-A 010 0
0124
0125
0116
0126
0117
0127
0118
0119
0128
0150
0160
0170
ı Ł ő š ű
0131
0141
0151
0132
0142
0152
0133
0143
0153
0134
0144
0154
ĵ Ņ ŕ
0135
0145
0155
0136
0146
0156
011A
011B
011C
011D
011E
ď ğ 010F
011F
0137
0147
0157
0161
0171
0162
0163
0172
ų
0173
0164
0174
ť ŵ
0165
0175
012A
0166
`
0167
0176
ŷ
0177
9 ň Ř Ũ Ÿ 0138
0148
0158
ĩ Ĺ ʼn ř
0129
0139
0149
0159
0168
0178
ũ Ź
0169
0179
ĺ Ŋ Ś Ū ź
013A
014A
015A
016A
017A
ī Ļ ŋ ś ū Ż
012B
012C
013B
014B
015B
016B
017B
ļ Ō Ŝ Ŭ ż
013C
014C
015C
016C
017C
ĭ Ľ ō ŝ ŭ Ž
012D
Ď Ğ Į 010E
F
0115
č ĝ
010D
E
0123
Č Ĝ Ĭ 010C
D
0114
ċ ě
010B
C
0122
Ċ Ě Ī 010A
B
0113
ĉ ę
0109
A
0112
Ĉ Ę Ĩ 0108
9
017
Ő Š Ű
0140
ć ė ) ķ Ň ŗ
0107
8
016
Ć Ė ( Ķ ņ Ŗ _ Ŷ 0106
7
0121
ą ĕ ĥ
0105
6
0111
0130
015
Ą Ĕ Ĥ Ĵ ń Ŕ Ť Ŵ 0104
5
0120
014
ă ē ģ ij Ń œ ţ
0103
4
013
Ă Ē Ģ IJ ł Œ Ţ Ų 0102
3
0110
ā ġ
0101
2
012
Ā Ġ İ 0100
1
011
017F
012E
013D
014D
015D
016D
ľ Ŏ Ş Ů ž
013E
014E
015E
016E
į ŏ ş ů
012F
017D
013F
014F
015F
016F
017E
ſ
017F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
11
Latin Extended-B Range: 0180–024F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0180
Latin Extended-B
018 0
018E
F
01A2
01A3
0194
01A4
0195
01A5
0196
01A6
0197
01A7
0198
01A8
0199
019A
019B
01A9
*
01AA
+
01AB
019C
01AC
ƭ
018D
E
01C1
01B2
01C2
01B3
01C3
024
01E0
01F0
0200
01D1
01E1
01F1
0201
0210
0211
0220
0221
0230
0240
¨ ¡
0231
0241
ǒ Ǣ Dz î © ¬
01D2
01E2
01F2
0202
01D3
01E3
01F3
0203
0212
0213
0222
0232
0242
ï ª √
0223
0233
0243
01B4
01C4
01D4
01E4
01F4
0204
0214
0224
0234
0244
01B5
01C5
01D5
01E5
01F5
0205
0215
0225
0235
0245
01B6
01C6
01D6
01E6
01F6
0206
0216
0226
01B7
01C7
01D7
01E7
01F7
0207
0217
0227
0236
∑
0237
0246
«
0247
01B8
01C8
01D8
01E8
01F8
01B9
01C9
01D9
01E9
01F9
0208
0209
0218
0219
0228
0238
0248
π …
0229
0239
0249
ƺ NJ ǚ Ǫ Ǻ ¡ ∫
01BA
01CA
01DA
01EA
01FA
: Nj Ǜ ǫ ǻ
01BB
01CB
01DB
01EB
01FB
020A
020B
021A
021B
022A
023A
024A
¢ ª À
022B
023B
024B
Ƭ ; nj ǜ Ǭ Ǽ ë £ º Ã
018C
D
0193
018B
C
0192
Ɗ 018A
B
01B1
023
ǐ ^ ǰ } ì § ¿
01D0
Đ ƙ Ʃ 8 lj Ǚ ǩ v 0189
A
01A1
022
ƈ Ƙ ( 7 Lj ǘ Ǩ u
∏ » 0188
9
0191
021
Ƈ ' Ʒ LJ Ǘ ǧ é 0187
8
020
Ɔ & 5 dž ǖ Ǧ t ∆ 0186
7
>
01C0
01F
ƥ 4 Dž Ǖ c ǵ ≈ 0185
6
01B0
01E
Ƥ ƴ DŽ ǔ b Ǵ ƒ 0184
5
01A0
01D
Ɠ ç ã A Ǔ ǣ dz 0183
4
0190
01C
å Ʋ @ 0182
3
01B
Ɓ Ƒ ơ 1 ? Ǒ _ DZ ~ 0181
2
01A
Ơ ư 0180
1
019
024F
019D
019E
01AD
01BC
01BE
Ə ! Ư è 018F
019F
01AF
01DC
01EC
01FC
020C
021C
022C
023C
024C
< Ǎ [ ǭ ǽ í ¤ Ω Õ
01BD
. = 01AE
01CC
01BF
01CD
01DD
01ED
01FD
020D
021D
022D
023D
024D
ǎ Ǟ Ǯ Ǿ ¥ æ Œ
01CE
Ǐ
01CF
01DE
01EE
01FE
020E
021E
022E
023E
024E
ǟ ǯ ǿ ¦ ø œ
01DF
01EF
01FF
020F
021F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
022F
023F
024F
15
0180
Latin Extended-B
Non-European and historic Latin 0180
%
0190
LATIN CAPITAL LETTER OPEN E = epsilon • African → 025B ɛ latin small letter open e → 2107 euler constant
LATIN SMALL LETTER B WITH STROKE
• Americanist and Indo-Europeanist usage for phonetic beta
• Americanist orthographies use an alternate glyph with the stroke through the bowl
0181 0182 0183
0184 0185
0186
0187 0188 0189
018A 018B 018C 018D
018E
018F
16
• Old Saxon • uppercase is 0243 √ → 03B2 β greek small letter beta → 2422 & blank symbol Ɓ LATIN CAPITAL LETTER B WITH HOOK • Zulu, Pan-Nigerian alphabet → 0253 ɓ latin small letter b with hook ) LATIN CAPITAL LETTER B WITH TOPBAR * LATIN SMALL LETTER B WITH TOPBAR • Zhuang (old orthography) • former Soviet minority language scripts → 0411 Б cyrillic capital letter be , LATIN CAPITAL LETTER TONE SIX - LATIN SMALL LETTER TONE SIX • Zhuang (old orthography) • Zhuang tone three is Cyrillic ze • Zhuang tone four is Cyrillic che → 01A8 . latin small letter tone two → 01BD / latin small letter tone five → 0437 з cyrillic small letter ze → 0447 ч cyrillic small letter che → 044C ь cyrillic small letter soft sign Ɔ LATIN CAPITAL LETTER OPEN O • typographically a turned C • African → 0254 ɔ latin small letter open o Ƈ LATIN CAPITAL LETTER C WITH HOOK ƈ LATIN SMALL LETTER C WITH HOOK • African Đ LATIN CAPITAL LETTER AFRICAN D • Ewe → 00D0 Ð latin capital letter eth → 0110 8 latin capital letter d with stroke → 0256 9 latin small letter d with tail Ɗ LATIN CAPITAL LETTER D WITH HOOK • Pan-Nigerian alphabet → 0257 ɗ latin small letter d with hook < LATIN CAPITAL LETTER D WITH TOPBAR = LATIN SMALL LETTER D WITH TOPBAR • former-Soviet minority language scripts • Zhuang (old orthography) > LATIN SMALL LETTER TURNED DELTA
0191 0192
0193 0194 0195 0196 0197
= turned e • Pan-Nigerian alphabet • lowercase is 01DD B
Ə LATIN CAPITAL LETTER SCHWA • Azerbaijani, ... → 0259 ə latin small letter schwa → 04D8 Ә cyrillic capital letter schwa
Ƒ LATIN CAPITAL LETTER F WITH HOOK • African LATIN SMALL LETTER F WITH HOOK = script f = Florin currency symbol (Netherlands) = function symbol • used as abbreviation convention for folder
Ɠ LATIN CAPITAL LETTER G WITH HOOK • African → 0260 latin small letter g with hook
LATIN CAPITAL LETTER GAMMA • African → 0263 latin small letter gamma LATIN SMALL LETTER HV • Gothic transliteration • uppercase is 01F6 LATIN CAPITAL LETTER IOTA • African → 0269 ɩ latin small letter iota LATIN CAPITAL LETTER I WITH STROKE = barred i, i bar • African • ISO 6438 gives lowercase as 026A , not 0268 → 026A latin letter small capital i
0198 0199
Ƙ LATIN CAPITAL LETTER K WITH HOOK ƙ LATIN SMALL LETTER K WITH HOOK • Hausa, Pan-Nigerian alphabet 019A LATIN SMALL LETTER L WITH BAR
019B
= barred l • Americanist phonetic usage for 026C ɬ → 0142 ł latin small letter l with stroke → 023D Ω latin capital letter l with bar
LATIN SMALL LETTER LAMBDA WITH STROKE
= barred lambda, lambda bar • Americanist phonetic usage
019C LATIN CAPITAL LETTER TURNED M • Zhuang (old orthography) → 026F latin small letter turned m 019D LATIN CAPITAL LETTER N WITH LEFT HOOK • African → 0272 latin small letter n with left hook 019E LATIN SMALL LETTER N WITH LONG RIGHT LEG
• archaic phonetic for Japanese 3093 ん • recommended spelling for syllabic n is 006E n 0329 • Lakota (indicates nasalization of vowel) → 0220 ì latin capital letter n with long right
= reversed Polish-hook o • archaic phonetic for labialized alveolar fricative • recommended spellings 007A z 02B7 ? or 007A z 032B @
A LATIN CAPITAL LETTER REVERSED E
01A0
019F
01A0
leg
LATIN CAPITAL LETTER O WITH MIDDLE TILDE
= barred o, o bar • lowercase is 0275 • African → 04E8 Ө cyrillic capital letter barred o
Ơ LATIN CAPITAL LETTER O WITH HORN ≡ 004F O 031B $
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
IPA Extensions Range: 0250–02AF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0250
IPA Extensions 025 0
ə ɛ
0273
0264
0265
0275
0266
0276
0267
0277
ɸ
0268
ɩ
0269
026A
026B
ɬ
026C
026D
0278
+
0279
,
027A
-
027B
ɼ
027C
0282
0283
0291
ʒ
0292
S
02A1
T
02A2
ʓ U
0293
02A3
ʔ ʤ
0285
0286
026E
0287
0295
02A4
027E
02A5
H ʦ
0296
02A6
ʗ ʧ
0297
02A7
: ◎ ʨ
0288
0298
02A8
; K [ 0289
0299
02A9
ʊ L \
028A
029A
02AA
ʋ M ]
028B
029B
02AB
> N ^
028C
029C
/ ? O
027D
ɮ ɾ
025E
F
0263
ɚ
025D
E
0272
ɗ ɧ ɷ 9
025C
D
0262
ɦ ( ʆ
025B
C
0281
G ʥ
025A
B
0271
7
0259
A
0261
02A0
ɕ '
0258
9
0290
0294
0257
8
02A
0284
0256
7
029
0274
0255
6
0280
ɔ ɤ & 6
0254
5
0270
ɓ % ʃ 0253
4
0260
$ ʂ 0252
3
028
ɑ ɡ ɱ 3 ʑ 0251
2
027
" 2 B ʠ
0250
1
026
02AF
028D
029D
02AC
_
02AD
@ P
028E
029E
02AE
! 1 A Q ʯ
025F
026F
027F
028F
029F
02AF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
21
Spacing Modifier Letters Range: 02B0–02FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
02B0
Spacing Modifier Letters 02B 0
02B0
1
02B1
2
02B2
3
02B3
4
5
F
ô
5
02E3
6
02E4
02F2
ö
02F3
õ
02F4
8 ù
)
9 û
˘
: ü
˙
; †
ʹ
02C5
ˆ
02C6
ˇ
02C7
02C8
02C9
02CA
02CB
02CC
02D5
02D6
02D7
02D8
02D9
˜
02DC
˝
02BE
02CE
02DE
!
02BF
02CF
02E6
02E7
02E8
02E9
02F5
02F6
02F7
02F8
02F9
02EA
02FA
˛ = ¢
02DB
02DD
02E5
˚ < °
02DA
02CD
02BD
E
4
02E2
02F1
(
02BC
D
ò
7 ú
02BB
C
&
3
02E1
'
02BA
B
˄
%
02D3
02F0
˅
02B9
A
˃
02C3
$
02D2
2 ó
02E0
02B8
9
˂
02C2
#
02D1
02F
02D4
02B7
8
02C1
"
02D0
02E
02C4
02B6
7
02C0
02D
02B4
02B5
6
02C
02FF
0
02EB
02FB
> £
02EC
02FC
? §
02ED
02FD
@ •
02EE
02FE
1 ñ ¶
02DF
02EF
02FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
25
Combining Diacritical Marks Range: 0300–036F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0300
Combining Diacritical Marks 030 0
0341
0351
0361
0312
0322
0332
0342
0352
0362
0313
0323
0333
0343
0353
0363
0314
0324
0334
0344
0354
0364
0315
0325
0335
0345
0355
0365
0316
0326
0336
0346
0356
0366
0317
0327
0337
0347
0357
0367
0318
0328
0338
0348
0358
0368
0319
0329
0339
0349
0359
0369
031A
032A
033A
034A
035A
036A
031B
032B
033B
034B
035B
036B
031C
032C
033C
034C
035C
036C
031D
032D
033D
034D
035D
036D
0 ? L
030E
F
0331
/ > K
030D
E
0321
. = J
030C
D
0311
- < I
030B
C
0360
, ; H
030A
B
0350
+ : G 0309
A
0340
* 9 F 0308
9
0330
) 8 E 0307
8
0320
( 7 D 0306
7
0310
' 6 C 0305
6
036
& 5 B 0304
5
035
% ã 0303
4
034
$ 4 A O 0302
3
033
# 3 N 0301
2
032
" 2 M 0300
1
031
036F
031E
032E
033E
034E
035E
036E
! 1 @
030F
031F
032F
033F
034F
035F
036F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
29
Greek and Coptic Range: 0370–03FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0370
Greek and Coptic 037
038
039
03A
0390
0391
03A0
03A1
0392
0393
ʹ
0374
͵
0375
03E0
03F0
03B1
03C1
03D1
03E1
03F1
03B2
03C2
03D2
03E2
03F2
03A3
03B3
03C3
03D3
03E3
03F3
0394
03A4
03B4
03C4
03D4
03E4
03F4
΅ Ε Υ ε υ ϕ ϥ ϵ 0385
0387
0395
03A5
03B5
03C5
03D5
03E5
03F5
0396
03A6
03B6
03C6
03D6
03E6
03F6
0397
03A7
03B7
03C7
03D7
03E7
03F7
Έ Θ Ψ θ ψ Ϙ Ϩ ϸ
8
0388
0398
03A8
03B8
03C8
03D8
03E8
03F8
Ή Ι Ω ι ω ϙ ϩ Ϲ
9
0389
0399
03A9
03B9
03C9
03D9
03E9
03F9
ͺ Ί Κ Ϊ κ ϊ Ϛ Ϫ Ϻ
037A
038A
039A
03AA
03BA
03CA
03DA
03EA
03FA
Λ Ϋ λ ϋ ϛ ϫ ϻ
ͻ
039B
037B
03AB
03BB
03CB
03DB
03EB
03FB
ͼ Ό Μ ά μ ό Ϝ Ϭ ϼ
037C
038C
039C
03AC
03BC
03CC
03DC
03EC
03FC
Ν έ ν ύ ϝ ϭ Ͻ
ͽ
037D
039D
03AD
03BD
03CD
03DD
03ED
03FD
; Ύ Ξ ή ξ ώ Ϟ Ϯ Ͼ
037E
F
03D0
· Η Χ η χ ϗ ϧ Ϸ
7
E
03C0
΄ Δ Τ δ τ ϔ Ϥ ϴ
0384
0386
D
03B0
Ά Ζ Φ ζ φ ϖ Ϧ ϶
6
C
03F
Γ Σ γ σ ϓ ϣ ϳ
3
B
03E
β ς ϒ Ϣ ϲ
Β
2
A
03D
Α Ρ α ρ ϑ ϡ ϱ
1
5
03C
ΐ Π ΰ π ϐ Ϡ ϰ
0
4
03B
03FF
038E
039E
03AE
Ώ Ο ί 038F
039F
03AF
03BE
ο
03BF
03CE
03DE
03EE
03FE
ϟ ϯ Ͽ
03DF
03EF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
03FF
33
Cyrillic Range: 0400–04FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0400 040
0400
042
043
044
045
046
047
048
049
04A 04B 04C 04D 04E
04F
0410
0420
0430
0440
0450
0460
0470
0480
0490
04A0
04B0
04C0
04D0
04E0
04F0
Ё Б С б с ё ѡ ѱ ҁ ґ ҡ ұ Ӂ ӑ ӡ ӱ
1
0401
0411
0421
0431
0441
0451
0461
0471
0481
0491
04A1
04B1
04C1
04D1
04E1
04F1
Ђ В Т в т ђ Ѣ Ѳ ҂ Ғ Ң Ҳ ӂ Ӓ Ӣ Ӳ
2
0402
0412
0422
0432
0442
0452
0462
0472
0482
0492
04A2
04B2
04C2
04D2
04E2
04F2
Ѓ Г У г у ѓ ѣ ѳ ҃ ғ ң ҳ Ӄ ӓ ӣ ӳ
3
0403
0413
0423
0433
0443
0453
0463
0473
0483
0493
04A3
04B3
04C3
04D3
04E3
04F3
Є Д Ф д ф є Ѥ Ѵ ҄ Ҕ Ҥ Ҵ ӄ Ӕ Ӥ Ӵ
4
0404
0414
0424
0434
0444
0454
0464
0474
0484
0494
04A4
04B4
04C4
04D4
04E4
04F4
Ѕ Е Х е х ѕ ѥ ѵ ñ ҕ ҥ ҵ Ӆ ӕ ӥ ӵ
5
0405
0415
0425
0435
0445
0455
0465
0475
0485
0495
04A5
04B5
04C5
04D5
04E5
04F5
І Ж Ц ж ц і Ѧ Ѷ ó Җ Ҧ Ҷ ӆ Ӗ Ӧ Á
6
0406
0416
0426
0436
0446
0456
0466
0476
0486
Ї З Ч з ч ї ѧ ѷ
7
0407
0417
0427
0437
0447
0457
0467
0496
04A6
04B6
04C6
04D6
04E6
04F6
җ ҧ ҷ Ӈ ӗ ӧ Ë 0497
0477
04A7
04B7
04C7
04D7
04E7
04F7
Ј И Ш и ш ј Ѩ Ѹ ҈ Ҙ Ҩ Ҹ ӈ Ә Ө Ӹ
8
0408
0418
0428
0438
0448
0458
0468
0478
0488
0498
04A8
04B8
04C8
04D8
04E8
04F8
Љ Й Щ й щ љ ѩ ѹ ҉ ҙ ҩ ҹ Ӊ ә ө ӹ
9
0409
0419
0429
0439
0449
0459
0469
0479
0489
0499
04A9
04B9
04C9
04D9
04E9
04F9
Њ К Ъ к ъ њ ѪѺ Ҋ Қ Ҫ Һ ӊ Ӛ Ӫ ‰
A
040A
041A
042A
043A
044A
045A
046A
047A
048A
049A
04AA
04BA
04CA
04DA
04EA
04FA
Ћ Л Ы л ы ћ ѫ ѻ ҋ қ ҫ һ Ӌ ӛ ӫ Â
B
040B
C
D
041B
042B
043B
044B
045B
046B
047B
048B
049B
04AB
04BB
04CB
04DB
04EB
04FB
Ќ М Ь м ь ќ ѬѼ Ҍ Ҝ Ҭ Ҽ ӌ Ӝ Ӭ Ê 040C
041C
042C
043C
044C
045C
046C
047C
048C
049C
04AC
04BC
04CC
04DC
04EC
04FC
Ѝ Н Э н э ѝ ѭѽ ҍ ҝ ҭ ҽ Ӎ ӝ ӭ Á 040D
041D
042D
043D
044D
045D
046D
047D
048D
049D
04AD
04BD
04CD
04DD
04ED
04FD
Ў ОЮ о ю ў Ѯ Ѿ Ҏ Ҟ Ү Ҿ ӎ Ӟ Ӯ Ë 040E
F
041
04FF
Ѐ А Р а р ѐ ѠѰ Ҁ Ґ Ҡ Ұ Ӏ Ӑ Ӡ Ӱ
0
E
Cyrillic
041E
042E
043E
044E
045E
046E
047E
048E
049E
04AE
04BE
04CE
04DE
04EE
04FE
Џ П Я п я џ ѯ ѿ ҏ ҟ ү ҿ ´ ӟ ӯ È 040F
041F
042F
043F
044F
045F
046F
047F
048F
049F
04AF
04BF
04CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
04DF
04EF
04FF
37
0400
Cyrillic
Cyrillic extensions 0400 0401 0402 0403 0404 0405 0406
Ѐ CYRILLIC CAPITAL LETTER IE WITH GRAVE ≡ 0415 Е 0300 ) Ё CYRILLIC CAPITAL LETTER IO ≡ 0415 Е 0308 + Ђ CYRILLIC CAPITAL LETTER DJE Ѓ CYRILLIC CAPITAL LETTER GJE ≡ 0413 Г 0301 / Є CYRILLIC CAPITAL LETTER UKRAINIAN IE Ѕ CYRILLIC CAPITAL LETTER DZE І CYRILLIC CAPITAL LETTER BYELORUSSIANUKRAINIAN I
→ 0049 I latin capital letter i → 0456 і cyrillic small letter byelorussianukrainian i
0407
Ї
→ 04C0 Ӏ cyrillic letter palochka CYRILLIC CAPITAL LETTER YI
≡ 0406 І 0308 + 0408 Ј CYRILLIC CAPITAL LETTER JE 0409 Љ CYRILLIC CAPITAL LETTER LJE 040A Њ CYRILLIC CAPITAL LETTER NJE 040B Ћ CYRILLIC CAPITAL LETTER TSHE 040C Ќ CYRILLIC CAPITAL LETTER KJE ≡ 041A К 0301 / 040D Ѝ CYRILLIC CAPITAL LETTER I WITH GRAVE ≡ 0418 И 0300 ) 040E Ў CYRILLIC CAPITAL LETTER SHORT U ≡ 0423 У 0306 040F Џ CYRILLIC CAPITAL LETTER DZHE
Basic Russian alphabet 0410 0411 0412 0413 0414 0415 0416 0417 0418 0419 041A 041B 041C 041D 041E 041F 0420 0421 0422 0423 0424 0425 0426 0427 0428 0429 042A 042B 042C 042D 042E
38
А CYRILLIC CAPITAL LETTER A Б CYRILLIC CAPITAL LETTER BE → 0183 P latin small letter b with topbar В CYRILLIC CAPITAL LETTER VE Г CYRILLIC CAPITAL LETTER GHE Д CYRILLIC CAPITAL LETTER DE Е CYRILLIC CAPITAL LETTER IE Ж CYRILLIC CAPITAL LETTER ZHE З CYRILLIC CAPITAL LETTER ZE И CYRILLIC CAPITAL LETTER I Й CYRILLIC CAPITAL LETTER SHORT I ≡ 0418 И 0306 К CYRILLIC CAPITAL LETTER KA Л CYRILLIC CAPITAL LETTER EL М CYRILLIC CAPITAL LETTER EM Н CYRILLIC CAPITAL LETTER EN О CYRILLIC CAPITAL LETTER O П CYRILLIC CAPITAL LETTER PE Р CYRILLIC CAPITAL LETTER ER С CYRILLIC CAPITAL LETTER ES Т CYRILLIC CAPITAL LETTER TE У CYRILLIC CAPITAL LETTER U → 0478 Ѹ cyrillic capital letter uk → 04AF ү cyrillic small letter straight u Ф CYRILLIC CAPITAL LETTER EF Х CYRILLIC CAPITAL LETTER HA Ц CYRILLIC CAPITAL LETTER TSE Ч CYRILLIC CAPITAL LETTER CHE Ш CYRILLIC CAPITAL LETTER SHA Щ CYRILLIC CAPITAL LETTER SHCHA Ъ CYRILLIC CAPITAL LETTER HARD SIGN Ы CYRILLIC CAPITAL LETTER YERU Ь CYRILLIC CAPITAL LETTER SOFT SIGN Э CYRILLIC CAPITAL LETTER E Ю CYRILLIC CAPITAL LETTER YU
0459
042F 0430 0431 0432 0433 0434 0435 0436 0437 0438 0439
Я а б в г д е ж з и й
CYRILLIC CAPITAL LETTER YA CYRILLIC SMALL LETTER A CYRILLIC SMALL LETTER BE CYRILLIC SMALL LETTER VE CYRILLIC SMALL LETTER GHE CYRILLIC SMALL LETTER DE CYRILLIC SMALL LETTER IE CYRILLIC SMALL LETTER ZHE CYRILLIC SMALL LETTER ZE CYRILLIC SMALL LETTER I CYRILLIC SMALL LETTER SHORT I
043A 043B 043C 043D 043E 043F 0440 0441 0442 0443 0444 0445 0446 0447 0448 0449 044A 044B 044C
к л м н о п р с т у ф х ц ч ш щ ъ ы ь
CYRILLIC SMALL LETTER KA CYRILLIC SMALL LETTER EL CYRILLIC SMALL LETTER EM CYRILLIC SMALL LETTER EN CYRILLIC SMALL LETTER O CYRILLIC SMALL LETTER PE CYRILLIC SMALL LETTER ER CYRILLIC SMALL LETTER ES CYRILLIC SMALL LETTER TE CYRILLIC SMALL LETTER U CYRILLIC SMALL LETTER EF CYRILLIC SMALL LETTER HA CYRILLIC SMALL LETTER TSE CYRILLIC SMALL LETTER CHE CYRILLIC SMALL LETTER SHA CYRILLIC SMALL LETTER SHCHA CYRILLIC SMALL LETTER HARD SIGN CYRILLIC SMALL LETTER YERU CYRILLIC SMALL LETTER SOFT SIGN
044D 044E 044F
≡ 0438 и 0306
→ 0185 $ latin small letter tone six э CYRILLIC SMALL LETTER E ю CYRILLIC SMALL LETTER YU я CYRILLIC SMALL LETTER YA
Cyrillic extensions 0450
ѐ
CYRILLIC SMALL LETTER IE WITH GRAVE
0451
ё
CYRILLIC SMALL LETTER IO
0452
ђ
0453
ѓ
0454
є
0455
ѕ
0456
і
0457
ї
0458
ј
0459
• Macedonian ≡ 0435 е 0300 ) • Russian, ... ≡ 0435 е 0308 +
CYRILLIC SMALL LETTER DJE
• Serbian → 0111 - latin small letter d with stroke CYRILLIC SMALL LETTER GJE
• Macedonian ≡ 0433 г 0301 /
CYRILLIC SMALL LETTER UKRAINIAN IE
= Old Cyrillic yest CYRILLIC SMALL LETTER DZE
= Old Cyrillic zelo • Macedonian CYRILLIC SMALL LETTER BYELORUSSIANUKRAINIAN I
= Old Cyrillic i CYRILLIC SMALL LETTER YI
• Ukrainian ≡ 0456 і 0308 +
CYRILLIC SMALL LETTER JE
• Serbian, Azerbaijani, Altay љ CYRILLIC SMALL LETTER LJE • Serbian, Macedonian → 01C9 lj latin small letter lj
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Cyrillic Supplement Range: 0500–052F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Armenian Range: 0530–058F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Hebrew Range: 0590–05FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0590
Hebrew 059
05A
$† $∞
0
05A0
1
05A3
05B3
05C2
05D2
05E2
05F0
05F1
05F2
√ !
05C3
05D3
05E3
05A4
05B4
05C4
05A5
05B5
05C5
05A6
05B6
05C6
05D4
05E4
05F3
05F4
05D5
05E5
05D6
05E6
05A7
05B7
05A8
05B8
05A9
05B9
05C7
05D7
05E7
05D8
05E8
05D9
05E9
$ú $¨ $º
$ù $≠ $Ω
$û $Æ æ
$ü $Ø $ø
059E
F
05E1
059D
E
05D1
$õ $´ $ª 059C
D
05C1
059B
C
05E0
$ö Ç π 059A
B
05B2
$ô $© $π 0599
A
05A2
$ò $® $∏ 0598
9
05D0
$ó $ß $∑ Å 0597
8
05B1
$ñ $¶ $∂ ä 0596
7
05F
05C0
$ï $• $μ Ñ 0595
6
05E
$î $§ $¥ $ƒ " 0594
5
05A1
$ì $£ $≥ 0593
4
05D
$í É $≤ $¬ 0592
3
05B0
05C
$ë $° $± $¡ 0591
2
05B
05FF
059F
05AA
05AB
05AC
05AD
05AE
05AF
05BA
05BB
05BC
05BD
05BE
05BF
05DA
05EA
05DB
05DC
05DD
05DE
05DF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
45
Arabic Range: 0600–06FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0600 060
Arabic 061
062
0
0600
0601
065
066
067
0611
0602
0612
0640
0650
0660
0670
0621
0622
0631
0641
0632
0642
0651
0661
+ ; J 0652
0662
0671
Z
0672
! , < K [
3
0603
0613
068
0623
0633
0643
0653
0663
0673
0680
i
0681
j
0682
k
0683
" - = L \ l
4
0614
5
0615
0624
0634
0644
# .
0625
0635
0645
0654
0664
> M
0655
0665
0674
0684
]
m
0675
0685
$ / N ^ n
6
0626
7
0627
0636
0646
0656
0666
0676
0686
% 0 O _ o 0637
0647
0657
0667
0677
0687
& 1 P ` p
8
0628
9
069
) 9 H X h x
0630
0610
2
0629
0638
0648
0658
0668
0678
0688
' 2 Q a q
0639
0649
0659
0669
0679
0689
0690
06A 06B 06C 06D 06E
06A0
06B0
062A
؛
B
060B
061B
،
C
062C
060C
D
062D
060D
060E
062B
061E
062E
؟ 060F
48
061F
062F
063A
064A
065A
066A
067A
068A
06A1
06B1
065B
5
064C
065C
066B
067B
068B
¸ Ö
06D0
06E0
06C1
06D1
06E1
06A2
06B2
06C2
06D2
06E2
067C
068C
06F1
06F2
{ « » Ê Ù
0693
06A3
06B3
06C3
06D3
06E3
06F3
| ¬ ¼ Ë Ú
0694
06A4
06B4
}
0695
06A5
06B5
~
0696
06A6
06B6
0697
06A7
06B7
0698
06A8
06B8
¡
0699
06A9
06B9
069A
06AA
06BA
069B
06AB
06BB
06C4
06D4
06E4
06F4
½
Ì
Û
06C5
06D5
06E5
069C
06AC
06BC
06F5
® ¾ Í Ü
06C6
06D6
06E6
06F6
¯ ¿ Î Ý
06C7
06D7
06E7
06F7
° À Ï Þ
06C8
06D8
06E8
06F8
± Á ۩ ß
06C9
06D9
06E9
06F9
² Â Ñ à
06CA
06DA
06EA
06FA
³ ۛ Ò á
06CB
06DB
06EB
T d t ¤ ´ Ä Ó
066C
06F0
z ª º Ø
0692
4 S c s £
064B
¨
06C0
06F
y © ¹ ×
0691
( 3 R b r ¢
A
F
064
* : I Y
1
E
063
06FF
06CC
06DC
06EC
06FB
â
06FC
6 U e u
¥ µ ã
064D
065D
066D
067D
068D
069D
06AD
7 V f v
064E
8
064F
065E
066E
067E
068E
069E
06AE
06BD
067F
068F
069F
06AF
06DD
06ED
¦ ¶ ۞ Ô
06BE
W g w §
066F
06CD
06BF
06CE
06DE
06EE
·
06CF
06DF
06EF
06FD
ä
06FE
06FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0600
Arabic
Subtending marks 0600 0601 0602 0603
ARABIC NUMBER SIGN ARABIC SIGN SANAH ARABIC FOOTNOTE MARKER ARABIC SIGN SAFHA
Currency sign 060B
AFGHANI SIGN
Punctuation 060C
،
ARABIC COMMA
• also used with Thaana and Syriac in modern text
060D
→ 002C , comma ARABIC DATE SEPARATOR
Poetic marks 060E 060F
ARABIC POETIC VERSE SIGN ARABIC SIGN MISRA
Honorifics 0610
0611
0612
0613
0614
ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM
• represents sallallahu alayhe wasallam “may God’s peace and blessings be upon him” • represents alayhe assalam “upon him be peace” ARABIC SIGN RAHMATULLAH ALAYHE
• represents rahmatullah alayhe “may God have mercy upon him” ARABIC SIGN RADI ALLAHOU ANHU
• represents radi allahu ’anhu “may God be pleased with him” ARABIC SIGN TAKHALLUS
ARABIC LETTER WAW WITH HAMZA ABOVE
0625
ARABIC LETTER ALEF WITH HAMZA BELOW
0626
0627 0628 0629 062A 062B 062C 062D 062E 062F 0630 0631 0632 0633 0634 0635 0636 0637 0638 0639
063A 063B 063C 063D 063E 063F 0640
" " " " "
• sign placed over the name or nom-de-plume of
Koranic annotation sign
0624
ARABIC SIGN ALAYHE ASSALLAM
a poet, or in some writings used to mark all proper names
0615
0652
ARABIC SMALL HIGH TAH
• marks a recommended pause position in some
Korans published in Iran and Pakistan • should not be confused with the small TAH sign used as a diacritic for some letters such as 0679 : ؛
ARABIC SEMICOLON
• also used with Thaana and Syriac in modern text
→ 003B ; semicolon
061C "
≡ 0627 0655
ARABIC LETTER YEH WITH HAMZA ABOVE
≡ 064A 0654
ARABIC LETTER ALEF ARABIC LETTER BEH ARABIC LETTER TEH MARBUTA ARABIC LETTER TEH ARABIC LETTER THEH ARABIC LETTER JEEM ARABIC LETTER HAH ARABIC LETTER KHAH ARABIC LETTER DAL ARABIC LETTER THAL ARABIC LETTER REH ARABIC LETTER ZAIN ARABIC LETTER SEEN ARABIC LETTER SHEEN ARABIC LETTER SAD ARABIC LETTER DAD ARABIC LETTER TAH ARABIC LETTER ZAH ARABIC LETTER AIN
→ 01B9 latin small letter ezh reversed → 02BF modifier letter left half ring ARABIC LETTER GHAIN
= kashida • inserted to stretch characters • also used with Syriac
0641 0642 0643 0644 0645 0646 0647 0648 0649
! " # $ %
ARABIC LETTER FEH ARABIC LETTER QAF ARABIC LETTER KAF ARABIC LETTER LAM ARABIC LETTER MEEM ARABIC LETTER NOON ARABIC LETTER HEH ARABIC LETTER WAW ARABIC LETTER ALEF MAKSURA
064A
ARABIC LETTER YEH
Punctuation 061B
≡ 0648 0654
• represents YEH-shaped letter with no dots in any positional form
Points from ISO 8859-6 064B 064C 064D 064E 064F 0650 0651 0652
& ' ( ) * + , -
ARABIC FATHATAN ARABIC DAMMATAN ARABIC KASRATAN ARABIC FATHA ARABIC DAMMA ARABIC KASRA ARABIC SHADDA ARABIC SUKUN
Based on ISO 8859-6
• marks absence of a vowel after the base
0621
=
ARABIC LETTER HAMZA
• used in some Korans to mark a long vowel as
0622
>
ARABIC LETTER ALEF WITH MADDA ABOVE
0623
@
• can have a variety of shapes, including a circular one and a shape that looks like ‘’ → 06E1 arabic small high dotless head of khah
→ 02BE modifier letter right half ring ≡ 0627 0653 ?
ARABIC LETTER ALEF WITH HAMZA ABOVE
≡ 0627 0654
consonant ignored
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
49
Syriac Range: 0700–074F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0700
Syriac 070 0
܁
0701
2
܅
0705
6
܆
0706
7
܈
0708
9
܉
0709
A
܍
070D
0731
0741
0712
0722
0732
0742
0713
0723
0733
0743
0714
0724
0734
0744
݅ ܵ ܥ ܕ
0715
0725
0735
0745
݆ ܶ ܦ ܖ
0716
0726
0736
0746
0717
0727
0737
0747
݈ ܸ ܨ ܘ
0718
ܙ
0719
0728
071A
0738
0748
݉ ܹ ܩ 0729
072A
0739
0749
ܺ ݊
073A
074A
071B
072B
073B
071C
072C
073C
ݍ ܽ ܭ ܝ
071D
072D
073D
074D
ݎ ܾ ܮܞ
E
F
0721
ܼ ܬ ܜ ܌
070C
D
0740
ܻ ܫ ܛ ܋ 070B
C
0730
ܑ ݁ ܱ ܡ
0711
ܪ ܚ ܊
070A
B
0720
݇ ܷ ܧ ܗ ܇
0707
8
0710
݄ ܴ ܤ ܔ ܄
0704
5
074
݃ ܳ ܣ ܓ ܃
0703
4
073
݂ ܲ ܢ ܒ ܂
0702
3
072
݀ ܰ ܠ ܐ ܀ 0700
1
071
074F
071E
072E
073E
074E
071F
072F
073F
074F
ݏ ܿ ܯ ܟ 070F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
53
Arabic Supplement Range: 0750–077F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Thaana Range: 0780–07BF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
NKo Range: 07C0–07FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Devanagari Range: 0900–097F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0900
Devanagari 090
091
0910
0951
0961
0912
0922
0932
0942
0952
0962
0913
0923
0933
0943
0953
0914
0924
0934
0944
0954
0963
।
0964
0915
0916
0917
0925
0926
0927
0935
0936
0937
0965
0945
0966
0946
0967
0947
0918
0928
0938
0948
0958
0968
0919
0929
0939
0949
0959
0969
ऋ छ फ
$ो ज़ ५ ॻ
091A
091B
094A
092A
094B
092B
095A
095B
096A
096B
097B
ऌ ज ब $़ $ौ ड़ ६ ॼ 091C
092C
093C
094C
095C
096C
097C
ऍ झ भ ऽ $् ढ़ ७ ॽ 091D
092D
093D
094D
095D
096D
097D
ऎ ञ म $ा
फ़ ८ ॾ
ए ट य $ि
य़ ९ ॿ
090F
60
0941
$ॊ ग़ ४
090E
F
0931
ऊ च प
090D
E
0970
उ ङ ऩ ह $ॉ ख़ ३
090C
D
097
ई घ न स $ै क़ २
090B
C
0921
१
090A
B
0911
इ ग ध ष $े
0909
A
0960
०
0908
9
0950
आ ख द श $ॆ 0907
8
0940
॥
0906
7
0930
अ क थ व $ॅ 0905
6
0920
ऄ औ त ऴ $ॄ $॔ 0904
5
096
$ः ओ ण ळ $ृ $॓ $ॣ 0903
4
095
$ं ऒ ढ ल $ू $॒ $ॢ 0902
3
094
$ँ ऑ ड ऱ $ु $॑ ॡ 0901
2
093
ऐ ठ र $ी ॐ ॠ ॰
0
1
092
097F
091E
091F
092E
092F
093E
093F
095E
095F
096E
096F
097E
097F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Bengali Range: 0980–09FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0980
Bengali 098
099
0990
$‚ Ú
$É ì £
$√
$„ Û
î §
$ƒ
Ù
09A2
0993
09E2
09E3
09C3
09A3
0995
0996
09F3
ı 09F5
09A5
09A6
09F2
09F4
09C4
09A4
Ê ˆ 09E6
09B6
0997
09A7
09B7
09C7
09D7
09F6
0998
09A8
09B8
09E8
09C8
09E9
09B9
0999
09EA
09AA
099B
09F9
09EB
09CB
09AB
09FA
Î
$À
ã õ ´ 098B
09F8
Í ˙
ä ö ™ 099A
09F7
È ˘
π
â ô
09E7
Ë ¯
à ò ® ∏ $»
098A
å ú ¨ $º $Ã ‹ Ï 098C
099C
09AC
09BC
09CC
09DC
09EC
ù ≠ Ω $Õ › Ì
D
099D
09AD
09BD
09CD
09DD
099E
09AE
09BE
è ü Ø $ø 098F
099F
09AF
09BF
09ED
Ó
û Æ $æ Œ
E
F
09C2
09F1
á ó ß ∑ $« $◊ Á ˜
0989
C
09B2
Ü ñ ¶ ∂
0988
B
09E1
Ö ï •
0987
A
09F0
¢ ≤ $¬
0986
9
09E0
$Ç
0985
8
‡
09C0
09C1
09A1
0994
7
09B0
09F
· Ò
4
6
09A0
09E
$¡
0983
5
09D
°
0982
3
09C
$Å
0981
2
09B
ê † ∞ $¿
0
1
09A
09FF
09CE
09EE
fl Ô 09DF
09EF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
63
Gurmukhi Range: 0A00–0A7F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0A00
Gurmukhi 0A0 0
0A1
¢ ≤ $¬
Ú
0A22
0A72
0A42
î §
Ù
0A13
0A23
0A73
0A33
0A74
0A24
Ö ï • μ 0A15
0A25
0A35
Ê
Ü ñ ¶ ∂ 0A16
0A26
0A36
0A66
$«
Á
à ò ® ∏ $»
Ë
á ó ß 0A17
0A18
0A28
0A38
0A68
0A48
Ÿ È
π 0A39
0A19
0A67
0A47
0A27
â ô 0A09
0A59
0A69
ä ö ™
⁄ Í
õ ´
$À ¤ Î
0A0A
B
0A1A
0A1B
0A5A
0A2A
0A4B
0A2B
0A5B
0A6A
0A6B
ú ¨ $º $Ã ‹ Ï
C
0A1C
0A2C
0A3C
0A1D
0A4C
0A5C
0A2D
0A6C
Ì
$Õ
ù ≠
D
0A6D
0A4D
û Æ $æ
fi Ó
è ü Ø $ø
Ô
E
0A1E
0A0F
66
0A32
Û
0A08
F
0A71
0A41
$É ì £ ≥
0A07
A
0A70
0A40
$Ç
0A06
9
0A30
$Ò
0A05
8
0A20
0A21
0A14
7
0A7
$¡
4
6
0A6
°
0A03
5
0A5
$Å
0A02
3
0A4
$
0A01
2
0A3
ê † ∞ $¿ 0A10
1
0A2
0A7F
0A1F
0A2E
0A2F
0A3E
0A3F
0A5E
0A6E
0A6F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Gujarati Range: 0A80–0AFF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Oriya Range: 0B00–0B7F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0B00
Oriya 0B0
0B1
ê †
0
0B10
1
¢
$Ç
0B22
0B13
0B14
0B71
≤ $¬
0B32
0B33
0B42
0B43
0B24
0B15
0B25
0B35
0B16
0B26
$÷
0B56
0B36
0B17
0B27
0B37
0B18
0B28
0B38
0B47
0B57
Ê
0B66
0B1A
å
0B0C
0B1B
0B68
È
π
0B69
Í 0B6A
0B2A
Î
$À 0B4B
0B2B
0B6B
ú ¨ $º $Ã ‹ Ï
0B1C
0B2C
ù ≠
D
0B67
Ë
0B48
0B39
0B19
ã õ ´ 0B0B
0B1D
0B2D
0B3C
0B4C
0B5C
0B6C
Ω $Õ › Ì
0B3D
0B4D
0B5D
0B6D
û Æ $æ
Ó
è ü Ø $ø
fl Ô
E
0B1E
F
0B61
0B41
ä ö ™ 0B0A
C
0B23
â ô 0B09
B
0B70
· Ò
$¡
à ò ® ∏ $» 0B08
A
0B60
á ó ß ∑ $« $◊ Á 0B07
9
‡
0B40
Ü ñ ¶ ∂ 0B06
8
0B7
Ö ï • μ
0B05
7
0B6
î §
4
6
0B5
$É ì £ ≥ $√ 0B03
5
0B4
∞ $¿
0B30
0B21
0B02
3
0B20
0B3
°
$Å
0B01
2
0B2
0B7F
0B0F
0B1F
0B2E
0B2F
0B6E
0B3E
0B3F
0B5F
0B6F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
71
Tamil Range: 0B80–0BFF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0B80
Tamil 0B8
0B9
0BA
ê
0
0BB
1
$Ç í 0B82
3
Ò
≤ $¬
Ú
Ö ï
μ
ı
Ü
∂ $∆
á
∑ $« $◊ Á ˜
0BA3
0BA4
0BF4
0BB4
0BB6
0BB7
0B87
0BF5
0BA8
0BB8
0BC7
0BC8
0B99
0BA9
0B8A
0B9A
B
ú
D
Ï
$Õ
Ì
0B8F
74
0BED
Ô
0B9F
0BAF
0BBE
0BBF
0BFA
0BEC
è ü Ø $ø
0BAE
0BF9
0BEB
Ó
0B9E
0BF8
$Ã
0BEA
é û Æ $æ 0B8E
F
0BE8
Î
0BCD
E
˯
$À 0BCC
0B9C
0BF7
Í ˙
0BCB
C
0BE7
0BF6
$
0BCA
0BAA
0BD7
0BE9
0BB9
ä ö ™
0BE6
È ˘
â ô ©π 0B89
Ê ˆ
0BC6
® ∏ $»
à 0B88
A
0BF3
0BB3
0BB5
0B95
0B86
9
0BF2
0BC2
Ù
0B85
8
0BF1
0BC1
î § ¥
0B93
0B94
7
0BF0
Û
4
6
0BC0
É ì £ ≥
0B83
5
0BF
± $¡ 0BB2
0B92
0BE
0BB1
2
0BD
∞ $¿
0BB0
0B90
0BC
0BFF
0BEE
0BEF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Telugu Range: 0C00–0C7F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Kannada Range: 0C80–0CFF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Malayalam Range: 0D00–0D7F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0D00
Malayalam 0D0
0D1
0D2
0D10
1
° ± ¡
·
0D12
0D13
0D14
0D43
0D24
0D34
0D15
0D25
0D35
0D16
0D26
0D36
Ê
0D66
0D46
0D17
0D27
0D37
0D47
0D18
0D28
0D38
0D48
π
0D67
Ë 0D68
È
0D39
0D19
0D57
0D69
ã õ ´
À 0D4B
0D6B
å ú ¨
Ã
0D2C
0D4C
Ï
ù ≠
Õ
0D0C
D
0D1A
0D1B
0D1C
0D1D
0D4A
0D2A
0D2B
0D4D
0D2D
Í
0D6A
Î
0D6C
Ì
0D6D
é û Æ æ
Ó
è ü Ø ø
Ô
0D0E
F
0D33
0D0B
E
0D42
ä ö ™ 0D0A
C
0D23
â ô 0D09
B
0D32
àò ® ∏» 0D08
A
0D22
á ó ß ∑ « ◊ Á 0D07
9
0D61
Ü ñ ¶ ∂ ∆ 0D06
8
0D41
Ö ï • μ 0D05
7
0D31
0D60
0D40
î § ¥
4
6
0D30
0D7
É ì £ ≥ √ 0D03
5
0D6
Ç í ¢ ≤ ¬ 0D02
3
0D5
‡
0D20
0D21
2
0D4
∞ ¿
ê †
0
0D3
0D7F
0D0F
0D1E
0D1F
0D2E
0D2F
0D3E
0D3F
0D6E
0D6F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
81
Sinhala Range: 0D80–0DFF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0D80
Sinhala 0D8
0D9
0D90
0D91
0D95
0DA2
0DA3
0DA4
0DC2
0DB3
0DB4
0DC3
0DC4
0DD2
0DD3
0DD4
0DA5
0DB5
0DF2
0DF3
0DF4
0DC5
0D96
0DA6
0DB6
0DC6
0DD6
à
® ∏
ÿ
â
© π
Ÿ
0DA7
0DA8
0DA9
0DB7
0DB8
0DD8
0DB9
0DD9
ä ö ™ ∫ ⁄ 0D9A
0DAA
0DBA
0DCA
0DDA
ã õ ´ ª
¤
å ú ¨
‹
0D8C
0D9B
0D9C
0DAB
0DBB
0DDB
0DAC
0DDC
ç ù ≠ Ω
›
é û Æ
fi
è ü Ø
œ fl
0D8D
0D8E
0D8F
84
0DD1
ß ∑
0D8B
F
0DC1
á
0D8A
E
0DB1
Ü ñ ¶ ∂ ∆ ÷
0D89
D
0DA1
Ö ï • μ ≈
0D88
C
0DD0
Ù
0D93
0D87
B
0DC0
î § ¥ ƒ ‘
0D86
A
0DB0
Û
0D85
9
0DF
É ì£ ≥ √ ”
0D92
0D94
8
0DE
Ú
4
7
0DD
¬ “
0D83
6
0DA0
Ç í ¢ 0D82
5
0DC
ë ° ± ¡ —
1
3
0DB
ê † ∞ ¿ –
0
2
0DA
0DFF
0D9D
0D9E
0D9F
0DAD
0DBD
0DDD
0DAE
0DAF
0DDE
0DCF
0DDF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Thai Range: 0E00–0E7F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0E00
Thai 0E0
0E1
0E10
0E14
0E24
0E34
0E15
0E25
0E35
0E16
0E26
0E36
0E17
0E27
0E37
0E18
0E28
0E38
0E19
0E29
0E39
0E1A
0E2A
0E3A
0E1B
0E51
? O
0E42
0E52
@ P
0E43
0E53
A Q
0E44
0E54
B
R
0E45
0E55
0E46
0E56
0E47
0E57
0E48
0E58
0E49
0E59
0E1C
0E1D
0E0F
0E1E
0E1F
0E4A
0E5A
H X
0E2B
0E4B
0E5B
I
0E4C
0E2C
J
0E4D
0E2D
K
/ 0E0E
F
4
0E33
. 0E0D
E
0E23
0E0C
D
0E13
3
0E32
, 0E0B
C
0E22
0E41
+ ; G W
0E0A
B
0E50
* : F V 0E09
A
M
0E7
) 9 E U 0E08
9
=
0E40
0E6
( 8 D T
0E07
8
0E5
' 7 C S 0E06
7
0E12
0E31
& 6 0E05
6
0E30
% 5 0E04
5
0E21
$ 0E03
4
0E11
#
0E02
3
0E20
0E4
" 2 > N 0E01
2
0E3
! 1
0
1
0E2
0E7F
0E4E
0E2E
0 < L
0E2F
0E3F
0E4F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
87
Lao Range: 0E80–0EFF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0E80
Lao 0E8
0E9
0EA
0EB
∞
0
0EB0
1
0EA1
0EA2
0EA3
0EA5
0EC1
0ED1
0EC2
0ED2
0EC3
0ED3
0EC4
0ED4
’
0ED5
0EB5
0EC6
0ED6
◊ á ó ß ∑ ∏ » ÿ à 0E87
0E97
0EA7
0E88
0E99
0EB8
0EC8
0ED8
0EB9
0EC9
0ED9
ä ö ™
0E8A
0E9A
0ED7
0EB7
π … Ÿ
ô
9
0ECA
0EAA
õ ´ ª À
B
0E9B
0EAB
0EBB
0ECB
º Ã ‹
ú
C
0EBC
0E9C
0ECC
0EDC
ç ù ≠ Ω Õ ›
0E8D
0E9D
0EAD
0EBD
0ECD
0EDD
û Æ
0E9E
0EAE
ü Ø
0E9F
90
0EB3
0EB6
0E96
F
0ED0
∂ ∆ ÷
ñ
6
E
0EB2
0EB4
0E94
0E95
D
0EB1
ï • μ
5
A
¿ –
0EC0
¥ ƒ ‘
Ñ î 0E84
8
0EF
£ ≥ √ ”
3
7
0EE
¢ ≤ ¬ “
Ç 0E82
4
0ED
° ± ¡ —
Å 0E81
2
0EC
0EFF
0EAF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Tibetan Range: 0F00–0FFF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
0F00 0F0
0F00
0F2
0F3
0F4
0F5
0F6
0F7
0F10
0F20
0F30
0F40
0F50
0F8
0FFF 0F9 0FA 0FB 0FC 0FD 0FE 0FF
- < 0F80
0F60
0F90
0FA0
0FB0
0FC0
Ü 0FD0
" 2 @ O _ i . = á
1
0F01
0F11
0F21
0F31
0F41
0F51
0F61
0F71
0F81
0F91
0FA1
0FB1
0FC1
0FD1
# 3 A P ` j / >
2
0F02
0F12
0F22
0F32
0F42
0F52
0F62
0F72
0F82
0F92
$ 4 B Q a k
3
0F03
0F13
0F23
0F33
0F43
0F53
0F63
0F73
0F83
0F93
0FA2
0FB2
0FA3
0FB3
0FC2
0 ?
0FC3
% 5 C R b l ! 1 @
4
0F04
0F14
0F24
0F34
0F44
0F54
0F64
0F74
0F84
0F94
0FA4
0FB4
0FC4
& 6 D S c m " 2 A
5
0F05
0F15
0F25
0F35
0F45
0F55
0F65
0F75
0F85
0F95
0FA5
0FB5
0FC5
0F06
0F16
0F26
0F36
0F46
0F56
0F66
0F76
0F86
0F96
0FA6
0FB6
0FC6
0F07
0F17
0F27
0F37
0F47
0F57
0F67
0F77
0F87
0F97
0FA7
0FB7
0FC7
' 7 E T d n # 3 B
6
( 8 F U e o $ 4 C
7
V f p
) 9
8
0F08
0F18
0F28
0F38
0F09
0F19
0F29
0F39
0F58
0F68
0F78
0F88
0F59
0F69
0F79
0F89
% 5 D
0FA8
0FB8
0FC8
0FA9
0FB9
0FC9
* : G W g q & 6 E
9
0F49
0F99
+ H X h r ' 7 F
A
0F0A
0F1A
0F2A
0F3A
0F4A
0F5A
, I Y
B
C
0F7A
- ; J Z
t
0F1B
0F2B
0F3B
0F4B
0F8A
0F9A
0FAA
0F8B
0F9B
0FAB
0FAC
0F7C
0F9C
. < K [
u
*
0F2C
0F3C
0F4C
0FBB
0FBC
0F7D
0F9D
/ = L \
v
+ :
0F0E
0F0F
0F1D
0F1E
0F1F
0F2D
0F2E
0F3D
0F3E
0F4D
0F4E
0F5E
0 > M ] 0F2F
0F3F
0F4F
0F5F
0F7E
w 0F7F
0FCB
0F9E
0FCC
0FAD
0F5D
0F0D
0FCA
) 9 H
0F5C
0F1C
0FBA
s ( 8 G
0F7B
0F0C
D
0F6A
0F5B
0F0B
F
0F1
! 1 ? N ^
0
E
Tibetan
0FAE
0FBE
, ; I 0F9F
0FAF
0FBF
0FCF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
93
Myanmar Range: 1000–109F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Georgian Range: 10A0–10FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Hangul Jamo Range: 1100–11FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
1100 110
1100
112
113
114
115
116
117
118
119
11A 11B 11C 11D 11E
11F
1110
1120
1130
1140
1150
1160
1170
1180
1190
11A0
11B0
11C0
11D0
11E0
11F0
ᄁᄑᄡᄱᅁᅑᅡᅱᆁᆑᆡᆱᇁᇑᇡᇱ
1
1101
1111
1121
1131
1141
1151
1161
1171
1181
1191
11A1
11B1
11C1
11D1
11E1
11F1
ᄂᄒᄢᄲᅂᅒᅢᅲᆂᆒᆢᆲᇂᇒᇢᇲ
2
1102
3
4
1142
1152
1162
1172
1182
1192
11A2
11B2
11C2
11D2
11E2
11F2
ᆴᇄᇔᇤᇴ
ᄅᄕᄥᄵᅅᅕᅥᅵᆅᆕ
ᆵᇅᇕᇥᇵ
ᄆᄖᄦᄶᅆᅖᅦᅶᆆᆖ
ᆶᇆᇖᇦᇶ
ᄇᄗᄧᄷᅇᅗᅧᅷᆇᆗ
ᆷᇇᇗᇧᇷ
1106
7
1132
ᄄᄔᄤᄴᅄᅔᅤᅴᆄᆔ 1105
6
1122
ᆳᇃᇓᇣᇳ
1104
5
1112
ᄃᄓᄣᄳᅃᅓᅣᅳᆃᆓ 1103
1107
1113
1114
1115
1116
1117
1123
1124
1125
1126
1127
1133
1134
1135
1136
1137
1143
1144
1145
1146
1147
1153
1154
1155
1156
1157
1163
1164
1165
1166
1167
1173
1174
1175
1176
1177
1183
1184
1185
1186
1187
11B3
1193
11B4
1194
11B5
1195
11B6
1196
11B7
1197
11C3
11C4
11C5
11C6
11C7
11D3
11D4
11D5
11D6
11D7
11E3
11E4
11E5
11E6
11E7
11F3
11F4
11F5
11F6
11F7
ᄈᄘᄨᄸᅈᅘᅨᅸᆈᆘᆨᆸᇈᇘᇨᇸ
8
1108
1118
1128
1138
1148
1158
1168
1178
1188
1198
11A8
11B8
11C8
11D8
11E8
11F8
ᄉᄙᄩᄹᅉᅙᅩᅹᆉᆙᆩᆹᇉᇙᇩᇹ
9
1109
A
B
1139
1149
1159
1169
1179
1189
1199
11A9
11B9
11C9
11D9
11E9
ᄋᄛᄫᄻᅋ
ᅫᅻᆋᆛᆫᆻᇋᇛᇫ
ᄌᄜᄬᄼᅌ
ᅬᅼᆌᆜᆬᆼᇌᇜᇬ
ᄍᄝᄭᄽᅍ
ᅭᅽᆍᆝᆭᆽᇍᇝᇭ
ᄎᄞᄮᄾᅎ
ᅮᅾᆎᆞᆮᆾᇎᇞᇮ
110C
D
1129
ᅪᅺᆊᆚᆪᆺᇊᇚᇪ
110B
C
1119
ᄊᄚᄪᄺᅊ 110A
110D
110E
F
111
11FF
ᄀᄐᄠᄰᅀᅐᅰᆀᆐᆠᆰᇀᇐᇠᇰ
0
E
Hangul Jamo
111A
111B
111C
111D
111E
112A
112B
112C
112D
112E
113A
113B
113C
113D
113E
116A
114A
116B
114B
116C
114C
116D
114D
116E
114E
117A
117B
117C
117D
117E
118A
118B
118C
118D
118E
119A
119B
119C
119D
119E
11AA
11AB
11AC
11AD
11AE
11BA
11BB
11BC
11BD
11BE
11CA
11CB
11CC
11CD
11CE
11DA
11DB
11DC
11DD
11DE
11F9
11EA
11EB
11EC
11ED
11EE
ᄏᄟᄯᄿᅏᅯᅿᆏᆟᆯᆿᇏᇟᇯ 110F
111F
112F
113F
114F
115F
116F
117F
118F
119F
11AF
11BF
11CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
11DF
11EF
101
Ethiopic Range: 1200–137F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
1200
Ethiopic 120
0
1250
1260
1270
1280
1290
12A0
12B0
1211
1221
1231
1241
1251
1261
1271
1281
1291
12A1
1212
1222
1232
1242
1252
1262
1272
1282
1292
12A2
12B2
1213
1223
1233
1243
1253
1263
1273
1283
1293
12A3
12B3
1214
1224
1234
1244
1254
1264
1274
1284
1294
12A4
12B4
1215
1225
1235
1245
1255
1265
1275
1285
1295
12A5
12B5
1216
1226
1236
1246
1256
1217
1227
1237
1266
1276
1286
1296
12A6
b r 1267
1247
1277
1287
1297
12A7
1218
1228
1238
1248
1258
1219
1229
1268
1278
1288
d t 1269
1239
1298
12A8
12B8
¡ ® 1299
1279
12A9
12B9
121A
122A
123A
124A
125A
126A
127A
128A
129A
12AA
12BA
121B
122B
123B
124B
125B
126B
127B
128B
129B
12AB
12BB
121C
122C
123C
124C
125C
126C
127C
128C
129C
12AC
12BC
121D
122D
123D
124D
125D
126D
127D
128D
129D
12AD
12BD
0 @
i y
¦ ³
! 1 A
j z
§
120E
F
1240
/ ? N Z h x ¥ ² 120D
E
1230
. > M Y g w ¤ ± 120C
D
12B
- = L X f v
£ ° 120B
C
12A
, < K W e u ¢ ¯ 120A
B
1220
+ ; 1209
A
129
* : J V c s 1208
9
1210
) 9 I 1207
8
128
( 8 H U a q O 1206
7
127
' 7 G T ` p E ¬ 1205
6
126
& 6 F S _ o « 1204
5
125
% 5 E R ^ n ~ ª 1203
4
124
$ 4 D Q ] m } I © 1202
3
123
# 3 C P \ l | U 1201
2
122
" 2 B O [ k { A ¨ 1200
1
121
12BF
120F
121E
121F
122E
122F
123E
123F
126E
126F
127E
127F
129E
129F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
12AE
12BE
12AF
105
12C0
Ethiopic 12C
0
12D
12D0
12D1
12D2
12E0
12E1
12E2
12F1
1320
1321
1301
1330
1331
1340
1350
0 @ 1341
1351
1360
1370
L
\
1361
12F2
1302
1312
1322
1332
1342
1352
1362
1371
]
1372
a
& 6 F
b
12D5
12E4
12E5
12F3
12F4
12F5
1303
1304
1305
1313
1314
1315
12E6
12F6
¹ È
12D8
12F7
1323
1324
1325
1326
1306
× ç ÷
1327
1307
1333
1334
1335
1336
1337
1343
1344
1345
1346
1347
1353
1354
1355
1356
1357
1363
1364
1365
1366
R
1367
1373
1374
1375
1376
1377
Ø è ø ' 7 G S c
12E8
12F8
1308
1318
1328
1338
1348
1358
1368
1378
º É Ù é ù ( 8 H T d 12D9
12E9
12F9
1309
1319
1329
1339
1349
1359
1369
» Ê Ú ê ú ) 9 I U 12CA
12DA
12EA
12FA
130A
131A
132A
133A
134A
135A
¼ Ë Û ë û * : 12CB
12DB
12EB
12FB
130B
131B
132B
133B
12DC
12EC
12FC
130C
131C
132C
133C
V
12DD
12ED
12FD
130D
131D
132D
133D
12DE
12EE
12FE
130E
131E
132E
133E
e
137A
f
137B
137C
X 136D
134D
¿ Î Þ î þ - = 12CE
1379
W g
136C
134C
¾ Í Ý í ý , < 12CD
136A
136B
134B
½ Ì Ü ì ü + ; 12CC
Y
136E
134E
À Ï ß ï . > $J Z 12CF
106
1310
% 5 E Q
12C9
F
1300
O Ö æ ö
12C8
E
12F0
`
12E3
12E7
D
137
¸ E Õ å õ $ 4 D P
12D4
7
C
136
_
12D3
12D6
B
135
· Å Ô ä ô # 3 C O
6
A
134
^
12C5
9
133
2 B N
12C4
8
132
¶ Ä Ó ã ó " 12C3
5
131
µ I Ò â ò ! 1 A M 12C2
4
130
U Ñ á ñ
1
3
12F
´ A Ð à ð / ? K [ 12C0
2
12E
137F
12DF
12EF
12FF
130F
131F
132F
133F
134F
135F
136F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Ethiopic Supplement Range: 1380–139F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Cherokee Range: 13A0–13FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Unified Canadian Aboriginal Syllabics Range: 1400–167F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
1400 140
Unified Canadian Aboriginal Syllabics 141
1410
1450
1460
1470
1480
1490
14A0
14B0
14C0
14D0
1411
1421
1431
1441
1451
1461
1471
1481
1491
14A1
14B1
14C1
14D1
1412
1422
1432
1442
1452
1462
1472
1482
1492
14A2
14B2
14C2
14D2
1413
1423
1433
1443
1453
1463
1473
1483
1493
14A3
14B3
14C3
14D3
1414
1424
1434
1444
1454
1464
1474
1484
1494
14A4
14B4
14C4
14D4
1415
1425
1435
1445
1455
1465
1475
1485
1495
14A5
14B5
14C5
14D5
1416
1426
1436
1446
1456
1466
1476
1486
1496
14A6
14B6
14C6
14D6
1417
1427
1437
1447
1457
1467
1477
1487
1497
14A7
14B7
14C7
14D7
1418
1428
1438
1448
1458
1468
1478
1488
1498
14A8
14B8
14C8
14D8
1419
1429
1439
1449
1459
1469
1479
1489
1499
14A9
14B9
14C9
14D9
141A
142A
143A
144A
145A
146A
147A
148A
149A
14AA
14BA
14CA
14DA
141B
142B
143B
144B
145B
146B
147B
148B
149B
14AB
14BB
14CB
14DB
141C
142C
143C
144C
145C
146C
147C
148C
149C
14AC
14BC
14CC
14DC
141D
142D
143D
144D
145D
146D
147D
148D
149D
14AD
14BD
14CD
14DD
/ ? O _ o ¯ ¿ Ï ß 140E
F
1440
. > N ^ n ~ ® ¾ Î Þ 140D
E
1430
- = M ] m } ½ Í Ý 140C
D
1420
, < L \ l | ¬ ¼ Ì Ü 140B
C
14D
+ ; K [ k { « » Ë Û 140A
B
14C
* : J Z j z ª º Ê Ú 1409
A
14B
) 9 I Y i y © ¹ É Ù 1408
9
14A
( 8 H X h x ¨ ¸ È Ø 1407
8
149
' 7 G W g w § · Ç × 1406
7
148
& 6 F V f v ¦ ¶ Æ Ö 1405
6
147
% 5 E U e u
¥ µ Å Õ 1404
5
146
$ 4 D T d t ¤ ´ Ä Ô 1403
4
145
# 3 C S c s £ ³ Ã Ó 1402
3
144
" 2 B R b r ¢ ² Â Ò 1401
2
143
! 1 A Q a q ¡ ± Á Ñ
0
1
142
14DF
140F
141E
141F
142E
143E
144E
145E
146E
147E
148E
149E
14AE
14BE
14CE
14DE
0 @ P ` p ° À Ð à 142F
143F
144F
145F
146F
147F
148F
149F
14AF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
14BF
14CF
14DF
113
14E0 14E 0
1520
1530
1540
1550
1560
1570
1580
1590
15A0
14F1
1501
1511
1521
1531
1541
1551
1561
1571
1581
1591
15A1
14F2
1502
1512
1522
1532
1542
1552
1562
1572
1582
1592
15A2
14F3
1503
1513
1523
1533
1543
1553
1563
1573
1583
1593
15A3
14F4
1504
1514
1524
1534
1544
1554
1564
1574
1584
1594
15A4
14F5
1505
1515
1525
1535
1545
1555
1565
1575
1585
1595
15A5
14F6
1506
1516
1526
1536
1546
1556
1566
1576
1586
1596
15A6
14F7
1507
1517
1527
1537
1547
1557
1567
1577
1587
1597
15A7
14F8
1508
1518
1528
1538
1548
1558
1568
1578
1588
1598
15A8
14F9
1509
1519
1529
1539
1549
1559
1569
1579
1589
1599
15A9
14FA
150A
151A
152A
153A
154A
155A
156A
157A
158A
159A
15AA
14FB
150B
151B
152B
153B
154B
155B
156B
157B
158B
159B
15AB
14FC
150C
14FD
150D
151C
151D
152C
153C
154C
155C
156C
157C
158C
159C
15AC
0 @ P ` p ° 152D
153D
154D
155D
156D
157D
158D
159D
15AD
14FE
150E
151E
152E
153E
154E
155E
156E
157E
158E
159E
15AE
ð " 2 B R b r ¢ ² 14EF
114
1510
ï ! 1 A Q a q ¡ ± 14EE
F
1500
î þ 14ED
E
14F0
í ý / ? O _ o ¯ 14EC
D
15A
ì ü . > N ^ n ~ ® 14EB
C
159
ë û - = M ] m } 14EA
B
158
ê ú , < L \ l | ¬ 14E9
A
157
é ù + ; K [ k { « 14E8
9
156
è ø * : J Z j z ª 14E7
8
155
ç ÷ ) 9 I Y i y © 14E6
7
154
æ ö ( 8 H X h x ¨ 14E5
6
153
å õ ' 7 G W g w § 14E4
5
152
ä ô & 6 F V f v ¦ 14E3
4
151
ã ó % 5 E U e u
¥ 14E2
3
150
â ò $ 4 D T d t ¤ 14E1
2
14F
15AF
á ñ # 3 C S c s £ 14E0
1
Unified Canadian Aboriginal Syllabics
14FF
150F
151F
152F
153F
154F
155F
156F
157F
158F
159F
15AF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
15B0 15B 0
15E0
15F0
1600
1610
1620
1630
1640
1650
1660
1670
15C1
15D1
15E1
15F1
1601
1611
1621
1631
1641
1651
1661
1671
15C2
15D2
15E2
15F2
1602
1612
1622
1632
1642
1652
1662
1672
15C3
15D3
15E3
15F3
1603
1613
1623
1633
1643
1653
1663
1673
15C4
15D4
15E4
15F4
1604
1614
1624
1634
1644
1654
1664
1674
15C5
15D5
15E5
15F5
1605
1615
1625
1635
1645
1655
1665
1675
15C6
15D6
15E6
15F6
1606
1616
1626
1636
1646
1656
1666
1676
15C7
15D7
15E7
15F7
1607
1617
1627
1637
1647
1657
1667
15C8
15D8
15E8
15F8
1608
1618
1628
1638
1648
1658
1668
15C9
15D9
15E9
15F9
1609
1619
1629
1639
1649
1659
1669
15CA
15DA
15EA
15FA
160A
15CB
15DB
15EB
15FB
160B
161A
161B
162A
163A
164A
165A
166A
0 @ P ` p 162B
163B
164B
165B
166B
15CC
15DC
15EC
15FC
160C
161C
162C
163C
164C
165C
166C
15CD
15DD
15ED
15FD
160D
161D
162D
163D
164D
165D
166D
Á Ñ á ñ # 3 C S c s 15BE
F
15D0
À Ð à ð " 2 B R b r 15BD
E
167
¿ Ï ß ï ! 1 A Q a q 15BC
D
15C0
¾ Î Þ î þ 15BB
C
166
½ Í Ý í ý / ? O _ o 15BA
B
165
¼ Ì Ü ì ü . > N ^ n 15B9
A
164
» Ë Û ë û - = M ] m 15B8
9
163
º Ê Ú ê ú , < L \ l 15B7
8
162
¹ É Ù é ù + ; K [ k { 15B6
7
161
¸ È Ø è ø * : J Z j z 15B5
6
160
· Ç × ç ÷ ) 9 I Y i y 15B4
5
15F
¶ Æ Ö æ ö ( 8 H X h x 15B3
4
15E
µ Å Õ å õ ' 7 G W g w 15B2
3
15D
´ Ä Ô ä ô & 6 F V f v 15B1
2
15C
167F
³ Ã Ó ã ó % 5 E U e u 15B0
1
Unified Canadian Aboriginal Syllabics
15CE
15DE
15EE
15FE
160E
161E
162E
163E
164E
165E
166E
Â Ò â ò $ 4 D T d t 15BF
15CF
15DF
15EF
15FF
160F
161F
162F
163F
164F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
165F
166F
115
1401
Unified Canadian Aboriginal Syllabics
Syllables
1429
1401
142A
1402 1403 1404 1405 1406 1407 1408 1409 140A 140B 140C 140D 140E 140F 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 141A 141B 141C 141D 141E 141F
5 CANADIAN SYLLABICS E • Inuktitut (AI), Carrier (U) 6 CANADIAN SYLLABICS AAI • Inuktitut 7 CANADIAN SYLLABICS I • Carrier (O) 8 CANADIAN SYLLABICS II 9 CANADIAN SYLLABICS O • Inuktitut (U), Carrier (E) : CANADIAN SYLLABICS OO • Inuktitut (UU) ; CANADIAN SYLLABICS Y-CREE OO < CANADIAN SYLLABICS CARRIER EE = CANADIAN SYLLABICS CARRIER I > CANADIAN SYLLABICS A ? CANADIAN SYLLABICS AA @ CANADIAN SYLLABICS WE A CANADIAN SYLLABICS WEST-CREE WE B CANADIAN SYLLABICS WI C CANADIAN SYLLABICS WEST-CREE WI D CANADIAN SYLLABICS WII E CANADIAN SYLLABICS WEST-CREE WII F CANADIAN SYLLABICS WO G CANADIAN SYLLABICS WEST-CREE WO H CANADIAN SYLLABICS WOO I CANADIAN SYLLABICS WEST-CREE WOO J CANADIAN SYLLABICS NASKAPI WOO K CANADIAN SYLLABICS WA L CANADIAN SYLLABICS WEST-CREE WA M CANADIAN SYLLABICS WAA N CANADIAN SYLLABICS WEST-CREE WAA O CANADIAN SYLLABICS NASKAPI WAA P CANADIAN SYLLABICS AI • East Cree Q CANADIAN SYLLABICS Y-CREE W R CANADIAN SYLLABICS GLOTTAL STOP • Moose Cree (Y), Algonquian (GLOTTAL STOP)
S CANADIAN SYLLABICS FINAL ACUTE • West Cree (T), East Cree (Y), Inuktitut (GLOTTAL STOP)
1420 1421 1422 1423
• Athapascan (B/P), Sayisi (I), Carrier (G)
T CANADIAN SYLLABICS FINAL GRAVE • West Cree (K), Athapascan (K), Carrier (KH) U CANADIAN SYLLABICS FINAL BOTTOM HALF RING
• N Cree (SH), Sayisi (R), Carrier (NG)
V CANADIAN SYLLABICS FINAL TOP HALF RING • Algonquian (S), Chipewyan (R), Sayisi (S) W CANADIAN SYLLABICS FINAL RIGHT HALF RING
• West Cree (N), Athapascan (D/T), Sayisi (N), 1424 1425 1426 1427 1428
Carrier (N)
X CANADIAN SYLLABICS FINAL RING • West Cree (W), Sayisi (O) Y CANADIAN SYLLABICS FINAL DOUBLE ACUTE • Chipewyan (TT), South Slavey (GH) Z CANADIAN SYLLABICS FINAL DOUBLE SHORT VERTICAL STROKES
1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 143A 143B 143C 143D 143E 143F 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 144A 144B 144C 144D 144E 144F 1450 1451 1452 1453 1454 1455
• Algonquian (H), Carrier (R) 1456 [ CANADIAN SYLLABICS FINAL MIDDLE DOT • Moose Cree (W), Athapascan (Y), Sayisi (YU) 1457 1458 \ CANADIAN SYLLABICS FINAL SHORT HORIZONTAL STROKE
• West Cree (C), Sayisi (D)
116
142B 142C 142D 142E 142F
1459 145A 145B
145B
CANADIAN SYLLABICS FINAL PLUS • Athapascan (N), Sayisi (AI) CANADIAN SYLLABICS FINAL DOWN TACK • N Cree (L), Carrier (D) → 22A4 ⊤ down tack CANADIAN SYLLABICS EN CANADIAN SYLLABICS IN CANADIAN SYLLABICS ON CANADIAN SYLLABICS AN CANADIAN SYLLABICS PE • Inuktitut (PAI), Athapascan (BE), Carrier (HU) CANADIAN SYLLABICS PAAI • Inuktitut
CANADIAN SYLLABICS PI CANADIAN SYLLABICS PII CANADIAN SYLLABICS PO • Inuktitut (PU), Athapascan (BO), Carrier (HE)
CANADIAN SYLLABICS POO • Inuktitut (PUU) CANADIAN SYLLABICS Y-CREE POO CANADIAN SYLLABICS CARRIER HEE CANADIAN SYLLABICS CARRIER HI CANADIAN SYLLABICS PA • Athapascan (BA), Carrier (HA) CANADIAN SYLLABICS PAA CANADIAN SYLLABICS PWE CANADIAN SYLLABICS WEST-CREE PWE CANADIAN SYLLABICS PWI CANADIAN SYLLABICS WEST-CREE PWI CANADIAN SYLLABICS PWII CANADIAN SYLLABICS WEST-CREE PWII CANADIAN SYLLABICS PWO CANADIAN SYLLABICS WEST-CREE PWO CANADIAN SYLLABICS PWOO CANADIAN SYLLABICS WEST-CREE PWOO CANADIAN SYLLABICS PWA CANADIAN SYLLABICS WEST-CREE PWA CANADIAN SYLLABICS PWAA CANADIAN SYLLABICS WEST-CREE PWAA
! CANADIAN SYLLABICS Y-CREE PWAA " CANADIAN SYLLABICS P # CANADIAN SYLLABICS WEST-CREE P • Sayisi (G) $ CANADIAN SYLLABICS CARRIER H % CANADIAN SYLLABICS TE • Inuktitut (TAI), Athapascan (DI), Carrier (DU) & CANADIAN SYLLABICS TAAI • Inuktitut ' CANADIAN SYLLABICS TI • Athapascan (DE), Carrier (DO) ( CANADIAN SYLLABICS TII ) CANADIAN SYLLABICS TO • Inuktitut (TU), Athapascan (DO), Carrier (DE), Sayisi (DU)
* CANADIAN SYLLABICS TOO • Inuktitut (TUU) + CANADIAN SYLLABICS Y-CREE TOO , CANADIAN SYLLABICS CARRIER DEE - CANADIAN SYLLABICS CARRIER DI . CANADIAN SYLLABICS TA • Athapascan (DA) / CANADIAN SYLLABICS TAA 0 CANADIAN SYLLABICS TWE 1 CANADIAN SYLLABICS WEST-CREE TWE 2 CANADIAN SYLLABICS TWI 3 CANADIAN SYLLABICS WEST-CREE TWI 4 CANADIAN SYLLABICS TWII
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Ogham Range: 1680–169F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Runic Range: 16A0–16FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Tagalog Range: 1700–171F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Hanunoo Range: 1720–173F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Buhid Range: 1740–175F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Tagbanwa Range: 1760–177F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Khmer Range: 1780–17FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
1780
Khmer 178 0
1
2
17D0
17E0
17F0
1781
1791
17A1
17B1
17C1
17D1
17E1
17F1
Å ë ° ± $¡ $— · Ò Ç í ¢ ≤ $¬ ‚
Ú
17F2
1793
17A3
17B3
17C3
17D3
17E3
17F3
1794
17A4
17B4
17C4
17D4
17E4
17F4
1795
17A5
17B5
17C5
17D5
17E5
1796
17A6
17B6
17C6
17D6
17E6
17F5
ˆ
17F6
1797
17A7
17B7
17C7
17D7
17E7
17F7
1798
17A8
17B8
17C8
17D8
17E8
17F8
â ô © $π $… Ÿ È ˘ 1799
17A9
17B9
17C9
17D9
17E9
17F9
ä ö ™ $∫ $ ⁄ 179A
17AA
17BA
17CA
ã õ ´ $ª $À 179B
17AB
17BB
17CB
å ú ¨ $º $Ã 179C
17AC
17BC
17CC
17DA
¤
17DB
‹
17DC
ç ù ≠ $Ω $Õ $› 179D
17AD
17BD
17CD
17DD
é û Æ $æ $Œ 178E
F
17E2
à ò ® $∏ $» ÿ Ë ¯
178D
E
17D2
á ó ß $∑ $« ◊ Á ˜
178C
D
17C2
Ü ñ ¶ $∂ $∆ ÷ Ê
178B
C
17B2
Ö ï • $≈ ’ Â ı
178A
B
17A2
Ñ î § $ƒ ‘ ‰ Ù
1789
A
1792
É ì £ ≥ $√ $” „ Û
1788
9
17F
17C0
1787
8
17E
17B0
1786
7
17D
17A0
1785
6
17C
1790
1784
5
17B
1780
1783
4
17A
Ä ê † ∞ $¿ $– ‡
1782
3
179
17FF
179E
17AE
17BE
17CE
è ü Ø $ø $œ 178F
179F
17AF
17BF
17CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
129
Mongolian Range: 1800–18AF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
1800
Mongolian 180 0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
181
182
183
184
185
186
18AF 187
188
189
18A
Äê†∞¿–‡Äê† Åë°±¡—·ÒÅë° Çí¢≤¬“‚ÚÇí¢ Éì£≥√”„ÛÉì£ Ñƒ‘‰ÙÑî§ Öï•μ≈’ÂıÖï• Üñ¶∂∆÷ʈÜñ¶ áóß∑«◊Á˜áóß àò®∏»ÿË àò® âô©π…ŸÈ âô© ä ™∫ ⁄Í äö ´ªÀ¤Î ãõ ¨ºÃ‹Ï åú ≠ΩÕ›Ì çù ÆæŒfiÓ éû ØøœflÔ èü 1800
1810
1820
1830
1840
1850
1860
1870
1880
1890
18A0
1801
1811
1821
1831
1841
1851
1861
1871
1881
1891
18A1
1802
1812
1822
1832
1842
1852
1862
1872
1882
1892
18A2
1803
1813
1823
1833
1843
1853
1863
1873
1883
1893
18A3
1804
1814
1824
1834
1844
1854
1864
1874
1884
1894
18A4
1805
1815
1825
1835
1845
1855
1865
1875
1885
1895
18A5
1806
1816
1826
1836
1846
1856
1866
1876
1886
1896
18A6
1807
1817
1827
1837
1847
1857
1867
1877
1887
1897
18A7
1808
1818
1828
1838
1848
1858
1868
1888
1898
18A8
1809
1819
1829
1839
1849
1859
1869
1889
1899
18A9
180A
182A
183A
184A
185A
186A
188A
189A
180B
182B
183B
184B
185B
186B
188B
189B
180C
182C
183C
184C
185C
186C
188C
189C
180D
182D
183D
184D
185D
186D
188D
189D
180E
182E
183E
184E
185E
186E
188E
189E
182F
183F
184F
185F
186F
188F
189F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
133
Limbu Range: 1900–194F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Tai Le Range: 1950–197F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
New Tai Lue Range: 1980–19DF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Khmer Symbols Range: 19E0–19FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Buginese Range: 1A00–1A1F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Balinese Range: 1B00–1B7F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
1B00
Balinese 1B0 0
F
1B12
1B22
1B32
1B13
1B23
1B33
1B14
1B24
1B34
1B15
1B25
1B35
1B16
1B26
1B36
1B17
1B18
1B27
1B37
1B41
1B42
1B43
1B44
1B45
1B46
1B47
® $∏ »
1B28
1B38
1B19
1B29
1B39
1B1A
1B2A
1B3A
1B1B
1B2B
1B3B
1B1C
1B2C
1B3C
ç ù ≠ $Ω
1B0D
E
1B31
å ú ¨ $º 1B0C
D
1B21
1B51
1B61
1B52
1B62
1B53
1B54
1B55
1B56
1B63
1B70
1B71
1B72
‰
1B48
1B57
ÿ
1B58
1B73
Ù
1B64
1B74
Â
ı
1B65
Ê
1B66
1B67
Ë
1B68
1B49
1B59
1B69
1B4A
1B5A
1B6A
ã õ ´ $ª À ¤ $Î
1B0B
C
‡ $
1B60
ä ö ™ $∫ ⁄ Í 1B0A
B
1B50
1B7
1B75
ˆ
1B76
˜
1B77
¯
1B78
â ô © $π … Ÿ È ˘
1B09
A
1B11
à ò 1B08
9
1B40
1B6
á ó ß $∑ « ◊ Á 1B07
8
1B30
Ü ñ ¶ $∂ ∆ ÷ 1B06
7
1B20
Ö ï • $μ ≈ ’ 1B05
6
1B10
$Ñ î § $¥ $ƒ ‘ 1B04
5
1B5
$É ì £ ≥ $√ ” „ $Û 1B03
4
1B4
$Ç í ¢ ≤ $¬ “ ‚ $Ú 1B02
3
1B3
$Å ë ° ± $¡ — · $Ò 1B01
2
1B2
$Ä ê † ∞ $¿ – 1B00
1
1B1
1B7F
1B1D
1B2D
1B3D
é û Æ $æ 1B2E
1B3E
1B0E
1B1E
è
ü Ø $ø
1B0F
1B1F
1B2F
1B3F
1B4B
1B5B
1B6B
‹
$Ï
1B5C
1B6C
1B79
˙
1B7A
˚
1B7B
¸
1B7C
› $Ì
1B5D
1B6D
fi $Ó
1B5E
1B6E
fl $Ô
1B5F
1B6F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
143
Phonetic Extensions Range: 1D00–1D7F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
1D00
Phonetic Extensions 1D0 0
1D09
A
1D0A
B
1D14
1D24
1D15
1D25
1D16
1D26
1D17
1D27
1D18
1D28
1D41
1D33
1D42
E
1D43
1D5
1D6
1D7
1D29
F
1D2A
1D1B
1D2B
1D1C
1D2C
v
G
W
g
w
1D35
8
1D36
1D37
1D38
1D45
1D39
1D3A
1D3B
1D3C
1D3D
I
1D47
J
1D48
1D49
M
1D4B
N
1D4C
O
1D4D
A
Q
1D3F
1D56
Y
1D57
Z
1D58
[
1D59
1D65
h
1D66
i
1D67
j
1D68
k
1D69
1D75
x
1D76
y
1D77
z
1D78
{
1D79
L \ l Â
1D4A
! 1
@
1D55
H X
1D46
1D4E
1D2F
f
u
1D73
7
1D3E
1D1F
V
e
1D63
t
1D72
1D74
1D2E
0
U
1D53
d
1D62
s
1D71
1D64
1D1E
1D52
c
1D61
1D70
1D54
, <
1D1A
S
1D51
1D60
1D44
+ ; K
1D19
1D50
1D34
1D2D
1D0F
146
1D40
1D1D
1D0E
F
1D23
/ ? 1D0D
E
1D13
. >
1D0C
D
1D4
4 D T
1D32
- =
1D0B
C
1D22
* :
1D08
9
1D12
1D31
) 9
1D07
8
1D21
( 1D06
7
1D11
' 1D05
6
1D30
& 6
1D04
5
1D20
% 5
1D03
4
1D10
$ 1D02
3
1D3
# 3 C 1D01
2
1D2
" 2 B R b r
1D00
1
1D1
1D7F
P
1D4F
1D5A
1D6A
1D7A
] m ò
1D5B
1D6B
^ n
1D5C
_
1D5D
`
1D5E
1D6C
1D7B
1D7C
o
1D6D
1D7D
p
1D6E
1D7E
a q ô
1D5F
1D6F
1D7F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Phonetic Extensions Supplement Range: 1D80–1DBF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
1D80
Phonetic Extensions Supplement 1D8 0
ᶀ ᶐ
1D80
1
ᶂ
1D82
3
5
ᶵ
F
1D95
ᶆ ᶖ
1D96
ᶇ ᶗ
1D97
ᶈ ᶘ ᶉ
ᶊ
ᶋ
ᶌ ᶍ
1D8D
E
ᶴ
ᶥ
1D8C
D
ᶤ
ᶳ
1DB3
ᶕ
1D8B
C
ᶣ
1DA3
ᶲ
1DB2
ᶅ
1D8A
B
ᶢ
1DA2
ᶱ
1DB1
1DB4
1D89
A
ᶡ
1DA1
ᶰ
1DB0
1DA4
1D88
9
ᶠ
1DA0
1DB
1D94
1D87
8
1D93
ᶄ ᶔ
1D86
7
ᶒ
1D92
1DA
1D84
1D85
6
1D91
ᶃ ᶓ 1D83
4
1D90
ᶁ ᶑ
1D81
2
1D9
ᶎ
1D98
ᶙ
1D99
ᶚ
1D9A
ᶛ
1D9B
ᶜ
1D9C
ᶝ
1D9D
ᶞ
1DA5
ᶦ
1DA6
ᶧ
1DA7
ᶨ
1DA8
ᶩ
1DA9
ᶪ
1DAA
ᶫ
1DAB
ᶬ
1DAC
ᶭ
1DAD
ᶮ
1DB5
ᶶ
1DB6
ᶷ
1DB7
ᶸ
1DB8
ᶹ
1DB9
ᶺ
1DBA
ᶻ
1DBB
ᶼ
1DBC
ᶽ
1DBD
ᶾ
1D8E
1D9E
1DAE
1DBE
ᶏ
ᶟ
ᶯ
ᶿ
1D8F
1D9F
1DAF
1DBF
1DBF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
149
Combining Diacritical Marks Supplement Range: 1DC0–1DFF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Latin Extended Additional Range: 1E00–1EFF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
1E00
Latin Extended Additional
1EFF
1E0 1E1 1E2 1E3 1E4 1E5 1E6 1E7 1E8 1E9 1EA 1EB 1EC 1ED 1EE 1EF
Ḁ Ḑ Ḡ Ḱ Ṁ Ṑ Ṡ Ṱ Ẁ Ẑ Ạ Ằ Ề Ố Ỡ Ự
0
1E00
1E01
1E30
1E40
1E50
1E60
1E70
1E80
1E90
1EA0
1EB0
1EC0
1ED0
1EE0
1EF0
1E11
1E21
1E31
1E41
1E51
1E61
1E71
1E81
1E91
1EA1
1EB1
1EC1
1ED1
1EE1
1EF1
Ḃ Ḓ Ḣ Ḳ Ṃ Ṓ Ṣ t Ẃ Ẓ ° À Ồ Ợ Ỳ
2
1E02
1E12
1E22
1E32
1E42
1E52
1E62
1E72
1E82
1E92
1EA2
1EB2
1EC2
1ED2
1EE2
1EF2
ḃ ḓ ḣ ḳ ṃ ṓ ṣ u ẃ ẓ ¡ ± Á ồ ợ ỳ
3
1E03
1E13
1E23
1E33
1E43
1E53
1E63
1E73
1E83
1E93
1EA3
1EB3
1EC3
1ED3
1EE3
1EF3
Ḅ Ḕ Ḥ6 Ṅ Ṕ Ṥ Ṵ Ẅ Ấ Ẵ Ễ Ò Ụ Ỵ
4
1E04
1E14
1E24
1E34
1E44
1E54
1E64
1E74
1E84
1E94
1EA4
1EB4
1EC4
1ED4
1EE4
1EF4
ḅ ḕ ḥ 7 ṅ ṕ ṥ ṵ ẅ ấ ẵ ễ Ó ụ ỵ
5
1E05
1E15
1E25
1E35
1E45
1E55
1E65
1E75
1E85
1E95
1EA5
1EB5
1EC5
1ED5
1EE5
1EF5
Ḗ Ḧ Ḷ Ṇ Ṗ Ṧ Ṷ Ẇ Ầ Ặ Ä Ỗ ä ô
6
1E06
1E16
1E26
1E36
1E46
1E56
1E66
1E76
1E86
1E96
1EA6
1EB6
1EC6
1ED6
1EE6
1EF6
ḗ ḧ ḷ ṇ ṗ ṧ ṷ ẇ ẗ ầ ặ Å ỗ å õ
7
1E07
1E17
1E27
1E37
1E47
1E57
1E67
1E77
1E87
1E97
1EA7
1EB7
1EC7
1ED7
1EE7
1EF7
Ḉ Ḙ Ḩ Ḹ J Ṙ Ṩ Ṹ Ẉ ẘ ¦ Ẹ Æ Ö Ứ Ỹ
8
1E08
1E18
1E28
1E38
1E48
1E58
1E68
1E78
1E88
1E98
1EA8
1EB8
1EC8
1ED8
1EE8
1EF8
ḉ ḙ ḩ ḹ K ṙ ṩ ṹ ẉ ẙ § ẹ Ç × ứ ỹ
9
1E09
1E19
1E29
1E39
1E49
1E59
1E69
1E79
1E89
1E99
1EA9
1EB9
1EC9
1ED9
1EE9
1EF9
Ḋ Ḛ Ḫ < Ṋ Ṛ Ṫ Ṻ Ẋ Ẫ ¸ Ị Ớ Ừ
A
1E0A
1E1A
1E2A
1E3A
1E4A
1E5A
1E6A
1E7A
1E8A
1E9A
1EAA
1EBA
1ECA
1EDA
1EEA
ḋ ḛ ḫ = ṋ ṛ ṫ ṻ ẋ ẛ ẫ ¹ ị ớ ừ
B
1E0B
C
D
1E1B
1E2B
1E3B
1E4B
1E5B
1E6B
1E7B
1E8B
Ḍ Ḝ Ḭ Ḽ Ṍ Ṝ Ṭ Ṽ Ẍ 1E0C
1E1C
1E2C
1E3C
1E4C
1E5C
1E6C
1E7C
1E8C
ḍ ḝ ḭ ḽ ṍ ṝ ṭ ṽ ẍ
1E0D
1E1D
1E2D
1E3D
1E4D
1E5D
1E6D
1E7D
1E8D
1E9B
1EAB
1EBB
1ECB
1EDB
1EEB
ª Ẽ Ọ Ờ ê 1EAC
1EBC
1ECC
1EDC
1EEC
« ẽ ọ ờ ë
1EAD
1EBD
1ECD
1EDD
1EED
Ḟ Ḯ Ḿ Ṏ ` p Ṿ Ẏ
Ắ Ế Ì Ü Ữ
ḟ ḯ ḿ ṏ a q ṿ ẏ
ắ ế Í Ý ữ
1E0E
F
1E20
ḁ ḑ ḡ ḱ ṁ ṑ ṡ ṱ ẁ ẑ ạ ằ ề ố ỡ ự
1
E
1E10
1E0F
1E1E
1E1F
1E2E
1E2F
1E3E
1E3F
1E4E
1E4F
1E5E
1E5F
1E6E
1E6F
1E7E
1E7F
1E8E
1E8F
1EAE
1EAF
1EBE
1EBF
1ECE
1ECF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
1EDE
1EDF
1EEE
1EEF
153
1E00
Latin Extended Additional
In this block the names "WITH LINE BELOW" refer to a macron below the letter.
Latin general use extensions 1E00
1E08
Ḁ LATIN CAPITAL LETTER A WITH RING BELOW ≡ 0041 A 0325 . ḁ LATIN SMALL LETTER A WITH RING BELOW ≡ 0061 a 0325 . Ḃ LATIN CAPITAL LETTER B WITH DOT ABOVE ≡ 0042 B 0307 ḃ LATIN SMALL LETTER B WITH DOT ABOVE • Irish Gaelic (old orthography) ≡ 0062 b 0307 Ḅ LATIN CAPITAL LETTER B WITH DOT BELOW ≡ 0042 B 0323 ḅ LATIN SMALL LETTER B WITH DOT BELOW ≡ 0062 b 0323 4 LATIN CAPITAL LETTER B WITH LINE BELOW ≡ 0042 B 0331 , 5 LATIN SMALL LETTER B WITH LINE BELOW ≡ 0062 b 0331 , Ḉ LATIN CAPITAL LETTER C WITH CEDILLA AND
1E09
ḉ
1E01 1E02 1E03 1E04 1E05 1E06 1E07
1E0A 1E0B 1E0C 1E0D 1E0E 1E0F 1E10 1E11 1E12 1E13 1E14 1E15
ACUTE
≡ 00C7 Ç 0301 % LATIN SMALL LETTER C WITH CEDILLA AND ACUTE
≡ 00E7 ç 0301 % Ḋ LATIN CAPITAL LETTER D WITH DOT ABOVE ≡ 0044 D 0307 ḋ LATIN SMALL LETTER D WITH DOT ABOVE • Irish Gaelic (old orthography) ≡ 0064 d 0307 Ḍ LATIN CAPITAL LETTER D WITH DOT BELOW ≡ 0044 D 0323 ḍ LATIN SMALL LETTER D WITH DOT BELOW • Indic transliteration ≡ 0064 d 0323 < LATIN CAPITAL LETTER D WITH LINE BELOW ≡ 0044 D 0331 , = LATIN SMALL LETTER D WITH LINE BELOW ≡ 0064 d 0331 , Ḑ LATIN CAPITAL LETTER D WITH CEDILLA ≡ 0044 D 0327 ḑ LATIN SMALL LETTER D WITH CEDILLA • Livonian ≡ 0064 d 0327 Ḓ LATIN CAPITAL LETTER D WITH CIRCUMFLEX
1E1D
≡ 0228 0306 ḝ
LATIN SMALL LETTER E WITH CEDILLA AND BREVE
1E2A
≡ 0229 0306 Ḟ LATIN CAPITAL LETTER F WITH DOT ABOVE ≡ 0046 F 0307 ḟ LATIN SMALL LETTER F WITH DOT ABOVE • Irish Gaelic (old orthography) ≡ 0066 f 0307 Ḡ LATIN CAPITAL LETTER G WITH MACRON ≡ 0047 G 0304 ḡ LATIN SMALL LETTER G WITH MACRON ≡ 0067 g 0304 Ḣ LATIN CAPITAL LETTER H WITH DOT ABOVE ≡ 0048 H 0307 ḣ LATIN SMALL LETTER H WITH DOT ABOVE ≡ 0068 h 0307 Ḥ LATIN CAPITAL LETTER H WITH DOT BELOW ≡ 0048 H 0323 ḥ LATIN SMALL LETTER H WITH DOT BELOW • Indic transliteration ≡ 0068 h 0323 Ḧ LATIN CAPITAL LETTER H WITH DIAERESIS ≡ 0048 H 0308 ḧ LATIN SMALL LETTER H WITH DIAERESIS ≡ 0068 h 0308 Ḩ LATIN CAPITAL LETTER H WITH CEDILLA ≡ 0048 H 0327 ḩ LATIN SMALL LETTER H WITH CEDILLA ≡ 0068 h 0327 Ḫ LATIN CAPITAL LETTER H WITH BREVE
1E2B
ḫ
1E1E 1E1F 1E20 1E21 1E22 1E23 1E24 1E25 1E26 1E27 1E28 1E29
BELOW
≡ 0048 H 032E LATIN SMALL LETTER H WITH BREVE BELOW
• Semitic transliteration ≡ 0068 h 032E ḭ
1E2E
Ḯ
1E2F LATIN CAPITAL LETTER E WITH MACRON AND
ḯ
≡ 0112 Ē 0300 M
≡ 00EF ï 0301 % Ḱ LATIN CAPITAL LETTER K WITH ACUTE ≡ 004B K 0301 % ḱ LATIN SMALL LETTER K WITH ACUTE • Macedonian transliteration ≡ 006B k 0301 % Ḳ LATIN CAPITAL LETTER K WITH DOT BELOW ≡ 004B K 0323 ḳ LATIN SMALL LETTER K WITH DOT BELOW ≡ 006B k 0323 + LATIN CAPITAL LETTER K WITH LINE BELOW ≡ 004B K 0331 ,
LATIN SMALL LETTER D WITH CIRCUMFLEX BELOW GRAVE
LATIN SMALL LETTER E WITH MACRON AND GRAVE
≡ 0113 ē 0300 M Ḗ LATIN CAPITAL LETTER E WITH MACRON AND
1E17
ḗ
ACUTE
≡ 0112 Ē 0301 % LATIN SMALL LETTER E WITH MACRON AND ACUTE
≡ 0113 ē 0301 % Ḙ LATIN CAPITAL LETTER E WITH CIRCUMFLEX BELOW
≡ 0045 E 032D
154
BREVE
1E2D
1E16
1E18
≡ 0065 e 032D Ḛ LATIN CAPITAL LETTER E WITH TILDE BELOW ≡ 0045 E 0330 1E1B ḛ LATIN SMALL LETTER E WITH TILDE BELOW ≡ 0065 e 0330 1E1C Ḝ LATIN CAPITAL LETTER E WITH CEDILLA AND 1E1A
≡ 0064 d 032D
ḕ
LATIN SMALL LETTER E WITH CIRCUMFLEX BELOW
Ḭ
≡ 0044 D 032D
Ḕ
ḙ
1E2C
BELOW
ḓ
1E19
1E34
1E30 1E31 1E32 1E33 1E34
LATIN CAPITAL LETTER I WITH TILDE BELOW
≡ 0049 I 0330 LATIN SMALL LETTER I WITH TILDE BELOW
≡ 0069 i 0330 LATIN CAPITAL LETTER I WITH DIAERESIS AND ACUTE
≡ 00CF Ï 0301 % LATIN SMALL LETTER I WITH DIAERESIS AND ACUTE
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Greek Extended Range: 1F00–1FFF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
1F00 1F0
Greek Extended 1F1
1F00
1F4
1F5
1F6
1F7
1F8
1F9 1FA 1FB 1FC 1FD 1FE 1FF
1F10
1F20
1F30
1F40
1F50
1F60
1F70
1F80
1F90
1FA0
1FB0
1FC0
1FD0
1FE0
ἁ ἑ ἡ ἱ ὁ ὑ ὡ ά ᾁ ᾑ ᾡ ᾱ ῁ ῑ ῡ
1
1F01
1F11
1F21
1F31
1F41
1F51
1F61
1F71
1F81
1F91
1FA1
1FB1
1FC1
1FD1
1FE1
ἂ ἒ ἢ ἲ ὂ ὒ ὢ ὲ ᾂ ᾒ ᾢ ᾲ ῂ ῒ ῢ ῲ
2
1F02
1F12
1F22
1F32
1F42
1F52
1F62
1F72
1F82
1F92
1FA2
1FB2
1FC2
1FD2
1FE2
1FF2
ἃ ἓ ἣ ἳ ὃ ὓ ὣ έ ᾃ ᾓ ᾣ ᾳ ῃ ΐ ΰ ῳ
3
1F03
1F13
1F23
1F33
1F43
1F53
1F63
1F73
1F83
1F93
1FA3
1FB3
1FC3
1FD3
1F04
1F14
1F24
1F34
1F44
1F54
1F64
1F74
1F84
1F94
1FA4
1FB4
1FE4
1FC4
1F05
6
1F15
1F35
ἆ
ἦ ἶ
ἇ
ἧ ἷ
1F26
1F06
7
1F25
1F27
1F07
1F45
1F55
1F08
1F18
1F28
1F85
1F95
1FE5
1FA5
1F66
1F76
1F86
1F96
1FA6
1FB6
1FC6
1FD6
1FE6
1FF6
ὗ ὧ ί ᾇ ᾗ ᾧ ᾷ ῇ ῗ ῧ ῷ
1F57
1F37
1F38
1F75
1FF4
ὖ ὦ ὶ ᾆ ᾖ ᾦ ᾶ ῆ ῖ ῦ ῶ
1F56
1F36
1F67
1F77
1F87
1F97
1FA7
1FB7
1FC7
1FD7
1FE7
1FF7
Ὠ ὸ ᾈᾘᾨ Ᾰ Ὲ Ῐ Ῠ Ὸ
Ἀ ἘἨ Ἰ Ὀ
8
1F65
1FF3
ῥ
ἅ ἕ ἥ ἵ ὅ ὕ ὥ ή ᾅ ᾕ ᾥ
5
1FE3
ῤ ῴ
ἄ ἔ ἤ ἴ ὄ ὔ ὤ ὴ ᾄ ᾔ ᾤ ᾴ ῄ
4
1F48
1F68
1F78
1F88
1F98
1FA8
1FB8
1FC8
1FD8
1FE8
1FF8
Ἁ Ἑ Ἡ Ἱ Ὁ Ὑ Ὡ ό ᾉᾙᾩ Ᾱ Έ Ῑ Ῡ Ό
9
1F09
1F19
1F29
1F39
1F49
1F59
1F0A
1F1A
1F2A
1F3A
1F69
1F79
1F89
1F99
1FA9
1FB9
1FC9
1FD9
1FE9
1FF9
Ὢ ὺ ᾊ ᾚᾪ Ὰ Ὴ Ὶ Ὺ Ὼ
Ἂ ἚἪ Ἲ Ὂ
A
1F4A
1F6A
1F7A
1F8A
1F9A
1FAA
1FBA
1FCA
1FDA
1FEA
1FFA
Ἃ Ἓ Ἣ Ἳ Ὃ Ὓ Ὣ ύ ᾋ ᾛᾫ Ά Ή Ί Ύ Ώ
B
1F0B
C
D
1F1B
1F2B
1F3B
1F4B
1F5B
1F1C
1F2C
1F3C
1F6B
1F7B
1F8B
1F9B
1FAB
1FBB
1FCB
1FDB
1F4C
1F6C
1F7C
1F8C
1F9C
1FAC
1FBC
1FEB
1FFB
Ῥ ῼ
Ὤ ὼ ᾌ ᾜᾬ ᾼ ῌ
Ἄ ἜἬ Ἴ Ὄ 1F0C
1FEC
1FCC
1FFC
Ἅ Ἕ Ἥ Ἵ Ὅ Ὕ Ὥ ώ ᾍ ᾝᾭ ᾽ ῍ ῝ ῭ ´ 1F0D
1F1D
1F2D
1F3D
1F4D
1F5D
1F6D
Ἆ
ἮἾ
Ὦ
Ἇ
ἯἿ
ὟὯ
1F0E
F
1F3
ἀ ἐ ἠ ἰ ὀ ὐ ὠ ὰ ᾀ ᾐ ᾠ ᾰ ῀ ῐ ῠ
0
E
1F2
1FFF
1F0F
1F2E
1F2F
1F6E
1F3E
1F3F
1F5F
1F6F
1F7D
1F8D
1F9D
1FAD
1FBD
1FCD
1FDD
1FED
1FFD
ᾎ ᾞᾮ ι ῎ ῞ ΅ ῾ 1F8E
1F9E
1FAE
1FBE
1FCE
1FDE
1FEE
1FFE
ᾏ ᾟᾯ ᾿ ῏ ῟ ` 1F8F
1F9F
1FAF
1FBF
1FCF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
1FDF
1FEF
159
1F00
Greek Extended
Precomposed polytonic Greek 1F00 1F01 1F02 1F03 1F04
ἀ GREEK SMALL LETTER ALPHA WITH PSILI ≡ 03B1 α 0313 ἁ GREEK SMALL LETTER ALPHA WITH DASIA ≡ 03B1 α 0314 ἂ GREEK SMALL LETTER ALPHA WITH PSILI AND VARIA
≡ 1F00 ἀ 0300
ἃ GREEK SMALL LETTER ALPHA WITH DASIA AND VARIA
≡ 1F01 ἁ 0300
ἄ GREEK SMALL LETTER ALPHA WITH PSILI AND OXIA
≡ 1F00 ἀ 0301
1F05
ἅ GREEK SMALL LETTER ALPHA WITH DASIA
1F06
≡ 1F01 ἁ 0301 ἆ GREEK SMALL LETTER ALPHA WITH PSILI
1F07
≡ 1F00 ἀ 0342 ἇ GREEK SMALL LETTER ALPHA WITH DASIA
AND OXIA
AND PERISPOMENI
≡ 1F01 ἁ 0342 Ἀ GREEK CAPITAL LETTER ALPHA WITH PSILI ≡ 0391 Α 0313 1F09 Ἁ GREEK CAPITAL LETTER ALPHA WITH DASIA ≡ 0391 Α 0314 1F0A Ἂ GREEK CAPITAL LETTER ALPHA WITH PSILI AND VARIA
≡ 1F08 Ἀ 0300 1F0B Ἃ GREEK CAPITAL LETTER ALPHA WITH DASIA AND VARIA
≡ 1F09 Ἁ 0300 1F0C Ἄ GREEK CAPITAL LETTER ALPHA WITH PSILI AND OXIA
≡ 1F08 Ἀ 0301 1F0D Ἅ GREEK CAPITAL LETTER ALPHA WITH DASIA AND OXIA
≡ 1F09 Ἁ 0301 1F0E Ἆ GREEK CAPITAL LETTER ALPHA WITH PSILI AND PERISPOMENI
1F0F
≡ 1F08 Ἀ 0342 Ἇ GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI
1F10
ἐ
1F11
ἑ
1F12
ἒ
1F13
ἓ
1F14
ἔ
1F15
ἕ
≡ 1F09 Ἁ 0342
GREEK SMALL LETTER EPSILON WITH PSILI
≡ 03B5 ε 0313
GREEK SMALL LETTER EPSILON WITH DASIA
≡ 03B5 ε 0314
GREEK SMALL LETTER EPSILON WITH PSILI AND VARIA
≡ 1F10 ἐ 0300
GREEK SMALL LETTER EPSILON WITH DASIA AND VARIA
≡ 1F11 ἑ 0300
GREEK SMALL LETTER EPSILON WITH PSILI AND OXIA
≡ 1F10 ἐ 0301
GREEK SMALL LETTER EPSILON WITH DASIA AND OXIA
≡ 1F11 ἑ 0301
1F16 "
160
Ἑ GREEK CAPITAL LETTER EPSILON WITH DASIA ≡ 0395 Ε 0314 1F1A Ἒ GREEK CAPITAL LETTER EPSILON WITH PSILI 1F19
AND VARIA
≡ 1F18 Ἐ 0300 1F1B Ἓ GREEK CAPITAL LETTER EPSILON WITH DASIA AND VARIA
≡ 1F19 Ἑ 0300 1F1C Ἔ GREEK CAPITAL LETTER EPSILON WITH PSILI AND OXIA
≡ 1F18 Ἐ 0301 1F1D Ἕ GREEK CAPITAL LETTER EPSILON WITH DASIA AND OXIA
≡ 1F19 Ἑ 0301 1F1E "
AND PERISPOMENI
1F08
1F31
1F23
≡ 1F20 ἠ 0300 ἣ GREEK SMALL LETTER ETA WITH DASIA AND
1F24
≡ 1F21 ἡ 0300 ἤ GREEK SMALL LETTER ETA WITH PSILI AND
1F25
≡ 1F20 ἠ 0301 ἥ GREEK SMALL LETTER ETA WITH DASIA AND
1F26
≡ 1F21 ἡ 0301 ἦ GREEK SMALL LETTER ETA WITH PSILI AND
1F27
≡ 1F20 ἠ 0342 ἧ GREEK SMALL LETTER ETA WITH DASIA AND
VARIA
OXIA
OXIA
PERISPOMENI
PERISPOMENI
≡ 1F21 ἡ 0342 Ἠ GREEK CAPITAL LETTER ETA WITH PSILI ≡ 0397 Η 0313 1F29 Ἡ GREEK CAPITAL LETTER ETA WITH DASIA ≡ 0397 Η 0314 1F2A Ἢ GREEK CAPITAL LETTER ETA WITH PSILI AND
1F28
VARIA
≡ 1F28 Ἠ 0300 1F2B Ἣ GREEK CAPITAL LETTER ETA WITH DASIA AND VARIA
≡ 1F29 Ἡ 0300 1F2C Ἤ GREEK CAPITAL LETTER ETA WITH PSILI AND OXIA
≡ 1F28 Ἠ 0301 1F2D Ἥ GREEK CAPITAL LETTER ETA WITH DASIA AND OXIA
≡ 1F29 Ἡ 0301 1F2E Ἦ GREEK CAPITAL LETTER ETA WITH PSILI AND PERISPOMENI
≡ 1F28 Ἠ 0342 1F2F Ἧ GREEK CAPITAL LETTER ETA WITH DASIA AND PERISPOMENI
1F30
ἰ
1F31
ἱ
≡ 1F29 Ἡ 0342
GREEK SMALL LETTER IOTA WITH PSILI
≡ 03B9 ι 0313
GREEK SMALL LETTER IOTA WITH DASIA
≡ 03B9 ι 0314
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
1F32 1F32 1F33 1F34
ἲ ἳ ἴ
1F35
ἵ
1F36
ἶ
1F37
ἷ
Greek Extended GREEK SMALL LETTER IOTA WITH PSILI AND VARIA
≡ 1F30 ἰ 0300
GREEK SMALL LETTER IOTA WITH DASIA AND VARIA
≡ 1F31 ἱ 0300
GREEK SMALL LETTER IOTA WITH PSILI AND OXIA
1F3A
Ἲ GREEK CAPITAL LETTER IOTA WITH PSILI AND
GREEK CAPITAL LETTER IOTA WITH PSILI
≡ 0399 Ι 0313 ≡ 0399 Ι 0314
DASIA AND OXIA
AND VARIA
1F53
ὓ
1F54
ὔ
1F55
ὕ
1F56
ὖ
1F57
ὗ
VARIA
≡ 1F38 Ἰ 0300
Ἳ GREEK CAPITAL LETTER IOTA WITH DASIA AND VARIA
≡ 1F39 Ἱ 0300
Ἴ GREEK CAPITAL LETTER IOTA WITH PSILI AND OXIA
≡ 1F38 Ἰ 0301
Ἵ GREEK CAPITAL LETTER IOTA WITH DASIA
≡ 1F50 ὐ 0300 GREEK SMALL LETTER UPSILON WITH DASIA AND VARIA
≡ 1F51 ὑ 0300
GREEK SMALL LETTER UPSILON WITH PSILI AND OXIA
≡ 1F50 ὐ 0301
GREEK SMALL LETTER UPSILON WITH DASIA AND OXIA
≡ 1F51 ὑ 0301
GREEK SMALL LETTER UPSILON WITH PSILI AND PERISPOMENI
≡ 1F50 ὐ 0342
GREEK SMALL LETTER UPSILON WITH DASIA AND PERISPOMENI
≡ 1F51 ὑ 0342 1F58 "
Ἶ
≡ 1F39 Ἱ 0301 PERISPOMENI
≡ 1F38 Ἰ 0342
Ἷ GREEK CAPITAL LETTER IOTA WITH DASIA
1F40
ὀ
1F41
ὁ
1F42
ὂ
1F43
ὃ
1F45
≡ 1F48 Ὀ 0301 1F4D Ὅ GREEK CAPITAL LETTER OMICRON WITH
GREEK CAPITAL LETTER IOTA WITH DASIA
1F3F
1F44
AND OXIA
≡ 1F31 ἱ 0342
Ἱ
1F3E
≡ 1F49 Ὁ 0300 1F4C Ὄ GREEK CAPITAL LETTER OMICRON WITH PSILI
≡ 1F49 Ὁ 0301 1F4E "
1F39
1F3D
DASIA AND VARIA
GREEK SMALL LETTER IOTA WITH DASIA AND OXIA
Ἰ
1F3C
1F4B Ὃ GREEK CAPITAL LETTER OMICRON WITH
≡ 1F30 ἰ 0301
1F38
1F3B
1F65
AND PERISPOMENI
ὄ ὅ
≡ 1F39 Ἱ 0342
GREEK SMALL LETTER OMICRON WITH PSILI
≡ 03BF ο 0313
GREEK SMALL LETTER OMICRON WITH DASIA
≡ 03BF ο 0314
GREEK SMALL LETTER OMICRON WITH PSILI AND VARIA
≡ 1F40 ὀ 0300
GREEK SMALL LETTER OMICRON WITH DASIA AND VARIA
DASIA
≡ 03A5 Υ 0314 1F5A "
≡ 1F59 Ὑ 0300 1F5C "
≡ 1F59 Ὑ 0301 1F5E "
≡ 1F41 ὁ 0300
1F60
GREEK SMALL LETTER OMICRON WITH PSILI AND OXIA
1F61
≡ 1F40 ὀ 0301
GREEK SMALL LETTER OMICRON WITH DASIA AND OXIA
1F62
≡ 1F41 ὁ 0301
1F63 1F46 "
≡ 039F Ο 0314
1F4A Ὂ GREEK CAPITAL LETTER OMICRON WITH PSILI AND VARIA
≡ 1F48 Ὀ 0300
1F65
≡ 1F59 Ὑ 0342 ὠ GREEK SMALL LETTER OMEGA WITH PSILI ≡ 03C9 ω 0313 ὡ GREEK SMALL LETTER OMEGA WITH DASIA ≡ 03C9 ω 0314 ὢ GREEK SMALL LETTER OMEGA WITH PSILI AND VARIA
≡ 1F60 ὠ 0300 ὣ GREEK SMALL LETTER OMEGA WITH DASIA AND VARIA
≡ 1F61 ὡ 0300 ὤ GREEK SMALL LETTER OMEGA WITH PSILI AND OXIA
≡ 1F60 ὠ 0301 ὥ GREEK SMALL LETTER OMEGA WITH DASIA AND OXIA
≡ 1F61 ὡ 0301
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
161
General Punctuation Range: 2000–206F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
2000
General Punctuation 200 0
2000
1
2024
2025
2016
2017
2026
2027
2018
2028
2019
2029
201A
202A
2050
2060
2051
2061
2033
2034
2035
2036
2042
2052
2062
˛
2043
2053
2063
⁄ ˇ
2044
2054
m
2045
2055
–
2046
2056
2037
2038
2039
203A
2047
2057
— 2048
2049
ô
204A
2058
“ 2059
205A
206A
201B
202B
203B
204B
205B
206B
201C
202C
203C
204C
201D
202D
203D
204D
201E
202E
203E
204E
205C
206C
Û #
205D
206D
Ù $
205E
206E
% 200F
166
2023
2041
„ ¯ 200E
F
2040
” # 200D
E
2015
206
“ " ⸏ " 200C
D
2014
205
! Ò ! 200B
C
204
⁂
2032
‚ › 200A
B
•
2022
2031
’ ‹ 2009
A
2021
‘ 2008
9
2013
‗ 2007
8
2030
… 2006
7
2020
2005
6
203
† ‰
— 2004
5
2012
– 2003
4
2011
‒ 2002
3
2010
202
‡ 2001
2
201
206F
201F
202F
203F
204F
205F
206F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
2000
General Punctuation
For additional general punctuation characters see also Basic Latin, Latin-1, Supplemental Punctuation and CJK Symbols and Punctuation.
Spaces
EN QUAD ≡ 2002 en space 2001 EM QUAD 2000
2002
2003
2004 2005 2006
= mutton quad ≡ 2003 em space
EN SPACE
= nut • half an em 0020 space
EM SPACE
= mutton • nominally, a space equal to the type size in points • may scale by the condensation factor of a font 0020 space
THREE-PER-EM SPACE
Dashes 2010
-
General punctuation 2016
2017
‗
2018
‘
SIX-PER-EM SPACE • in computer typography sometimes equated to FIGURE SPACE • space equal to tabular width of a font • this is equivalent to the digit width of fonts with fixed-width digits
<noBreak> 0020
PUNCTUATION SPACE • space equal to narrow punctuation of a font 0020 space 2009 THIN SPACE • a fifth of an em (or sometimes a sixth) 0020 space 200A HAIR SPACE • thinner than a thin space • in traditional typography, the thinnest space 2008
2019
’
available
0020 space
200B ZERO WIDTH SPACE 201A • commonly abbreviated ZWSP • this character is intended for line break control; it has no width, but its presence between two characters does not prevent increased letter 201B spacing in justification
‚
Format characters
200C ZERO WIDTH NON-JOINER • commonly abbreviated ZWNJ 200D ZERO WIDTH JOINER • commonly abbreviated ZWJ 200E LEFT-TO-RIGHT MARK • commonly abbreviated LRM 200F RIGHT-TO-LEFT MARK • commonly abbreviated RLM
DOUBLE VERTICAL LINE
• used in pairs to indicate norm of a matrix → 20E6 combining double vertical stroke overlay
thin space
2007
→ 002D - hyphen-minus → 00AD soft hyphen
mark 2015 HORIZONTAL BAR = quotation dash • long dash introducing quoted text
FOUR-PER-EM SPACE
0020 space
HYPHEN
NON-BREAKING HYPHEN → 002D - hyphen-minus → 00AD soft hyphen <noBreak> 2010 2012 ‒ FIGURE DASH 2013 – EN DASH 2014 — EM DASH • may be used in pairs to offset parenthetical text → 30FC ー katakana-hiragana prolonged sound 2011
= thick space 0020 space = mid space 0020 space
201C
201C
“
→ 2225 parallel to DOUBLE LOW LINE
• this is a spacing character → 005F _ low line → 0333 ã combining double low line 0020 0333 ã LEFT SINGLE QUOTATION MARK
= single turned comma quotation mark • this is the preferred character (as opposed to 201B ) → 0027 ' apostrophe → 02BB modifier letter turned comma → 275B heavy single turned comma quotation mark ornament RIGHT SINGLE QUOTATION MARK
= single comma quotation mark • this is the preferred character to use for apostrophe → 0027 ' apostrophe → 02BC modifier letter apostrophe → 275C heavy single comma quotation mark ornament SINGLE LOW-9 QUOTATION MARK
= low single comma quotation mark • used as opening single quotation mark in some languages SINGLE HIGH-REVERSED-9 QUOTATION MARK
= single reversed comma quotation mark • has same semantic as 2018 ‘ , but differs in appearance → 02BD modifier letter reversed comma LEFT DOUBLE QUOTATION MARK
= double turned comma quotation mark • this is the preferred character (as opposed to 201F ) → 0022 " quotation mark → 275D heavy double turned comma quotation mark ornament → 301D reversed double prime quotation mark
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
167
Superscripts and Subscripts Range: 2070–209F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Currency Symbols Range: 20A0–20CF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Combining Diacritical Marks for Symbols Range: 20D0–20FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Letterlike Symbols Range: 2100–214F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
2100
Letterlike Symbols 210 0
A
B
2122
2132
2142
2113
2123
2133
2143
2114
2124
2134
2144
2115
2125
2135
2145
2116
2126
2136
2146
2117
2127
2137
2147
2118
2128
2138
2148
2129
2139
2149
210A
211A
212A
213A
214A
Å 211B
212B
213B
214B
ˇ Í 211C
212C
213C
214C
å 211D
212D
213D
214D
℞ ℮ Æ
210E
F
2141
2119
210D
E
2131
2109
210C
D
2112
210B
C
2121
℘ ? 2108
9
2111
℧ > 2107
8
2140
№ Ω = 2106
7
2130
℅ ℵ 2105
6
2120
2104
5
2110
2103
4
214
™ 2102
3
213
2101
2
212
2100
1
211
214F
211E
212E
213E
214E
210F
211F
212F
213F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
177
Number Forms Range: 2150–218F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Arrows Range: 2190–21FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
2190
Arrows 219 0
↑
2191
2
2197
8
2198
9
F
21B2
21B3
21A4
21A5
21B4
21C1
21D1
21F0
⇡
c
21E1
21F1
4 ⇒ ⇢ d
21C2
5
21C3
21D2
⇓
21D3
21E2
⇣
21E3
21F2
e
21F3
21C4
21D4
21E4
21F4
↵ 7 G W
21B5
21A6
21B6
21C5
21D5
21E5
21F5
21C6
21D6
21E6
21F6
) 9 I Y
21A7
21A8
21B7
21C7
21D7
21E7
21F7
* : J Z
21B8
21A9
21B9
21AA
21BA
21AB
21BB
21AC
21BC
21AD
21BD
219E
21AE
21BE
!
219F
182
↗ 3 ⇑
21B1
21E0
21C8
21D8
21E8
21F8
21C9
21D9
21E9
21F9
21CA
21DA
21EA
21FA
21CB
21DB
21EB
21FB
21CC
21DC
21EC
21FC
/ ? O _ 219D
E
21D0
. > N ^
219C
D
21A3
21C0
- = M ] 219B
C
21F
, < L ⇪ 219A
B
21E
+ ; K [ 2199
A
21D
( 8 H X 2196
7
21A2
2195
6
21C
↔ & 6 ⇔ V 2194
5
21A1
21B0
↓ ↘
2193
4
21A0
→ ↙ 2192
3
21B
← ↖ 2 ⇐ ⇠ b 2190
1
21A
21FF
21AF
21CD
21DD
0 @ ⇞ 21CE
21DE
1 A ⇟
21BF
21CF
21DF
21ED
21FD
`
21EE
21FE
a
21EF
21FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Mathematical Operators Range: 2200–22FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
2200 220
2200
222
223
224
225
226
227
228
229
22A 22B 22C 22D 22E
22F
2210
2220
2230
2240
2250
2260
2270
2280
2290
22A0
22B0
22C0
22D0
22E0
22F0
∑ ! 8 G ≡ g ⊁
¦ ± ¼ Ì Ü
1
2201
2211
2221
2231
2241
2251
2261
2271
2281
2291
22A1
22B1
22C1
22D1
22E1
22F1
∂ − " 9 ≒ X h ⊂ § ² ½ Í
2
2202
2212
2222
2232
2242
2252
2262
2272
2282
2292
22A2
22B2
22C2
22D2
22E2
22F2
∃ ∓ ∣ : I Y i ⊃ ⊣ ¨ ³ ¾ Î
3
2203
2213
2223
2233
2243
2253
2263
2273
2283
2293
22A3
22B3
22C3
22D3
22E3
22F3
$ ∴ ; J ≤ j x ⊤ © ´ ¿ Ï
4
2204
2214
2224
2234
2244
2254
2264
2274
2284
2294
22A4
22B4
22C4
22D4
22E4
22F4
∅ % ∵ ≈ K ≥ k y ⊕ ⊥ ª ⋅ À Ð
5
2205
2215
2225
2235
2245
2255
2265
2275
2285
2295
22A5
22B5
22C5
22D5
22E5
22F5
∆ & ∶ = L ≦ l ⊆ ¶ Á Ñ
6
2206
2216
2226
2236
2246
2256
2266
2276
2286
2296
22A6
22B6
22C6
22D6
22E6
22F6
∇ ∗ ∧ ∷ > M ≧ m ⊇ ⊗ · Â Ò
7
2207
2217
2227
2237
2247
2257
2267
2277
2287
2297
22A7
22B7
22C7
22D7
22E7
22F7
∈ ∨ / ? N ^ | Ã Ó
8
2208
2218
2228
2238
2248
2258
2268
2278
2288
2298
22A8
22B8
22C8
22D8
22E8
22F8
∉ ∩ 0 @ O _ } « Ä Ô
9
2209
2219
2229
2239
2249
2259
2269
2279
2289
2299
22A9
22B9
22C9
22D9
22E9
22F9
√ ∪ 1 A P ` ≺ ⊊ ¬ Å Õ
A
220A
221A
222A
223A
224A
225A
226A
227A
228A
229A
22AA
22BA
22CA
22DA
22EA
22FA
∋ ∫ 2 B Q a ≻ ⊋ Æ Ö
B
220B
C
D
221B
222B
223B
224B
225B
226B
227B
228B
229B
22AB
22BB
22CB
22DB
22EB
22FB
∼ ≌ R b p ¡ ® Ç × 220C
221C
222C
223C
224C
225C
226C
227C
228C
229C
22AC
22BC
22CC
22DC
22EC
22FC
∝ 4 C S c q ¢ ¯ ¸ È Ø
220D
221D
222D
223D
224D
225D
226D
227D
228D
229D
22AD
22BD
22CD
22DD
22ED
22FD
∞ ∮ 5 D T ≮ r £ ¹ É Ù
220E
F
221
22FF
∀ ∠ 7 F ≠ f ⊀ ¥ ° » Ë Û
0
E
Mathematical Operators
221E
222E
223E
224E
225E
226E
227E
228E
229E
22AE
22BE
22CE
22DE
22EE
22FE
∏ ∟ 6 E U ≯ s ¤ ⊿ º Ê Ú 220F
221F
222F
223F
224F
225F
226F
227F
228F
229F
22AF
22BF
22CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
22DF
22EF
22FF
185
2200
Mathematical Operators
Miscellaneous mathematical symbols
2213
2200
2214 2215
2201 2202 2203 2204 2205
2206
2207
∀ FOR ALL )
= universal quantifier COMPLEMENT
→ 0297 ʗ latin letter stretched c ∂ PARTIAL DIFFERENTIAL ∃ THERE EXISTS = existential quantifier
. THERE DOES NOT EXIST ≡ 2203 ∃ 0338 ∅ EMPTY SET
= null set • used in linguistics to indicate a null morpheme or phonological “zero” → 00D8 Ø latin capital letter o with stroke → 2300 1 diameter sign
∆ INCREMENT
= Laplace operator = forward difference = symmetric difference of sets → 0394 Δ greek capital letter delta → 25B3 3 white up-pointing triangle
∇ NABLA
= backward difference = gradient, del • used for Laplacian operator (written with superscript 2) → 25BD 5 white down-pointing triangle
Set membership 2208 2209 220A
220B
∈ ELEMENT OF ∉ NOT AN ELEMENT OF ≡ 2208 ∈ 0338 8 SMALL ELEMENT OF • originates in math pi fonts; not the straight epsilon → 03F5 ϵ greek lunate epsilon symbol
∋ CONTAINS AS MEMBER
2216 2217 2218
∓ MINUS-OR-PLUS SIGN → 00B1 ± plus-minus sign DOT PLUS DIVISION SLASH • generic division operator → 002F / solidus → 2044 ⁄ fraction slash SET MINUS → 005C \ reverse solidus ∗ ASTERISK OPERATOR → 002A * asterisk RING OPERATOR
221A
√ SQUARE ROOT
Miscellaneous mathematical symbols 221E 221F 2220 2221 2222
∞ ∟ ∠
2223
DIVIDES
DOES NOT DIVIDE
Miscellaneous mathematical symbol
2225
2226
END OF PROOF
2210 2211
∏ N-ARY PRODUCT
@ N-ARY COPRODUCT = coproduct sign
186
2228
∑ N-ARY SUMMATION
= summation sign → 03A3 Σ greek capital letter sigma → 2140 double-struck n-ary summation
2212
2227
= product sign → 03A0 Π greek capital letter pi
Operators
= such that = APL stile → 007C | vertical line → 01C0 latin letter dental click
≡ 2223 ∣ 0338 PARALLEL TO
→ 01C1 latin letter lateral click → 2016 double vertical line NOT PARALLEL TO
≡ 2225 0338
Logical and set operators
N-ary operators 220F
= angle arc
∣
= q.e.d. → 2023 = triangular bullet → 25AE > black vertical rectangle
INFINITY RIGHT ANGLE ANGLE MEASURED ANGLE SPHERICAL ANGLE
Operators
2224
<
→ 00B7 · middle dot → 2022 • bullet → 2024 one dot leader
CUBE ROOT
FOURTH ROOT ∝ PROPORTIONAL TO → 03B1 α greek small letter alpha
: DOES NOT CONTAIN AS MEMBER ≡ 220B ∋ 0338 220D ; SMALL CONTAINS AS MEMBER → 03F6 ϶ greek reversed lunate epsilon symbol
220E
BULLET OPERATOR
= radical sign → 2713 ˉ check mark
= such that
220C
= composite function = APL jot → 00B0 ° degree sign → 25E6 white bullet
2219
221B 221C 221D
222A
− MINUS SIGN → 002D - hyphen-minus
2229 222A
∧ LOGICAL AND
= wedge, conjunction → 22C0 n-ary logical and → 2303 up arrowhead
∨ LOGICAL OR
= vee, disjunction → 22C1 " n-ary logical or → 2304 # down arrowhead
∩ INTERSECTION
= cap, hat → 22C2 % n-ary intersection
∪ UNION
= cup → 22C3 ' n-ary union
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Miscellaneous Technical Range: 2300–23FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
2300 230
2300
232
233
234
235
236
237
238
239
23A 23B 23C 23D 23E
23F
2310
2320
2330
2340
2350
2360
2370
2380
2390
23A0
23B0
23C0
23D0
23E0
⌡ 1 A Q a q - = ⚒
1
2301
2311
2321
2331
2341
2351
2361
2371
2381
2391
23A1
23B1
23C1
23D1
23E1
⌂ ⌒ $ 2 B R b r . > ⚓
2
2302
2312
2322
2332
2342
2352
2362
2372
2382
2392
23A2
23B2
23C2
23D2
23E2
% 3 C S c s
/ ? ⚔
3
2303
2313
2323
2333
2343
2353
2363
2373
2383
2393
& 4 D T d t
4
2304
2314
2324
2334
2344
2354
2364
2374
2384
2394
23A3
23A4
23B3
23C3
23D3
23E3
0 @ ⚕ 23B4
23C4
23D4
23E4
⌅ ' 5 E U e u ! 1 A⚖
5
2305
2315
2325
2335
2345
2355
2365
2375
2385
2395
23A5
23B5
23C5
23D5
23E5
(6 F V f v " 2 B ⚗
6
2306
2316
2326
2336
2346
2356
2366
2376
2386
2396
23A6
23B6
23C6
23D6
23E6
) 7 G W g w # 3 C ⚘
7
2307
2317
2327
2337
2347
2357
2367
2377
2387
2397
23A7
23B7
23C7
23D7
23E7
*8 H X h x ⎨ 4 D⚙
8
2308
2318
2328
2338
2348
2358
2368
2378
2388
2398
23A8
23B8
23C8
23D8
〈 9 I Y i y % 5 E⚚
9
2309
2319
2329
2339
2349
2359
2369
2379
2389
2399
23A9
23B9
23C9
23D9
〉 : J Z j z & 6 F Ω
A
230A
231A
232A
233A
234A
235A
236A
237A
238A
239A
23AA
23BA
23CA
23DA
+; K [ k { ' 7 G æ
B
230B
C
D
231B
232B
233B
234B
235B
236B
237B
238B
239B
23AB
23BB
23CB
23DB
, < L \ l ⎬ 8 H 230C
231C
232C
233C
234C
235C
236C
237C
238C
239C
23AC
23BC
23CC
23DC
- = M ] m | ) 9 I 230D
230E
F
231
23FF
⌠ 0 @ P ` p , <
0
E
Miscellaneous Technical
231D
231E
232D
233D
234D
235D
236D
237D
238D
239D
23AD
23BD
23CD
23DD
. > N ^ n } * : J 232E
233E
234E
235E
236E
237E
238E
239E
23AE
23BE
23CE
23DE
! / ? O _ o ~ + ; 230F
231F
232F
233F
234F
235F
236F
237F
238F
239F
23AF
23BF
23CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
23DF
191
2300
Miscellaneous Technical
Miscellaneous technical
GUI icons
2300
DIAMETER SIGN
2301
ELECTRIC ARROW
231A 231B
2302 2303
⌂ HOUSE UP ARROWHEAD → 005E ^ circumflex accent → 02C4 ˄ modifier letter up arrowhead → 2038 ! caret → 2227 ∧ logical and # DOWN ARROWHEAD → 02C5 ˅ modifier letter down arrowhead → 2228 ∨ logical or → 2335 & countersink ⌅ PROJECTIVE → 22BC ( nand ) PERSPECTIVE * WAVY LINE → 3030 wavy dash
2304
2305 2306 2307
→ 2205 ∅ empty set • from ISO 2047 • symbol for End of Transmission
Corner brackets The ceiling and floor characters are recommended for general-purpose corner brackets, rather than the CJK corner brackets, which are wide quotation marks. 2308 + LEFT CEILING = APL upstile → 300C 「 left corner bracket 2309 , RIGHT CEILING → 20E7 combining annuity symbol 230A - LEFT FLOOR = APL downstile 230B . RIGHT FLOOR → 300D 」 right corner bracket
Crops 230C
/ BOTTOM RIGHT CROP • set of four “crop” corners, arranged facing
230D 230E 230F
0 BOTTOM LEFT CROP 1 TOP RIGHT CROP 2 TOP LEFT CROP
outward
Miscellaneous technical 2310 2311
2312 2313 2314 2315 2316 2317 2318 2319
3 REVERSED NOT SIGN 4
= beginning of line → 00AC ¬ not sign SQUARE LOZENGE
= Kissen (pillow) • used as a command delimiter in some very old computers
⌒ ARC → 25E0 6 upper half circle 7 SEGMENT 8 SECTOR 9 TELEPHONE RECORDER : POSITION INDICATOR ; VIEWDATA SQUARE → 22D5 = equal and parallel to ? PLACE OF INTEREST SIGN = command key (1.0)
@ TURNED NOT SIGN = line marker
192
2330
WATCH HOURGLASS
Quine corners 231C
TOP LEFT CORNER • set of four “quine” corners, for quincuncial arrangement
• these are also used in mathematics in upper and lower pairs
231D 231E 231F
→ 2E00 ⸀ right angle substitution marker
TOP RIGHT CORNER BOTTOM LEFT CORNER BOTTOM RIGHT CORNER
Integral pieces 2320
⌠
TOP HALF INTEGRAL
2321
⌡
BOTTOM HALF INTEGRAL
→ 23AE integral extension
Frown and smile 2322 2323
FROWN → 2040 character tie
SMILE → 203F undertie
Keyboard symbols 2324
UP ARROWHEAD BETWEEN TWO HORIZONTAL BARS
= enter key 2325 OPTION KEY 2326 ERASE TO THE RIGHT = delete to the right key 2327 X IN A RECTANGLE BOX = clear key 2328 KEYBOARD
Angle brackets These are discouraged for mathematical use because of their canonical equivalence to CJK punctuation. 2329 〈 LEFT-POINTING ANGLE BRACKET → 003C < less-than sign → 2039 ‹ single left-pointing angle quotation mark → 27E8 ⟨ mathematical left angle bracket ≡ 3008 〈 left angle bracket 232A 〉 RIGHT-POINTING ANGLE BRACKET → 003E > greater-than sign → 203A › single right-pointing angle quotation mark → 27E9 ⟩ mathematical right angle bracket ≡ 3009 〉 right angle bracket
Keyboard symbol
232B ERASE TO THE LEFT = delete to the left key
Chemistry symbol 232C
BENZENE RING
Drafting symbols 232D 232E 232F 2330
CYLINDRICITY ALL AROUND-PROFILE SYMMETRY TOTAL RUNOUT
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Control Pictures Range: 2400–243F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Optical Character Recognition Range: 2440–245F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Enclosed Alphanumerics Range: 2460–24FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
2460
Enclosed Alphanumerics 246 0
24E0
24F0
2471
2481
2491
24A1
24B1
24C1
24D1
24E1
24F1
2472
2482
2492
24A2
24B2
24C2
24D2
24E2
24F2
2473
2483
2493
24A3
24B3
24C3
24D3
24E3
24F3
2474
2484
2494
24A4
24B4
24C4
24D4
24E4
24F4
2475
2485
2495
24A5
24B5
24C5
24D5
24E5
24F5
2476
2486
2496
24A6
24B6
24C6
24D6
24E6
24F6
2477
2487
2497
24A7
24B7
24C7
24D7
24E7
24F7
2478
2488
2498
24A8
24B8
24C8
24D8
24E8
24F8
2479
2489
2499
24A9
24B9
24C9
24D9
24E9
24F9
247A
248A
249A
24AA
24BA
24CA
24DA
24EA
24FA
247B
248B
249B
24AB
24BB
24CB
24DB
24EB
24FB
247C
248C
249C
24AC
24BC
24CC
24DC
24EC
24FC
247D
248D
249D
24AD
24BD
24CD
24DD
24ED
24FD
247E
248E
249E
24AE
24BE
24CE
24DE
24EE
24FE
⑿ ⒏ ⒟ ⒯ a q 246F
198
24D0
⑾ ⒎ ⒞ ⒮ ` p 246E
F
24C0
⑽ ⒍ ⒝ ⒭ _ o 246D
E
24B0
⑼ ⒌ ⒜ ⒬ ^ n ~ 246C
D
24A0
⑻ ⒋ ⒛ ⒫ ] m } 246B
C
2490
⑺ ⒊ ⒚ ⒪ \ l | 246A
B
2480
⑹ ⒉ ⒙ ⒩ [ k { 2469
A
2470
⑸ ⒈ ⒘ ⒨ Z j z 2468
9
24F
⑷ ⒇ ⒗ ⒧ Y i y 2467
8
24E
⑶ ⒆ ⒖ ⒦ X h x 2466
7
24D
⑵ ⒅ ⒕ ⒥ ⒵ g w 2465
6
24C
⑴ ⒄ ⒔ ⒤ ⒴ f v 2464
5
24B
⒃ ⒓ ⒣ ⒳ e u
2463
4
24A
⒂ ⒒ ⒢ ⒲ d t 2462
3
249
⒁ ⒑ ⒡ ⒱ c s 2461
2
248
⒀ ⒐ ⒠ ⒰ b r 2460
1
247
24FF
247F
248F
249F
24AF
24BF
24CF
24DF
24EF
24FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Box Drawing Range: 2500–257F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
2500
Box Drawing 250 0
2531
2541
2551
2561
2571
2512
2522
2532
2542
2552
2562
2572
2513
2523
2533
2543
2553
2563
2573
2514
2524
2534
2544
2554
2564
2574
2515
2525
2535
2545
2555
2565
2575
2516
2526
2536
2546
2556
2566
2576
2517
2527
2537
2547
2557
2567
2577
2518
2528
2538
2548
2558
2568
2578
2519
2529
2539
2549
2559
2569
2579
251A
252A
253A
254A
255A
256A
257A
251B
252B
253B
254B
255B
256B
257B
251C
252C
253C
254C
255C
256C
257C
251D
252D
253D
254D
255D
256D
257D
251E
252E
253E
254E
255E
256E
257E
┏ ┟ ┯ ┿ ╏ ╟ ╯ ╿ 250F
202
2521
┎ ┞ ┮ ┾ ╎ ╞ ╮ ╾
250E
F
2511
┍ ┝ ┭ ┽ ╍ ╝ ╭ ╽
250D
E
2570
┌ ├ ┬ ┼ ╌ ╜ ╬ ╼ 250C
D
2560
┋ ┛ ┫ ┻ ╋ ╛ ╫ ╻
250B
C
2550
┊ ┚ ┪ ┺ ╊ ╚ ╪ ╺
250A
B
2540
┉ ┙ ┩ ┹ ╉ ╙ ╩ ╹ 2509
A
2530
┈ ┘ ┨ ┸ ╈ ╘ ╨ ╸ 2508
9
2520
┇ ┗ ┧ ┷ ╇ ╗ ╧ ╷ 2507
8
2510
┆ ┖ ┦ ┶ ╆ ╖ ╦ ╶ 2506
7
257
┅ ┕ ┥ ┵ ╅ ╕ ╥ ╵ 2505
6
256
┄ └ ┤ ┴ ╄ ╔ ╤ ╴ 2504
5
255
┃ ┓ ┣ ┳ ╃ ╓ ╣ ╳ 2503
4
254
│ ┒ ┢ ┲ ╂ ╒ ╢ ╲ 2502
3
253
━ ┑ ┡ ┱ ╁ ║ ╡ ╱ 2501
2
252
─ ┐ ┠ ┰ ╀ ═ ╠ ╰ 2500
1
251
257F
251F
252F
253F
254F
255F
256F
257F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Block Elements Range: 2580–259F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Geometric Shapes Range: 25A0–25FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
25A0
Geometric Shapes 25A 0
25C2
25D2
25E2
25F2
25B3
25C3
25D3
25E3
25F3
25B4
25C4
25D4
25E4
25F4
25B5
25C5
25D5
25E5
25F5
25B6
25C6
25D6
25E6
25F6
25B7
25C7
25D7
25E7
25F7
25B8
25C8
25D8
25E8
25F8
25B9
25C9
25D9
25E9
25F9
25BA
25BB
25CA
25CB
25DA
25DB
25EA
25EB
25FA
25FB
25BC
25CC
25DC
25EC
25FC
/ ?
25AF
206
25B2
. >
25AE
F
25F1
○ - =
25AD
E
25E1
25AC
D
25D1
◊ , <
25AB
C
25C1
25AA
B
25B1
◉ + ; 25A9
A
25F0
* : 25A8
9
25E0
♦ ) 9 I 25A7
8
25D0
( 8 H 25A6
7
25C0
' 7 G 25A5
6
25B0
& 6 F 25A4
5
25F
% 5 E 25A3
4
25E
$ 4 D 25A2
3
25D
# 3 C 25A1
2
25C
" 2 B 25A0
1
25B
25FF
25BD
25BE
25CD
25CE
25DD
25ED
25FD
0 @ 25DE
25EE
25FE
! 1 A
25BF
25CF
25DF
25EF
25FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Miscellaneous Symbols Range: 2600–26FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
2600 260
2600
262
263
264
265
266
267
268
269
26A 26B 26C 26D 26E
26F
2610
2620
2630
2640
2650
2660
2670
2680
2690
26A0
26B0
+ ; K ♡ c mx
1
2601
2611
2621
2631
2641
2651
2661
2671
2681
2691
26A1
26B1
,
2
2602
2612
2622
2632
2642
2652
2662
2672
2682
2692
26A2
26B2
- = M o z
3
2603
2613
2623
2633
2643
2653
2663
2673
2683
2693
26A3
. > N ♤ p {
4
2604
2614
5
2605
2615
2624
2625
2634
2644
2654
2664
2674
2684
2694
26A4
/ ? O è | 2635
2645
2655
2665
2675
2685
2695
26A5
! 0@P q }
6
2606
2616
2626
2636
2646
2656
2666
2676
2686
2696
26A6
1 A Q ♧ r ~
7
2607
2617
2627
2637
2647
2657
2667
2677
2687
2697
26A7
" 2BR Z ⚈ ÷
8
2608
2618
2628
2638
2648
2658
2668
2678
2688
2698
26A8
# 3CS [ ⚉ s
9
2609
2619
2629
2639
2649
2659
2669
2679
2689
2699
26A9
$4DT \ f t
A
260A
261A
262A
263A
264A
265A
266A
267A
268A
269A
26AA
% 5EU ] g u
B
260B
C
D
261B
262B
263B
264B
265B
266B
267B
268B
269B
26AB
& 6FV ^ h v 260C
261C
262C
263C
264C
265C
266C
267C
268C
269C
26AC
' 7 GW _ i
( 8 HX ` d j
) 9 IY a e k
260D
260E
F
261
26FF
* : J b lw
0
E
Miscellaneous Symbols
260F
261D
261E
261F
262D
262E
262F
263D
263E
263F
264D
264E
264F
265D
265E
265F
266D
266E
266F
267D
267E
267F
268D
268E
268F
26AD
26AE
26AF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
209
Dingbats Range: 2700–27BF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
2700
Dingbats 270
271
2710
2712
e u
2770
2750
2780
2790
27A0
2721
2722
2731
2741
2751
2761
/ ? M W 2732
2742
2752
2762
2771
2781
2791
27A1
27B1
g w £
2772
2782
2792
27A2
27B2
2714
2723
2724
2733
2734
2763
2743
h x ¤
2764
2744
2773
2774
2783
2784
2793
2794
27A3
27A4
27B3
27B4
§
2716
2725
2726
2735
2736
2765
2745
2746
2756
2717
2727
2737
2747
2767
5 E O
2738
2718
2748
2758
& 6 F P 2719
2729
2739
2749
2759
' 7 G Q 271A
272A
273A
274A
275A
( 8 H R
B
271B
272B
273B
274B
270C
271C
272C
275B
S
) 9
275C
273C
2766
2775
2776
27A5
2785
27A6
2786
\ l |
% 4 D
A
2777
27B5
27B6
¨
27A7
2787
27B7
] m } ©
2768
2778
2788
2798
27A8
27B8
^ n ~ ª
2769
2779
2789
2799
27A9
27B9
_ o «
276A
277A
278A
279A
27AA
27BA
` p ¬
276B
277B
278B
279B
27AB
27BB
a q
276C
277C
278C
279C
27AC
27BC
* : I T b r ® 270D
271D
272D
273D
274D
271E
272E
275E
273E
, < J 270F
271F
272F
273F
275D
276D
277D
278D
279D
27AD
27BD
U c s ¯
+ ; 270E
F
27B
$ 3 C N [ k {
2709
E
27A
¦
2708
D
279
Z j z
2707
C
2740
# 2 B 2706
9
2730
278
Y i y ¥
2713
2715
8
2720
277
" 1 A
5
7
276
X
2704
6
275
! 0 @ 2703
4
2711
2702
3
274
. > L V f v ¢ 2701
2
273
- = K
0
1
272
27BF
274F
276E
277E
278E
279E
27AE
27BE
d t ¡
276F
277F
278F
279F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
27AF
213
Miscellaneous Mathematical Symbols-A Range: 27C0–27EF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Supplemental Arrows-A Range: 27F0–27FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Braille Patterns Range: 2800–28FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
2800 280
2800
282
283
284
285
286
287
288
289
28A 28B 28C 28D 28E
28F
2810
2820
2830
2840
2850
2860
2870
2880
2890
28A0
28B0
28C0
28D0
28E0
28F0
Åë°±¡—·ÒÅë°±¡—·Ò
1
2801
2811
2821
2831
2841
2851
2861
2871
2881
2891
28A1
28B1
28C1
28D1
28E1
28F1
Çí¢≤¬“‚ÚÇí¢≤¬“‚Ú
2
2802
2812
2822
2832
2842
2852
2862
2872
2882
2892
28A2
28B2
28C2
28D2
28E2
28F2
Éì£≥√”„ÛÉì£≥√”„Û
3
2803
2813
2823
2833
2843
2853
2863
2873
2883
2893
28A3
28B3
28C3
28D3
28E3
28F3
Ñƒ‘‰ÙÑƒ‘‰Ù
4
2804
2814
2824
2834
2844
2854
2864
2874
2884
2894
28A4
28B4
28C4
28D4
28E4
28F4
Öï•μ≈’ÂıÖï•μ≈’Âı
5
2805
2815
2825
2835
2845
2855
2865
2875
2885
2895
28A5
28B5
28C5
28D5
28E5
28F5
Üñ¶∂∆÷ʈÜñ¶∂∆÷ʈ
6
2806
2816
2826
2836
2846
2856
2866
2876
2886
2896
28A6
28B6
28C6
28D6
28E6
28F6
áóß∑«◊Á˜áóß∑«◊Á˜
7
2807
2817
2827
2837
2847
2857
2867
2877
2887
2897
28A7
28B7
28C7
28D7
28E7
28F7
àò®∏»ÿ˯àò®∏»ÿ˯
8
2808
2818
2828
2838
2848
2858
2868
2878
2888
2898
28A8
28B8
28C8
28D8
28E8
28F8
âô©π…ŸÈ˘âô©π…ŸÈ˘
9
2809
2819
2829
2839
2849
2859
2869
2879
2889
2899
28A9
28B9
28C9
28D9
28E9
28F9
äö™∫ ⁄Í˙äö™∫ ⁄Í˙
A
280A
281A
282A
283A
284A
285A
286A
287A
288A
289A
28AA
28BA
28CA
28DA
28EA
28FA
ãõ´ªÀ¤Î˚ãõ´ªÀ¤Î˚
B
280B
C
D
281B
282B
283B
284B
285B
286B
287B
288B
289B
28AB
28BB
28CB
28DB
28EB
28FB
åú¨ºÃ‹Ï¸åú¨ºÃ‹Ï¸ 280C
281C
282C
283C
284C
285C
286C
287C
288C
289C
28AC
28BC
28CC
28DC
28EC
28FC
çù≠ΩÕ›Ì˝çù≠ΩÕ›Ì˝ 280D
281D
282D
283D
284D
285D
286D
287D
288D
289D
28AD
28BD
28CD
28DD
28ED
28FD
éûÆæŒfiÓ˛éûÆæŒfiÓ˛ 280E
F
281
28FF
Äê†∞¿–‡Äê†∞¿–‡
0
E
Braille Patterns
281E
282E
283E
284E
285E
286E
287E
288E
289E
28AE
28BE
28CE
28DE
28EE
28FE
èüØøœflÔˇèüØøœflÔˇ 280F
220
281F
282F
283F
284F
285F
286F
287F
288F
289F
28AF
28BF
28CF
28DF
28EF
28FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Supplemental Arrows-B Range: 2900–297F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
2900
Supplemental Arrows-B 290 0
2941
2912
2913
2922
2923
2932
2933
2942
2943
2914
2924
2934
2944
2915
2925
2935
2945
2916
2926
2936
2946
2917
2927
2918
2928
2919
2929
2937
2947
2951
2970
⥡ ⥱
2961
2971
2952
2953
2962
2963
2972
2973
2954
2955
2964
2974
⥥ ⥵ 2965
2975
2956
2966
2976
2957
2967
2977
⤸ ⥈ ⥘ ⥨ ⥸
2938
⤹
2939
2948
⥉
2949
291A
292A
293A
294A
291B
292B
293B
294B
⤌ ⤜ ⤬ ⤼ ⥌ 291C
292C
293C
294C
⤍ ⤝ ⤭ ⤽ ⥍
2958
2968
2978
⥙ ⥩ ⥹
2959
2969
2979
295A
296A
297A
291D
292D
293D
294D
295B
296B
297B
⥜ ⥬ ⥼
295C
296C
297C
⥝ ⥭ ⥽
295D
296D
297D
⤎ ⤞ ⤮ ⤾ ⥎ ⥞ ⥮ ⥾ 290E
F
2931
2960
⤋ ⤛ ⤫ ⤻ ⥋ ⥛ ⥫ ⥻
290D
E
2950
⤊ ⤚ ⤪ ⤺ ⥊ ⥚ ⥪ ⥺
290C
D
2921
⤉ ⤙ ⤩
290B
C
2911
⤈ ⤘ ⤨
290A
B
2940
⤇ ⤗ ⤧ ⤷ ⥇ ⥗ ⥧ ⥷
2909
A
2930
⤆ ⤖ ⤦ ⤶ ⥆ ⥖ ⥦ ⥶
2908
9
2920
⤅ ⤕ ⤥ ⤵ ⥅ ⥕
2907
8
2910
⤄ ⤔ ⤤ ⤴ ⥄ ⥔ ⥤ ⥴
2906
7
297
⤣ ⤳ ⥃ ⥓ ⥣ ⥳
2905
6
296
⤃ ⤓ 2904
5
295
⤢ ⤲ ⥂ ⥒ ⥢ ⥲
2903
4
294
⤂ ⤒ 2902
3
293
⤁ ⤑ ⤡ ⤱ ⥁ ⥑ 2901
2
292
⤀ ⤐ ⤠ ⤰ ⥀ ⥐ ⥠ ⥰ 2900
1
291
297F
291E
292E
293E
294E
295E
296E
297E
⤏ ⤟ ⤯ ⤿ ⥏ ⥟ ⥯ ⥿ 290F
291F
292F
293F
294F
295F
296F
297F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
223
Miscellaneous Mathematical Symbols-B Range: 2980–29FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
2980
Miscellaneous Mathematical Symbols-B 298 0
⦀
2980
1
⦁
2981
2
⦂
2982
3
⦉
2989
A
⦊
298A
B
29E0
29F0
⦑ ⦡ ⦱ ⧁ ⧑ ⧡ ⧱
2991
29A1
29B1
29C1
29D1
29E1
29F1
⦒ ⦢ ⦲ ⧂ ⧒ ⧢ ⧲
2992
29A2
29B2
29C2
29D2
29E2
29F2
2993
29A3
29B3
29C3
29D3
29E3
29F3
2994
29A4
29B4
29C4
29D4
29E4
2995
29A5
29B5
29C5
29D5
29E5
29F4
⧵
29F5
2996
29A6
29B6
29C6
29D6
29E6
2997
29A7
29B7
29C7
2998
29A8
29B8
29C8
29D7
29E7
29F6
⧷
29F7
⧘ ⧨ ⧸
29D8
29E8
29F8
⦙
⦩ ⦹ ⧉ ⧙ ⧩ ⧹
⦚
⦪ ⦺ ⧊ ⧚ ⧪ ⧺
2999
299A
29A9
29AA
29B9
29BA
29C9
29CA
29D9
29DA
29E9
29EA
29F9
29FA
299B
29AB
29BB
29CB
29DB
29EB
29FB
299C
29AC
29BC
29CC
29DC
29EC
29FC
299D
29AD
29BD
29CD
29DD
29ED
29FD
299E
29AE
29BE
29CE
29DE
29EE
29FE
⦏ ⦟ ⦯ ⦿ ⧏ ⧟ ⧯ ⧿
298F
226
29D0
⦎ ⦞ ⦮ ⦾ ⧎ ⧞ ⧮ ⧾
298E
F
29C0
⦍ ⦝ ⦭ ⦽ ⧍ ⧝ ⧭ ⧽
298D
E
29B0
⦌ ⦜ ⦬ ⦼ ⧌ ⧜ ⧬ ⧼
298C
D
29A0
⦋ ⦛ ⦫ ⦻ ⧋ ⧛ ⧫ ⧻
298B
C
29F
⦐ ⦠ ⦰ ⧀ ⧐ ⧠ ⧰
2990
⦈ ⦘ ⦨ ⦸ ⧈
2988
9
29E
⦇ ⦗ ⦧ ⦷ ⧇ ⧗ ⧧
2987
8
29D
⦆ ⦖ ⦦ ⦶ ⧆ ⧖ ⧦ ⧶
2986
7
29C
⦅ ⦕ ⦥ ⦵ ⧅ ⧕ ⧥
2985
6
29B
⦄ ⦔ ⦤ ⦴ ⧄ ⧔ ⧤ ⧴
2984
5
29A
⦃ ⦓ ⦣ ⦳ ⧃ ⧓ ⧣ ⧳
2983
4
299
29FF
299F
29AF
29BF
29CF
29DF
29EF
29FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Supplemental Mathematical Operators Range: 2A00–2AFF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
2A00
Supplemental Mathematical Operators
2AFF
2A0 2A1 2A2 2A3 2A4 2A5 2A6 2A7 2A8 2A9 2AA 2AB 2AC 2AD 2AE 2AF
⨀ ⨐ ⨠
0
2A00
2A01
2A30
⩀ ⩐ ⩠ ⩰ ⪀ ⪐ ⪠ ⪰ ⫀ ⫐ ⫠ ⫰ 2A40
2A50
2A60
2A70
2A80
2A90
2AA0
2AB0
2AC0
2AD0
2AE0
2AF0
2A11
2A21
2A31
2A41
2A51
2A61
2A71
2A81
2A91
2AA1
2AB1
2AC1
2AD1
2AE1
2AF1
⨂ ⨒ ⨢ ⨲ ⩂ ⩒ ⩢ ⩲ ⪂ ⪒ ⪢ ⪲ ⫂ ⫒ ⫲
2
2A02
2A12
2A22
2A32
2A42
2A52
2A62
2A72
2A82
2A92
2AA2
2AB2
2AC2
2AD2
2AE2
2AF2
⨃ ⨓ ⨣ ⨳ ⩃ ⩓ ⩣ ⩳ ⪃ ⪓ ⪣ ⪳ ⫃ ⫓ ⫣ ⫳
3
2A03
2A13
2A23
2A33
2A43
2A53
2A63
2A73
2A83
2A93
2AA3
2AB3
2AC3
2AD3
2AE3
2AF3
⨄ ⨔ ⨤ ⨴ ⩄ ⩔ ⩤ ⩴ ⪄ ⪔ ⪤ ⪴ ⫄ ⫔ ⫤ ⫴
4
2A04
2A14
2A24
2A34
2A44
2A54
2A64
2A74
2A84
2A94
2AA4
2AB4
2AC4
2AD4
2AE4
2AF4
⨅ ⨕ ⨥ ⨵ ⩅ ⩕ ⩥ ⩵ ⪅ ⪕ ⪥ ⪵ ⫅ ⫕ ⫥ ⫵
5
2A05
2A15
2A25
2A35
2A45
2A55
2A65
2A75
2A85
2A95
2AA5
2AB5
2AC5
2AD5
2AE5
2AF5
⨆ ⨖ ⨦ ⨶ ⩆ ⩖ ⩦⩶⪆ ⪖ ⪦ ⪶ ⫆ ⫖ ⫦ ⫶
6
2A06
2A16
2A26
2A36
2A46
2A56
2A66
2A76
2A86
2A96
2AA6
2AB6
2AC6
2AD6
2AE6
2AF6
⨇ ⨗ ⨧ ⨷ ⩇ ⩗ ⩧ ⩷ ⪇ ⪗ ⪧ ⪷ ⫇ ⫗ ⫧ ⫷
7
2A07
2A17
2A27
2A37
2A47
2A57
2A67
2A77
2A87
2A97
2AA7
2AB7
2AC7
2AD7
2AE7
2AF7
⨈ ⨘ ⨨ ⨸ ⩈ ⩘ ⩨ ⩸ ⪈ ⪘ ⪨ ⪸ ⫈ ⫘ ⫨ ⫸
8
2A08
2A18
2A28
2A38
2A48
2A58
2A68
2A78
2A88
2A98
2AA8
2AB8
2AC8
2AD8
2AE8
2AF8
⨉ ⨙ ⨩ ⨹ ⩉ ⩙ ⩩ ⩹ ⪉ ⪙ ⪩ ⪹ ⫉ ⫙ ⫩ ⫹
9
2A09
2A19
2A29
2A39
2A49
2A59
2A69
2A79
2A89
2A99
2AA9
2AB9
2AC9
2AD9
2AE9
2AF9
⨊ ⨚ ⨪ ⨺ ⩊ ⩚ ⩪ ⩺ ⪊ ⪚ ⪪ ⪺ ⫊ ⫚ ⫪ ⫺
A
2A0A
2A1A
2A2A
2A3A
2A4A
2A5A
2A6A
2A7A
2A8A
2A9A
2AAA
2ABA
2ACA
2ADA
2AEA
2AFA
⨋ ⨛ ⨫ ⨻ ⩋ ⩛ ⩫ ⩻ ⪋ ⪛ ⪫ ⪻ ⫋ ⫛ ⫫ ⫻
B
2A0B
C
⨌ 2A0C
D
2A1B
2A2B
2A3B
2A4B
2A5B
2A6B
2A7B
2A8B
2A9B
2AAB
2ABB
2ACB
2ADB
2AEB
2AFB
⨜ ⨬ ⨼ ⩌ ⩜ ⩬ ⩼ ⪌ ⪜ ⪬ ⪼ ⫌ ⫝̸ ⫬ ⫼
2A1C
2A2C
2A3C
2A4C
2A5C
2A6C
2A7C
2A8C
2A9C
2AAC
2ABC
2ACC
2ADC
2AEC
2AFC
⨍ ⨝ ⨭ ⨽ ⩍ ⩝ ⩭ ⩽ ⪍ ⪝ ⪭ ⪽ ⫍ ⫝ ⫭ ⫽
2A0D
2A1D
2A2D
2A3D
2A4D
2A5D
2A6D
2A7D
2A8D
2A9D
2AAD
2ABD
2ACD
2ADD
2AED
⨎ ⨞ ⨮ ⨾ ⩎ ⩞ ⩮ ⩾ ⪎ ⪞ ⪮ ⪾ ⫎ ⫞ ⫮
2A0E
F
2A20
⨁ ⨑ ⨡ ⨱ ⩁ ⩑ ⩡ ⩱ ⪁ ⪑ ⪡ ⪱ ⫁ ⫑ ⫡ ⫱
1
E
2A10
2A1E
2A2E
2A3E
2A4E
2A5E
2A6E
2A7E
2A8E
2A9E
2AAE
2ABE
2ACE
2ADE
2AEE
2AFD
⫾
2AFE
⨏ ⨟ ⨯ ⨿ ⩏ ⩟ ⩯ ⩿ ⪏ ⪟ ⪯ ⪿ ⫏ ⫟ ⫯ ⫿
2A0F
2A1F
2A2F
2A3F
2A4F
2A5F
2A6F
2A7F
2A8F
2A9F
2AAF
2ABF
2ACF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
2ADF
2AEF
2AFF
229
2A00
Supplemental Mathematical Operators 2A21
N-ary operators
2A00 ⨀ N-ARY CIRCLED DOT OPERATOR → 2299 circled dot operator → 25C9 ◉ fisheye 2A01 ⨁ N-ARY CIRCLED PLUS OPERATOR → 2295 ⊕ circled plus 2A02 ⨂ N-ARY CIRCLED TIMES OPERATOR → 2297 ⊗ circled times 2A03 ⨃ N-ARY UNION OPERATOR WITH DOT 2A04 ⨄ N-ARY UNION OPERATOR WITH PLUS → 228E multiset union 2A05 ⨅ N-ARY SQUARE INTERSECTION OPERATOR → 2293 square cap 2A06 ⨆ N-ARY SQUARE UNION OPERATOR → 2294 square cup 2A07 ⨇ TWO LOGICAL AND OPERATOR = merge → 2A55 ⩕ two intersecting logical and 2A08 ⨈ TWO LOGICAL OR OPERATOR → 2A56 ⩖ two intersecting logical or 2A09 ⨉ N-ARY TIMES OPERATOR → 00D7 × multiplication sign
Summations and integrals
2A0A ⨊ MODULO TWO SUM → 2211 ∑ n-ary summation 2A0B ⨋ SUMMATION WITH INTEGRAL 2A0C ⨌ QUADRUPLE INTEGRAL OPERATOR → 222D triple integral 222B ∫ 222B ∫ 222B ∫ 222B ∫ 2A0D ⨍ FINITE PART INTEGRAL 2A0E ⨎ INTEGRAL WITH DOUBLE STROKE 2A0F ⨏ INTEGRAL AVERAGE WITH SLASH 2A10 ⨐ CIRCULATION FUNCTION 2A11 ⨑ ANTICLOCKWISE INTEGRATION 2A12 ⨒ LINE INTEGRATION WITH RECTANGULAR 2A13
⨓
2A14 2A15
⨔ ⨕
2A16 2A17
⨖ ⨗
2A18 2A19 2A1A 2A1B
⨘ ⨙ ⨚ ⨛
2A1C
⨜
Z NOTATION SCHEMA PROJECTION
→ 21BE upwards harpoon with barb rightwards
Plus and minus sign operators 2A22 2A23
⨢ PLUS SIGN WITH SMALL CIRCLE ABOVE ⨣ PLUS SIGN WITH CIRCUMFLEX ACCENT
2A24
⨤ PLUS SIGN WITH TILDE ABOVE
2A25 2A26
⨥ PLUS SIGN WITH DOT BELOW → 2214 dot plus ⨦ PLUS SIGN WITH TILDE BELOW
2A27
⨧ PLUS SIGN WITH SUBSCRIPT TWO
2A28 2A29 2A2A
⨨ PLUS SIGN WITH BLACK TRIANGLE ⨩ MINUS SIGN WITH COMMA ABOVE ⨪ MINUS SIGN WITH DOT BELOW → 2238 dot minus ⨫ MINUS SIGN WITH FALLING DOTS ⨬ MINUS SIGN WITH RISING DOTS ⨭ PLUS SIGN IN LEFT HALF CIRCLE ⨮ PLUS SIGN IN RIGHT HALF CIRCLE
2A2B 2A2C 2A2D 2A2E
ABOVE
= positive difference or sum
= sum or positive difference = nim-addition
Multiplication and division sign operators 2A2F
⨯
2A30 2A31 2A32 2A33 2A34 2A35 2A36
⨱ ⨲ ⨳ ⨴ ⨵ ⨶
VECTOR OR CROSS PRODUCT
→ 00D7 × multiplication sign
MULTIPLICATION SIGN WITH DOT ABOVE MULTIPLICATION SIGN WITH UNDERBAR SEMIDIRECT PRODUCT WITH BOTTOM CLOSED SMASH PRODUCT MULTIPLICATION SIGN IN LEFT HALF CIRCLE MULTIPLICATION SIGN IN RIGHT HALF CIRCLE CIRCLED MULTIPLICATION SIGN WITH CIRCUMFLEX ACCENT MULTIPLICATION SIGN IN DOUBLE CIRCLE CIRCLED DIVISION SIGN
2A37 ⨷ PATH AROUND POLE 2A38 ⨸ LINE INTEGRATION WITH SEMICIRCULAR PATH AROUND POLE LINE INTEGRATION NOT INCLUDING THE POLE Miscellaneous mathematical operators INTEGRAL AROUND A POINT OPERATOR 2A39 ⨹ PLUS SIGN IN TRIANGLE → 222E ∮ contour integral 2A3A ⨺ MINUS SIGN IN TRIANGLE QUATERNION INTEGRAL OPERATOR 2A3B ⨻ MULTIPLICATION SIGN IN TRIANGLE INTEGRAL WITH LEFTWARDS ARROW WITH 2A3C ⨼ INTERIOR PRODUCT HOOK → 230B right floor INTEGRAL WITH TIMES SIGN 2A3D ⨽ RIGHTHAND INTERIOR PRODUCT INTEGRAL WITH INTERSECTION → 230A left floor INTEGRAL WITH UNION → 2319 turned not sign INTEGRAL WITH OVERBAR 2A3E ⨾ Z NOTATION RELATIONAL COMPOSITION = upper integral
INTEGRAL WITH UNDERBAR
= lower integral
Miscellaneous large operators
2A1D ⨝ JOIN = large bowtie • relational database theory → 22C8 H bowtie → 27D7 ⟗ full outer join 2A1E ⨞ LARGE LEFT TRIANGLE OPERATOR • relational database theory → 25C1 white left-pointing triangle 2A1F ⨟ Z NOTATION SCHEMA COMPOSITION → 2A3E ⨾ z notation relational composition 2A20 ⨠ Z NOTATION SCHEMA PIPING → 226B much greater-than
230
⨡
2A46
2A3F
→ 2A1F ⨟ z notation schema composition ⨿ AMALGAMATION OR COPRODUCT → 2210 n-ary coproduct
Intersections and unions 2A40 2A41 2A42 2A43 2A44 2A45 2A46
⩀ INTERSECTION WITH DOT → 2227 ∧ logical and → 27D1 ⟑ and with dot ⩁ UNION WITH MINUS SIGN ⩂ ⩃ ⩄ ⩅ ⩆
= z notation bag subtraction → 228E multiset union
UNION WITH OVERBAR INTERSECTION WITH OVERBAR INTERSECTION WITH LOGICAL AND UNION WITH LOGICAL OR UNION ABOVE INTERSECTION
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Miscellaneous Symbols and Arrows Range: 2B00–2BFF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Glagolitic Range: 2C00–2C5F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Latin Extended-C Range: 2C60–2C7F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Coptic Range: 2C80–2CFF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
2C80
Coptic 2C8 0
Ç
2C82
3
É
2C83
4
í
2C92
ì
2C93
2CA1
2CB1
2CC1
2CD1
2CE0
2C94
2C95
2CE1
¢ ≤ ¬ “ ‚
2CA2
2CB2
2CC2
2CD2
2CE2
£ ≥ √ ” „
2CA3
2CB3
2CC3
2CD3
2CA4
2CB4
2CC4
2CA5
2CB5
2CC5
2CD4
2CE3
‰
2CE4
’ Â
2CD5
2C96
2CA6
2CB6
2CC6
2CD6
2C97
2CA7
2CB7
2CC7
2CD7
2C98
2CA8
2CB8
2C99
2CA9
2CB9
2CC8
2CD8
2CE5
2CE6
2CE7
2CE8
… Ÿ È ˘
2CC9
2CD9
2C9A
2CAA
2CBA
2CCA
2CDA
2C9B
2CAB
2CBB
2CCB
2CDB
2C9C
2CAC
2CBC
2CCC
2CDC
2C9D
2CAD
2CBD
2CCD
2CDD
é û Æ æ Œ fi 2C8E
F
2C91
ç ù ≠ Ω Õ ›
2C8D
E
2CD0
å ú ¨ º Ã ‹ 2C8C
D
2CC0
ã õ ´ ª À € 2C8B
C
2CB0
2CE9
2CF9
ä ö ™ ∫ ⁄Í ˙ 2C8A
B
2CA0
â ô © π 2C89
A
2CF
à ò ® ∏ » ÿ Ë 2C88
9
2CE
á ó ß ∑ « ◊ Á 2C87
8
2CD
Ü ñ ¶ ∂ ∆ ÷ Ê 2C86
7
2C90
Ö ï • μ ≈
2C85
6
2CC
Ñ î § ¥ ƒ ‘ 2C84
5
2CB
Å ë ° ± ¡ — · 2C81
2
2CA
Ä ê † ∞ ¿ – ‡ 2C80
1
2C9
2CFF
2C9E
2CAE
2CBE
2CCE
2CDE
è ü Ø ø œ fl
2C8F
2C9F
2CAF
2CBF
2CCF
2CDF
2CEA
2CFA
˚
2CFB
¸ 2CFC
˝
2CFD
˛
2CFE
ˇ
2CFF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
239
Georgian Supplement Range: 2D00–2D2F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Tifinagh Range: 2D30–2D7F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Ethiopic Extended Range: 2D80–2DDF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Supplemental Punctuation Range: 2E00–2E7F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
CJK Radicals Supplement Range: 2E80–2EFF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
2E80
CJK Radicals Supplement 2E8 0
2E91
2E92
2E93
2E94
2E95
2E96
2E97
2E98
2E99
2EC1
2ED1
2EE1
2EF1
2EA2
2EB2
2EC2
2ED2
2EE2
2EF2
2EA3
2EB3
2EC3
2ED3
2EE3
2EF3
2EA4
2EB4
2EC4
2ED4
2EE4
2EA5
2EB5
2EC5
2ED5
2EE5
2EA6
2EB6
2EC6
2ED6
2EE6
2EA7
2EB7
2EC7
2ED7
2EE7
2EA8
2EB8
2EC8
2ED8
2EE8
2EA9
2EB9
2EC9
2ED9
2EE9
2EAA
2EBA
2ECA
2EDA
2EEA
2E9B
2EAB
2EBB
2ECB
2EDB
2EEB
2E9C
2EAC
2EBC
2ECC
2EDC
2EEC
2E9D
2EAD
2EBD
2ECD
2EDD
2EED
2E9E
2EAE
2EBE
2ECE
2EDE
2EEE
⺏ ⺟ ⺯ ⺿ ⻏ ⻟ ⻯ 2E8F
248
2EB1
⺎ ⺞ ⺮ ⺾ ⻎ ⻞ ⻮ 2E8E
F
2EA1
⺍ ⺝ ⺭ ⺽ ⻍ ⻝ ⻭ 2E8D
E
2EF0
⺌ ⺜ ⺬ ⺼ ⻌ ⻜ ⻬ 2E8C
D
2EE0
⺋ ⺛ ⺫ ⺻ ⻋ ⻛ ⻫ 2E8B
C
2ED0
⺪ ⺺ ⻊ ⻚ ⻪
⺊ 2E8A
B
2EC0
⺉ ⺙ ⺩ ⺹ ⻉ ⻙ ⻩ 2E89
A
2EB0
⺈ ⺘ ⺨ ⺸ ⻈ ⻘ ⻨ 2E88
9
2EA0
⺇ ⺗ ⺧ ⺷ ⻇ ⻗ ⻧ 2E87
8
2E90
⺆ ⺖ ⺦ ⺶ ⻆ ⻖ ⻦ 2E86
7
2EF
⺅ ⺕ ⺥ ⺵ ⻅ ⻕ ⻥ 2E85
6
2EE
⺄ ⺔ ⺤ ⺴ ⻄ ⻔⻤ 2E84
5
2ED
⺃ ⺓ ⺣ ⺳ ⻃ ⻓ ⻣ ⻳ 2E83
4
2EC
⺂ ⺒ ⺢ ⺲ ⻂ ⻒ ⻢ ⻲ 2E82
3
2EB
⺁ ⺑ ⺡ ⺱ ⻁ ⻑ ⻡ ⻱ 2E81
2
2EA
⺀ ⺐ ⺠ ⺰ ⻀ ⻐ ⻠ ⻰ 2E80
1
2E9
2EFF
2E9F
2EAF
2EBF
2ECF
2EDF
2EEF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Kangxi Radicals Range: 2F00–2FDF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
2F00 2F0 0
2F0E
F
2F10
2F20
2F30
2F40
2F50
2F60
2F70
2F80
2F90
2FA0
2FB0
2FC0
2FD0
2F11
2F21
2F31
2F41
2F51
2F61
2F71
2F81
2F91
2FA1
2FB1
2FC1
2FD1
2F12
2F22
2F32
2F42
2F52
2F62
2F72
2F82
2F92
2FA2
2FB2
2FC2
2FD2
2F13
2F23
2F33
2F43
2F53
2F63
2F73
2F83
2F93
2FA3
2FB3
2FC3
2FD3
2F14
2F24
2F34
2F44
2F54
2F64
2F74
2F84
2F94
2FA4
2FB4
2FC4
2FD4
2F15
2F25
2F35
2F45
2F55
2F65
2F75
2F85
2F95
2FA5
2FB5
2FC5
2FD5
2F16
2F26
2F36
2F46
2F56
2F66
2F76
2F86
2F96
2FA6
2FB6
2FC6
2F17
2F27
2F37
2F47
2F57
2F67
2F77
2F87
2F97
2FA7
2FB7
2FC7
2F18
2F28
2F38
2F48
2F58
2F68
2F78
2F88
2F98
2FA8
2FB8
2FC8
2F19
2F29
2F39
2F49
2F59
2F69
2F79
2F89
2F99
2FA9
2FB9
2FC9
2F1A
2F2A
2F3A
2F4A
2F5A
2F6A
2F7A
2F8A
2F9A
2FAA
2FBA
2FCA
2F1B
2F2B
2F3B
2F4B
2F5B
2F6B
2F7B
2F8B
2F9B
2FAB
2FBB
2FCB
2F1C
2F2C
2F3C
2F4C
2F5C
2F6C
2F7C
2F8C
2F9C
2FAC
2FBC
2FCC
/ ? O _ o ¯ ¿ Ï 2F0D
E
2FD
. > N ^ n~ ® ¾ Î 2F0C
D
2FC
- = M ] m } ½ Í 2F0B
C
2FB
, < L \ l | ¬ ¼ Ì 2F0A
B
2FA
+ ; K [ k { « » Ë 2F09
A
2F9
* : J Z j z ª º Ê 2F08
9
2F8
) 9 I Y i y © ¹ É 2F07
8
2F7
( 8 H X h x ¨ ¸ È 2F06
7
2F6
' 7 G W g w § · Ç × 2F05
6
2F5
& 6 F V f v ¦ ¶ Æ Ö 2F04
5
2F4
% 5 E U e u
¥ µ Å Õ 2F03
4
2F3
$ 4 D T d t ¤ ´ Ä Ô 2F02
3
2F2
# 3 C S c s £ ³ Ã Ó 2F01
2
2F1
2FDF
" 2 B R b r ¢ ² Â Ò 2F00
1
Kangxi Radicals
2F1D
2F1E
2F2D
2F3D
2F4D
2F5D
2F6D
2F7D
2F8D
2F9D
2FAD
2FBD
2FCD
0 @ P ` p ° À Ð 2F2E
2F3E
2F4E
2F5E
2F6E
2F7E
2F8E
2F9E
2FAE
2FBE
2FCE
! 1 A Q a q ¡ ± Á Ñ 2F0F
2F1F
2F2F
2F3F
2F4F
2F5F
2F6F
2F7F
2F8F
2F9F
2FAF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
2FBF
2FCF
253
2F00 Kangxi radicals
Kangxi Radicals
2F00 ! KANGXI RADICAL ONE 4E00 一 2F01 " KANGXI RADICAL LINE 4E28 丨 2F02 # KANGXI RADICAL DOT 4E36 丶 2F03 $ KANGXI RADICAL SLASH 4E3F 丿 2F04 % KANGXI RADICAL SECOND 4E59 乙 2F05 & KANGXI RADICAL HOOK 4E85 亅 2F06 ' KANGXI RADICAL TWO 4E8C 二 2F07 ( KANGXI RADICAL LID 4EA0 亠 2F08 ) KANGXI RADICAL MAN 4EBA 人 2F09 * KANGXI RADICAL LEGS 513F 儿 2F0A + KANGXI RADICAL ENTER 5165 入 2F0B , KANGXI RADICAL EIGHT 516B 八 2F0C - KANGXI RADICAL DOWN BOX 5182 冂 2F0D . KANGXI RADICAL COVER 5196 冖 2F0E / KANGXI RADICAL ICE 51AB 冫 2F0F 0 KANGXI RADICAL TABLE 51E0 几 2F10 1 KANGXI RADICAL OPEN BOX 51F5 凵 2F11 2 KANGXI RADICAL KNIFE 5200 刀 2F12 3 KANGXI RADICAL POWER 529B 力 2F13 4 KANGXI RADICAL WRAP 52F9 勹 2F14 5 KANGXI RADICAL SPOON 5315 匕 2F15 6 KANGXI RADICAL RIGHT OPEN BOX 531A 匚 2F16 7 KANGXI RADICAL HIDING ENCLOSURE 5338 匸 2F17 8 KANGXI RADICAL TEN 5341 十 2F18 9 KANGXI RADICAL DIVINATION 535C 卜 2F19 : KANGXI RADICAL SEAL 5369 卩 2F1A ; KANGXI RADICAL CLIFF 5382 厂 2F1B < KANGXI RADICAL PRIVATE 53B6 厶 2F1C = KANGXI RADICAL AGAIN 53C8 又 2F1D > KANGXI RADICAL MOUTH 53E3 口
254
2F3C
2F1E KANGXI RADICAL ENCLOSURE 56D7 囗 2F1F KANGXI RADICAL EARTH 571F 土 2F20 KANGXI RADICAL SCHOLAR 58EB 士 2F21 KANGXI RADICAL GO 5902 夂 2F22 KANGXI RADICAL GO SLOWLY 590A 夊 2F23 KANGXI RADICAL EVENING 5915 夕 2F24 KANGXI RADICAL BIG 5927 大 2F25 KANGXI RADICAL WOMAN 5973 女 2F26 KANGXI RADICAL CHILD 5B50 子 2F27 KANGXI RADICAL ROOF 5B80 宀 2F28 KANGXI RADICAL INCH 5BF8 寸 2F29 KANGXI RADICAL SMALL 5C0F 小 2F2A KANGXI RADICAL LAME 5C22 尢 2F2B KANGXI RADICAL CORPSE 5C38 尸 2F2C KANGXI RADICAL SPROUT 5C6E 屮 2F2D KANGXI RADICAL MOUNTAIN 5C71 山 2F2E KANGXI RADICAL RIVER 5DDB 巛 2F2F KANGXI RADICAL WORK 5DE5 工 2F30 KANGXI RADICAL ONESELF 5DF1 己 2F31 KANGXI RADICAL TURBAN 5DFE 巾 2F32 KANGXI RADICAL DRY 5E72 干 2F33 KANGXI RADICAL SHORT THREAD 5E7A 幺 2F34 KANGXI RADICAL DOTTED CLIFF 5E7F 广 2F35 KANGXI RADICAL LONG STRIDE 5EF4 廴 2F36 KANGXI RADICAL TWO HANDS 5EFE 廾 2F37 KANGXI RADICAL SHOOT 5F0B 弋 2F38 KANGXI RADICAL BOW 5F13 弓 2F39 KANGXI RADICAL SNOUT 5F50 彐 2F3A KANGXI RADICAL BRISTLE 5F61 彡 2F3B KANGXI RADICAL STEP 5F73 彳 2F3C KANGXI RADICAL HEART 5FC3 心
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Ideographic Description Characters Range: 2FF0–2FFF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
CJK Symbols and Punctuation Range: 3000–303F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
3000
CJK Symbols and Punctuation 300 0
3014
3024
3034
3015
3025
3035
3016
3026
3036
3017
3027
3037
3018
3028
3038
3019
3029
3039
301A
302A
301B
302B
303A
303B
301C
302C
303C
301D
302D
303D
『 〞 $ 300E
F
3033
」 $ 300D
E
3023
「 $ 300C
D
3013
》 〛 $ 300B
C
3032
《 〚 $ 卅 300A
B
3022
〉 〙 〩卄 3009
A
3012
〈 〘 〨十 3008
9
3031
〇 〗 〧 3007
8
3021
〆 〖 〦 3006
7
3011
々 〕 〥 3005
6
3030
〔 〤 3004
5
3020
〃 〓 〣 3003
4
3010
。 〒 〢 3002
3
303
、 】 〡 3001
2
302
【 〠 3000
1
301
303F
301E
302E
303E
』 $ 300F
301F
302F
303F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
259
Hiragana Range: 3040–309F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
3040
Hiragana 304
305
3050
3082
3092
3053
3063
3073
3083
3093
3054
3064
3074
3084
3094
3055
3065
3075
3085
3095
3056
3066
3076
3086
3096
3057
3067
3077
3087
3058
3068
3078
3088
3059
3069
3079
3089
3099
305A
306A
307A
308A
309A
305B
306B
307B
308B
309B
305C
306C
307C
308C
309C
305D
306D
307D
308D
309D
305E
306E
307E
308E
309E
く た は み わ 304F
262
3072
ぎ ぞ の ま ゎ ゞ 304E
F
3062
き そ ね ぽ ろ ゝ 304D
E
3052
が ぜ ぬ ぼ れ ゜ 304C
D
3091
か せ に ほ る ゛ 304B
C
3081
お ず な ぺ り $ 304A
B
3071
ぉ す ど べ ら $ 3049
A
3061
え じ と へ よ 3048
9
3051
ぇ し で ぷ ょ 3047
8
3090
う ざ て ぶ ゆ 3046
7
3080
ぅ さ づ ふ ゅ 3045
6
3070
い ご つ ぴ や ゔ 3044
5
3060
ぃ こ っ び ゃ ん 3043
4
309
あ げ ぢ ひ も を 3042
3
308
ぁ け ち ぱ め ゑ 3041
2
307
ぐ だ ば む ゐ
0
1
306
309F
305F
306F
307F
308F
309F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Katakana Range: 30A0–30FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
30A0
Katakana 30A 0
30B2
30C2
30D2
30E2
30F2
30B3
30C3
30D3
30E3
30F3
30B4
30C4
30D4
30E4
30F4
30B5
30C5
30D5
30E5
30F5
30B6
30C6
30D6
30E6
30F6
30B7
30C7
30D7
30E7
30F7
30B8
30C8
30D8
30E8
30F8
30B9
30C9
30D9
30E9
30F9
30BA
30CA
30DA
30EA
30FA
30BB
30CB
30DB
30EB
30FB
30BC
30CC
30DC
30EC
30FC
30BD
30CD
30DD
30ED
30FD
ギ ゾ ノ マ ヮ ヾ 30AE
F
30F1
キ ソ ネ ポ ロ ヽ 30AD
E
30E1
ガ ゼ ヌ ボ レ ー 30AC
D
30D1
カ セ ニ ホ ル ・ 30AB
C
30C1
オ ズ ナ ペ リ 30AA
B
30B1
ォ ス ド ベ ラ 30A9
A
30F0
エ ジ ト ヘ ヨ 30A8
9
30E0
ェ シ デ プ ョ 30A7
8
30D0
ウ ザ テ ブ ユ ヶ 30A6
7
30C0
ゥ サ ヅ フ ュ ヵ 30A5
6
30B0
イ ゴ ツ ピ ヤ ヴ 30A4
5
30F
ィ コ ッ ビ ャ ン 30A3
4
30E
ア ゲ ヂ ヒ モ ヲ 30A2
3
30D
ァ ケ チ パ メ ヱ 30A1
2
30C
グ ダ バ ム ヰ 30A0
1
30B
30FF
30BE
30CE
30DE
30EE
30FE
ク タ ハ ミ ワ 30AF
30BF
30CF
30DF
30EF
30FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
265
Bopomofo Range: 3100–312F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Hangul Compatibility Jamo Range: 3130–318F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
3130
Hangul Compatibility Jamo 313
314
3140
3162
3172
3182
3143
3153
3163
3173
3183
3144
3154
3164
3174
3184
3145
3155
3165
3175
3185
3146
3156
3166
3176
3186
3147
3157
3167
3177
3187
3148
3158
3168
3178
3188
3149
3159
3169
3179
3189
314A
315A
316A
317A
318A
314B
315B
316B
317B
318B
314C
315C
316C
317C
318C
314D
315D
316D
317D
318D
ㄾ ㅎ ㅞ ㅮ ㅾ ㆎ 313E
F
3152
ㄽ ㅍ ㅝ ㅭ ㅽ ㆍ 313D
E
3142
ㄼ ㅌ ㅜ ㅬ ㅼ ㆌ 313C
D
3181
ㄻ ㅋ ㅛ ㅫ ㅻ ㆋ 313B
C
3171
ㄺ ㅊ ㅚ ㅪ ㅺ ㆊ 313A
B
3161
ㄹ ㅉ ㅙ ㅩ ㅹ ㆉ 3139
A
3151
ㄸ ㅈ ㅘ ㅨ ㅸ ㆈ 3138
9
3141
ㄷ ㅇ ㅗ ㅧ ㅷ ㆇ 3137
8
3180
ㄶ ㅆ ㅖ ㅦ ㅶ ㆆ 3136
7
3170
ㄵ ㅅ ㅕ ㅥ ㅵ ㆅ 3135
6
3160
ㄴ ㅄ ㅔ ㅤ ㅴ ㆄ 3134
5
3150
ㄳ ㅃ ㅓ ㅣ ㅳ ㆃ 3133
4
318
ㄲ ㅂ ㅒ ㅢ ㅲ ㆂ 3132
3
317
ㄱ ㅁ ㅑ ㅡ ㅱ ㆁ 3131
2
316
ㅀ ㅐ ㅠ ㅰ ㆀ
0
1
315
318F
314E
315E
316E
317E
318E
ㄿ ㅏ ㅟ ㅯ ㅿ 313F
314F
315F
316F
317F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
269
Kanbun Range: 3190–319F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Bopomofo Extended Range: 31A0–31BF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
CJK Strokes Range: 31C0–31EF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Katakana Phonetic Extensions Range: 31F0–31FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Enclosed CJK Letters and Months Range: 3200–32FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
3200 320
3200
322
323
324
325
326
327
328
329
32A 32B 32C 32D 32E
32F
3210
3220
3230
3240
3250
3260
3270
3280
3290
32A0
32B0
32C0
32D0
32E0
32F0
㈁㈑㈡㈱㉁ ㉡㉱㊁㊑㊡ ㋁㋑㋡㋱
1
3201
3211
3221
3231
3241
3251
3261
3271
3281
3291
32A1
32B1
32C1
32D1
32E1
32F1
㈂㈒㈢㈲㉂ ㉢㉲㊂㊒㊢ ㋂㋒㋢㋲
2
3202
3212
3222
3232
3242
3252
3262
3272
3282
3292
32A2
32B2
32C2
32D2
32E2
32F2
㈃㈓㈣㈳㉃ ㉣㉳㊃㊓㊣ ㋃㋓㋣㋳
3
3203
4
5
3293
32A3
32B3
32C3
32D3
32E3
32F3
㈈㈘㈨㈸
㉨㉸㊈㊘㊨ ㋈㋘㋨㋸
㈉㈙㈩㈹
㉩㉹㊉㊙㊩ ㋉㋙㋩㋹
㈊㈚㈪㈺
㉪㉺㊊㊚㊪ ㋊㋚㋪㋺
㈋㈛㈫㈻
㉫㉻㊋㊛㊫ ㋋㋛㋫㋻
㈌㈜㈬㈼
㉬㉼㊌㊜㊬ !㋜㋬㋼
㈍㈝㈭㈽
㉭㉽㊍㊝㊭ "㋝㋭㋽
㈎㈞㈮㈾
㉮㉾㊎㊞㊮ #㋞㋮㋾
320C
D
3283
㉧㉷㊇㊗㊧ ㋇㋗㋧㋷
320B
C
3273
㈇㈗㈧㈷
320A
B
3263
㉦㉶㊆㊖㊦ ㋆㋖㋦㋶
3209
A
3253
㈆㈖㈦㈶
3208
9
3243
㉥㉵㊅㊕㊥ ㋅㋕㋥㋵
3207
8
3233
㈅㈕㈥㈵ 3206
7
3223
㉤㉴㊄㊔㊤ ㋄㋔㋤㋴
3205
6
3213
㈄㈔㈤㈴ 3204
320D
320E
F
321
32FF
㈀㈐㈠㈰㉀㉠㉰㊀㊐㊠㊰㋀㋐㋠㋰
0
E
Enclosed CJK Letters and Months
㈏ 320F
276
3214
3215
3216
3217
3218
3219
321A
321B
321C
321D
321E
3224
3225
3226
3227
3228
3229
322A
322B
322C
322D
322E
3234
3235
3236
3237
3238
3239
323A
323B
323C
323D
323E
㈯㈿ 322F
323F
3254
3255
3256
3257
3258
3259
325A
325B
325C
325D
325E
3264
3265
3266
3267
3268
3269
326A
326B
326C
326D
326E
3274
3275
3276
3277
3278
3279
327A
327B
327C
327D
327E
3284
3285
3286
3287
3288
3289
328A
328B
328C
328D
328E
3294
3295
3296
3297
3298
3299
329A
329B
329C
329D
329E
32A4
32A5
32A6
32A7
32A8
32A9
32AA
32AB
32AC
32AD
32AE
㉯㉿㊏㊟㊯ 325F
326F
327F
328F
329F
32AF
32B4
32B5
32B6
32B7
32B8
32B9
32BA
32BB
32BC
32BD
32BE
32BF
32C4
32C5
32C6
32C7
32C8
32C9
32CA
32CB
32CC
32CD
32CE
32D4
32D5
32D6
32D7
32D8
32D9
32DA
32DB
32DC
32DD
32DE
32E4
32E5
32E6
32E7
32E8
32E9
32EA
32EB
32EC
32ED
32EE
32F4
32F5
32F6
32F7
32F8
32F9
32FA
32FB
32FC
32FD
32FE
$㋟㋯ 32CF
32DF
32EF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
3200
Enclosed CJK Letters and Months
Parenthesized Hangul elements 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 320A 320B 320C 320D
㈀ PARENTHESIZED HANGUL KIYEOK 0028 ( 1100 ᄀ0029 ) ㈁ PARENTHESIZED HANGUL NIEUN 0028 ( 1102 ᄂ0029 ) ㈂ PARENTHESIZED HANGUL TIKEUT 0028 ( 1103 ᄃ0029 ) ㈃ PARENTHESIZED HANGUL RIEUL 0028 ( 1105 ᄅ0029 ) ㈄ PARENTHESIZED HANGUL MIEUM 0028 ( 1106 ᄆ0029 ) ㈅ PARENTHESIZED HANGUL PIEUP 0028 ( 1107 ᄇ0029 ) ㈆ PARENTHESIZED HANGUL SIOS 0028 ( 1109 ᄉ0029 ) ㈇ PARENTHESIZED HANGUL IEUNG 0028 ( 110B ᄋ0029 ) ㈈ PARENTHESIZED HANGUL CIEUC 0028 ( 110C ᄌ0029 ) ㈉ PARENTHESIZED HANGUL CHIEUCH 0028 ( 110E ᄎ0029 ) ㈊ PARENTHESIZED HANGUL KHIEUKH 0028 ( 110F ᄏ0029 ) ㈋ PARENTHESIZED HANGUL THIEUTH 0028 ( 1110 ᄐ0029 ) ㈌ PARENTHESIZED HANGUL PHIEUPH 0028 ( 1111 ᄑ0029 ) ㈍ PARENTHESIZED HANGUL HIEUH 0028 ( 1112 ᄒ0029 )
Parenthesized Hangul syllables
320E ㈎ PARENTHESIZED HANGUL KIYEOK A 0028 ( 1100 ᄀ1161 ᅡ0029 ) 320F ㈏ PARENTHESIZED HANGUL NIEUN A 0028 ( 1102 ᄂ1161 ᅡ0029 ) 3210 ㈐ PARENTHESIZED HANGUL TIKEUT A 0028 ( 1103 ᄃ1161 ᅡ0029 ) 3211 ㈑ PARENTHESIZED HANGUL RIEUL A 0028 ( 1105 ᄅ1161 ᅡ0029 ) 3212 ㈒ PARENTHESIZED HANGUL MIEUM A 0028 ( 1106 ᄆ1161 ᅡ0029 ) 3213 ㈓ PARENTHESIZED HANGUL PIEUP A 0028 ( 1107 ᄇ1161 ᅡ0029 ) 3214 ㈔ PARENTHESIZED HANGUL SIOS A 0028 ( 1109 ᄉ1161 ᅡ0029 ) 3215 ㈕ PARENTHESIZED HANGUL IEUNG A 0028 ( 110B ᄋ1161 ᅡ0029 ) 3216 ㈖ PARENTHESIZED HANGUL CIEUC A 0028 ( 110C ᄌ1161 ᅡ0029 ) 3217 ㈗ PARENTHESIZED HANGUL CHIEUCH A 0028 ( 110E ᄎ1161 ᅡ0029 ) 3218 ㈘ PARENTHESIZED HANGUL KHIEUKH A 0028 ( 110F ᄏ1161 ᅡ0029 ) 3219 ㈙ PARENTHESIZED HANGUL THIEUTH A 0028 ( 1110 ᄐ1161 ᅡ0029 ) 321A ㈚ PARENTHESIZED HANGUL PHIEUPH A 0028 ( 1111 ᄑ1161 ᅡ0029 ) 321B ㈛ PARENTHESIZED HANGUL HIEUH A 0028 ( 1112 ᄒ1161 ᅡ0029 ) 321C ㈜ PARENTHESIZED HANGUL CIEUC U 0028 ( 110C ᄌ116E ᅮ0029 )
3235
Parenthesized Korean words
321D ㈝ PARENTHESIZED KOREAN CHARACTER OJEON 0028 ( 110B ᄋ1169 ᅩ110C ᄌ1165 ᅥ 11AB ᆫ0029 ) 321E ㈞ PARENTHESIZED KOREAN CHARACTER O HU 0028 ( 110B ᄋ1169 ᅩ1112 ᄒ116E ᅮ0029 )
Parenthesized ideographs 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 322A 322B 322C 322D 322E 322F 3230 3231 3232 3233 3234 3235
㈠ PARENTHESIZED IDEOGRAPH ONE 0028 ( 4E00 一 0029 ) ㈡ PARENTHESIZED IDEOGRAPH TWO 0028 ( 4E8C 二 0029 ) ㈢ PARENTHESIZED IDEOGRAPH THREE 0028 ( 4E09 三 0029 ) ㈣ PARENTHESIZED IDEOGRAPH FOUR 0028 ( 56DB 四 0029 ) ㈤ PARENTHESIZED IDEOGRAPH FIVE 0028 ( 4E94 五 0029 ) ㈥ PARENTHESIZED IDEOGRAPH SIX 0028 ( 516D 六 0029 ) ㈦ PARENTHESIZED IDEOGRAPH SEVEN 0028 ( 4E03 七 0029 ) ㈧ PARENTHESIZED IDEOGRAPH EIGHT 0028 ( 516B 八 0029 ) ㈨ PARENTHESIZED IDEOGRAPH NINE 0028 ( 4E5D 九 0029 ) ㈩ PARENTHESIZED IDEOGRAPH TEN 0028 ( 5341 十 0029 ) ㈪ PARENTHESIZED IDEOGRAPH MOON • Monday 0028 ( 6708 月 0029 ) ㈫ PARENTHESIZED IDEOGRAPH FIRE • Tuesday 0028 ( 706B 火 0029 ) ㈬ PARENTHESIZED IDEOGRAPH WATER • Wednesday 0028 ( 6C34 水 0029 ) ㈭ PARENTHESIZED IDEOGRAPH WOOD • Thursday 0028 ( 6728 木 0029 ) ㈮ PARENTHESIZED IDEOGRAPH METAL • Friday 0028 ( 91D1 金 0029 ) ㈯ PARENTHESIZED IDEOGRAPH EARTH • Saturday 0028 ( 571F 土 0029 ) ㈰ PARENTHESIZED IDEOGRAPH SUN • Sunday 0028 ( 65E5 日 0029 ) ㈱ PARENTHESIZED IDEOGRAPH STOCK • incorporated 0028 ( 682A 株 0029 ) ㈲ PARENTHESIZED IDEOGRAPH HAVE • limited 0028 ( 6709 有 0029 ) ㈳ PARENTHESIZED IDEOGRAPH SOCIETY • company 0028 ( 793E 社 0029 ) ㈴ PARENTHESIZED IDEOGRAPH NAME 0028 ( 540D 名 0029 ) ㈵ PARENTHESIZED IDEOGRAPH SPECIAL 0028 ( 7279 特 0029 )
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
277
CJK Compatibility Range: 3300–33FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
3300 330
3300
332
333
334
335
336
337
338
339
33A 33B 33C 33D 33E
33F
3310
3320
3330
3340
3350
3360
3370
3380
3390
33A0
33B0
33C0
33D0
33E0
33F0
㌁㌑㌡㌱㍁㍑㍡㍱㎁㎑㎡㎱㏁㏑㏡㏱
1
3301
3311
3321
3331
3341
3351
3361
3371
3381
3391
33A1
33B1
33C1
33D1
33E1
33F1
㌂㌒㌢㌲㍂㍒㍢㍲㎂㎒㎢㎲㏂㏒㏢㏲
2
3302
3312
3322
3332
3342
3352
3362
3372
3382
3392
33A2
33B2
33C2
33D2
33E2
33F2
㌃㌓㌣㌳㍃㍓㍣㍳㎃㎓㎣㎳㏃㏓㏣㏳
3
3303
3313
3323
3333
3343
3353
3363
3373
3383
3393
33A3
33B3
33C3
33D3
33E3
33F3
㌄㌔㌤㌴㍄㍔㍤㍴㎄㎔㎤㎴㏄㏔㏤㏴
4
3304
3314
3324
3334
3344
3354
3364
3374
3384
3394
33A4
33B4
33C4
33D4
33E4
33F4
㌅㌕㌥㌵㍅㍕㍥㍵㎅㎕㎥㎵㏅㏕㏥㏵
5
3305
3315
3325
3335
3345
3355
3365
3375
3385
3395
33A5
33B5
33C5
33D5
33E5
33F5
㌆㌖㌦㌶㍆㍖㍦㍶㎆㎦㎶㏆㏖㏦㏶
6
3306
3316
3326
3336
3346
3356
3366
3376
3386
3396
33A6
33B6
33C6
33D6
33E6
33F6
㌇㌗㌧㌷㍇㍗㍧㎇㎗㎧㎷㏇㏗㏧㏷
7
3307
3317
3327
3337
3347
3357
3367
3377
3387
3397
33A7
33B7
33C7
33D7
33E7
33F7
㌈㌘㌨㌸㍈㍘㍨㎈㎘㎨㎸㏈㏘㏨㏸
8
3308
3318
3328
3338
3348
3358
3368
3378
3388
3398
33A8
33B8
33C8
33D8
33E8
33F8
㌉㌙㌩㌹㍉㍙㍩㎉㎙㎩㎹㏉㏙㏩㏹
9
3309
3319
3329
3339
3349
3359
3369
3379
3389
3399
33A9
33B9
33C9
33D9
33E9
33F9
㌊㌚㌪㌺㍊㍚㍪㎊㎚㎪㎺㏊㏚㏪㏺
A
330A
331A
332A
333A
334A
335A
336A
337A
338A
339A
33AA
33BA
33CA
33DA
33EA
33FA
㌋㌛㌫㌻㍋㍛㍫㍻㎋㎛㎫㎻㏋㏛㏫㏻
B
330B
C
D
331B
332B
333B
334B
335B
336B
337B
338B
339B
33AB
33BB
33CB
33DB
33EB
33FB
㌌㌜㌬㌼㍌㍜㍬㍼㎌㎜㎼㏌㏜㏬㏼ 330C
331C
332C
333C
334C
335C
336C
337C
338C
339C
33AC
33BC
33CC
33DC
33EC
33FC
㌍㌝㌭㌽㍍㍝㍭㍽㎍㎝㎭㎽㏍㏝㏭㏽ 330D
331D
332D
333D
334D
335D
336D
337D
338D
339D
33AD
33BD
33CD
33DD
33ED
33FD
㌎㌞㌮㌾㍎㍞㍮㍾㎎㎞㎮㎾㏎㏮㏾ 330E
F
331
33FF
㌀㌐㌠㌰㍀㍐㍠㍰㎀㎐㎠㎰㏀㏐㏠㏰
0
E
CJK Compatibility
331E
332E
333E
334E
335E
336E
337E
338E
339E
33AE
33BE
33CE
33DE
33EE
33FE
㌏㌟㌯㌿㍏㍟㍯㍿㎏㎟㎯㎿㏏ ㏯
330F
282
331F
332F
333F
334F
335F
336F
337F
338F
339F
33AF
33BF
33CF
33DF
33EF
33FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
3300 Squared Katakana words 3300
3301
3302
3303 3304
3305 3306 3307
3308
3309 330A 330B 330C
330D
330E 330F 3310 3311
CJK Compatibility
㌀ SQUARE APAATO • apartment <square> 30A2 ア 30D1 パ 30FC ー 30C8 ト ㌁ SQUARE ARUHUA • alpha <square> 30A2 ア 30EB ル 30D5 フ 30A1 ァ ㌂ SQUARE ANPEA • ampere <square> 30A2 ア 30F3 ン 30DA ペ 30A2 ア ㌃ SQUARE AARU • are (unit of area) <square> 30A2 ア 30FC ー 30EB ル ㌄ SQUARE ININGU • inning <square> 30A4 イ 30CB ニ 30F3 ン 30B0 グ ㌅ SQUARE INTI • inch <square> 30A4 イ 30F3 ン 30C1 チ ㌆ SQUARE UON • won (Korean currency) <square> 30A6 ウ 30A9 ォ 30F3 ン ㌇ SQUARE ESUKUUDO • escudo (Portuguese currency) <square> 30A8 エ 30B9 ス 30AF ク 30FC ー 30C9 ド ㌈ SQUARE EEKAA • acre <square> 30A8 エ 30FC ー 30AB カ 30FC ー ㌉ SQUARE ONSU • ounce <square> 30AA オ 30F3 ン 30B9 ス ㌊ SQUARE OOMU • ohm <square> 30AA オ 30FC ー 30E0 ム ㌋ SQUARE KAIRI • kai-ri: nautical mile <square> 30AB カ 30A4 イ 30EA リ ㌌ SQUARE KARATTO • carat <square> 30AB カ 30E9 ラ 30C3 ッ 30C8 ト ㌍ SQUARE KARORII • calorie <square> 30AB カ 30ED ロ 30EA リ 30FC ー ㌎ SQUARE GARON • gallon <square> 30AC ガ 30ED ロ 30F3 ン ㌏ SQUARE GANMA • gamma <square> 30AC ガ 30F3 ン 30DE マ ㌐ SQUARE GIGA • giga <square> 30AE ギ 30AC ガ ㌑ SQUARE GINII • guinea <square> 30AE ギ 30CB ニ 30FC ー
3312
3313
3314 3315
3316
3317
3318 3319
331A
331B
331C 331D 331E 331F
3320
3321
3322 3323
3323
㌒ SQUARE KYURII • curie <square> 30AD キ 30E5 ュ 30EA リ 30FC ー ㌓ SQUARE GIRUDAA • guilder <square> 30AE ギ 30EB ル 30C0 ダ 30FC ー ㌔ SQUARE KIRO • kilo <square> 30AD キ 30ED ロ ㌕ SQUARE KIROGURAMU • kilogram <square> 30AD キ 30ED ロ 30B0 グ 30E9 ラ 30E0 ム ㌖ SQUARE KIROMEETORU • kilometer <square> 30AD キ 30ED ロ 30E1 メ 30FC ー 30C8 ト 30EB ル ㌗ SQUARE KIROWATTO • kilowatt <square> 30AD キ 30ED ロ 30EF ワ 30C3 ッ 30C8 ト ㌘ SQUARE GURAMU • gram <square> 30B0 グ 30E9 ラ 30E0 ム ㌙ SQUARE GURAMUTON • gram ton <square> 30B0 グ 30E9 ラ 30E0 ム 30C8 ト 30F3 ン ㌚ SQUARE KURUZEIRO • cruzeiro (Brazilian currency) <square> 30AF ク 30EB ル 30BC ゼ 30A4 イ 30ED ロ ㌛ SQUARE KUROONE • krone <square> 30AF ク 30ED ロ 30FC ー 30CD ネ ㌜ SQUARE KEESU • case <square> 30B1 ケ 30FC ー 30B9 ス ㌝ SQUARE KORUNA • koruna (Czech currency) <square> 30B3 コ 30EB ル 30CA ナ ㌞ SQUARE KOOPO • co-op <square> 30B3 コ 30FC ー 30DD ポ ㌟ SQUARE SAIKURU • cycle <square> 30B5 サ 30A4 イ 30AF ク 30EB ル ㌠ SQUARE SANTIIMU • centime <square> 30B5 サ 30F3 ン 30C1 チ 30FC ー 30E0 ム ㌡ SQUARE SIRINGU • shilling <square> 30B7 シ 30EA リ 30F3 ン 30B0 グ ㌢ SQUARE SENTI • centi <square> 30BB セ 30F3 ン 30C1 チ ㌣ SQUARE SENTO • cent <square> 30BB セ 30F3 ン 30C8 ト
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
283
3324 3324 3325 3326 3327 3328 3329 332A 332B
332C 332D
332E
332F 3330 3331 3332
3333
3334
3335 3336
284
CJK Compatibility
㌤ SQUARE DAASU • dozen <square> 30C0 ダ 30FC ー 30B9 ス ㌥ SQUARE DESI • deci <square> 30C7 デ 30B7 シ ㌦ SQUARE DORU • dollar <square> 30C9 ド 30EB ル ㌧ SQUARE TON • ton <square> 30C8 ト 30F3 ン ㌨ SQUARE NANO • nano <square> 30CA ナ 30CE ノ ㌩ SQUARE NOTTO • knot, nautical mile <square> 30CE ノ 30C3 ッ 30C8 ト ㌪ SQUARE HAITU • heights <square> 30CF ハ 30A4 イ 30C4 ツ ㌫ SQUARE PAASENTO • percent <square> 30D1 パ 30FC ー 30BB セ 30F3 ン 30C8 ト ㌬ SQUARE PAATU • parts <square> 30D1 パ 30FC ー 30C4 ツ ㌭ SQUARE BAARERU • barrel <square> 30D0 バ 30FC ー 30EC レ 30EB ル ㌮ SQUARE PIASUTORU • piaster <square> 30D4 ピ 30A2 ア 30B9 ス 30C8 ト 30EB ル ㌯ SQUARE PIKURU • picul (unit of weight) <square> 30D4 ピ 30AF ク 30EB ル ㌰ SQUARE PIKO • pico <square> 30D4 ピ 30B3 コ ㌱ SQUARE BIRU • building <square> 30D3 ビ 30EB ル ㌲ SQUARE HUARADDO • farad <square> 30D5 フ 30A1 ァ 30E9 ラ 30C3 ッ 30C9 ド ㌳ SQUARE HUIITO • feet <square> 30D5 フ 30A3 ィ 30FC ー 30C8 ト ㌴ SQUARE BUSSYERU • bushel <square> 30D6 ブ 30C3 ッ 30B7 シ 30A7 ェ 30EB ル ㌵ SQUARE HURAN • franc <square> 30D5 フ 30E9 ラ 30F3 ン ㌶ SQUARE HEKUTAARU • hectare <square> 30D8 ヘ 30AF ク 30BF タ 30FC ー 30EB ル
3337 3338 3339 333A 333B 333C 333D
333E 333F 3340 3341 3342 3343
3344 3345 3346 3347
3348
3349 334A
334A
㌷ SQUARE PESO • peso <square> 30DA ペ 30BD ソ ㌸ SQUARE PENIHI • pfennig <square> 30DA ペ 30CB ニ 30D2 ヒ ㌹ SQUARE HERUTU • hertz <square> 30D8 ヘ 30EB ル 30C4 ツ ㌺ SQUARE PENSU • pence <square> 30DA ペ 30F3 ン 30B9 ス ㌻ SQUARE PEEZI • page <square> 30DA ペ 30FC ー 30B8 ジ ㌼ SQUARE BEETA • beta <square> 30D9 ベ 30FC ー 30BF タ ㌽ SQUARE POINTO • point <square> 30DD ポ 30A4 イ 30F3 ン 30C8 ト ㌾ SQUARE BORUTO • volt, bolt <square> 30DC ボ 30EB ル 30C8 ト ㌿ SQUARE HON • hon: volume <square> 30DB ホ 30F3 ン ㍀ SQUARE PONDO • pound <square> 30DD ポ 30F3 ン 30C9 ド ㍁ SQUARE HOORU • hall <square> 30DB ホ 30FC ー 30EB ル ㍂ SQUARE HOON • horn <square> 30DB ホ 30FC ー 30F3 ン ㍃ SQUARE MAIKURO • micro <square> 30DE マ 30A4 イ 30AF ク 30ED ロ ㍄ SQUARE MAIRU • mile <square> 30DE マ 30A4 イ 30EB ル ㍅ SQUARE MAHHA • mach <square> 30DE マ 30C3 ッ 30CF ハ ㍆ SQUARE MARUKU • mark <square> 30DE マ 30EB ル 30AF ク ㍇ SQUARE MANSYON • mansion (i.e. better quality apartment) <square> 30DE マ 30F3 ン 30B7 シ 30E7 ョ 30F3 ン ㍈ SQUARE MIKURON • micron <square> 30DF ミ 30AF ク 30ED ロ 30F3 ン ㍉ SQUARE MIRI • milli <square> 30DF ミ 30EA リ ㍊ SQUARE MIRIBAARU • millibar <square> 30DF ミ 30EA リ 30D0 バ 30FC ー 30EB ル
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
CJK Unified Ideographs Extension A Range: 3400–4DBF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
3400 340
3400
342
343
344
345
346
347
348
349
34A 34B 34C 34D 34E
34F
3410
3420
3430
3440
3450
3460
3470
3480
3490
34A0
34B0
34C0
34D0
34E0
34F0
㐁㐑㐡㐱㑁㑑㑡㑱㒁㒑㒡㒱㓁㓑㓡㓱
1
3401
3411
3421
3431
3441
3451
3461
3471
3481
3491
34A1
34B1
34C1
34D1
34E1
34F1
㐂㐒㐢㐲㑂㑒㑢㑲㒂㒒㒢㒲㓂㓒㓢㓲
2
3402
3412
3422
3432
3442
3452
3462
3472
3482
3492
34A2
34B2
34C2
34D2
34E2
34F2
㐃㐓㐣㐳㑃㑓㑣㑳㒃㒓㒣㒳㓃㓓㓣㓳
3
3403
3413
3423
3433
3443
3453
3463
3473
3483
3493
34A3
34B3
34C3
34D3
34E3
34F3
㐄㐔㐤㐴㑄㑔㑤㑴㒄㒔㒤㒴㓄㓔㓤㓴
4
3404
3414
3424
3434
3444
3454
3464
3474
3484
3494
34A4
34B4
34C4
34D4
34E4
34F4
㐅㐕㐥㐵㑅㑕㑥㑵㒅㒕㒥㒵㓅㓕㓥㓵
5
3405
3415
3425
3435
3445
3455
3465
3475
3485
3495
34A5
34B5
34C5
34D5
34E5
34F5
㐆㐖㐦㐶㑆㑖㑦㑶㒆㒖㒦㒶㓆㓖㓦㓶
6
3406
3416
3426
3436
3446
3456
3466
3476
3486
3496
34A6
34B6
34C6
34D6
34E6
34F6
㐇㐗㐧㐷㑇㑗㑧㑷㒇㒗㒧㒷㓇㓗㓧㓷
7
3407
3417
3427
3437
3447
3457
3467
3477
3487
3497
34A7
34B7
34C7
34D7
34E7
34F7
㐈㐘㐨㐸㑈㑘㑨㑸㒈㒘㒨㒸㓈㓘㓨㓸
8
3408
3418
3428
3438
3448
3458
3468
3478
3488
3498
34A8
34B8
34C8
34D8
34E8
34F8
㐉㐙㐩㐹㑉㑙㑩㑹㒉㒙㒩㒹㓉㓙㓩㓹
9
3409
3419
3429
3439
3449
3459
3469
3479
3489
3499
34A9
34B9
34C9
34D9
34E9
34F9
㐊㐚㐪㐺㑊㑚㑪㑺㒊㒚㒪㒺㓊㓚㓪㓺
A
340A
341A
342A
343A
344A
345A
346A
347A
348A
349A
34AA
34BA
34CA
34DA
34EA
34FA
㐋㐛㐫㐻㑋㑛㑫㑻㒋㒛㒫㒻㓋㓛㓫㓻
B
340B
C
D
341B
342B
343B
344B
345B
346B
347B
348B
349B
34AB
34BB
34CB
34DB
34EB
34FB
㐌㐜㐬㐼㑌㑜㑬㑼㒌㒜㒬㒼㓌㓜㓬㓼 340C
341C
342C
343C
344C
345C
346C
347C
348C
349C
34AC
34BC
34CC
34DC
34EC
34FC
㐍㐝㐭㐽㑍㑝㑭㑽㒍㒝㒭㒽㓍㓝㓭㓽 340D
341D
342D
343D
344D
345D
346D
347D
348D
349D
34AD
34BD
34CD
34DD
34ED
34FD
㐎㐞㐮㐾㑎㑞㑮㑾㒎㒞㒮㒾㓎㓞㓮㓾 340E
F
341
34FF
㐀㐐㐠㐰㑀㑐㑠㑰㒀㒐㒠㒰㓀㓐㓠㓰
0
E
CJK Unified Ideographs Extension A
341E
342E
343E
344E
345E
346E
347E
348E
349E
34AE
34BE
34CE
34DE
34EE
34FE
㐏㐟㐯㐿㑏㑟㑯㑿㒏㒟㒯㒿㓏㓟㓯㓿 340F
341F
342F
343F
344F
345F
346F
347F
348F
349F
34AF
34BF
34CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
34DF
34EF
34FF
289
3500 350
3500
352
353
354
355
356
357
358
359
35A 35B 35C 35D 35E
35F
3510
3520
3530
3540
3550
3560
3570
3580
3590
35A0
35B0
35C0
35D0
35E0
35F0
㔁㔑㔡㔱㕁㕑㕡㕱㖁㖑㖡㖱㗁㗑㗡㗱
1
3501
3511
3521
3531
3541
3551
3561
3571
3581
3591
35A1
35B1
35C1
35D1
35E1
35F1
㔂㔒㔢㔲㕂㕒㕢㕲㖂㖒㖢㖲㗂㗒㗢㗲
2
3502
3512
3522
3532
3542
3552
3562
3572
3582
3592
35A2
35B2
35C2
35D2
35E2
35F2
㔃㔓㔣㔳㕃㕓㕣㕳㖃㖓㖣㖳㗃㗓㗣㗳
3
3503
3513
3523
3533
3543
3553
3563
3573
3583
3593
35A3
35B3
35C3
35D3
35E3
35F3
㔄㔔㔤㔴㕄㕔㕤㕴㖄㖔㖤㖴㗄㗔㗤㗴
4
3504
3514
3524
3534
3544
3554
3564
3574
3584
3594
35A4
35B4
35C4
35D4
35E4
35F4
㔅㔕㔥㔵㕅㕕㕥㕵㖅㖕㖥㖵㗅㗕㗥㗵
5
3505
3515
3525
3535
3545
3555
3565
3575
3585
3595
35A5
35B5
35C5
35D5
35E5
35F5
㔆㔖㔦㔶㕆㕖㕦㕶㖆㖖㖦㖶㗆㗖㗦㗶
6
3506
3516
3526
3536
3546
3556
3566
3576
3586
3596
35A6
35B6
35C6
35D6
35E6
35F6
㔇㔗㔧㔷㕇㕗㕧㕷㖇㖗㖧㖷㗇㗗㗧㗷
7
3507
3517
3527
3537
3547
3557
3567
3577
3587
3597
35A7
35B7
35C7
35D7
35E7
35F7
㔈㔘㔨㔸㕈㕘㕨㕸㖈㖘㖨㖸㗈㗘㗨㗸
8
3508
3518
3528
3538
3548
3558
3568
3578
3588
3598
35A8
35B8
35C8
35D8
35E8
35F8
㔉㔙㔩㔹㕉㕙㕩㕹㖉㖙㖩㖹㗉㗙㗩㗹
9
3509
3519
3529
3539
3549
3559
3569
3579
3589
3599
35A9
35B9
35C9
35D9
35E9
35F9
㔊㔚㔪㔺㕊㕚㕪㕺㖊㖚㖪㖺㗊㗚㗪㗺
A
350A
351A
352A
353A
354A
355A
356A
357A
358A
359A
35AA
35BA
35CA
35DA
35EA
35FA
㔋㔛㔫㔻㕋㕛㕫㕻㖋㖛㖫㖻㗋㗛㗫㗻
B
350B
C
D
351B
352B
353B
354B
355B
356B
357B
358B
359B
35AB
35BB
35CB
35DB
35EB
35FB
㔌㔜㔬㔼㕌㕜㕬㕼㖌㖜㖬㖼㗌㗜㗬㗼 350C
351C
352C
353C
354C
355C
356C
357C
358C
359C
35AC
35BC
35CC
35DC
35EC
35FC
㔍㔝㔭㔽㕍㕝㕭㕽㖍㖝㖭㖽㗍㗝㗭㗽 350D
351D
352D
353D
354D
355D
356D
357D
358D
359D
35AD
35BD
35CD
35DD
35ED
35FD
㔎㔞㔮㔾㕎㕞㕮㕾㖎㖞㖮㖾㗎㗞㗮㗾 350E
F
351
35FF
㔀㔐㔠㔰㕀㕐㕠㕰㖀㖐㖠㖰㗀㗐㗠㗰
0
E
CJK Unified Ideographs Extension A
351E
352E
353E
354E
355E
356E
357E
358E
359E
35AE
35BE
35CE
35DE
35EE
35FE
㔏㔟㔯㔿㕏㕟㕯㕿㖏㖟㖯㖿㗏㗟㗯㗿 350F
290
351F
352F
353F
354F
355F
356F
357F
358F
359F
35AF
35BF
35CF
35DF
35EF
35FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
3600 360
3600
362
363
364
365
366
367
368
369
36A 36B 36C 36D 36E
36F
3610
3620
3630
3640
3650
3660
3670
3680
3690
36A0
36B0
36C0
36D0
36E0
36F0
㘁㘑㘡㘱㙁㙑㙡㙱㚁㚑㚡㚱㛁㛑㛡㛱
1
3601
3611
3621
3631
3641
3651
3661
3671
3681
3691
36A1
36B1
36C1
36D1
36E1
36F1
㘂㘒㘢㘲㙂㙒㙢㙲㚂㚒㚢㚲㛂㛒㛢㛲
2
3602
3612
3622
3632
3642
3652
3662
3672
3682
3692
36A2
36B2
36C2
36D2
36E2
36F2
㘃㘓㘣㘳㙃㙓㙣㙳㚃㚓㚣㚳㛃㛓㛣㛳
3
3603
3613
3623
3633
3643
3653
3663
3673
3683
3693
36A3
36B3
36C3
36D3
36E3
36F3
㘄㘔㘤㘴㙄㙔㙤㙴㚄㚔㚤㚴㛄㛔㛤㛴
4
3604
3614
3624
3634
3644
3654
3664
3674
3684
3694
36A4
36B4
36C4
36D4
36E4
36F4
㘅㘕㘥㘵㙅㙕㙥㙵㚅㚕㚥㚵㛅㛕㛥㛵
5
3605
3615
3625
3635
3645
3655
3665
3675
3685
3695
36A5
36B5
36C5
36D5
36E5
36F5
㘆㘖㘦㘶㙆㙖㙦㙶㚆㚖㚦㚶㛆㛖㛦㛶
6
3606
3616
3626
3636
3646
3656
3666
3676
3686
3696
36A6
36B6
36C6
36D6
36E6
36F6
㘇㘗㘧㘷㙇㙗㙧㙷㚇㚗㚧㚷㛇㛗㛧㛷
7
3607
3617
3627
3637
3647
3657
3667
3677
3687
3697
36A7
36B7
36C7
36D7
36E7
36F7
㘈㘘㘨㘸㙈㙘㙨㙸㚈㚘㚨㚸㛈㛘㛨㛸
8
3608
3618
3628
3638
3648
3658
3668
3678
3688
3698
36A8
36B8
36C8
36D8
36E8
36F8
㘉㘙㘩㘹㙉㙙㙩㙹㚉㚙㚩㚹㛉㛙㛩㛹
9
3609
3619
3629
3639
3649
3659
3669
3679
3689
3699
36A9
36B9
36C9
36D9
36E9
36F9
㘊㘚㘪㘺㙊㙚㙪㙺㚊㚚㚪㚺㛊㛚㛪㛺
A
360A
361A
362A
363A
364A
365A
366A
367A
368A
369A
36AA
36BA
36CA
36DA
36EA
36FA
㘋㘛㘫㘻㙋㙛㙫㙻㚋㚛㚫㚻㛋㛛㛫㛻
B
360B
C
D
361B
362B
363B
364B
365B
366B
367B
368B
369B
36AB
36BB
36CB
36DB
36EB
36FB
㘌㘜㘬㘼㙌㙜㙬㙼㚌㚜㚬㚼㛌㛜㛬㛼 360C
361C
362C
363C
364C
365C
366C
367C
368C
369C
36AC
36BC
36CC
36DC
36EC
36FC
㘍㘝㘭㘽㙍㙝㙭㙽㚍㚝㚭㚽㛍㛝㛭㛽 360D
361D
362D
363D
364D
365D
366D
367D
368D
369D
36AD
36BD
36CD
36DD
36ED
36FD
㘎㘞㘮㘾㙎㙞㙮㙾㚎㚞㚮㚾㛎㛞㛮㛾 360E
F
361
36FF
㘀㘐㘠㘰㙀㙐㙠㙰㚀㚐㚠㚰㛀㛐㛠㛰
0
E
CJK Unified Ideographs Extension A
361E
362E
363E
364E
365E
366E
367E
368E
369E
36AE
36BE
36CE
36DE
36EE
36FE
㘏㘟㘯㘿㙏㙟㙯㙿㚏㚟㚯㚿㛏㛟㛯㛿 360F
361F
362F
363F
364F
365F
366F
367F
368F
369F
36AF
36BF
36CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
36DF
36EF
36FF
291
3700 370
3700
372
373
374
375
376
377
378
379
37A 37B 37C 37D 37E
37F
3710
3720
3730
3740
3750
3760
3770
3780
3790
37A0
37B0
37C0
37D0
37E0
37F0
㜁㜑㜡㜱㝁㝑㝡㝱㞁㞑㞡㞱㟁㟑㟡㟱
1
3701
3711
3721
3731
3741
3751
3761
3771
3781
3791
37A1
37B1
37C1
37D1
37E1
37F1
㜂㜒㜢㜲㝂㝒㝢㝲㞂㞒㞢㞲㟂㟒㟢㟲
2
3702
3712
3722
3732
3742
3752
3762
3772
3782
3792
37A2
37B2
37C2
37D2
37E2
37F2
㜃㜓㜣㜳㝃㝓㝣㝳㞃㞓㞣㞳㟃㟓㟣㟳
3
3703
3713
3723
3733
3743
3753
3763
3773
3783
3793
37A3
37B3
37C3
37D3
37E3
37F3
㜄㜔㜤㜴㝄㝔㝤㝴㞄㞔㞤㞴㟄㟔㟤㟴
4
3704
3714
3724
3734
3744
3754
3764
3774
3784
3794
37A4
37B4
37C4
37D4
37E4
37F4
㜅㜕㜥㜵㝅㝕㝥㝵㞅㞕㞥㞵㟅㟕㟥㟵
5
3705
3715
3725
3735
3745
3755
3765
3775
3785
3795
37A5
37B5
37C5
37D5
37E5
37F5
㜆㜖㜦㜶㝆㝖㝦㝶㞆㞖㞦㞶㟆㟖㟦㟶
6
3706
3716
3726
3736
3746
3756
3766
3776
3786
3796
37A6
37B6
37C6
37D6
37E6
37F6
㜇㜗㜧㜷㝇㝗㝧㝷㞇㞗㞧㞷㟇㟗㟧㟷
7
3707
3717
3727
3737
3747
3757
3767
3777
3787
3797
37A7
37B7
37C7
37D7
37E7
37F7
㜈㜘㜨㜸㝈㝘㝨㝸㞈㞘㞨㞸㟈㟘㟨㟸
8
3708
3718
3728
3738
3748
3758
3768
3778
3788
3798
37A8
37B8
37C8
37D8
37E8
37F8
㜉㜙㜩㜹㝉㝙㝩㝹㞉㞙㞩㞹㟉㟙㟩㟹
9
3709
3719
3729
3739
3749
3759
3769
3779
3789
3799
37A9
37B9
37C9
37D9
37E9
37F9
㜊㜚㜪㜺㝊㝚㝪㝺㞊㞚㞪㞺㟊㟚㟪㟺
A
370A
371A
372A
373A
374A
375A
376A
377A
378A
379A
37AA
37BA
37CA
37DA
37EA
37FA
㜋㜛㜫㜻㝋㝛㝫㝻㞋㞛㞫㞻㟋㟛㟫㟻
B
370B
C
D
371B
372B
373B
374B
375B
376B
377B
378B
379B
37AB
37BB
37CB
37DB
37EB
37FB
㜌㜜㜬㜼㝌㝜㝬㝼㞌㞜㞬㞼㟌㟜㟬㟼 370C
371C
372C
373C
374C
375C
376C
377C
378C
379C
37AC
37BC
37CC
37DC
37EC
37FC
㜍㜝㜭㜽㝍㝝㝭㝽㞍㞝㞭㞽㟍㟝㟭㟽 370D
371D
372D
373D
374D
375D
376D
377D
378D
379D
37AD
37BD
37CD
37DD
37ED
37FD
㜎㜞㜮㜾㝎㝞㝮㝾㞎㞞㞮㞾㟎㟞㟮㟾 370E
F
371
37FF
㜀㜐㜠㜰㝀㝐㝠㝰㞀㞐㞠㞰㟀㟐㟠㟰
0
E
CJK Unified Ideographs Extension A
371E
372E
373E
374E
375E
376E
377E
378E
379E
37AE
37BE
37CE
37DE
37EE
37FE
㜏㜟㜯㜿㝏㝟㝯㝿㞏㞟㞯㞿㟏㟟㟯㟿 370F
292
371F
372F
373F
374F
375F
376F
377F
378F
379F
37AF
37BF
37CF
37DF
37EF
37FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
3800 380
3800
382
383
384
385
386
387
388
389
38A 38B 38C 38D 38E
38F
3810
3820
3830
3840
3850
3860
3870
3880
3890
38A0
38B0
38C0
38D0
38E0
38F0
㠁㠑㠡㠱㡁㡑㡡㡱㢁㢑㢡㢱㣁㣑㣡㣱
1
3801
3811
3821
3831
3841
3851
3861
3871
3881
3891
38A1
38B1
38C1
38D1
38E1
38F1
㠂㠒㠢㠲㡂㡒㡢㡲㢂㢒㢢㢲㣂㣒㣢㣲
2
3802
3812
3822
3832
3842
3852
3862
3872
3882
3892
38A2
38B2
38C2
38D2
38E2
38F2
㠃㠓㠣㠳㡃㡓㡣㡳㢃㢓㢣㢳㣃㣓㣣㣳
3
3803
3813
3823
3833
3843
3853
3863
3873
3883
3893
38A3
38B3
38C3
38D3
38E3
38F3
㠄㠔㠤㠴㡄㡔㡤㡴㢄㢔㢤㢴㣄㣔㣤㣴
4
3804
3814
3824
3834
3844
3854
3864
3874
3884
3894
38A4
38B4
38C4
38D4
38E4
38F4
㠅㠕㠥㠵㡅㡕㡥㡵㢅㢕㢥㢵㣅㣕㣥㣵
5
3805
3815
3825
3835
3845
3855
3865
3875
3885
3895
38A5
38B5
38C5
38D5
38E5
38F5
㠆㠖㠦㠶㡆㡖㡦㡶㢆㢖㢦㢶㣆㣖㣦㣶
6
3806
3816
3826
3836
3846
3856
3866
3876
3886
3896
38A6
38B6
38C6
38D6
38E6
38F6
㠇㠗㠧㠷㡇㡗㡧㡷㢇㢗㢧㢷㣇㣗㣧㣷
7
3807
3817
3827
3837
3847
3857
3867
3877
3887
3897
38A7
38B7
38C7
38D7
38E7
38F7
㠈㠘㠨㠸㡈㡘㡨㡸㢈㢘㢨㢸㣈㣘㣨㣸
8
3808
3818
3828
3838
3848
3858
3868
3878
3888
3898
38A8
38B8
38C8
38D8
38E8
38F8
㠉㠙㠩㠹㡉㡙㡩㡹㢉㢙㢩㢹㣉㣙㣩㣹
9
3809
3819
3829
3839
3849
3859
3869
3879
3889
3899
38A9
38B9
38C9
38D9
38E9
38F9
㠊㠚㠪㠺㡊㡚㡪㡺㢊㢚㢪㢺㣊㣚㣪㣺
A
380A
381A
382A
383A
384A
385A
386A
387A
388A
389A
38AA
38BA
38CA
38DA
38EA
38FA
㠋㠛㠫㠻㡋㡛㡫㡻㢋㢛㢫㢻㣋㣛㣫㣻
B
380B
C
D
381B
382B
383B
384B
385B
386B
387B
388B
389B
38AB
38BB
38CB
38DB
38EB
38FB
㠌㠜㠬㠼㡌㡜㡬㡼㢌㢜㢬㢼㣌㣜㣬㣼 380C
381C
382C
383C
384C
385C
386C
387C
388C
389C
38AC
38BC
38CC
38DC
38EC
38FC
㠍㠝㠭㠽㡍㡝㡭㡽㢍㢝㢭㢽㣍㣝㣭㣽 380D
381D
382D
383D
384D
385D
386D
387D
388D
389D
38AD
38BD
38CD
38DD
38ED
38FD
㠎㠞㠮㠾㡎㡞㡮㡾㢎㢞㢮㢾㣎㣞㣮㣾 380E
F
381
38FF
㠀㠐㠠㠰㡀㡐㡠㡰㢀㢐㢠㢰㣀㣐㣠㣰
0
E
CJK Unified Ideographs Extension A
381E
382E
383E
384E
385E
386E
387E
388E
389E
38AE
38BE
38CE
38DE
38EE
38FE
㠏㠟㠯㠿㡏㡟㡯㡿㢏㢟㢯㢿㣏㣟㣯㣿 380F
381F
382F
383F
384F
385F
386F
387F
388F
389F
38AF
38BF
38CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
38DF
38EF
38FF
293
3900 390
3900
392
393
394
395
396
397
398
399
39A 39B 39C 39D 39E
39F
3910
3920
3930
3940
3950
3960
3970
3980
3990
39A0
39B0
39C0
39D0
39E0
39F0
㤁㤑㤡㤱㥁㥑㥡㥱㦁㦑㦡㦱㧁㧑㧡㧱
1
3901
3911
3921
3931
3941
3951
3961
3971
3981
3991
39A1
39B1
39C1
39D1
39E1
39F1
㤂㤒㤢㤲㥂㥒㥢㥲㦂㦒㦢㦲㧂㧒㧢㧲
2
3902
3912
3922
3932
3942
3952
3962
3972
3982
3992
39A2
39B2
39C2
39D2
39E2
39F2
㤃㤓㤣㤳㥃㥓㥣㥳㦃㦓㦣㦳㧃㧓㧣㧳
3
3903
3913
3923
3933
3943
3953
3963
3973
3983
3993
39A3
39B3
39C3
39D3
39E3
39F3
㤄㤔㤤㤴㥄㥔㥤㥴㦄㦔㦤㦴㧄㧔㧤㧴
4
3904
3914
3924
3934
3944
3954
3964
3974
3984
3994
39A4
39B4
39C4
39D4
39E4
39F4
㤅㤕㤥㤵㥅㥕㥥㥵㦅㦕㦥㦵㧅㧕㧥㧵
5
3905
3915
3925
3935
3945
3955
3965
3975
3985
3995
39A5
39B5
39C5
39D5
39E5
39F5
㤆㤖㤦㤶㥆㥖㥦㥶㦆㦖㦦㦶㧆㧖㧦㧶
6
3906
3916
3926
3936
3946
3956
3966
3976
3986
3996
39A6
39B6
39C6
39D6
39E6
39F6
㤇㤗㤧㤷㥇㥗㥧㥷㦇㦗㦧㦷㧇㧗㧧㧷
7
3907
3917
3927
3937
3947
3957
3967
3977
3987
3997
39A7
39B7
39C7
39D7
39E7
39F7
㤈㤘㤨㤸㥈㥘㥨㥸㦈㦘㦨㦸㧈㧘㧨㧸
8
3908
3918
3928
3938
3948
3958
3968
3978
3988
3998
39A8
39B8
39C8
39D8
39E8
39F8
㤉㤙㤩㤹㥉㥙㥩㥹㦉㦙㦩㦹㧉㧙㧩㧹
9
3909
3919
3929
3939
3949
3959
3969
3979
3989
3999
39A9
39B9
39C9
39D9
39E9
39F9
㤊㤚㤪㤺㥊㥚㥪㥺㦊㦚㦪㦺㧊㧚㧪㧺
A
390A
391A
392A
393A
394A
395A
396A
397A
398A
399A
39AA
39BA
39CA
39DA
39EA
39FA
㤋㤛㤫㤻㥋㥛㥫㥻㦋㦛㦫㦻㧋㧛㧫㧻
B
390B
C
D
391B
392B
393B
394B
395B
396B
397B
398B
399B
39AB
39BB
39CB
39DB
39EB
39FB
㤌㤜㤬㤼㥌㥜㥬㥼㦌㦜㦬㦼㧌㧜㧬㧼 390C
391C
392C
393C
394C
395C
396C
397C
398C
399C
39AC
39BC
39CC
39DC
39EC
39FC
㤍㤝㤭㤽㥍㥝㥭㥽㦍㦝㦭㦽㧍㧝㧭㧽 390D
391D
392D
393D
394D
395D
396D
397D
398D
399D
39AD
39BD
39CD
39DD
39ED
39FD
㤎㤞㤮㤾㥎㥞㥮㥾㦎㦞㦮㦾㧎㧞㧮㧾 390E
F
391
39FF
㤀㤐㤠㤰㥀㥐㥠㥰㦀㦐㦠㦰㧀㧐㧠㧰
0
E
CJK Unified Ideographs Extension A
391E
392E
393E
394E
395E
396E
397E
398E
399E
39AE
39BE
39CE
39DE
39EE
39FE
㤏㤟㤯㤿㥏㥟㥯㥿㦏㦟㦯㦿㧏㧟㧯㧿 390F
294
391F
392F
393F
394F
395F
396F
397F
398F
399F
39AF
39BF
39CF
39DF
39EF
39FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
3A00
CJK Unified Ideographs Extension A
3AFF
3A0 3A1 3A2 3A3 3A4 3A5 3A6 3A7 3A8 3A9 3AA 3AB 3AC 3AD 3AE 3AF
㨀㨐㨠㨰㩀㩐㩠㩰㪀㪐㪠㪰㫀㫐㫠㫰
0
3A00
3A01
3A30
3A40
3A50
3A60
3A70
3A80
3A90
3AA0
3AB0
3AC0
3AD0
3AE0
3AF0
3A11
3A21
3A31
3A41
3A51
3A61
3A71
3A81
3A91
3AA1
3AB1
3AC1
3AD1
3AE1
3AF1
㨂㨒㨢㨲㩂㩒㩢㩲㪂㪒㪢㪲㫂㫒㫢㫲
2
3A02
3A12
3A22
3A32
3A42
3A52
3A62
3A72
3A82
3A92
3AA2
3AB2
3AC2
3AD2
3AE2
3AF2
㨃㨓㨣㨳㩃㩓㩣㩳㪃㪓㪣㪳㫃㫓㫣㫳
3
3A03
3A13
3A23
3A33
3A43
3A53
3A63
3A73
3A83
3A93
3AA3
3AB3
3AC3
3AD3
3AE3
3AF3
㨄㨔㨤㨴㩄㩔㩤㩴㪄㪔㪤㪴㫄㫔㫤㫴
4
3A04
3A14
3A24
3A34
3A44
3A54
3A64
3A74
3A84
3A94
3AA4
3AB4
3AC4
3AD4
3AE4
3AF4
㨅㨕㨥㨵㩅㩕㩥㩵㪅㪕㪥㪵㫅㫕㫥㫵
5
3A05
3A15
3A25
3A35
3A45
3A55
3A65
3A75
3A85
3A95
3AA5
3AB5
3AC5
3AD5
3AE5
3AF5
㨆㨖㨦㨶㩆㩖㩦㩶㪆㪖㪦㪶㫆㫖㫦㫶
6
3A06
3A16
3A26
3A36
3A46
3A56
3A66
3A76
3A86
3A96
3AA6
3AB6
3AC6
3AD6
3AE6
3AF6
㨇㨗㨧㨷㩇㩗㩧㩷㪇㪗㪧㪷㫇㫗㫧㫷
7
3A07
3A17
3A27
3A37
3A47
3A57
3A67
3A77
3A87
3A97
3AA7
3AB7
3AC7
3AD7
3AE7
3AF7
㨈㨘㨨㨸㩈㩘㩨㩸㪈㪘㪨㪸㫈㫘㫨㫸
8
3A08
3A18
3A28
3A38
3A48
3A58
3A68
3A78
3A88
3A98
3AA8
3AB8
3AC8
3AD8
3AE8
3AF8
㨉㨙㨩㨹㩉㩙㩩㩹㪉㪙㪩㪹㫉㫙㫩㫹
9
3A09
3A19
3A29
3A39
3A49
3A59
3A69
3A79
3A89
3A99
3AA9
3AB9
3AC9
3AD9
3AE9
3AF9
㨊㨚㨪㨺㩊㩚㩪㩺㪊㪚㪪㪺㫊㫚㫪㫺
A
3A0A
3A1A
3A2A
3A3A
3A4A
3A5A
3A6A
3A7A
3A8A
3A9A
3AAA
3ABA
3ACA
3ADA
3AEA
3AFA
㨋㨛㨫㨻㩋㩛㩫㩻㪋㪛㪫㪻㫋㫛㫫㫻
B
3A0B
C
D
3A1B
3A2B
3A3B
3A4B
3A5B
3A6B
3A7B
3A8B
3A9B
3AAB
3ABB
3ACB
3ADB
3AEB
3AFB
㨌㨜㨬㨼㩌㩜㩬㩼㪌㪜㪬㪼㫌㫜㫬㫼 3A0C
3A1C
3A2C
3A3C
3A4C
3A5C
3A6C
3A7C
3A8C
3A9C
3AAC
3ABC
3ACC
3ADC
3AEC
3AFC
㨍㨝㨭㨽㩍㩝㩭㩽㪍㪝㪭㪽㫍㫝㫭㫽 3A0D
3A1D
3A2D
3A3D
3A4D
3A5D
3A6D
3A7D
3A8D
3A9D
3AAD
3ABD
3ACD
3ADD
3AED
3AFD
㨎㨞㨮㨾㩎㩞㩮㩾㪎㪞㪮㪾㫎㫞㫮㫾 3A0E
F
3A20
㨁㨑㨡㨱㩁㩑㩡㩱㪁㪑㪡㪱㫁㫑㫡㫱
1
E
3A10
3A1E
3A2E
3A3E
3A4E
3A5E
3A6E
3A7E
3A8E
3A9E
3AAE
3ABE
3ACE
3ADE
3AEE
3AFE
㨏㨟㨯㨿㩏㩟㩯㩿㪏㪟㪯㪿㫏㫟㫯㫿 3A0F
3A1F
3A2F
3A3F
3A4F
3A5F
3A6F
3A7F
3A8F
3A9F
3AAF
3ABF
3ACF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
3ADF
3AEF
3AFF
295
3B00
CJK Unified Ideographs Extension A
3BFF
3B0 3B1 3B2 3B3 3B4 3B5 3B6 3B7 3B8 3B9 3BA 3BB 3BC 3BD 3BE 3BF
㬀㬐㬠㬰㭀㭐㭠㭰㮀㮐㮠㮰㯀㯐㯠㯰
0
3B00
3B01
3B30
3B40
3B50
3B60
3B70
3B80
3B90
3BA0
3BB0
3BC0
3BD0
3BE0
3BF0
3B11
3B21
3B31
3B41
3B51
3B61
3B71
3B81
3B91
3BA1
3BB1
3BC1
3BD1
3BE1
3BF1
㬂㬒㬢㬲㭂㭒㭢㭲㮂㮒㮢㮲㯂㯒㯢㯲
2
3B02
3B12
3B22
3B32
3B42
3B52
3B62
3B72
3B82
3B92
3BA2
3BB2
3BC2
3BD2
3BE2
3BF2
㬃㬓㬣㬳㭃㭓㭣㭳㮃㮓㮣㮳㯃㯓㯣㯳
3
3B03
3B13
3B23
3B33
3B43
3B53
3B63
3B73
3B83
3B93
3BA3
3BB3
3BC3
3BD3
3BE3
3BF3
㬄㬔㬤㬴㭄㭔㭤㭴㮄㮔㮤㮴㯄㯔㯤㯴
4
3B04
3B14
3B24
3B34
3B44
3B54
3B64
3B74
3B84
3B94
3BA4
3BB4
3BC4
3BD4
3BE4
3BF4
㬅㬕㬥㬵㭅㭕㭥㭵㮅㮕㮥㮵㯅㯕㯥㯵
5
3B05
3B15
3B25
3B35
3B45
3B55
3B65
3B75
3B85
3B95
3BA5
3BB5
3BC5
3BD5
3BE5
3BF5
㬆㬖㬦㬶㭆㭖㭦㭶㮆㮖㮦㮶㯆㯖㯦㯶
6
3B06
3B16
3B26
3B36
3B46
3B56
3B66
3B76
3B86
3B96
3BA6
3BB6
3BC6
3BD6
3BE6
3BF6
㬇㬗㬧㬷㭇㭗㭧㭷㮇㮗㮧㮷㯇㯗㯧㯷
7
3B07
3B17
3B27
3B37
3B47
3B57
3B67
3B77
3B87
3B97
3BA7
3BB7
3BC7
3BD7
3BE7
3BF7
㬈㬘㬨㬸㭈㭘㭨㭸㮈㮘㮨㮸㯈㯘㯨㯸
8
3B08
3B18
3B28
3B38
3B48
3B58
3B68
3B78
3B88
3B98
3BA8
3BB8
3BC8
3BD8
3BE8
3BF8
㬉㬙㬩㬹㭉㭙㭩㭹㮉㮙㮩㮹㯉㯙㯩㯹
9
3B09
3B19
3B29
3B39
3B49
3B59
3B69
3B79
3B89
3B99
3BA9
3BB9
3BC9
3BD9
3BE9
3BF9
㬊㬚㬪㬺㭊㭚㭪㭺㮊㮚㮪㮺㯊㯚㯪㯺
A
3B0A
3B1A
3B2A
3B3A
3B4A
3B5A
3B6A
3B7A
3B8A
3B9A
3BAA
3BBA
3BCA
3BDA
3BEA
3BFA
㬋㬛㬫㬻㭋㭛㭫㭻㮋㮛㮫㮻㯋㯛㯫㯻
B
3B0B
C
D
3B1B
3B2B
3B3B
3B4B
3B5B
3B6B
3B7B
3B8B
3B9B
3BAB
3BBB
3BCB
3BDB
3BEB
3BFB
㬌㬜㬬㬼㭌㭜㭬㭼㮌㮜㮬㮼㯌㯜㯬㯼 3B0C
3B1C
3B2C
3B3C
3B4C
3B5C
3B6C
3B7C
3B8C
3B9C
3BAC
3BBC
3BCC
3BDC
3BEC
3BFC
㬍㬝㬭㬽㭍㭝㭭㭽㮍㮝㮭㮽㯍㯝㯭㯽 3B0D
3B1D
3B2D
3B3D
3B4D
3B5D
3B6D
3B7D
3B8D
3B9D
3BAD
3BBD
3BCD
3BDD
3BED
3BFD
㬎㬞㬮㬾㭎㭞㭮㭾㮎㮞㮮㮾㯎㯞㯮㯾 3B0E
F
3B20
㬁㬑㬡㬱㭁㭑㭡㭱㮁㮑㮡㮱㯁㯑㯡㯱
1
E
3B10
3B1E
3B2E
3B3E
3B4E
3B5E
3B6E
3B7E
3B8E
3B9E
3BAE
3BBE
3BCE
3BDE
3BEE
3BFE
㬏㬟㬯㬿㭏㭟㭯㭿㮏㮟㮯㮿㯏㯟㯯㯿 3B0F
296
3B1F
3B2F
3B3F
3B4F
3B5F
3B6F
3B7F
3B8F
3B9F
3BAF
3BBF
3BCF
3BDF
3BEF
3BFF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
3C00
CJK Unified Ideographs Extension A
3CFF
3C0 3C1 3C2 3C3 3C4 3C5 3C6 3C7 3C8 3C9 3CA 3CB 3CC 3CD 3CE 3CF
㰀㰐㰠㰰㱀㱐㱠㱰㲀㲐㲠㲰㳀㳐㳠㳰
0
3C00
3C01
3C30
3C40
3C50
3C60
3C70
3C80
3C90
3CA0
3CB0
3CC0
3CD0
3CE0
3CF0
3C11
3C21
3C31
3C41
3C51
3C61
3C71
3C81
3C91
3CA1
3CB1
3CC1
3CD1
3CE1
3CF1
㰂㰒㰢㰲㱂㱒㱢㱲㲂㲒㲢㲲㳂㳒㳢㳲
2
3C02
3C12
3C22
3C32
3C42
3C52
3C62
3C72
3C82
3C92
3CA2
3CB2
3CC2
3CD2
3CE2
3CF2
㰃㰓㰣㰳㱃㱓㱣㱳㲃㲓㲣㲳㳃㳓㳣㳳
3
3C03
3C13
3C23
3C33
3C43
3C53
3C63
3C73
3C83
3C93
3CA3
3CB3
3CC3
3CD3
3CE3
3CF3
㰄㰔㰤㰴㱄㱔㱤㱴㲄㲔㲤㲴㳄㳔㳤㳴
4
3C04
3C14
3C24
3C34
3C44
3C54
3C64
3C74
3C84
3C94
3CA4
3CB4
3CC4
3CD4
3CE4
3CF4
㰅㰕㰥㰵㱅㱕㱥㱵㲅㲕㲥㲵㳅㳕㳥㳵
5
3C05
3C15
3C25
3C35
3C45
3C55
3C65
3C75
3C85
3C95
3CA5
3CB5
3CC5
3CD5
3CE5
3CF5
㰆㰖㰦㰶㱆㱖㱦㱶㲆㲖㲦㲶㳆㳖㳦㳶
6
3C06
3C16
3C26
3C36
3C46
3C56
3C66
3C76
3C86
3C96
3CA6
3CB6
3CC6
3CD6
3CE6
3CF6
㰇㰗㰧㰷㱇㱗㱧㱷㲇㲗㲧㲷㳇㳗㳧㳷
7
3C07
3C17
3C27
3C37
3C47
3C57
3C67
3C77
3C87
3C97
3CA7
3CB7
3CC7
3CD7
3CE7
3CF7
㰈㰘㰨㰸㱈㱘㱨㱸㲈㲘㲨㲸㳈㳘㳨㳸
8
3C08
3C18
3C28
3C38
3C48
3C58
3C68
3C78
3C88
3C98
3CA8
3CB8
3CC8
3CD8
3CE8
3CF8
㰉㰙㰩㰹㱉㱙㱩㱹㲉㲙㲩㲹㳉㳙㳩㳹
9
3C09
3C19
3C29
3C39
3C49
3C59
3C69
3C79
3C89
3C99
3CA9
3CB9
3CC9
3CD9
3CE9
3CF9
㰊㰚㰪㰺㱊㱚㱪㱺㲊㲚㲪㲺㳊㳚㳪㳺
A
3C0A
3C1A
3C2A
3C3A
3C4A
3C5A
3C6A
3C7A
3C8A
3C9A
3CAA
3CBA
3CCA
3CDA
3CEA
3CFA
㰋㰛㰫㰻㱋㱛㱫㱻㲋㲛㲫㲻㳋㳛㳫㳻
B
3C0B
C
D
3C1B
3C2B
3C3B
3C4B
3C5B
3C6B
3C7B
3C8B
3C9B
3CAB
3CBB
3CCB
3CDB
3CEB
3CFB
㰌㰜㰬㰼㱌㱜㱬㱼㲌㲜㲬㲼㳌㳜㳬㳼 3C0C
3C1C
3C2C
3C3C
3C4C
3C5C
3C6C
3C7C
3C8C
3C9C
3CAC
3CBC
3CCC
3CDC
3CEC
3CFC
㰍㰝㰭㰽㱍㱝㱭㱽㲍㲝㲭㲽㳍㳝㳭㳽 3C0D
3C1D
3C2D
3C3D
3C4D
3C5D
3C6D
3C7D
3C8D
3C9D
3CAD
3CBD
3CCD
3CDD
3CED
3CFD
㰎㰞㰮㰾㱎㱞㱮㱾㲎㲞㲮㲾㳎㳞㳮㳾 3C0E
F
3C20
㰁㰑㰡㰱㱁㱑㱡㱱㲁㲑㲡㲱㳁㳑㳡㳱
1
E
3C10
3C1E
3C2E
3C3E
3C4E
3C5E
3C6E
3C7E
3C8E
3C9E
3CAE
3CBE
3CCE
3CDE
3CEE
3CFE
㰏㰟㰯㰿㱏㱟㱯㱿㲏㲟㲯㲿㳏㳟㳯㳿 3C0F
3C1F
3C2F
3C3F
3C4F
3C5F
3C6F
3C7F
3C8F
3C9F
3CAF
3CBF
3CCF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
3CDF
3CEF
3CFF
297
3D00
CJK Unified Ideographs Extension A
3DFF
3D0 3D1 3D2 3D3 3D4 3D5 3D6 3D7 3D8 3D9 3DA 3DB 3DC 3DD 3DE 3DF
㴀㴐㴠㴰㵀㵐㵠㵰㶀㶐㶠㶰㷀㷐㷠㷰
0
3D00
3D01
3D30
3D40
3D50
3D60
3D70
3D80
3D90
3DA0
3DB0
3DC0
3DD0
3DE0
3DF0
3D11
3D21
3D31
3D41
3D51
3D61
3D71
3D81
3D91
3DA1
3DB1
3DC1
3DD1
3DE1
3DF1
㴂㴒㴢㴲㵂㵒㵢㵲㶂㶒㶢㶲㷂㷒㷢㷲
2
3D02
3D12
3D22
3D32
3D42
3D52
3D62
3D72
3D82
3D92
3DA2
3DB2
3DC2
3DD2
3DE2
3DF2
㴃㴓㴣㴳㵃㵓㵣㵳㶃㶓㶣㶳㷃㷓㷣㷳
3
3D03
3D13
3D23
3D33
3D43
3D53
3D63
3D73
3D83
3D93
3DA3
3DB3
3DC3
3DD3
3DE3
3DF3
㴄㴔㴤㴴㵄㵔㵤㵴㶄㶔㶤㶴㷄㷔㷤㷴
4
3D04
3D14
3D24
3D34
3D44
3D54
3D64
3D74
3D84
3D94
3DA4
3DB4
3DC4
3DD4
3DE4
3DF4
㴅㴕㴥㴵㵅㵕㵥㵵㶅㶕㶥㶵㷅㷕㷥㷵
5
3D05
3D15
3D25
3D35
3D45
3D55
3D65
3D75
3D85
3D95
3DA5
3DB5
3DC5
3DD5
3DE5
3DF5
㴆㴖㴦㴶㵆㵖㵦㵶㶆㶖㶦㶶㷆㷖㷦㷶
6
3D06
3D16
3D26
3D36
3D46
3D56
3D66
3D76
3D86
3D96
3DA6
3DB6
3DC6
3DD6
3DE6
3DF6
㴇㴗㴧㴷㵇㵗㵧㵷㶇㶗㶧㶷㷇㷗㷧㷷
7
3D07
3D17
3D27
3D37
3D47
3D57
3D67
3D77
3D87
3D97
3DA7
3DB7
3DC7
3DD7
3DE7
3DF7
㴈㴘㴨㴸㵈㵘㵨㵸㶈㶘㶨㶸㷈㷘㷨㷸
8
3D08
3D18
3D28
3D38
3D48
3D58
3D68
3D78
3D88
3D98
3DA8
3DB8
3DC8
3DD8
3DE8
3DF8
㴉㴙㴩㴹㵉㵙㵩㵹㶉㶙㶩㶹㷉㷙㷩㷹
9
3D09
3D19
3D29
3D39
3D49
3D59
3D69
3D79
3D89
3D99
3DA9
3DB9
3DC9
3DD9
3DE9
3DF9
o㴚㴪㴺㵊㵚㵪㵺㶊㶚㶪㶺㷊㷚㷪㷺
A
3D0A
3D1A
3D2A
3D3A
3D4A
3D5A
3D6A
3D7A
3D8A
3D9A
3DAA
3DBA
3DCA
3DDA
3DEA
3DFA
㴋㴛㴫㴻㵋㵛㵫㵻㶋㶛㶫㶻㷋㷛㷫㷻
B
3D0B
C
D
3D1B
3D2B
3D3B
3D4B
3D5B
3D6B
3D7B
3D8B
3D9B
3DAB
3DBB
3DCB
3DDB
3DEB
3DFB
㴌㴜㴬㴼㵌㵜㵬㵼㶌㶜㶬㶼㷌㷜㷬㷼 3D0C
3D1C
3D2C
3D3C
3D4C
3D5C
3D6C
3D7C
3D8C
3D9C
3DAC
3DBC
3DCC
3DDC
3DEC
3DFC
㴍㴝㴭㴽㵍㵝㵭㵽㶍㶝㶭㶽㷍㷝㷭㷽 3D0D
3D1D
3D2D
3D3D
3D4D
3D5D
3D6D
3D7D
3D8D
3D9D
3DAD
3DBD
3DCD
3DDD
3DED
3DFD
㴎㴞㴮㴾㵎㵞㵮㵾㶎㶞㶮㶾㷎㷞㷮㷾 3D0E
F
3D20
㴁㴑㴡㴱㵁㵑㵡㵱㶁㶑㶡㶱㷁㷑㷡㷱
1
E
3D10
3D1E
3D2E
3D3E
3D4E
3D5E
3D6E
3D7E
3D8E
3D9E
3DAE
3DBE
3DCE
3DDE
3DEE
3DFE
㴏㴟㴯㴿㵏㵟㵯㵿㶏㶟㶯㶿㷏㷟㷯㷿 3D0F
298
3D1F
3D2F
3D3F
3D4F
3D5F
3D6F
3D7F
3D8F
3D9F
3DAF
3DBF
3DCF
3DDF
3DEF
3DFF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
3E00
CJK Unified Ideographs Extension A
3EFF
3E0 3E1 3E2 3E3 3E4 3E5 3E6 3E7 3E8 3E9 3EA 3EB 3EC 3ED 3EE 3EF
㸀㸐㸠㸰㹀㹐㹠㹰㺀㺐㺠㺰㻀㻐㻠㻰
0
3E00
3E01
3E30
3E40
3E50
3E60
3E70
3E80
3E90
3EA0
3EB0
3EC0
3ED0
3EE0
3EF0
3E11
3E21
3E31
3E41
3E51
3E61
3E71
3E81
3E91
3EA1
3EB1
3EC1
3ED1
3EE1
3EF1
㸂㸒㸢㸲㹂㹒㹢㹲㺂㺒㺢㺲㻂㻒㻢㻲
2
3E02
3E12
3E22
3E32
3E42
3E52
3E62
3E72
3E82
3E92
3EA2
3EB2
3EC2
3ED2
3EE2
3EF2
㸃㸓㸣㸳㹃㹓㹣㹳㺃㺓㺣㺳㻃㻓㻣㻳
3
3E03
3E13
3E23
3E33
3E43
3E53
3E63
3E73
3E83
3E93
3EA3
3EB3
3EC3
3ED3
3EE3
3EF3
㸄㸔㸤㸴㹄㹔㹤㹴㺄㺔㺤㺴㻄㻔㻤㻴
4
3E04
3E14
3E24
3E34
3E44
3E54
3E64
3E74
3E84
3E94
3EA4
3EB4
3EC4
3ED4
3EE4
3EF4
㸅㸕㸥㸵㹅㹕㹥㹵㺅㺕㺥㺵㻅㻕㻥㻵
5
3E05
3E15
3E25
3E35
3E45
3E55
3E65
3E75
3E85
3E95
3EA5
3EB5
3EC5
3ED5
3EE5
3EF5
㸆㸖㸦㸶㹆㹖㹦㹶㺆㺖㺦㺶㻆㻖㻦㻶
6
3E06
3E16
3E26
3E36
3E46
3E56
3E66
3E76
3E86
3E96
3EA6
3EB6
3EC6
3ED6
3EE6
3EF6
㸇㸗㸧㸷㹇㹗㹧㹷㺇㺗㺧㺷㻇㻗㻧㻷
7
3E07
3E17
3E27
3E37
3E47
3E57
3E67
3E77
3E87
3E97
3EA7
3EB7
3EC7
3ED7
3EE7
3EF7
㸈㸘㸨㸸㹈㹘㹨㹸㺈㺘㺨㺸㻈㻘㻨㻸
8
3E08
3E18
3E28
3E38
3E48
3E58
3E68
3E78
3E88
3E98
3EA8
3EB8
3EC8
3ED8
3EE8
3EF8
㸉㸙㸩㸹㹉㹙㹩㹹㺉㺙㺩㺹㻉㻙㻩㻹
9
3E09
3E19
3E29
3E39
3E49
3E59
3E69
3E79
3E89
3E99
3EA9
3EB9
3EC9
3ED9
3EE9
3EF9
㸊㸚㸪㸺㹊㹚㹪㹺㺊㺚㺪㺺㻊㻚㻪㻺
A
3E0A
3E1A
3E2A
3E3A
3E4A
3E5A
3E6A
3E7A
3E8A
3E9A
3EAA
3EBA
3ECA
3EDA
3EEA
3EFA
㸋㸛㸫㸻㹋㹛㹫㹻㺋㺛㺫㺻㻋㻛㻫㻻
B
3E0B
C
D
3E1B
3E2B
3E3B
3E4B
3E5B
3E6B
3E7B
3E8B
3E9B
3EAB
3EBB
3ECB
3EDB
3EEB
3EFB
㸌㸜㸬㸼㹌㹜㹬㹼㺌㺜㺬㺼㻌㻜㻬㻼 3E0C
3E1C
3E2C
3E3C
3E4C
3E5C
3E6C
3E7C
3E8C
3E9C
3EAC
3EBC
3ECC
3EDC
3EEC
3EFC
㸍㸝㸭㸽㹍㹝㹭㹽㺍㺝㺭㺽㻍㻝㻭㻽 3E0D
3E1D
3E2D
3E3D
3E4D
3E5D
3E6D
3E7D
3E8D
3E9D
3EAD
3EBD
3ECD
3EDD
3EED
3EFD
㸎㸞㸮㸾㹎㹞㹮㹾㺎㺞㺮㺾㻎㻞㻮㻾 3E0E
F
3E20
㸁㸑㸡㸱㹁㹑㹡㹱㺁㺑㺡㺱㻁㻑㻡㻱
1
E
3E10
3E1E
3E2E
3E3E
3E4E
3E5E
3E6E
3E7E
3E8E
3E9E
3EAE
3EBE
3ECE
3EDE
3EEE
3EFE
㸏㸟㸯㸿㹏㹟㹯㹿㺏㺟㺯㺿㻏㻟㻯㻿 3E0F
3E1F
3E2F
3E3F
3E4F
3E5F
3E6F
3E7F
3E8F
3E9F
3EAF
3EBF
3ECF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
3EDF
3EEF
3EFF
299
3F00 3F0
3F00
3F2
3F3
3F4
3F5
3F6
3F7
3F8
3F9 3FA 3FB 3FC 3FD 3FE 3FF
3F10
3F20
3F30
3F40
3F50
3F60
3F70
3F80
3F90
3FA0
3FB0
3FC0
3FD0
3FE0
3FF0
㼁㼑㼡㼱㽁㽑㽡㽱㾁㾑㾡㾱㿁㿑㿡㿱
1
3F01
3F11
3F21
3F31
3F41
3F51
3F61
3F71
3F81
3F91
3FA1
3FB1
3FC1
3FD1
3FE1
3FF1
㼂㼒㼢㼲㽂㽒㽢㽲㾂㾒㾢㾲㿂㿒㿢㿲
2
3F02
3F12
3F22
3F32
3F42
3F52
3F62
3F72
3F82
3F92
3FA2
3FB2
3FC2
3FD2
3FE2
3FF2
㼃㼓㼣㼳㽃㽓㽣㽳㾃㾓㾣㾳㿃㿓㿣㿳
3
3F03
3F13
3F23
3F33
3F43
3F53
3F63
3F73
3F83
3F93
3FA3
3FB3
3FC3
3FD3
3FE3
3FF3
㼄㼔㼤㼴㽄㽔㽤㽴㾄㾔㾤㾴㿄㿔㿤㿴
4
3F04
3F14
3F24
3F34
3F44
3F54
3F64
3F74
3F84
3F94
3FA4
3FB4
3FC4
3FD4
3FE4
3FF4
㼅㼕㼥㼵㽅㽕㽥㽵㾅㾕㾥㾵㿅㿕㿥㿵
5
3F05
3F15
3F25
3F35
3F45
3F55
3F65
3F75
3F85
3F95
3FA5
3FB5
3FC5
3FD5
3FE5
3FF5
㼆㼖㼦㼶㽆㽖㽦㽶㾆㾖㾦㾶㿆㿖㿦㿶
6
3F06
3F16
3F26
3F36
3F46
3F56
3F66
3F76
3F86
3F96
3FA6
3FB6
3FC6
3FD6
3FE6
3FF6
㼇㼗㼧㼷㽇㽗㽧㽷㾇㾗㾧㾷㿇㿗㿧㿷
7
3F07
3F17
3F27
3F37
3F47
3F57
3F67
3F77
3F87
3F97
3FA7
3FB7
3FC7
3FD7
3FE7
3FF7
㼈㼘㼨㼸㽈㽘㽨㽸㾈㾘㾨㾸㿈㿘㿨㿸
8
3F08
3F18
3F28
3F38
3F48
3F58
3F68
3F78
3F88
3F98
3FA8
3FB8
3FC8
3FD8
3FE8
3FF8
㼉㼙㼩㼹㽉㽙㽩㽹㾉㾙㾩㾹㿉㿙㿩㿹
9
3F09
3F19
3F29
3F39
3F49
3F59
3F69
3F79
3F89
3F99
3FA9
3FB9
3FC9
3FD9
3FE9
3FF9
㼊㼚㼪㼺㽊㽚㽪㽺㾊㾚㾪㾺㿊㿚㿪㿺
A
3F0A
3F1A
3F2A
3F3A
3F4A
3F5A
3F6A
3F7A
3F8A
3F9A
3FAA
3FBA
3FCA
3FDA
3FEA
3FFA
㼋㼛㼫㼻㽋㽛㽫㽻㾋㾛㾫㾻㿋㿛㿫㿻
B
3F0B
C
D
3F1B
3F2B
3F3B
3F4B
3F5B
3F6B
3F7B
3F8B
3F9B
3FAB
3FBB
3FCB
3FDB
3FEB
3FFB
㼌㼜㼬㼼㽌㽜㽬㽼㾌㾜㾬㾼㿌㿜㿬㿼 3F0C
3F1C
3F2C
3F3C
3F4C
3F5C
3F6C
3F7C
3F8C
3F9C
3FAC
3FBC
3FCC
3FDC
3FEC
3FFC
㼍㼝㼭㼽㽍㽝㽭㽽㾍㾝㾭㾽㿍㿝㿭㿽 3F0D
3F1D
3F2D
3F3D
3F4D
3F5D
3F6D
3F7D
3F8D
3F9D
3FAD
3FBD
3FCD
3FDD
3FED
3FFD
㼎㼞㼮㼾㽎㽞㽮㽾㾎㾞㾮㾾㿎㿞㿮㿾 3F0E
F
3F1
3FFF
㼀㼐㼠㼰㽀㽐㽠㽰㾀㾐㾠㾰㿀㿐㿠㿰
0
E
CJK Unified Ideographs Extension A
3F1E
3F2E
3F3E
3F4E
3F5E
3F6E
3F7E
3F8E
3F9E
3FAE
3FBE
3FCE
3FDE
3FEE
3FFE
㼏㼟㼯㼿㽏㽟㽯㽿㾏㾟㾯㾿㿏㿟㿯㿿 3F0F
300
3F1F
3F2F
3F3F
3F4F
3F5F
3F6F
3F7F
3F8F
3F9F
3FAF
3FBF
3FCF
3FDF
3FEF
3FFF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Yijing Hexagram Symbols Range: 4DC0–4DFF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
CJK Unified Ideographs Range: 4E00–9FBF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
4E00
CJK Unified Ideographs
4EFF
4E0 4E1 4E2 4E3 4E4 4E5 4E6 4E7 4E8 4E9 4EA 4EB 4EC 4ED 4EE 4EF
一丐丠丰乀乐习买亀亐亠亰什仐仠仰
0
4E00
4E01
4E30
4E40
4E50
4E60
4E70
4E80
4E90
4EA0
4EB0
4EC0
4ED0
4EE0
4EF0
4E11
4E21
4E31
4E41
4E51
4E61
4E71
4E81
4E91
4EA1
4EB1
4EC1
4ED1
4EE1
4EF1
丂丒丢串乂乒乢乲亂互亢亲仂仒仢仲
2
4E02
4E12
4E22
4E32
4E42
4E52
4E62
4E72
4E82
4E92
4EA2
4EB2
4EC2
4ED2
4EE2
4EF2
七专丣丳乃乓乣乳亃亓亣亳仃仓代仳
3
4E03
4E13
4E23
4E33
4E43
4E53
4E63
4E73
4E83
4E93
4EA3
4EB3
4EC3
4ED3
4EE3
4EF3
丄且两临乄乔乤乴亄五交亴仄仔令仴
4
4E04
4E14
4E24
4E34
4E44
4E54
4E64
4E74
4E84
4E94
4EA4
4EB4
4EC4
4ED4
4EE4
4EF4
丅丕严丵久乕乥乵亅井亥亵仅仕以仵
5
4E05
4E15
4E25
4E35
4E45
4E55
4E65
4E75
4E85
4E95
4EA5
4EB5
4EC5
4ED5
4EE5
4EF5
丆世並丶乆乖书乶了亖亦亶仆他仦件
6
4E06
4E16
4E26
4E36
4E46
4E56
4E66
4E76
4E86
4E96
4EA6
4EB6
4EC6
4ED6
4EE6
4EF6
万丗丧丷乇乗乧乷亇亗产亷仇仗仧价
7
4E07
4E17
4E27
4E37
4E47
4E57
4E67
4E77
4E87
4E97
4EA7
4EB7
4EC7
4ED7
4EE7
4EF7
丈丘丨丸么乘乨乸予亘亨亸仈付仨仸
8
4E08
4E18
4E28
4E38
4E48
4E58
4E68
4E78
4E88
4E98
4EA8
4EB8
4EC8
4ED8
4EE8
4EF8
三丙丩丹义乙乩乹争亙亩亹仉仙仩仹
9
4E09
4E19
4E29
4E39
4E49
4E59
4E69
4E79
4E89
4E99
4EA9
4EB9
4EC9
4ED9
4EE9
4EF9
上业个为乊乚乪乺亊亚亪人今仚仪仺
A
4E0A
4E1A
4E2A
4E3A
4E4A
4E5A
4E6A
4E7A
4E8A
4E9A
4EAA
4EBA
4ECA
4EDA
4EEA
4EFA
下丛丫主之乛乫乻事些享亻介仛仫任
B
4E0B
C
D
4E1B
4E2B
4E3B
4E4B
4E5B
4E6B
4E7B
4E8B
4E9B
4EAB
4EBB
4ECB
4EDB
4EEB
4EFB
丌东丬丼乌乜乬乼二亜京亼仌仜们仼 4E0C
4E1C
4E2C
4E3C
4E4C
4E5C
4E6C
4E7C
4E8C
4E9C
4EAC
4EBC
4ECC
4EDC
4EEC
4EFC
不丝中丽乍九乭乽亍亝亭亽仍仝仭份 4E0D
4E1D
4E2D
4E3D
4E4D
4E5D
4E6D
4E7D
4E8D
4E9D
4EAD
4EBD
4ECD
4EDD
4EED
4EFD
与丞丮举乎乞乮乾于亞亮亾从仞仮仾 4E0E
F
4E20
丁丑両丱乁乑乡乱亁云亡亱仁仑仡仱
1
E
4E10
4E1E
4E2E
4E3E
4E4E
4E5E
4E6E
4E7E
4E8E
4E9E
4EAE
4EBE
4ECE
4EDE
4EEE
4EFE
丏丟丯丿乏也乯乿亏亟亯亿仏仟仯仿 4E0F
316
4E1F
4E2F
4E3F
4E4F
4E5F
4E6F
4E7F
4E8F
4E9F
4EAF
4EBF
4ECF
4EDF
4EEF
4EFF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
4F00 4F0
4F00
4F2
4F3
4F4
4F5
4F6
4F7
4F8
4F9 4FA 4FB 4FC 4FD 4FE 4FF
4F10
4F20
4F30
4F40
4F50
4F60
4F70
4F80
4F90
4FA0
4FB0
4FC0
4FD0
4FE0
4FF0
企休伡伱佁佑佡佱侁侑価侱俁俑信俱
1
4F01
4F11
4F21
4F31
4F41
4F51
4F61
4F71
4F81
4F91
4FA1
4FB1
4FC1
4FD1
4FE1
4FF1
伂伒伢伲佂佒佢佲侂侒侢侲係俒俢俲
2
4F02
4F12
4F22
4F32
4F42
4F52
4F62
4F72
4F82
4F92
4FA2
4FB2
4FC2
4FD2
4FE2
4FF2
伃伓伣伳佃体佣佳侃侓侣侳促俓俣俳
3
4F03
4F13
4F23
4F33
4F43
4F53
4F63
4F73
4F83
4F93
4FA3
4FB3
4FC3
4FD3
4FE3
4FF3
伄伔伤伴佄佔佤佴侄侔侤侴俄俔俤俴
4
4F04
4F14
4F24
4F34
4F44
4F54
4F64
4F74
4F84
4F94
4FA4
4FB4
4FC4
4FD4
4FE4
4FF4
伅伕伥伵佅何佥併侅侕侥侵俅俕俥俵
5
4F05
4F15
4F25
4F35
4F45
4F55
4F65
4F75
4F85
4F95
4FA5
4FB5
4FC5
4FD5
4FE5
4FF5
伆伖伦伶但佖佦佶來侖侦侶俆俖俦俶
6
4F06
4F16
4F26
4F36
4F46
4F56
4F66
4F76
4F86
4F96
4FA6
4FB6
4FC6
4FD6
4FE6
4FF6
伇众伧伷佇佗佧佷侇侗侧侷俇俗俧俷
7
4F07
4F17
4F27
4F37
4F47
4F57
4F67
4F77
4F87
4F97
4FA7
4FB7
4FC7
4FD7
4FE7
4FF7
伈优伨伸佈佘佨佸侈侘侨侸俈俘俨俸
8
4F08
4F18
4F28
4F38
4F48
4F58
4F68
4F78
4F88
4F98
4FA8
4FB8
4FC8
4FD8
4FE8
4FF8
伉伙伩伹佉余佩佹侉侙侩侹俉俙俩俹
9
4F09
4F19
4F29
4F39
4F49
4F59
4F69
4F79
4F89
4F99
4FA9
4FB9
4FC9
4FD9
4FE9
4FF9
伊会伪伺佊佚佪佺侊侚侪侺俊俚俪俺
A
4F0A
4F1A
4F2A
4F3A
4F4A
4F5A
4F6A
4F7A
4F8A
4F9A
4FAA
4FBA
4FCA
4FDA
4FEA
4FFA
伋伛伫伻佋佛佫佻例供侫侻俋俛俫俻
B
4F0B
C
D
4F1B
4F2B
4F3B
4F4B
4F5B
4F6B
4F7B
4F8B
4F9B
4FAB
4FBB
4FCB
4FDB
4FEB
4FFB
伌伜伬似佌作佬佼侌侜侬侼俌俜俬俼 4F0C
4F1C
4F2C
4F3C
4F4C
4F5C
4F6C
4F7C
4F8C
4F9C
4FAC
4FBC
4FCC
4FDC
4FEC
4FFC
伍伝伭伽位佝佭佽侍依侭侽俍保俭俽 4F0D
4F1D
4F2D
4F3D
4F4D
4F5D
4F6D
4F7D
4F8D
4F9D
4FAD
4FBD
4FCD
4FDD
4FED
4FFD
伎伞伮伾低佞佮佾侎侞侮侾俎俞修俾 4F0E
F
4F1
4FFF
伀伐传估佀佐你佰侀侐侠侰俀俐俠俰
0
E
CJK Unified Ideographs
4F1E
4F2E
4F3E
4F4E
4F5E
4F6E
4F7E
4F8E
4F9E
4FAE
4FBE
4FCE
4FDE
4FEE
4FFE
伏伟伯伿住佟佯使侏侟侯便俏俟俯俿 4F0F
4F1F
4F2F
4F3F
4F4F
4F5F
4F6F
4F7F
4F8F
4F9F
4FAF
4FBF
4FCF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
4FDF
4FEF
4FFF
317
5000 500
5000
502
503
504
505
506
507
508
509
50A 50B 50C 50D 50E
50F
5010
5020
5030
5040
5050
5060
5070
5080
5090
50A0
50B0
50C0
50D0
50E0
50F0
倁們倡倱偁偑偡偱傁傑傡傱僁僑僡僱
1
5001
5011
5021
5031
5041
5051
5061
5071
5081
5091
50A1
50B1
50C1
50D1
50E1
50F1
倂倒倢倲偂偒偢偲傂傒傢傲僂僒僢僲
2
5002
5012
5022
5032
5042
5052
5062
5072
5082
5092
50A2
50B2
50C2
50D2
50E2
50F2
倃倓倣倳偃偓偣偳傃傓傣傳僃僓僣僳
3
5003
5013
5023
5033
5043
5053
5063
5073
5083
5093
50A3
50B3
50C3
50D3
50E3
50F3
倄倔値倴偄偔偤側傄傔傤傴僄僔僤僴
4
5004
5014
5024
5034
5044
5054
5064
5074
5084
5094
50A4
50B4
50C4
50D4
50E4
50F4
倅倕倥倵偅偕健偵傅傕傥債僅僕僥僵
5
5005
5015
5025
5035
5045
5055
5065
5075
5085
5095
50A5
50B5
50C5
50D5
50E5
50F5
倆倖倦倶偆偖偦偶傆傖傦傶僆僖僦僶
6
5006
5016
5026
5036
5046
5056
5066
5076
5086
5096
50A6
50B6
50C6
50D6
50E6
50F6
倇倗倧倷假偗偧偷傇傗傧傷僇僗僧僷
7
5007
5017
5027
5037
5047
5057
5067
5077
5087
5097
50A7
50B7
50C7
50D7
50E7
50F7
倈倘倨倸偈偘偨偸傈傘储傸僈僘僨僸
8
5008
5018
5028
5038
5048
5058
5068
5078
5088
5098
50A8
50B8
50C8
50D8
50E8
50F8
倉候倩倹偉偙偩偹傉備傩傹僉僙僩價
9
5009
5019
5029
5039
5049
5059
5069
5079
5089
5099
50A9
50B9
50C9
50D9
50E9
50F9
倊倚倪债偊做偪偺傊傚傪傺僊僚僪僺
A
500A
501A
502A
503A
504A
505A
506A
507A
508A
509A
50AA
50BA
50CA
50DA
50EA
50FA
個倛倫倻偋偛偫偻傋傛傫傻僋僛僫僻
B
500B
C
D
501B
502B
503B
504B
505B
506B
507B
508B
509B
50AB
50BB
50CB
50DB
50EB
50FB
倌倜倬值偌停偬偼傌傜催傼僌僜僬僼 500C
501C
502C
503C
504C
505C
506C
507C
508C
509C
50AC
50BC
50CC
50DC
50EC
50FC
倍倝倭倽偍偝偭偽傍傝傭傽働僝僭僽 500D
501D
502D
503D
504D
505D
506D
507D
508D
509D
50AD
50BD
50CD
50DD
50ED
50FD
倎倞倮倾偎偞偮偾傎傞傮傾僎僞僮僾 500E
F
501
50FF
倀倐倠倰偀偐偠偰傀傐傠傰僀僐僠僰
0
E
CJK Unified Ideographs
501E
502E
503E
504E
505E
506E
507E
508E
509E
50AE
50BE
50CE
50DE
50EE
50FE
倏借倯倿偏偟偯偿傏傟傯傿像僟僯僿 500F
318
501F
502F
503F
504F
505F
506F
507F
508F
509F
50AF
50BF
50CF
50DF
50EF
50FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
5100 510
5100
512
513
514
515
516
517
518
519
51A 51B 51C 51D 51E
51F
5110
5120
5130
5140
5150
5160
5170
5180
5190
51A0
51B0
51C0
51D0
51E0
51F0
儁儑儡儱允兑兡共冁冑冡冱凁凑凡凱
1
5101
5111
5121
5131
5141
5151
5161
5171
5181
5191
51A1
51B1
51C1
51D1
51E1
51F1
儂儒儢儲兂兒兢兲冂冒冢冲凂凒凢凲
2
5102
5112
5122
5132
5142
5152
5162
5172
5182
5192
51A2
51B2
51C2
51D2
51E2
51F2
儃儓儣儳元兓兣关冃冓冣决凃凓凣凳
3
5103
5113
5123
5133
5143
5153
5163
5173
5183
5193
51A3
51B3
51C3
51D3
51E3
51F3
億儔儤儴兄兔兤兴冄冔冤冴凄凔凤凴
4
5104
5114
5124
5134
5144
5154
5164
5174
5184
5194
51A4
51B4
51C4
51D4
51E4
51F4
儅儕儥儵充兕入兵内冕冥况凅凕凥凵
5
5105
5115
5125
5135
5145
5155
5165
5175
5185
5195
51A5
51B5
51C5
51D5
51E5
51F5
儆儖儦儶兆兖兦其円冖冦冶准凖処凶
6
5106
5116
5126
5136
5146
5156
5166
5176
5186
5196
51A6
51B6
51C6
51D6
51E6
51F6
儇儗儧儷兇兗內具冇冗冧冷凇凗凧凷
7
5107
5117
5127
5137
5147
5157
5167
5177
5187
5197
51A7
51B7
51C7
51D7
51E7
51F7
儈儘儨儸先兘全典冈冘冨冸凈凘凨凸
8
5108
5118
5128
5138
5148
5158
5168
5178
5188
5198
51A8
51B8
51C8
51D8
51E8
51F8
儉儙儩儹光兙兩兹冉写冩冹凉凙凩凹
9
5109
5119
5129
5139
5149
5159
5169
5179
5189
5199
51A9
51B9
51C9
51D9
51E9
51F9
儊儚優儺兊党兪兺冊冚冪冺凊凚凪出
A
510A
511A
512A
513A
514A
515A
516A
517A
518A
519A
51AA
51BA
51CA
51DA
51EA
51FA
儋儛儫儻克兛八养冋军冫冻凋凛凫击
B
510B
C
D
511B
512B
513B
514B
515B
516B
517B
518B
519B
51AB
51BB
51CB
51DB
51EB
51FB
儌儜儬儼兌兜公兼册农冬冼凌凜凬凼 510C
511C
512C
513C
514C
515C
516C
517C
518C
519C
51AC
51BC
51CC
51DC
51EC
51FC
儍儝儭儽免兝六兽再冝冭冽凍凝凭函 510D
511D
512D
513D
514D
515D
516D
517D
518D
519D
51AD
51BD
51CD
51DD
51ED
51FD
儎儞儮儾兎兞兮兾冎冞冮冾凎凞凮凾 510E
F
511
51FF
儀儐儠儰兀児兠兰冀冐冠冰净凐几凰
0
E
CJK Unified Ideographs
511E
512E
513E
514E
515E
516E
517E
518E
519E
51AE
51BE
51CE
51DE
51EE
51FE
儏償儯儿兏兟兯兿冏冟冯冿减凟凯凿 510F
511F
512F
513F
514F
515F
516F
517F
518F
519F
51AF
51BF
51CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
51DF
51EF
51FF
319
5200 520
5200
522
523
524
525
526
527
528
529
52A 52B 52C 52D 52E
52F
5210
5220
5230
5240
5250
5260
5270
5280
5290
52A0
52B0
52C0
52D0
52E0
52F0
刁刑刡刱剁剑剡剱劁劑务励勁勑勡勱
1
5201
5211
5221
5231
5241
5251
5261
5271
5281
5291
52A1
52B1
52C1
52D1
52E1
52F1
刂划刢刲剂剒剢割劂劒劢劲勂勒勢勲
2
5202
5212
5222
5232
5242
5252
5262
5272
5282
5292
52A2
52B2
52C2
52D2
52E2
52F2
刃刓刣刳剃剓剣剳劃劓劣劳勃勓勣勳
3
5203
5213
5223
5233
5243
5253
5263
5273
5283
5293
52A3
52B3
52C3
52D3
52E3
52F3
刄刔判刴剄剔剤剴劄劔劤労勄勔勤勴
4
5204
5214
5224
5234
5244
5254
5264
5274
5284
5294
52A4
52B4
52C4
52D4
52E4
52F4
刅刕別刵剅剕剥創劅劕劥劵勅動勥勵
5
5205
5215
5225
5235
5245
5255
5265
5275
5285
5295
52A5
52B5
52C5
52D5
52E5
52F5
分刖刦制剆剖剦剶劆劖劦劶勆勖勦勶
6
5206
5216
5226
5236
5246
5256
5266
5276
5286
5296
52A6
52B6
52C6
52D6
52E6
52F6
切列刧刷則剗剧剷劇劗劧劷勇勗勧勷
7
5207
5217
5227
5237
5247
5257
5267
5277
5287
5297
52A7
52B7
52C7
52D7
52E7
52F7
刈刘刨券剈剘剨剸劈劘动劸勈勘勨勸
8
5208
5218
5228
5238
5248
5258
5268
5278
5288
5298
52A8
52B8
52C8
52D8
52E8
52F8
刉则利刹剉剙剩剹劉劙助効勉務勩勹
9
5209
5219
5229
5239
5249
5259
5269
5279
5289
5299
52A9
52B9
52C9
52D9
52E9
52F9
刊刚刪刺削剚剪剺劊劚努劺勊勚勪勺
A
520A
521A
522A
523A
524A
525A
526A
527A
528A
529A
52AA
52BA
52CA
52DA
52EA
52FA
刋创别刻剋剛剫剻劋力劫劻勋勛勫勻
B
520B
C
D
521B
522B
523B
524B
525B
526B
527B
528B
529B
52AB
52BB
52CB
52DB
52EB
52FB
刌刜刬刼剌剜剬剼劌劜劬劼勌勜勬勼 520C
521C
522C
523C
524C
525C
526C
527C
528C
529C
52AC
52BC
52CC
52DC
52EC
52FC
刍初刭刽前剝剭剽劍劝劭劽勍勝勭勽 520D
521D
522D
523D
524D
525D
526D
527D
528D
529D
52AD
52BD
52CD
52DD
52ED
52FD
刎刞刮刾剎剞剮剾劎办劮劾勎勞勮勾 520E
F
521
52FF
刀刐删到剀剐剠剰劀劐加劰勀勐勠勰
0
E
CJK Unified Ideographs
521E
522E
523E
524E
525E
526E
527E
528E
529E
52AE
52BE
52CE
52DE
52EE
52FE
刏刟刯刿剏剟副剿劏功劯势勏募勯勿 520F
320
521F
522F
523F
524F
525F
526F
527F
528F
529F
52AF
52BF
52CF
52DF
52EF
52FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
5300 530
5300
532
533
534
535
536
537
538
539
53A 53B 53C 53D 53E
53F
5310
5320
5330
5340
5350
5360
5370
5380
5390
53A0
53B0
53C0
53D0
53E0
53F0
匁匑匡匱十卑卡危厁厑厡厱叁发叡叱
1
5301
5311
5321
5331
5341
5351
5361
5371
5381
5391
53A1
53B1
53C1
53D1
53E1
53F1
匂匒匢匲卂卒卢卲厂厒厢厲参叒叢史
2
5302
5312
5322
5332
5342
5352
5362
5372
5382
5392
53A2
53B2
53C2
53D2
53E2
53F2
匃匓匣匳千卓卣即厃厓厣厳參叓口右
3
5303
5313
5323
5333
5343
5353
5363
5373
5383
5393
53A3
53B3
53C3
53D3
53E3
53F3
匄匔匤匴卄協卤却厄厔厤厴叄叔古叴
4
5304
5314
5324
5334
5344
5354
5364
5374
5384
5394
53A4
53B4
53C4
53D4
53E4
53F4
包匕匥匵卅单卥卵厅厕厥厵叅叕句叵
5
5305
5315
5325
5335
5345
5355
5365
5375
5385
5395
53A5
53B5
53C5
53D5
53E5
53F5
匆化匦匶卆卖卦卶历厖厦厶叆取另叶
6
5306
5316
5326
5336
5346
5356
5366
5376
5386
5396
53A6
53B6
53C6
53D6
53E6
53F6
匇北匧匷升南卧卷厇厗厧厷叇受叧号
7
5307
5317
5327
5337
5347
5357
5367
5377
5387
5397
53A7
53B7
53C7
53D7
53E7
53F7
匈匘匨匸午単卨卸厈厘厨厸又变叨司
8
5308
5318
5328
5338
5348
5358
5368
5378
5388
5398
53A8
53B8
53C8
53D8
53E8
53F8
匉匙匩匹卉卙卩卹厉厙厩厹叉叙叩叹
9
5309
5319
5329
5339
5349
5359
5369
5379
5389
5399
53A9
53B9
53C9
53D9
53E9
53F9
匊匚匪区半博卪卺厊厚厪厺及叚只叺
A
530A
531A
532A
533A
534A
535A
536A
537A
538A
539A
53AA
53BA
53CA
53DA
53EA
53FA
匋匛匫医卋卛卫卻压厛厫去友叛叫叻
B
530B
C
D
531B
532B
533B
534B
535B
536B
537B
538B
539B
53AB
53BB
53CB
53DB
53EB
53FB
匌匜匬匼卌卜卬卼厌厜厬厼双叜召叼 530C
531C
532C
533C
534C
535C
536C
537C
538C
539C
53AC
53BC
53CC
53DC
53EC
53FC
匍匝匭匽卍卝卭卽厍厝厭厽反叝叭叽 530D
531D
532D
533D
534D
535D
536D
537D
538D
539D
53AD
53BD
53CD
53DD
53ED
53FD
匎匞匮匾华卞卮卾厎厞厮厾収叞叮叾 530E
F
531
53FF
匀匐匠匰區卐占印厀厐厠厰叀叐叠台
0
E
CJK Unified Ideographs
531E
532E
533E
534E
535E
536E
537E
538E
539E
53AE
53BE
53CE
53DE
53EE
53FE
匏匟匯匿协卟卯卿厏原厯县叏叟可叿 530F
531F
532F
533F
534F
535F
536F
537F
538F
539F
53AF
53BF
53CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
53DF
53EF
53FF
321
5400 540
5400
542
543
544
545
546
547
548
549
54A 54B 54C 54D 54E
54F
5410
5420
5430
5440
5450
5460
5470
5480
5490
54A0
54B0
54C0
54D0
54E0
54F0
吁向吡吱呁呑呡呱咁咑咡咱品哑員哱
1
5401
5411
5421
5431
5441
5451
5461
5471
5481
5491
54A1
54B1
54C1
54D1
54E1
54F1
吂吒吢吲呂呒呢呲咂咒咢咲哂哒哢哲
2
5402
5412
5422
5432
5442
5452
5462
5472
5482
5492
54A2
54B2
54C2
54D2
54E2
54F2
吃吓吣吳呃呓呣味咃咓咣咳哃哓哣哳
3
5403
5413
5423
5433
5443
5453
5463
5473
5483
5493
54A3
54B3
54C3
54D3
54E3
54F3
各吔吤吴呄呔呤呴咄咔咤咴哄哔哤哴
4
5404
5414
5424
5434
5444
5454
5464
5474
5484
5494
54A4
54B4
54C4
54D4
54E4
54F4
吅吕吥吵呅呕呥呵咅咕咥咵哅哕哥哵
5
5405
5415
5425
5435
5445
5455
5465
5475
5485
5495
54A5
54B5
54C5
54D5
54E5
54F5
吆吖否吶呆呖呦呶咆咖咦咶哆哖哦哶
6
5406
5416
5426
5436
5446
5456
5466
5476
5486
5496
54A6
54B6
54C6
54D6
54E6
54F6
吇吗吧吷呇呗呧呷咇咗咧咷哇哗哧哷
7
5407
5417
5427
5437
5447
5457
5467
5477
5487
5497
54A7
54B7
54C7
54D7
54E7
54F7
合吘吨吸呈员周呸咈咘咨咸哈哘哨哸
8
5408
5418
5428
5438
5448
5458
5468
5478
5488
5498
54A8
54B8
54C8
54D8
54E8
54F8
吉吙吩吹呉呙呩呹咉咙咩咹哉哙哩哹
9
5409
5419
5429
5439
5449
5459
5469
5479
5489
5499
54A9
54B9
54C9
54D9
54E9
54F9
吊吚吪吺告呚呪呺咊咚咪咺哊哚哪哺
A
540A
541A
542A
543A
544A
545A
546A
547A
548A
549A
54AA
54BA
54CA
54DA
54EA
54FA
吋君含吻呋呛呫呻咋咛咫咻哋哛哫哻
B
540B
C
D
541B
542B
543B
544B
545B
546B
547B
548B
549B
54AB
54BB
54CB
54DB
54EB
54FB
同吜听吼呌呜呬呼和咜咬咼哌哜哬哼 540C
541C
542C
543C
544C
545C
546C
547C
548C
549C
54AC
54BC
54CC
54DC
54EC
54FC
名吝吭吽呍呝呭命咍咝咭咽响哝哭哽 540D
541D
542D
543D
544D
545D
546D
547D
548D
549D
54AD
54BD
54CD
54DD
54ED
54FD
后吞吮吾呎呞呮呾咎咞咮咾哎哞哮哾 540E
F
541
54FF
吀吐吠吰呀呐呠呰咀咐咠咰哀哐哠哰
0
E
CJK Unified Ideographs
541E
542E
543E
544E
545E
546E
547E
548E
549E
54AE
54BE
54CE
54DE
54EE
54FE
吏吟启吿呏呟呯呿咏咟咯咿哏哟哯哿 540F
322
541F
542F
543F
544F
545F
546F
547F
548F
549F
54AF
54BF
54CF
54DF
54EF
54FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
5500 550
5500
552
553
554
555
556
557
558
559
55A 55B 55C 55D 55E
55F
5510
5520
5530
5540
5550
5560
5570
5580
5590
55A0
55B0
55C0
55D0
55E0
55F0
唁唑唡唱啁啑啡啱喁喑喡喱嗁嗑嗡嗱
1
5501
5511
5521
5531
5541
5551
5561
5571
5581
5591
55A1
55B1
55C1
55D1
55E1
55F1
唂唒唢唲啂啒啢啲喂喒喢喲嗂嗒嗢嗲
2
5502
5512
5522
5532
5542
5552
5562
5572
5582
5592
55A2
55B2
55C2
55D2
55E2
55F2
唃唓唣唳啃啓啣啳喃喓喣喳嗃嗓嗣嗳
3
5503
5513
5523
5533
5543
5553
5563
5573
5583
5593
55A3
55B3
55C3
55D3
55E3
55F3
唄唔唤唴啄啔啤啴善喔喤喴嗄嗔嗤嗴
4
5504
5514
5524
5534
5544
5554
5564
5574
5584
5594
55A4
55B4
55C4
55D4
55E4
55F4
唅唕唥唵啅啕啥啵喅喕喥喵嗅嗕嗥嗵
5
5505
5515
5525
5535
5545
5555
5565
5575
5585
5595
55A5
55B5
55C5
55D5
55E5
55F5
唆唖唦唶商啖啦啶喆喖喦営嗆嗖嗦嗶
6
5506
5516
5526
5536
5546
5556
5566
5576
5586
5596
55A6
55B6
55C6
55D6
55E6
55F6
唇唗唧唷啇啗啧啷喇喗喧喷嗇嗗嗧嗷
7
5507
5517
5527
5537
5547
5557
5567
5577
5587
5597
55A7
55B7
55C7
55D7
55E7
55F7
唈唘唨唸啈啘啨啸喈喘喨喸嗈嗘嗨嗸
8
5508
5518
5528
5538
5548
5558
5568
5578
5588
5598
55A8
55B8
55C8
55D8
55E8
55F8
唉唙唩唹啉啙啩啹喉喙喩喹嗉嗙嗩嗹
9
5509
5519
5529
5539
5549
5559
5569
5579
5589
5599
55A9
55B9
55C9
55D9
55E9
55F9
唊唚唪唺啊啚啪啺喊喚喪喺嗊嗚嗪嗺
A
550A
551A
552A
553A
554A
555A
556A
557A
558A
559A
55AA
55BA
55CA
55DA
55EA
55FA
唋唛唫唻啋啛啫啻喋喛喫喻嗋嗛嗫嗻
B
550B
C
D
551B
552B
553B
554B
555B
556B
557B
558B
559B
55AB
55BB
55CB
55DB
55EB
55FB
唌唜唬唼啌啜啬啼喌喜喬喼嗌嗜嗬嗼 550C
551C
552C
553C
554C
555C
556C
557C
558C
559C
55AC
55BC
55CC
55DC
55EC
55FC
唍唝唭唽啍啝啭啽喍喝喭喽嗍嗝嗭嗽 550D
551D
552D
553D
554D
555D
556D
557D
558D
559D
55AD
55BD
55CD
55DD
55ED
55FD
唎唞售唾啎啞啮啾喎喞單喾嗎嗞嗮嗾 550E
F
551
55FF
唀唐唠唰啀啐啠啰喀喐喠喰嗀嗐嗠嗰
0
E
CJK Unified Ideographs
551E
552E
553E
554E
555E
556E
557E
558E
559E
55AE
55BE
55CE
55DE
55EE
55FE
唏唟唯唿問啟啯啿喏喟喯喿嗏嗟嗯嗿 550F
551F
552F
553F
554F
555F
556F
557F
558F
559F
55AF
55BF
55CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
55DF
55EF
55FF
323
5600 560
5600
562
563
564
565
566
567
568
569
56A 56B 56C 56D 56E
56F
5610
5620
5630
5640
5650
5660
5670
5680
5690
56A0
56B0
56C0
56D0
56E0
56F0
嘁嘑嘡嘱噁噑噡噱嚁嚑嚡嚱囁囑囡囱
1
5601
5611
5621
5631
5641
5651
5661
5671
5681
5691
56A1
56B1
56C1
56D1
56E1
56F1
嘂嘒嘢嘲噂噒噢噲嚂嚒嚢嚲囂囒团囲
2
5602
5612
5622
5632
5642
5652
5662
5672
5682
5692
56A2
56B2
56C2
56D2
56E2
56F2
嘃嘓嘣嘳噃噓噣噳嚃嚓嚣嚳囃囓団図
3
5603
5613
5623
5633
5643
5653
5663
5673
5683
5693
56A3
56B3
56C3
56D3
56E3
56F3
嘄嘔嘤嘴噄噔噤噴嚄嚔嚤嚴囄囔囤围
4
5604
5614
5624
5634
5644
5654
5664
5674
5684
5694
56A4
56B4
56C4
56D4
56E4
56F4
嘅嘕嘥嘵噅噕噥噵嚅嚕嚥嚵囅囕囥囵
5
5605
5615
5625
5635
5645
5655
5665
5675
5685
5695
56A5
56B5
56C5
56D5
56E5
56F5
嘆嘖嘦嘶噆噖噦噶嚆嚖嚦嚶囆囖囦囶
6
5606
5616
5626
5636
5646
5656
5666
5676
5686
5696
56A6
56B6
56C6
56D6
56E6
56F6
嘇嘗嘧嘷噇噗噧噷嚇嚗嚧嚷囇囗囧囷
7
5607
5617
5627
5637
5647
5657
5667
5677
5687
5697
56A7
56B7
56C7
56D7
56E7
56F7
嘈嘘嘨嘸噈噘器噸嚈嚘嚨嚸囈囘囨囸
8
5608
5618
5628
5638
5648
5658
5668
5678
5688
5698
56A8
56B8
56C8
56D8
56E8
56F8
嘉嘙嘩嘹噉噙噩噹嚉嚙嚩嚹囉囙囩囹
9
5609
5619
5629
5639
5649
5659
5669
5679
5689
5699
56A9
56B9
56C9
56D9
56E9
56F9
嘊嘚嘪嘺噊噚噪噺嚊嚚嚪嚺囊囚囪固
A
560A
561A
562A
563A
564A
565A
566A
567A
568A
569A
56AA
56BA
56CA
56DA
56EA
56FA
嘋嘛嘫嘻噋噛噫噻嚋嚛嚫嚻囋四囫囻
B
560B
C
D
561B
562B
563B
564B
565B
566B
567B
568B
569B
56AB
56BB
56CB
56DB
56EB
56FB
嘌嘜嘬嘼噌噜噬噼嚌嚜嚬嚼囌囜囬囼 560C
561C
562C
563C
564C
565C
566C
567C
568C
569C
56AC
56BC
56CC
56DC
56EC
56FC
嘍嘝嘭嘽噍噝噭噽嚍嚝嚭嚽囍囝园国 560D
561D
562D
563D
564D
565D
566D
567D
568D
569D
56AD
56BD
56CD
56DD
56ED
56FD
嘎嘞嘮嘾噎噞噮噾嚎嚞嚮嚾囎回囮图 560E
F
561
56FF
嘀嘐嘠嘰噀噐噠噰嚀嚐嚠嚰囀囐因困
0
E
CJK Unified Ideographs
561E
562E
563E
564E
565E
566E
567E
568E
569E
56AE
56BE
56CE
56DE
56EE
56FE
嘏嘟嘯嘿噏噟噯噿嚏嚟嚯嚿囏囟囯囿 560F
324
561F
562F
563F
564F
565F
566F
567F
568F
569F
56AF
56BF
56CF
56DF
56EF
56FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
5700 570
5700
572
573
574
575
576
577
578
579
57A 57B 57C 57D 57E
57F
5710
5720
5730
5740
5750
5760
5770
5780
5790
57A0
57B0
57C0
57D0
57E0
57F0
圁圑圡圱坁坑坡坱垁垑垡垱埁埑埡埱
1
5701
5711
5721
5731
5741
5751
5761
5771
5781
5791
57A1
57B1
57C1
57D1
57E1
57F1
圂園圢圲坂坒坢坲垂垒垢垲埂埒埢埲
2
5702
5712
5722
5732
5742
5752
5762
5772
5782
5792
57A2
57B2
57C2
57D2
57E2
57F2
圃圓圣圳坃坓坣坳垃垓垣垳埃埓埣埳
3
5703
5713
5723
5733
5743
5753
5763
5773
5783
5793
57A3
57B3
57C3
57D3
57E3
57F3
圄圔圤圴坄坔坤坴垄垔垤垴埄埔埤埴
4
5704
5714
5724
5734
5744
5754
5764
5774
5784
5794
57A4
57B4
57C4
57D4
57E4
57F4
圅圕圥圵坅坕坥坵垅垕垥垵埅埕埥埵
5
5705
5715
5725
5735
5745
5755
5765
5775
5785
5795
57A5
57B5
57C5
57D5
57E5
57F5
圆圖圦圶坆坖坦坶垆垖垦垶埆埖埦埶
6
5706
5716
5726
5736
5746
5756
5766
5776
5786
5796
57A6
57B6
57C6
57D6
57E6
57F6
圇圗圧圷均块坧坷垇垗垧垷埇埗埧執
7
5707
5717
5727
5737
5747
5757
5767
5777
5787
5797
57A7
57B7
57C7
57D7
57E7
57F7
圈團在圸坈坘坨坸垈垘垨垸埈埘埨埸
8
5708
5718
5728
5738
5748
5758
5768
5778
5788
5798
57A8
57B8
57C8
57D8
57E8
57F8
圉圙圩圹坉坙坩坹垉垙垩垹埉埙埩培
9
5709
5719
5729
5739
5749
5759
5769
5779
5789
5799
57A9
57B9
57C9
57D9
57E9
57F9
圊圚圪场坊坚坪坺垊垚垪垺埊埚埪基
A
570A
571A
572A
573A
574A
575A
576A
577A
578A
579A
57AA
57BA
57CA
57DA
57EA
57FA
國圛圫圻坋坛坫坻型垛垫垻埋埛埫埻
B
570B
C
D
571B
572B
573B
574B
575B
576B
577B
578B
579B
57AB
57BB
57CB
57DB
57EB
57FB
圌圜圬圼坌坜坬坼垌垜垬垼埌埜埬埼 570C
571C
572C
573C
574C
575C
576C
577C
578C
579C
57AC
57BC
57CC
57DC
57EC
57FC
圍圝圭圽坍坝坭坽垍垝垭垽埍埝埭埽 570D
571D
572D
573D
574D
575D
576D
577D
578D
579D
57AD
57BD
57CD
57DD
57ED
57FD
圎圞圮圾坎坞坮坾垎垞垮垾城埞埮埾 570E
F
571
57FF
圀圐圠地址坐坠坰垀垐垠垰埀埐埠埰
0
E
CJK Unified Ideographs
571E
572E
573E
574E
575E
576E
577E
578E
579E
57AE
57BE
57CE
57DE
57EE
57FE
圏土圯圿坏坟坯坿垏垟垯垿埏域埯埿 570F
571F
572F
573F
574F
575F
576F
577F
578F
579F
57AF
57BF
57CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
57DF
57EF
57FF
325
5800 580
5800
582
583
584
585
586
587
588
589
58A 58B 58C 58D 58E
58F
5810
5820
5830
5840
5850
5860
5870
5880
5890
58A0
58B0
58C0
58D0
58E0
58F0
堁堑堡報塁塑塡塱墁墑墡墱壁壑壡壱
1
5801
5811
5821
5831
5841
5851
5861
5871
5881
5891
58A1
58B1
58C1
58D1
58E1
58F1
堂堒堢堲塂塒塢塲墂墒墢墲壂壒壢売
2
5802
5812
5822
5832
5842
5852
5862
5872
5882
5892
58A2
58B2
58C2
58D2
58E2
58F2
堃堓堣堳塃塓塣塳境墓墣墳壃壓壣壳
3
5803
5813
5823
5833
5843
5853
5863
5873
5883
5893
58A3
58B3
58C3
58D3
58E3
58F3
堄堔堤場塄塔塤塴墄墔墤墴壄壔壤壴
4
5804
5814
5824
5834
5844
5854
5864
5874
5884
5894
58A4
58B4
58C4
58D4
58E4
58F4
堅堕堥堵塅塕塥塵墅墕墥墵壅壕壥壵
5
5805
5815
5825
5835
5845
5855
5865
5875
5885
5895
58A5
58B5
58C5
58D5
58E5
58F5
堆堖堦堶塆塖塦塶墆墖墦墶壆壖壦壶
6
5806
5816
5826
5836
5846
5856
5866
5876
5886
5896
58A6
58B6
58C6
58D6
58E6
58F6
堇堗堧堷塇塗塧塷墇増墧墷壇壗壧壷
7
5807
5817
5827
5837
5847
5857
5867
5877
5887
5897
58A7
58B7
58C7
58D7
58E7
58F7
堈堘堨堸塈塘塨塸墈墘墨墸壈壘壨壸
8
5808
5818
5828
5838
5848
5858
5868
5878
5888
5898
58A8
58B8
58C8
58D8
58E8
58F8
堉堙堩堹塉塙塩塹墉墙墩墹壉壙壩壹
9
5809
5819
5829
5839
5849
5859
5869
5879
5889
5899
58A9
58B9
58C9
58D9
58E9
58F9
堊堚堪堺塊塚塪塺墊墚墪墺壊壚壪壺
A
580A
581A
582A
583A
584A
585A
586A
587A
588A
589A
58AA
58BA
58CA
58DA
58EA
58FA
堋堛堫堻塋塛填塻墋墛墫墻壋壛士壻
B
580B
C
D
581B
582B
583B
584B
585B
586B
587B
588B
589B
58AB
58BB
58CB
58DB
58EB
58FB
堌堜堬堼塌塜塬塼墌墜墬墼壌壜壬壼 580C
581C
582C
583C
584C
585C
586C
587C
588C
589C
58AC
58BC
58CC
58DC
58EC
58FC
堍堝堭堽塍塝塭塽墍墝墭墽壍壝壭壽 580D
581D
582D
583D
584D
585D
586D
587D
588D
589D
58AD
58BD
58CD
58DD
58ED
58FD
堎堞堮堾塎塞塮塾墎增墮墾壎壞壮壾 580E
F
581
58FF
堀堐堠堰塀塐塠塰墀墐墠墰壀壐壠声
0
E
CJK Unified Ideographs
581E
582E
583E
584E
585E
586E
587E
588E
589E
58AE
58BE
58CE
58DE
58EE
58FE
堏堟堯堿塏塟塯塿墏墟墯墿壏壟壯壿 580F
326
581F
582F
583F
584F
585F
586F
587F
588F
589F
58AF
58BF
58CF
58DF
58EF
58FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
5900 590
5900
592
593
594
595
596
597
598
599
59A 59B 59C 59D 59E
59F
5910
5920
5930
5940
5950
5960
5970
5980
5990
59A0
59B0
59C0
59D0
59E0
59F0
夁夑夡失奁契奡奱妁妑妡妱姁姑姡姱
1
5901
5911
5921
5931
5941
5951
5961
5971
5981
5991
59A1
59B1
59C1
59D1
59E1
59F1
夂夒夢夲奂奒奢奲如妒妢妲姂姒姢姲
2
5902
5912
5922
5932
5942
5952
5962
5972
5982
5992
59A2
59B2
59C2
59D2
59E2
59F2
夃夓夣夳奃奓奣女妃妓妣妳姃姓姣姳
3
5903
5913
5923
5933
5943
5953
5963
5973
5983
5993
59A3
59B3
59C3
59D3
59E3
59F3
处夔夤头奄奔奤奴妄妔妤妴姄委姤姴
4
5904
5914
5924
5934
5944
5954
5964
5974
5984
5994
59A4
59B4
59C4
59D4
59E4
59F4
夅夕夥夵奅奕奥奵妅妕妥妵姅姕姥姵
5
5905
5915
5925
5935
5945
5955
5965
5975
5985
5995
59A5
59B5
59C5
59D5
59E5
59F5
夆外夦夶奆奖奦奶妆妖妦妶姆姖姦姶
6
5906
5916
5926
5936
5946
5956
5966
5976
5986
5996
59A6
59B6
59C6
59D6
59E6
59F6
备夗大夷奇套奧奷妇妗妧妷姇姗姧姷
7
5907
5917
5927
5937
5947
5957
5967
5977
5987
5997
59A7
59B7
59C7
59D7
59E7
59F7
夈夘夨夸奈奘奨奸妈妘妨妸姈姘姨姸
8
5908
5918
5928
5938
5948
5958
5968
5978
5988
5998
59A8
59B8
59C8
59D8
59E8
59F8
変夙天夹奉奙奩她妉妙妩妹姉姙姩姹
9
5909
5919
5929
5939
5949
5959
5969
5979
5989
5999
59A9
59B9
59C9
59D9
59E9
59F9
夊多太夺奊奚奪奺妊妚妪妺姊姚姪姺
A
590A
591A
592A
593A
594A
595A
596A
597A
598A
599A
59AA
59BA
59CA
59DA
59EA
59FA
夋夛夫夻奋奛奫奻妋妛妫妻始姛姫姻
B
590B
C
D
591B
592B
593B
594B
595B
596B
597B
598B
599B
59AB
59BB
59CB
59DB
59EB
59FB
夌夜夬夼奌奜奬奼妌妜妬妼姌姜姬姼 590C
591C
592C
593C
594C
595C
596C
597C
598C
599C
59AC
59BC
59CC
59DC
59EC
59FC
复夝夭夽奍奝奭好妍妝妭妽姍姝姭姽 590D
591D
592D
593D
594D
595D
596D
597D
598D
599D
59AD
59BD
59CD
59DD
59ED
59FD
夎夞央夾奎奞奮奾妎妞妮妾姎姞姮姾 590E
F
591
59FF
夀夐夠夰奀奐奠奰妀妐妠妰姀姐姠姰
0
E
CJK Unified Ideographs
591E
592E
593E
594E
595E
596E
597E
598E
599E
59AE
59BE
59CE
59DE
59EE
59FE
夏够夯夿奏奟奯奿妏妟妯妿姏姟姯姿 590F
591F
592F
593F
594F
595F
596F
597F
598F
599F
59AF
59BF
59CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
59DF
59EF
59FF
327
5A00
CJK Unified Ideographs
5AFF
5A0 5A1 5A2 5A3 5A4 5A5 5A6 5A7 5A8 5A9 5AA 5AB 5AC 5AD 5AE 5AF
娀娐娠娰婀婐婠婰媀媐媠媰嫀嫐嫠嫰
0
5A00
5A01
5A30
5A40
5A50
5A60
5A70
5A80
5A90
5AA0
5AB0
5AC0
5AD0
5AE0
5AF0
5A11
5A21
5A31
5A41
5A51
5A61
5A71
5A81
5A91
5AA1
5AB1
5AC1
5AD1
5AE1
5AF1
娂娒娢娲婂婒婢婲媂媒媢媲嫂嫒嫢嫲
2
5A02
5A12
5A22
5A32
5A42
5A52
5A62
5A72
5A82
5A92
5AA2
5AB2
5AC2
5AD2
5AE2
5AF2
娃娓娣娳婃婓婣婳媃媓媣媳嫃嫓嫣嫳
3
5A03
5A13
5A23
5A33
5A43
5A53
5A63
5A73
5A83
5A93
5AA3
5AB3
5AC3
5AD3
5AE3
5AF3
娄娔娤娴婄婔婤婴媄媔媤媴嫄嫔嫤嫴
4
5A04
5A14
5A24
5A34
5A44
5A54
5A64
5A74
5A84
5A94
5AA4
5AB4
5AC4
5AD4
5AE4
5AF4
娅娕娥娵婅婕婥婵媅媕媥媵嫅嫕嫥嫵
5
5A05
5A15
5A25
5A35
5A45
5A55
5A65
5A75
5A85
5A95
5AA5
5AB5
5AC5
5AD5
5AE5
5AF5
娆娖娦娶婆婖婦婶媆媖媦媶嫆嫖嫦嫶
6
5A06
5A16
5A26
5A36
5A46
5A56
5A66
5A76
5A86
5A96
5AA6
5AB6
5AC6
5AD6
5AE6
5AF6
娇娗娧娷婇婗婧婷媇媗媧媷嫇嫗嫧嫷
7
5A07
5A17
5A27
5A37
5A47
5A57
5A67
5A77
5A87
5A97
5AA7
5AB7
5AC7
5AD7
5AE7
5AF7
娈娘娨娸婈婘婨婸媈媘媨媸嫈嫘嫨嫸
8
5A08
5A18
5A28
5A38
5A48
5A58
5A68
5A78
5A88
5A98
5AA8
5AB8
5AC8
5AD8
5AE8
5AF8
娉娙娩娹婉婙婩婹媉媙媩媹嫉嫙嫩嫹
9
5A09
5A19
5A29
5A39
5A49
5A59
5A69
5A79
5A89
5A99
5AA9
5AB9
5AC9
5AD9
5AE9
5AF9
娊娚娪娺婊婚婪婺媊媚媪媺嫊嫚嫪嫺
A
5A0A
5A1A
5A2A
5A3A
5A4A
5A5A
5A6A
5A7A
5A8A
5A9A
5AAA
5ABA
5ACA
5ADA
5AEA
5AFA
娋娛娫娻婋婛婫婻媋媛媫媻嫋嫛嫫嫻
B
5A0B
C
D
5A1B
5A2B
5A3B
5A4B
5A5B
5A6B
5A7B
5A8B
5A9B
5AAB
5ABB
5ACB
5ADB
5AEB
5AFB
娌娜娬娼婌婜婬婼媌媜媬媼嫌嫜嫬嫼 5A0C
5A1C
5A2C
5A3C
5A4C
5A5C
5A6C
5A7C
5A8C
5A9C
5AAC
5ABC
5ACC
5ADC
5AEC
5AFC
娍娝娭娽婍婝婭婽媍媝媭媽嫍嫝嫭嫽 5A0D
5A1D
5A2D
5A3D
5A4D
5A5D
5A6D
5A7D
5A8D
5A9D
5AAD
5ABD
5ACD
5ADD
5AED
5AFD
娎娞娮娾婎婞婮婾媎媞媮媾嫎嫞嫮嫾 5A0E
F
5A20
威娑娡娱婁婑婡婱媁媑媡媱嫁嫑嫡嫱
1
E
5A10
5A1E
5A2E
5A3E
5A4E
5A5E
5A6E
5A7E
5A8E
5A9E
5AAE
5ABE
5ACE
5ADE
5AEE
5AFE
娏娟娯娿婏婟婯婿媏媟媯媿嫏嫟嫯嫿 5A0F
328
5A1F
5A2F
5A3F
5A4F
5A5F
5A6F
5A7F
5A8F
5A9F
5AAF
5ABF
5ACF
5ADF
5AEF
5AFF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
5B00
CJK Unified Ideographs
5BFF
5B0 5B1 5B2 5B3 5B4 5B5 5B6 5B7 5B8 5B9 5BA 5BB 5BC 5BD 5BE 5BF
嬀嬐嬠嬰孀子孠孰宀宐宠宰寀寐寠寰
0
5B00
5B01
5B30
5B40
5B50
5B60
5B70
5B80
5B90
5BA0
5BB0
5BC0
5BD0
5BE0
5BF0
5B11
5B21
5B31
5B41
5B51
5B61
5B71
5B81
5B91
5BA1
5BB1
5BC1
5BD1
5BE1
5BF1
嬂嬒嬢嬲孂孒孢孲宂宒客宲寂寒寢寲
2
5B02
5B12
5B22
5B32
5B42
5B52
5B62
5B72
5B82
5B92
5BA2
5BB2
5BC2
5BD2
5BE2
5BF2
嬃嬓嬣嬳孃孓季孳它宓宣害寃寓寣寳
3
5B03
5B13
5B23
5B33
5B43
5B53
5B63
5B73
5B83
5B93
5BA3
5BB3
5BC3
5BD3
5BE3
5BF3
嬄嬔嬤嬴孄孔孤孴宄宔室宴寄寔寤寴
4
5B04
5B14
5B24
5B34
5B44
5B54
5B64
5B74
5B84
5B94
5BA4
5BB4
5BC4
5BD4
5BE4
5BF4
嬅嬕嬥嬵孅孕孥孵宅宕宥宵寅寕寥寵
5
5B05
5B15
5B25
5B35
5B45
5B55
5B65
5B75
5B85
5B95
5BA5
5BB5
5BC5
5BD5
5BE5
5BF5
嬆嬖嬦嬶孆孖学孶宆宖宦家密寖實寶
6
5B06
5B16
5B26
5B36
5B46
5B56
5B66
5B76
5B86
5B96
5BA6
5BB6
5BC6
5BD6
5BE6
5BF6
嬇嬗嬧嬷孇字孧孷宇宗宧宷寇寗寧寷
7
5B07
5B17
5B27
5B37
5B47
5B57
5B67
5B77
5B87
5B97
5BA7
5BB7
5BC7
5BD7
5BE7
5BF7
嬈嬘嬨嬸孈存孨學守官宨宸寈寘寨寸
8
5B08
5B18
5B28
5B38
5B48
5B58
5B68
5B78
5B88
5B98
5BA8
5BB8
5BC8
5BD8
5BE8
5BF8
嬉嬙嬩嬹孉孙孩孹安宙宩容寉寙審对
9
5B09
5B19
5B29
5B39
5B49
5B59
5B69
5B79
5B89
5B99
5BA9
5BB9
5BC9
5BD9
5BE9
5BF9
嬊嬚嬪嬺孊孚孪孺宊定宪宺寊寚寪寺
A
5B0A
5B1A
5B2A
5B3A
5B4A
5B5A
5B6A
5B7A
5B8A
5B9A
5BAA
5BBA
5BCA
5BDA
5BEA
5BFA
嬋嬛嬫嬻孋孛孫孻宋宛宫宻寋寛寫寻
B
5B0B
C
D
5B1B
5B2B
5B3B
5B4B
5B5B
5B6B
5B7B
5B8B
5B9B
5BAB
5BBB
5BCB
5BDB
5BEB
5BFB
嬌嬜嬬嬼孌孜孬孼完宜宬宼富寜寬导 5B0C
5B1C
5B2C
5B3C
5B4C
5B5C
5B6C
5B7C
5B8C
5B9C
5BAC
5BBC
5BCC
5BDC
5BEC
5BFC
嬍嬝嬭嬽孍孝孭孽宍宝宭宽寍寝寭寽 5B0D
5B1D
5B2D
5B3D
5B4D
5B5D
5B6D
5B7D
5B8D
5B9D
5BAD
5BBD
5BCD
5BDD
5BED
5BFD
嬎嬞嬮嬾孎孞孮孾宎实宮宾寎寞寮対 5B0E
F
5B20
嬁嬑嬡嬱孁孑孡孱宁宑审宱寁寑寡寱
1
E
5B10
5B1E
5B2E
5B3E
5B4E
5B5E
5B6E
5B7E
5B8E
5B9E
5BAE
5BBE
5BCE
5BDE
5BEE
5BFE
嬏嬟嬯嬿孏孟孯孿宏実宯宿寏察寯寿 5B0F
5B1F
5B2F
5B3F
5B4F
5B5F
5B6F
5B7F
5B8F
5B9F
5BAF
5BBF
5BCF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
5BDF
5BEF
5BFF
329
5C00
CJK Unified Ideographs
5CFF
5C0 5C1 5C2 5C3 5C4 5C5 5C6 5C7 5C8 5C9 5CA 5CB 5CC 5CD 5CE 5CF
尀尐尠尰局屐屠屰岀岐岠岰峀峐峠峰
0
5C00
5C01
5C30
5C40
5C50
5C60
5C70
5C80
5C90
5CA0
5CB0
5CC0
5CD0
5CE0
5CF0
5C11
5C21
5C31
5C41
5C51
5C61
5C71
5C81
5C91
5CA1
5CB1
5CC1
5CD1
5CE1
5CF1
専尒尢尲层屒屢屲岂岒岢岲峂峒峢峲
2
5C02
5C12
5C22
5C32
5C42
5C52
5C62
5C72
5C82
5C92
5CA2
5CB2
5CC2
5CD2
5CE2
5CF2
尃尓尣尳屃屓屣屳岃岓岣岳峃峓峣峳
3
5C03
5C13
5C23
5C33
5C43
5C53
5C63
5C73
5C83
5C93
5CA3
5CB3
5CC3
5CD3
5CE3
5CF3
射尔尤尴屄屔層屴岄岔岤岴峄峔峤峴
4
5C04
5C14
5C24
5C34
5C44
5C54
5C64
5C74
5C84
5C94
5CA4
5CB4
5CC4
5CD4
5CE4
5CF4
尅尕尥尵居展履屵岅岕岥岵峅峕峥峵
5
5C05
5C15
5C25
5C35
5C45
5C55
5C65
5C75
5C85
5C95
5CA5
5CB5
5CC5
5CD5
5CE5
5CF5
将尖尦尶屆屖屦屶岆岖岦岶峆峖峦島
6
5C06
5C16
5C26
5C36
5C46
5C56
5C66
5C76
5C86
5C96
5CA6
5CB6
5CC6
5CD6
5CE6
5CF6
將尗尧尷屇屗屧屷岇岗岧岷峇峗峧峷
7
5C07
5C17
5C27
5C37
5C47
5C57
5C67
5C77
5C87
5C97
5CA7
5CB7
5CC7
5CD7
5CE7
5CF7
專尘尨尸屈屘屨屸岈岘岨岸峈峘峨峸
8
5C08
5C18
5C28
5C38
5C48
5C58
5C68
5C78
5C88
5C98
5CA8
5CB8
5CC8
5CD8
5CE8
5CF8
尉尙尩尹屉屙屩屹岉岙岩岹峉峙峩峹
9
5C09
5C19
5C29
5C39
5C49
5C59
5C69
5C79
5C89
5C99
5CA9
5CB9
5CC9
5CD9
5CE9
5CF9
尊尚尪尺届屚屪屺岊岚岪岺峊峚峪峺
A
5C0A
5C1A
5C2A
5C3A
5C4A
5C5A
5C6A
5C7A
5C8A
5C9A
5CAA
5CBA
5CCA
5CDA
5CEA
5CFA
尋尛尫尻屋屛屫屻岋岛岫岻峋峛峫峻
B
5C0B
C
D
5C1B
5C2B
5C3B
5C4B
5C5B
5C6B
5C7B
5C8B
5C9B
5CAB
5CBB
5CCB
5CDB
5CEB
5CFB
尌尜尬尼屌屜屬屼岌岜岬岼峌峜峬峼 5C0C
5C1C
5C2C
5C3C
5C4C
5C5C
5C6C
5C7C
5C8C
5C9C
5CAC
5CBC
5CCC
5CDC
5CEC
5CFC
對尝尭尽屍屝屭屽岍岝岭岽峍峝峭峽 5C0D
5C1D
5C2D
5C3D
5C4D
5C5D
5C6D
5C7D
5C8D
5C9D
5CAD
5CBD
5CCD
5CDD
5CED
5CFD
導尞尮尾屎属屮屾岎岞岮岾峎峞峮峾 5C0E
F
5C20
封少尡就屁屑屡山岁岑岡岱峁峑峡峱
1
E
5C10
5C1E
5C2E
5C3E
5C4E
5C5E
5C6E
5C7E
5C8E
5C9E
5CAE
5CBE
5CCE
5CDE
5CEE
5CFE
小尟尯尿屏屟屯屿岏岟岯岿峏峟峯峿 5C0F
330
5C1F
5C2F
5C3F
5C4F
5C5F
5C6F
5C7F
5C8F
5C9F
5CAF
5CBF
5CCF
5CDF
5CEF
5CFF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
5D00
CJK Unified Ideographs
5DFF
5D0 5D1 5D2 5D3 5D4 5D5 5D6 5D7 5D8 5D9 5DA 5DB 5DC 5DD 5DE 5DF
崀崐崠崰嵀嵐嵠嵰嶀嶐嶠嶰巀巐巠巰
0
5D00
5D01
5D30
5D40
5D50
5D60
5D70
5D80
5D90
5DA0
5DB0
5DC0
5DD0
5DE0
5DF0
5D11
5D21
5D31
5D41
5D51
5D61
5D71
5D81
5D91
5DA1
5DB1
5DC1
5DD1
5DE1
5DF1
崂崒崢崲嵂嵒嵢嵲嶂嶒嶢嶲巂巒巢已
2
5D02
5D12
5D22
5D32
5D42
5D52
5D62
5D72
5D82
5D92
5DA2
5DB2
5DC2
5DD2
5DE2
5DF2
崃崓崣崳嵃嵓嵣嵳嶃嶓嶣嶳巃巓巣巳
3
5D03
5D13
5D23
5D33
5D43
5D53
5D63
5D73
5D83
5D93
5DA3
5DB3
5DC3
5DD3
5DE3
5DF3
崄崔崤崴嵄嵔嵤嵴嶄嶔嶤嶴巄巔巤巴
4
5D04
5D14
5D24
5D34
5D44
5D54
5D64
5D74
5D84
5D94
5DA4
5DB4
5DC4
5DD4
5DE4
5DF4
崅崕崥崵嵅嵕嵥嵵嶅嶕嶥嶵巅巕工巵
5
5D05
5D15
5D25
5D35
5D45
5D55
5D65
5D75
5D85
5D95
5DA5
5DB5
5DC5
5DD5
5DE5
5DF5
崆崖崦崶嵆嵖嵦嵶嶆嶖嶦嶶巆巖左巶
6
5D06
5D16
5D26
5D36
5D46
5D56
5D66
5D76
5D86
5D96
5DA6
5DB6
5DC6
5DD6
5DE6
5DF6
崇崗崧崷嵇嵗嵧嵷嶇嶗嶧嶷巇巗巧巷
7
5D07
5D17
5D27
5D37
5D47
5D57
5D67
5D77
5D87
5D97
5DA7
5DB7
5DC7
5DD7
5DE7
5DF7
崈崘崨崸嵈嵘嵨嵸嶈嶘嶨嶸巈巘巨巸
8
5D08
5D18
5D28
5D38
5D48
5D58
5D68
5D78
5D88
5D98
5DA8
5DB8
5DC8
5DD8
5DE8
5DF8
崉崙崩崹嵉嵙嵩嵹嶉嶙嶩嶹巉巙巩巹
9
5D09
5D19
5D29
5D39
5D49
5D59
5D69
5D79
5D89
5D99
5DA9
5DB9
5DC9
5DD9
5DE9
5DF9
崊崚崪崺嵊嵚嵪嵺嶊嶚嶪嶺巊巚巪巺
A
5D0A
5D1A
5D2A
5D3A
5D4A
5D5A
5D6A
5D7A
5D8A
5D9A
5DAA
5DBA
5DCA
5DDA
5DEA
5DFA
崋崛崫崻嵋嵛嵫嵻嶋嶛嶫嶻巋巛巫巻
B
5D0B
C
D
5D1B
5D2B
5D3B
5D4B
5D5B
5D6B
5D7B
5D8B
5D9B
5DAB
5DBB
5DCB
5DDB
5DEB
5DFB
崌崜崬崼嵌嵜嵬嵼嶌嶜嶬嶼巌巜巬巼 5D0C
5D1C
5D2C
5D3C
5D4C
5D5C
5D6C
5D7C
5D8C
5D9C
5DAC
5DBC
5DCC
5DDC
5DEC
5DFC
崍崝崭崽嵍嵝嵭嵽嶍嶝嶭嶽巍川巭巽 5D0D
5D1D
5D2D
5D3D
5D4D
5D5D
5D6D
5D7D
5D8D
5D9D
5DAD
5DBD
5DCD
5DDD
5DED
5DFD
崎崞崮崾嵎嵞嵮嵾嶎嶞嶮嶾巎州差巾 5D0E
F
5D20
崁崑崡崱嵁嵑嵡嵱嶁嶑嶡嶱巁巑巡己
1
E
5D10
5D1E
5D2E
5D3E
5D4E
5D5E
5D6E
5D7E
5D8E
5D9E
5DAE
5DBE
5DCE
5DDE
5DEE
5DFE
崏崟崯崿嵏嵟嵯嵿嶏嶟嶯嶿巏巟巯巿 5D0F
5D1F
5D2F
5D3F
5D4F
5D5F
5D6F
5D7F
5D8F
5D9F
5DAF
5DBF
5DCF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
5DDF
5DEF
5DFF
331
5E00
CJK Unified Ideographs
5EFF
5E0 5E1 5E2 5E3 5E4 5E5 5E6 5E7 5E8 5E9 5EA 5EB 5EC 5ED 5EE 5EF
帀帐帠帰幀幐幠幰庀庐庠庰廀廐廠廰
0
5E00
5E01
5E30
5E40
5E50
5E60
5E70
5E80
5E90
5EA0
5EB0
5EC0
5ED0
5EE0
5EF0
5E11
5E21
5E31
5E41
5E51
5E61
5E71
5E81
5E91
5EA1
5EB1
5EC1
5ED1
5EE1
5EF1
市帒帢帲幂幒幢干庂庒庢庲廂廒廢廲
2
5E02
5E12
5E22
5E32
5E42
5E52
5E62
5E72
5E82
5E92
5EA2
5EB2
5EC2
5ED2
5EE2
5EF2
布帓帣帳幃幓幣平広库庣庳廃廓廣廳
3
5E03
5E13
5E23
5E33
5E43
5E53
5E63
5E73
5E83
5E93
5EA3
5EB3
5EC3
5ED3
5EE3
5EF3
帄帔帤帴幄幔幤年庄应庤庴廄廔廤廴
4
5E04
5E14
5E24
5E34
5E44
5E54
5E64
5E74
5E84
5E94
5EA4
5EB4
5EC4
5ED4
5EE4
5EF4
帅帕帥帵幅幕幥幵庅底庥庵廅廕廥廵
5
5E05
5E15
5E25
5E35
5E45
5E55
5E65
5E75
5E85
5E95
5EA5
5EB5
5EC5
5ED5
5EE5
5EF5
帆帖带帶幆幖幦并庆庖度庶廆廖廦延
6
5E06
5E16
5E26
5E36
5E46
5E56
5E66
5E76
5E86
5E96
5EA6
5EB6
5EC6
5ED6
5EE6
5EF6
帇帗帧帷幇幗幧幷庇店座康廇廗廧廷
7
5E07
5E17
5E27
5E37
5E47
5E57
5E67
5E77
5E87
5E97
5EA7
5EB7
5EC7
5ED7
5EE7
5EF7
师帘帨常幈幘幨幸庈庘庨庸廈廘廨廸
8
5E08
5E18
5E28
5E38
5E48
5E58
5E68
5E78
5E88
5E98
5EA8
5EB8
5EC8
5ED8
5EE8
5EF8
帉帙帩帹幉幙幩幹庉庙庩庹廉廙廩廹
9
5E09
5E19
5E29
5E39
5E49
5E59
5E69
5E79
5E89
5E99
5EA9
5EB9
5EC9
5ED9
5EE9
5EF9
帊帚帪帺幊幚幪幺床庚庪庺廊廚廪建
A
5E0A
5E1A
5E2A
5E3A
5E4A
5E5A
5E6A
5E7A
5E8A
5E9A
5EAA
5EBA
5ECA
5EDA
5EEA
5EFA
帋帛師帻幋幛幫幻庋庛庫庻廋廛廫廻
B
5E0B
C
D
5E1B
5E2B
5E3B
5E4B
5E5B
5E6B
5E7B
5E8B
5E9B
5EAB
5EBB
5ECB
5EDB
5EEB
5EFB
希帜帬帼幌幜幬幼庌府庬庼廌廜廬廼 5E0C
5E1C
5E2C
5E3C
5E4C
5E5C
5E6C
5E7C
5E8C
5E9C
5EAC
5EBC
5ECC
5EDC
5EEC
5EFC
帍帝席帽幍幝幭幽庍庝庭庽廍廝廭廽 5E0D
5E1D
5E2D
5E3D
5E4D
5E5D
5E6D
5E7D
5E8D
5E9D
5EAD
5EBD
5ECD
5EDD
5EED
5EFD
帎帞帮帾幎幞幮幾庎庞庮庾廎廞廮廾 5E0E
F
5E20
币帑帡帱幁幑幡幱庁庑庡庱廁廑廡廱
1
E
5E10
5E1E
5E2E
5E3E
5E4E
5E5E
5E6E
5E7E
5E8E
5E9E
5EAE
5EBE
5ECE
5EDE
5EEE
5EFE
帏帟帯帿幏幟幯广序废庯庿廏廟廯廿 5E0F
332
5E1F
5E2F
5E3F
5E4F
5E5F
5E6F
5E7F
5E8F
5E9F
5EAF
5EBF
5ECF
5EDF
5EEF
5EFF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
5F00 5F0
5F00
5F2
5F3
5F4
5F5
5F6
5F7
5F8
5F9 5FA 5FB 5FC 5FD 5FE 5FF
5F10
5F20
5F30
5F40
5F50
5F60
5F70
5F80
5F90
5FA0
5FB0
5FC0
5FD0
5FE0
5FF0
弁弑弡弱彁彑彡影征徑御徱忁忑忡忱
1
5F01
5F11
5F21
5F31
5F41
5F51
5F61
5F71
5F81
5F91
5FA1
5FB1
5FC1
5FD1
5FE1
5FF1
异弒弢弲彂归形彲徂徒徢徲忂忒忢忲
2
5F02
5F12
5F22
5F32
5F42
5F52
5F62
5F72
5F82
5F92
5FA2
5FB2
5FC2
5FD2
5FE2
5FF2
弃弓弣弳彃当彣彳徃従徣徳心忓忣忳
3
5F03
5F13
5F23
5F33
5F43
5F53
5F63
5F73
5F83
5F93
5FA3
5FB3
5FC3
5FD3
5FE3
5FF3
弄弔弤弴彄彔彤彴径徔徤徴忄忔忤忴
4
5F04
5F14
5F24
5F34
5F44
5F54
5F64
5F74
5F84
5F94
5FA4
5FB4
5FC4
5FD4
5FE4
5FF4
弅引弥張彅录彥彵待徕徥徵必忕忥念
5
5F05
5F15
5F25
5F35
5F45
5F55
5F65
5F75
5F85
5F95
5FA5
5FB5
5FC5
5FD5
5FE5
5FF5
弆弖弦弶彆彖彦彶徆徖徦徶忆忖忦忶
6
5F06
5F16
5F26
5F36
5F46
5F56
5F66
5F76
5F86
5F96
5FA6
5FB6
5FC6
5FD6
5FE6
5FF6
弇弗弧強彇彗彧彷徇得徧德忇志忧忷
7
5F07
5F17
5F27
5F37
5F47
5F57
5F67
5F77
5F87
5F97
5FA7
5FB7
5FC7
5FD7
5FE7
5FF7
弈弘弨弸彈彘彨彸很徘徨徸忈忘忨忸
8
5F08
5F18
5F28
5F38
5F48
5F58
5F68
5F78
5F88
5F98
5FA8
5FB8
5FC8
5FD8
5FE8
5FF8
弉弙弩弹彉彙彩役徉徙復徹忉忙忩忹
9
5F09
5F19
5F29
5F39
5F49
5F59
5F69
5F79
5F89
5F99
5FA9
5FB9
5FC9
5FD9
5FE9
5FF9
弊弚弪强彊彚彪彺徊徚循徺忊忚忪忺
A
5F0A
5F1A
5F2A
5F3A
5F4A
5F5A
5F6A
5F7A
5F8A
5F9A
5FAA
5FBA
5FCA
5FDA
5FEA
5FFA
弋弛弫弻彋彛彫彻律徛徫徻忋忛快忻
B
5F0B
C
D
5F1B
5F2B
5F3B
5F4B
5F5B
5F6B
5F7B
5F8B
5F9B
5FAB
5FBB
5FCB
5FDB
5FEB
5FFB
弌弜弬弼彌彜彬彼後徜徬徼忌応忬忼 5F0C
5F1C
5F2C
5F3C
5F4C
5F5C
5F6C
5F7C
5F8C
5F9C
5FAC
5FBC
5FCC
5FDC
5FEC
5FFC
弍弝弭弽彍彝彭彽徍徝徭徽忍忝忭忽 5F0D
5F1D
5F2D
5F3D
5F4D
5F5D
5F6D
5F7D
5F8D
5F9D
5FAD
5FBD
5FCD
5FDD
5FED
5FFD
弎弞弮弾彎彞彮彾徎從微徾忎忞忮忾 5F0E
F
5F1
5FFF
开弐张弰彀彐彠彰往徐徠徰忀忐忠忰
0
E
CJK Unified Ideographs
5F1E
5F2E
5F3E
5F4E
5F5E
5F6E
5F7E
5F8E
5F9E
5FAE
5FBE
5FCE
5FDE
5FEE
5FFE
式弟弯弿彏彟彯彿徏徟徯徿忏忟忯忿 5F0F
5F1F
5F2F
5F3F
5F4F
5F5F
5F6F
5F7F
5F8F
5F9F
5FAF
5FBF
5FCF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
5FDF
5FEF
5FFF
333
6000 600
6000
602
603
604
605
606
607
608
609
60A 60B 60C 60D 60E
60F
6010
6020
6030
6040
6050
6060
6070
6080
6090
60A0
60B0
60C0
60D0
60E0
60F0
态怑怡怱恁恑恡恱悁悑悡悱惁惑惡惱
1
6001
6011
6021
6031
6041
6051
6061
6071
6081
6091
60A1
60B1
60C1
60D1
60E1
60F1
怂怒怢怲恂恒恢恲悂悒悢悲惂惒惢惲
2
6002
6012
6022
6032
6042
6052
6062
6072
6082
6092
60A2
60B2
60C2
60D2
60E2
60F2
怃怓怣怳恃恓恣恳悃悓患悳惃惓惣想
3
6003
6013
6023
6033
6043
6053
6063
6073
6083
6093
60A3
60B3
60C3
60D3
60E3
60F3
怄怔怤怴恄恔恤恴悄悔悤悴惄惔惤惴
4
6004
6014
6024
6034
6044
6054
6064
6074
6084
6094
60A4
60B4
60C4
60D4
60E4
60F4
怅怕急怵恅恕恥恵悅悕悥悵情惕惥惵
5
6005
6015
6025
6035
6045
6055
6065
6075
6085
6095
60A5
60B5
60C5
60D5
60E5
60F5
怆怖怦怶恆恖恦恶悆悖悦悶惆惖惦惶
6
6006
6016
6026
6036
6046
6056
6066
6076
6086
6096
60A6
60B6
60C6
60D6
60E6
60F6
怇怗性怷恇恗恧恷悇悗悧悷惇惗惧惷
7
6007
6017
6027
6037
6047
6057
6067
6077
6087
6097
60A7
60B7
60C7
60D7
60E7
60F7
怈怘怨怸恈恘恨恸悈悘您悸惈惘惨惸
8
6008
6018
6028
6038
6048
6058
6068
6078
6088
6098
60A8
60B8
60C8
60D8
60E8
60F8
怉怙怩怹恉恙恩恹悉悙悩悹惉惙惩惹
9
6009
6019
6029
6039
6049
6059
6069
6079
6089
6099
60A9
60B9
60C9
60D9
60E9
60F9
怊怚怪怺恊恚恪恺悊悚悪悺惊惚惪惺
A
600A
601A
602A
603A
604A
605A
606A
607A
608A
609A
60AA
60BA
60CA
60DA
60EA
60FA
怋怛怫总恋恛恫恻悋悛悫悻惋惛惫惻
B
600B
C
D
601B
602B
603B
604B
605B
606B
607B
608B
609B
60AB
60BB
60CB
60DB
60EB
60FB
怌怜怬怼恌恜恬恼悌悜悬悼惌惜惬惼 600C
601C
602C
603C
604C
605C
606C
607C
608C
609C
60AC
60BC
60CC
60DC
60EC
60FC
怍思怭怽恍恝恭恽悍悝悭悽惍惝惭惽 600D
601D
602D
603D
604D
605D
606D
607D
608D
609D
60AD
60BD
60CD
60DD
60ED
60FD
怎怞怮怾恎恞恮恾悎悞悮悾惎惞惮惾 600E
F
601
60FF
怀怐怠怰恀恐恠恰悀悐悠悰惀惐惠惰
0
E
CJK Unified Ideographs
601E
602E
603E
604E
605E
606E
607E
608E
609E
60AE
60BE
60CE
60DE
60EE
60FE
怏怟怯怿恏恟息恿悏悟悯悿惏惟惯惿 600F
334
601F
602F
603F
604F
605F
606F
607F
608F
609F
60AF
60BF
60CF
60DF
60EF
60FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
6100 610
6100
612
613
614
615
616
617
618
619
61A 61B 61C 61D 61E
61F
6110
6120
6130
6140
6150
6160
6170
6180
6190
61A0
61B0
61C0
61D0
61E0
61F0
愁愑愡愱慁慑慡慱憁憑憡憱懁懑懡懱
1
6101
6111
6121
6131
6141
6151
6161
6171
6181
6191
61A1
61B1
61C1
61D1
61E1
61F1
愂愒愢愲慂慒慢慲憂憒憢憲懂懒懢懲
2
6102
6112
6122
6132
6142
6152
6162
6172
6182
6192
61A2
61B2
61C2
61D2
61E2
61F2
愃愓愣愳慃慓慣慳憃憓憣憳懃懓懣懳
3
6103
6113
6123
6133
6143
6153
6163
6173
6183
6193
61A3
61B3
61C3
61D3
61E3
61F3
愄愔愤愴慄慔慤慴憄憔憤憴懄懔懤懴
4
6104
6114
6124
6134
6144
6154
6164
6174
6184
6194
61A4
61B4
61C4
61D4
61E4
61F4
愅愕愥愵慅慕慥慵憅憕憥憵懅懕懥懵
5
6105
6115
6125
6135
6145
6155
6165
6175
6185
6195
61A5
61B5
61C5
61D5
61E5
61F5
愆愖愦愶慆慖慦慶憆憖憦憶懆懖懦懶
6
6106
6116
6126
6136
6146
6156
6166
6176
6186
6196
61A6
61B6
61C6
61D6
61E6
61F6
愇愗愧愷慇慗慧慷憇憗憧憷懇懗懧懷
7
6107
6117
6127
6137
6147
6157
6167
6177
6187
6197
61A7
61B7
61C7
61D7
61E7
61F7
愈愘愨愸慈慘慨慸憈憘憨憸懈懘懨懸
8
6108
6118
6128
6138
6148
6158
6168
6178
6188
6198
61A8
61B8
61C8
61D8
61E8
61F8
愉愙愩愹慉慙慩慹憉憙憩憹應懙懩懹
9
6109
6119
6129
6139
6149
6159
6169
6179
6189
6199
61A9
61B9
61C9
61D9
61E9
61F9
愊愚愪愺慊慚慪慺憊憚憪憺懊懚懪懺
A
610A
611A
612A
613A
614A
615A
616A
617A
618A
619A
61AA
61BA
61CA
61DA
61EA
61FA
愋愛愫愻態慛慫慻憋憛憫憻懋懛懫懻
B
610B
C
D
611B
612B
613B
614B
615B
616B
617B
618B
619B
61AB
61BB
61CB
61DB
61EB
61FB
愌愜愬愼慌慜慬慼憌憜憬憼懌懜懬懼 610C
611C
612C
613C
614C
615C
616C
617C
618C
619C
61AC
61BC
61CC
61DC
61EC
61FC
愍愝愭愽慍慝慭慽憍憝憭憽懍懝懭懽 610D
611D
612D
613D
614D
615D
616D
617D
618D
619D
61AD
61BD
61CD
61DD
61ED
61FD
愎愞愮愾慎慞慮慾憎憞憮憾懎懞懮懾 610E
F
611
61FF
愀愐愠愰慀慐慠慰憀憐憠憰懀懐懠懰
0
E
CJK Unified Ideographs
611E
612E
613E
614E
615E
616E
617E
618E
619E
61AE
61BE
61CE
61DE
61EE
61FE
意感愯愿慏慟慯慿憏憟憯憿懏懟懯懿 610F
611F
612F
613F
614F
615F
616F
617F
618F
619F
61AF
61BF
61CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
61DF
61EF
61FF
335
6200 620
6200
622
623
624
625
626
627
628
629
62A 62B 62C 62D 62E
62F
6210
6220
6230
6240
6250
6260
6270
6280
6290
62A0
62B0
62C0
62D0
62E0
62F0
戁我戡戱扁扑扡扱抁抑抡抱拁拑拡拱
1
6201
6211
6221
6231
6241
6251
6261
6271
6281
6291
62A1
62B1
62C1
62D1
62E1
62F1
戂戒戢戲扂扒扢扲抂抒抢抲拂拒拢拲
2
6202
6212
6222
6232
6242
6252
6262
6272
6282
6292
62A2
62B2
62C2
62D2
62E2
62F2
戃戓戣戳扃打扣扳抃抓抣抳拃拓拣拳
3
6203
6213
6223
6233
6243
6253
6263
6273
6283
6293
62A3
62B3
62C3
62D3
62E3
62F3
戄戔戤戴扄扔扤扴抄抔护抴拄拔拤拴
4
6204
6214
6224
6234
6244
6254
6264
6274
6284
6294
62A4
62B4
62C4
62D4
62E4
62F4
戅戕戥戵扅払扥扵抅投报抵担拕拥拵
5
6205
6215
6225
6235
6245
6255
6265
6275
6285
6295
62A5
62B5
62C5
62D5
62E5
62F5
戆或戦戶扆扖扦扶抆抖抦抶拆拖拦拶
6
6206
6216
6226
6236
6246
6256
6266
6276
6286
6296
62A6
62B6
62C6
62D6
62E6
62F6
戇戗戧户扇扗执扷抇抗抧抷拇拗拧拷
7
6207
6217
6227
6237
6247
6257
6267
6277
6287
6297
62A7
62B7
62C7
62D7
62E7
62F7
戈战戨戸扈托扨扸抈折抨抸拈拘拨拸
8
6208
6218
6228
6238
6248
6258
6268
6278
6288
6298
62A8
62B8
62C8
62D8
62E8
62F8
戉戙戩戹扉扙扩批抉抙抩抹拉拙择拹
9
6209
6219
6229
6239
6249
6259
6269
6279
6289
6299
62A9
62B9
62C9
62D9
62E9
62F9
戊戚截戺扊扚扪扺把抚抪抺拊拚拪拺
A
620A
621A
622A
623A
624A
625A
626A
627A
628A
629A
62AA
62BA
62CA
62DA
62EA
62FA
戋戛戫戻手扛扫扻抋抛披抻拋招拫拻
B
620B
C
D
621B
622B
623B
624B
625B
626B
627B
628B
629B
62AB
62BB
62CB
62DB
62EB
62FB
戌戜戬戼扌扜扬扼抌抜抬押拌拜括拼 620C
621C
622C
623C
624C
625C
626C
627C
628C
629C
62AC
62BC
62CC
62DC
62EC
62FC
戍戝戭戽才扝扭扽抍抝抭抽拍拝拭拽 620D
621D
622D
623D
624D
625D
626D
627D
628D
629D
62AD
62BD
62CD
62DD
62ED
62FD
戎戞戮戾扎扞扮找抎択抮抾拎拞拮拾 620E
F
621
62FF
戀成戠戰所扐扠扰技抐抠抰拀拐拠拰
0
E
CJK Unified Ideographs
621E
622E
623E
624E
625E
626E
627E
628E
629E
62AE
62BE
62CE
62DE
62EE
62FE
戏戟戯房扏扟扯承抏抟抯抿拏拟拯拿 620F
336
621F
622F
623F
624F
625F
626F
627F
628F
629F
62AF
62BF
62CF
62DF
62EF
62FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
6300 630
6300
632
633
634
635
636
637
638
639
63A 63B 63C 63D 63E
63F
6310
6320
6330
6340
6350
6360
6370
6380
6390
63A0
63B0
63C0
63D0
63E0
63F0
持挑挡挱捁捑捡捱掁掑採掱揁揑握揱
1
6301
6311
6321
6331
6341
6351
6361
6371
6381
6391
63A1
63B1
63C1
63D1
63E1
63F1
挂挒挢挲捂捒换捲掂排探掲揂插揢揲
2
6302
6312
6322
6332
6342
6352
6362
6372
6382
6392
63A2
63B2
63C2
63D2
63E2
63F2
挃挓挣挳捃捓捣捳掃掓掣掳揃揓揣揳
3
6303
6313
6323
6333
6343
6353
6363
6373
6383
6393
63A3
63B3
63C3
63D3
63E3
63F3
挄挔挤挴捄捔捤捴掄掔掤掴揄揔揤援
4
6304
6314
6324
6334
6344
6354
6364
6374
6384
6394
63A4
63B4
63C4
63D4
63E4
63F4
挅挕挥挵捅捕捥捵掅掕接掵揅揕揥揵
5
6305
6315
6325
6335
6345
6355
6365
6375
6385
6395
63A5
63B5
63C5
63D5
63E5
63F5
挆挖挦挶捆捖捦捶掆掖掦掶揆揖揦揶
6
6306
6316
6326
6336
6346
6356
6366
6376
6386
6396
63A6
63B6
63C6
63D6
63E6
63F6
指挗挧挷捇捗捧捷掇掗控掷揇揗揧揷
7
6307
6317
6327
6337
6347
6357
6367
6377
6387
6397
63A7
63B7
63C7
63D7
63E7
63F7
挈挘挨挸捈捘捨捸授掘推掸揈揘揨揸
8
6308
6318
6328
6338
6348
6358
6368
6378
6388
6398
63A8
63B8
63C8
63D8
63E8
63F8
按挙挩挹捉捙捩捹掉掙掩掹揉揙揩揹
9
6309
6319
6329
6339
6349
6359
6369
6379
6389
6399
63A9
63B9
63C9
63D9
63E9
63F9
挊挚挪挺捊捚捪捺掊掚措掺揊揚揪揺
A
630A
631A
632A
633A
634A
635A
636A
637A
638A
639A
63AA
63BA
63CA
63DA
63EA
63FA
挋挛挫挻捋捛捫捻掋掛掫掻揋換揫揻
B
630B
C
D
631B
632B
633B
634B
635B
636B
637B
638B
639B
63AB
63BB
63CB
63DB
63EB
63FB
挌挜挬挼捌捜捬捼掌掜掬掼揌揜揬揼 630C
631C
632C
633C
634C
635C
636C
637C
638C
639C
63AC
63BC
63CC
63DC
63EC
63FC
挍挝挭挽捍捝捭捽掍掝掭掽揍揝揭揽 630D
631D
632D
633D
634D
635D
636D
637D
638D
639D
63AD
63BD
63CD
63DD
63ED
63FD
挎挞挮挾捎捞据捾掎掞掮掾揎揞揮揾 630E
F
631
63FF
挀挐挠挰捀捐捠捰掀掐掠掰揀提揠揰
0
E
CJK Unified Ideographs
631E
632E
633E
634E
635E
636E
637E
638E
639E
63AE
63BE
63CE
63DE
63EE
63FE
挏挟振挿捏损捯捿掏掟掯掿描揟揯揿 630F
631F
632F
633F
634F
635F
636F
637F
638F
639F
63AF
63BF
63CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
63DF
63EF
63FF
337
6400 640
6400
642
643
644
645
646
647
648
649
64A 64B 64C 64D 64E
64F
6410
6420
6430
6440
6450
6460
6470
6480
6490
64A0
64B0
64C0
64D0
64E0
64F0
搁搑搡搱摁摑摡摱撁撑撡撱擁擑擡擱
1
6401
6411
6421
6431
6441
6451
6461
6471
6481
6491
64A1
64B1
64C1
64D1
64E1
64F1
搂搒搢搲摂摒摢摲撂撒撢撲擂擒擢擲
2
6402
6412
6422
6432
6442
6452
6462
6472
6482
6492
64A2
64B2
64C2
64D2
64E2
64F2
搃搓搣搳摃摓摣摳撃撓撣撳擃擓擣擳
3
6403
6413
6423
6433
6443
6453
6463
6473
6483
6493
64A3
64B3
64C3
64D3
64E3
64F3
搄搔搤搴摄摔摤摴撄撔撤撴擄擔擤擴
4
6404
6414
6424
6434
6444
6454
6464
6474
6484
6494
64A4
64B4
64C4
64D4
64E4
64F4
搅搕搥搵摅摕摥摵撅撕撥撵擅擕擥擵
5
6405
6415
6425
6435
6445
6455
6465
6475
6485
6495
64A5
64B5
64C5
64D5
64E5
64F5
搆搖搦搶摆摖摦摶撆撖撦撶擆擖擦擶
6
6406
6416
6426
6436
6446
6456
6466
6476
6486
6496
64A6
64B6
64C6
64D6
64E6
64F6
搇搗搧搷摇摗摧摷撇撗撧撷擇擗擧擷
7
6407
6417
6427
6437
6447
6457
6467
6477
6487
6497
64A7
64B7
64C7
64D7
64E7
64F7
搈搘搨搸摈摘摨摸撈撘撨撸擈擘擨擸
8
6408
6418
6428
6438
6448
6458
6468
6478
6488
6498
64A8
64B8
64C8
64D8
64E8
64F8
搉搙搩搹摉摙摩摹撉撙撩撹擉擙擩擹
9
6409
6419
6429
6439
6449
6459
6469
6479
6489
6499
64A9
64B9
64C9
64D9
64E9
64F9
搊搚搪携摊摚摪摺撊撚撪撺擊據擪擺
A
640A
641A
642A
643A
644A
645A
646A
647A
648A
649A
64AA
64BA
64CA
64DA
64EA
64FA
搋搛搫搻摋摛摫摻撋撛撫撻擋擛擫擻
B
640B
C
D
641B
642B
643B
644B
645B
646B
647B
648B
649B
64AB
64BB
64CB
64DB
64EB
64FB
搌搜搬搼摌摜摬摼撌撜撬撼擌擜擬擼 640C
641C
642C
643C
644C
645C
646C
647C
648C
649C
64AC
64BC
64CC
64DC
64EC
64FC
損搝搭搽摍摝摭摽撍撝播撽操擝擭擽 640D
641D
642D
643D
644D
645D
646D
647D
648D
649D
64AD
64BD
64CD
64DD
64ED
64FD
搎搞搮搾摎摞摮摾撎撞撮撾擎擞擮擾 640E
F
641
64FF
搀搐搠搰摀摐摠摰撀撐撠撰擀擐擠擰
0
E
CJK Unified Ideographs
641E
642E
643E
644E
645E
646E
647E
648E
649E
64AE
64BE
64CE
64DE
64EE
64FE
搏搟搯搿摏摟摯摿撏撟撯撿擏擟擯擿 640F
338
641F
642F
643F
644F
645F
646F
647F
648F
649F
64AF
64BF
64CF
64DF
64EF
64FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
6500 650
6500
652
653
654
655
656
657
658
659
65A 65B 65C 65D 65E
65F
6510
6520
6530
6540
6550
6560
6570
6580
6590
65A0
65B0
65C0
65D0
65E0
65F0
攁攑攡攱敁救敡敱斁斑斡斱旁旑旡旱
1
6501
6511
6521
6531
6541
6551
6561
6571
6581
6591
65A1
65B1
65C1
65D1
65E1
65F1
攂攒攢攲敂敒敢敲斂斒斢斲旂旒既旲
2
6502
6512
6522
6532
6542
6552
6562
6572
6582
6592
65A2
65B2
65C2
65D2
65E2
65F2
攃攓攣攳敃敓散敳斃斓斣斳旃旓旣旳
3
6503
6513
6523
6533
6543
6553
6563
6573
6583
6593
65A3
65B3
65C3
65D3
65E3
65F3
攄攔攤攴敄敔敤整斄斔斤斴旄旔旤旴
4
6504
6514
6524
6534
6544
6554
6564
6574
6584
6594
65A4
65B4
65C4
65D4
65E4
65F4
攅攕攥攵故敕敥敵斅斕斥斵旅旕日旵
5
6505
6515
6525
6535
6545
6555
6565
6575
6585
6595
65A5
65B5
65C5
65D5
65E5
65F5
攆攖攦收敆敖敦敶斆斖斦斶旆旖旦时
6
6506
6516
6526
6536
6546
6556
6566
6576
6586
6596
65A6
65B6
65C6
65D6
65E6
65F6
攇攗攧攷敇敗敧敷文斗斧斷旇旗旧旷
7
6507
6517
6527
6537
6547
6557
6567
6577
6587
6597
65A7
65B7
65C7
65D7
65E7
65F7
攈攘攨攸效敘敨數斈斘斨斸旈旘旨旸
8
6508
6518
6528
6538
6548
6558
6568
6578
6588
6598
65A8
65B8
65C8
65D8
65E8
65F8
攉攙攩改敉教敩敹斉料斩方旉旙早旹
9
6509
6519
6529
6539
6549
6559
6569
6579
6589
6599
65A9
65B9
65C9
65D9
65E9
65F9
攊攚攪攺敊敚敪敺斊斚斪斺旊旚旪旺
A
650A
651A
652A
653A
654A
655A
656A
657A
658A
659A
65AA
65BA
65CA
65DA
65EA
65FA
攋攛攫攻敋敛敫敻斋斛斫斻旋旛旫旻
B
650B
C
D
651B
652B
653B
654B
655B
656B
657B
658B
659B
65AB
65BB
65CB
65DB
65EB
65FB
攌攜攬攼敌敜敬敼斌斜斬於旌旜旬旼 650C
651C
652C
653C
654C
655C
656C
657C
658C
659C
65AC
65BC
65CC
65DC
65EC
65FC
攍攝攭攽敍敝敭敽斍斝断施旍旝旭旽 650D
651D
652D
653D
654D
655D
656D
657D
658D
659D
65AD
65BD
65CD
65DD
65ED
65FD
攎攞攮放敎敞敮敾斎斞斮斾旎旞旮旾 650E
F
651
65FF
攀攐攠攰敀敐敠数斀斐斠新旀旐无旰
0
E
CJK Unified Ideographs
651E
652E
653E
654E
655E
656E
657E
658E
659E
65AE
65BE
65CE
65DE
65EE
65FE
攏攟支政敏敟敯敿斏斟斯斿族旟旯旿 650F
651F
652F
653F
654F
655F
656F
657F
658F
659F
65AF
65BF
65CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
65DF
65EF
65FF
339
6600 660
6600
662
663
664
665
666
667
668
669
66A 66B 66C 66D 66E
66F
6610
6620
6630
6640
6650
6660
6670
6680
6690
66A0
66B0
66C0
66D0
66E0
66F0
昁昑昡昱晁晑晡晱暁暑暡暱曁曑曡曱
1
6601
6611
6621
6631
6641
6651
6661
6671
6681
6691
66A1
66B1
66C1
66D1
66E1
66F1
昂昒昢昲時晒晢晲暂暒暢暲曂曒曢曲
2
6602
6612
6622
6632
6642
6652
6662
6672
6682
6692
66A2
66B2
66C2
66D2
66E2
66F2
昃易昣昳晃晓晣晳暃暓暣暳曃曓曣曳
3
6603
6613
6623
6633
6643
6653
6663
6673
6683
6693
66A3
66B3
66C3
66D3
66E3
66F3
昄昔昤昴晄晔晤晴暄暔暤暴曄曔曤更
4
6604
6614
6624
6634
6644
6654
6664
6674
6684
6694
66A4
66B4
66C4
66D4
66E4
66F4
昅昕春昵晅晕晥晵暅暕暥暵曅曕曥曵
5
6605
6615
6625
6635
6645
6655
6665
6675
6685
6695
66A5
66B5
66C5
66D5
66E5
66F5
昆昖昦昶晆晖晦晶暆暖暦暶曆曖曦曶
6
6606
6616
6626
6636
6646
6656
6666
6676
6686
6696
66A6
66B6
66C6
66D6
66E6
66F6
昇昗昧昷晇晗晧晷暇暗暧暷曇曗曧曷
7
6607
6617
6627
6637
6647
6657
6667
6677
6687
6697
66A7
66B7
66C7
66D7
66E7
66F7
昈昘昨昸晈晘晨晸暈暘暨暸曈曘曨書
8
6608
6618
6628
6638
6648
6658
6668
6678
6688
6698
66A8
66B8
66C8
66D8
66E8
66F8
昉昙昩昹晉晙晩晹暉暙暩暹曉曙曩曹
9
6609
6619
6629
6639
6649
6659
6669
6679
6689
6699
66A9
66B9
66C9
66D9
66E9
66F9
昊昚昪昺晊晚晪智暊暚暪暺曊曚曪曺
A
660A
661A
662A
663A
664A
665A
666A
667A
668A
669A
66AA
66BA
66CA
66DA
66EA
66FA
昋昛昫昻晋晛晫晻暋暛暫暻曋曛曫曻
B
660B
C
D
661B
662B
663B
664B
665B
666B
667B
668B
669B
66AB
66BB
66CB
66DB
66EB
66FB
昌昜昬昼晌晜晬晼暌暜暬暼曌曜曬曼 660C
661C
662C
663C
664C
665C
666C
667C
668C
669C
66AC
66BC
66CC
66DC
66EC
66FC
昍昝昭昽晍晝晭晽暍暝暭暽曍曝曭曽 660D
661D
662D
663D
664D
665D
666D
667D
668D
669D
66AD
66BD
66CD
66DD
66ED
66FD
明昞昮显晎晞普晾暎暞暮暾曎曞曮曾 660E
F
661
66FF
昀昐映昰晀晐晠晰暀暐暠暰曀曐曠曰
0
E
CJK Unified Ideographs
661E
662E
663E
664E
665E
666E
667E
668E
669E
66AE
66BE
66CE
66DE
66EE
66FE
昏星是昿晏晟景晿暏暟暯暿曏曟曯替 660F
340
661F
662F
663F
664F
665F
666F
667F
668F
669F
66AF
66BF
66CF
66DF
66EF
66FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
6700 670
6700
672
673
674
675
676
677
678
679
67A 67B 67C 67D 67E
67F
6710
6720
6730
6740
6750
6760
6770
6780
6790
67A0
67B0
67C0
67D0
67E0
67F0
朁朑朡朱杁村条東极枑枡枱柁柑柡柱
1
6701
6711
6721
6731
6741
6751
6761
6771
6781
6791
67A1
67B1
67C1
67D1
67E1
67F1
朂朒朢朲杂杒杢杲枂枒枢枲柂柒柢柲
2
6702
6712
6722
6732
6742
6752
6762
6772
6782
6792
67A2
67B2
67C2
67D2
67E2
67F2
會朓朣朳权杓杣杳枃枓枣枳柃染柣柳
3
6703
6713
6723
6733
6743
6753
6763
6773
6783
6793
67A3
67B3
67C3
67D3
67E3
67F3
朄朔朤朴杄杔杤杴构枔枤枴柄柔柤柴
4
6704
6714
6724
6734
6744
6754
6764
6774
6784
6794
67A4
67B4
67C4
67D4
67E4
67F4
朅朕朥朵杅杕来杵枅枕枥枵柅柕查柵
5
6705
6715
6725
6735
6745
6755
6765
6775
6785
6795
67A5
67B5
67C5
67D5
67E5
67F5
朆朖朦朶杆杖杦杶枆枖枦架柆柖柦柶
6
6706
6716
6726
6736
6746
6756
6766
6776
6786
6796
67A6
67B6
67C6
67D6
67E6
67F6
朇朗朧朷杇杗杧杷枇林枧枷柇柗柧柷
7
6707
6717
6727
6737
6747
6757
6767
6777
6787
6797
67A7
67B7
67C7
67D7
67E7
67F7
月朘木朸杈杘杨杸枈枘枨枸柈柘柨柸
8
6708
6718
6728
6738
6748
6758
6768
6778
6788
6798
67A8
67B8
67C8
67D8
67E8
67F8
有朙朩朹杉杙杩杹枉枙枩枹柉柙柩柹
9
6709
6719
6729
6739
6749
6759
6769
6779
6789
6799
67A9
67B9
67C9
67D9
67E9
67F9
朊朚未机杊杚杪杺枊枚枪枺柊柚柪柺
A
670A
671A
672A
673A
674A
675A
676A
677A
678A
679A
67AA
67BA
67CA
67DA
67EA
67FA
朋望末朻杋杛杫杻枋枛枫枻柋柛柫査
B
670B
C
D
671B
672B
673B
674B
675B
676B
677B
678B
679B
67AB
67BB
67CB
67DB
67EB
67FB
朌朜本朼杌杜杬杼枌果枬枼柌柜柬柼 670C
671C
672C
673C
674C
675C
676C
677C
678C
679C
67AC
67BC
67CC
67DC
67EC
67FC
服朝札朽杍杝杭杽枍枝枭枽柍柝柭柽 670D
671D
672D
673D
674D
675D
676D
677D
678D
679D
67AD
67BD
67CD
67DD
67ED
67FD
朎朞朮朾李杞杮松枎枞枮枾柎柞柮柾 670E
F
671
67FF
最朐朠朰杀材杠杰枀析枠枰柀某柠柰
0
E
CJK Unified Ideographs
671E
672E
673E
674E
675E
676E
677E
678E
679E
67AE
67BE
67CE
67DE
67EE
67FE
朏期术朿杏束杯板枏枟枯枿柏柟柯柿 670F
671F
672F
673F
674F
675F
676F
677F
678F
679F
67AF
67BF
67CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
67DF
67EF
67FF
341
6800 680
6800
682
683
684
685
686
687
688
689
68A 68B 68C 68D 68E
68F
6810
6820
6830
6840
6850
6860
6870
6880
6890
68A0
68B0
68C0
68D0
68E0
68F0
栁树校栱桁桑桡桱梁梑梡梱棁棑棡棱
1
6801
6811
6821
6831
6841
6851
6861
6871
6881
6891
68A1
68B1
68C1
68D1
68E1
68F1
栂栒栢栲桂桒桢桲梂梒梢梲棂棒棢棲
2
6802
6812
6822
6832
6842
6852
6862
6872
6882
6892
68A2
68B2
68C2
68D2
68E2
68F2
栃栓栣栳桃桓档桳梃梓梣梳棃棓棣棳
3
6803
6813
6823
6833
6843
6853
6863
6873
6883
6893
68A3
68B3
68C3
68D3
68E3
68F3
栄栔栤栴桄桔桤桴梄梔梤梴棄棔棤棴
4
6804
6814
6824
6834
6844
6854
6864
6874
6884
6894
68A4
68B4
68C4
68D4
68E4
68F4
栅栕栥栵桅桕桥桵梅梕梥梵棅棕棥棵
5
6805
6815
6825
6835
6845
6855
6865
6875
6885
6895
68A5
68B5
68C5
68D5
68E5
68F5
栆栖栦栶框桖桦桶梆梖梦梶棆棖棦棶
6
6806
6816
6826
6836
6846
6856
6866
6876
6886
6896
68A6
68B6
68C6
68D6
68E6
68F6
标栗栧样桇桗桧桷梇梗梧梷棇棗棧棷
7
6807
6817
6827
6837
6847
6857
6867
6877
6887
6897
68A7
68B7
68C7
68D7
68E7
68F7
栈栘栨核案桘桨桸梈梘梨梸棈棘棨棸
8
6808
6818
6828
6838
6848
6858
6868
6878
6888
6898
68A8
68B8
68C8
68D8
68E8
68F8
栉栙栩根桉桙桩桹梉梙梩梹棉棙棩棹
9
6809
6819
6829
6839
6849
6859
6869
6879
6889
6899
68A9
68B9
68C9
68D9
68E9
68F9
栊栚株栺桊桚桪桺梊梚梪梺棊棚棪棺
A
680A
681A
682A
683A
684A
685A
686A
687A
688A
689A
68AA
68BA
68CA
68DA
68EA
68FA
栋栛栫栻桋桛桫桻梋梛梫梻棋棛棫棻
B
680B
C
D
681B
682B
683B
684B
685B
686B
687B
688B
689B
68AB
68BB
68CB
68DB
68EB
68FB
栌栜栬格桌桜桬桼梌梜梬梼棌棜棬棼 680C
681C
682C
683C
684C
685C
686C
687C
688C
689C
68AC
68BC
68CC
68DC
68EC
68FC
栍栝栭栽桍桝桭桽梍條梭梽棍棝棭棽 680D
681D
682D
683D
684D
685D
686D
687D
688D
689D
68AD
68BD
68CD
68DD
68ED
68FD
栎栞栮栾桎桞桮桾梎梞梮梾棎棞森棾 680E
F
681
68FF
栀栐栠栰桀桐桠桰梀梐梠械检棐棠棰
0
E
CJK Unified Ideographs
681E
682E
683E
684E
685E
686E
687E
688E
689E
68AE
68BE
68CE
68DE
68EE
68FE
栏栟栯栿桏桟桯桿梏梟梯梿棏棟棯棿 680F
342
681F
682F
683F
684F
685F
686F
687F
688F
689F
68AF
68BF
68CF
68DF
68EF
68FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
6900 690
6900
692
693
694
695
696
697
698
699
69A 69B 69C 69D 69E
69F
6910
6920
6930
6940
6950
6960
6970
6980
6990
69A0
69B0
69C0
69D0
69E0
69F0
椁椑椡椱楁楑楡楱榁榑榡榱槁槑槡槱
1
6901
6911
6921
6931
6941
6951
6961
6971
6981
6991
69A1
69B1
69C1
69D1
69E1
69F1
椂椒椢椲楂楒楢楲概榒榢榲槂槒槢槲
2
6902
6912
6922
6932
6942
6952
6962
6972
6982
6992
69A2
69B2
69C2
69D2
69E2
69F2
椃椓椣椳楃楓楣楳榃榓榣榳槃槓槣槳
3
6903
6913
6923
6933
6943
6953
6963
6973
6983
6993
69A3
69B3
69C3
69D3
69E3
69F3
椄椔椤椴楄楔楤楴榄榔榤榴槄槔槤槴
4
6904
6914
6924
6934
6944
6954
6964
6974
6984
6994
69A4
69B4
69C4
69D4
69E4
69F4
椅椕椥椵楅楕楥極榅榕榥榵槅槕槥槵
5
6905
6915
6925
6935
6945
6955
6965
6975
6985
6995
69A5
69B5
69C5
69D5
69E5
69F5
椆椖椦椶楆楖楦楶榆榖榦榶槆槖槦槶
6
6906
6916
6926
6936
6946
6956
6966
6976
6986
6996
69A6
69B6
69C6
69D6
69E6
69F6
椇椗椧椷楇楗楧楷榇榗榧榷槇槗槧槷
7
6907
6917
6927
6937
6947
6957
6967
6977
6987
6997
69A7
69B7
69C7
69D7
69E7
69F7
椈椘椨椸楈楘楨楸榈榘榨榸槈様槨槸
8
6908
6918
6928
6938
6948
6958
6968
6978
6988
6998
69A8
69B8
69C8
69D8
69E8
69F8
椉椙椩椹楉楙楩楹榉榙榩榹槉槙槩槹
9
6909
6919
6929
6939
6949
6959
6969
6979
6989
6999
69A9
69B9
69C9
69D9
69E9
69F9
椊椚椪椺楊楚楪楺榊榚榪榺槊槚槪槺
A
690A
691A
692A
693A
694A
695A
696A
697A
698A
699A
69AA
69BA
69CA
69DA
69EA
69FA
椋椛椫椻楋楛楫楻榋榛榫榻構槛槫槻
B
690B
C
D
691B
692B
693B
694B
695B
696B
697B
698B
699B
69AB
69BB
69CB
69DB
69EB
69FB
椌検椬椼楌楜楬楼榌榜榬榼槌槜槬槼 690C
691C
692C
693C
694C
695C
696C
697C
698C
699C
69AC
69BC
69CC
69DC
69EC
69FC
植椝椭椽楍楝業楽榍榝榭榽槍槝槭槽 690D
691D
692D
693D
694D
695D
696D
697D
698D
699D
69AD
69BD
69CD
69DD
69ED
69FD
椎椞椮椾楎楞楮楾榎榞榮榾槎槞槮槾 690E
F
691
69FF
椀椐椠椰楀楐楠楰榀榐榠榰槀槐槠槰
0
E
CJK Unified Ideographs
691E
692E
693E
694E
695E
696E
697E
698E
699E
69AE
69BE
69CE
69DE
69EE
69FE
椏椟椯椿楏楟楯楿榏榟榯榿槏槟槯槿 690F
691F
692F
693F
694F
695F
696F
697F
698F
699F
69AF
69BF
69CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
69DF
69EF
69FF
343
6A00
CJK Unified Ideographs
6AFF
6A0 6A1 6A2 6A3 6A4 6A5 6A6 6A7 6A8 6A9 6AA 6AB 6AC 6AD 6AE 6AF
樀樐樠樰橀橐橠橰檀檐檠檰櫀櫐櫠櫰
0
6A00
6A01
6A30
6A40
6A50
6A60
6A70
6A80
6A90
6AA0
6AB0
6AC0
6AD0
6AE0
6AF0
6A11
6A21
6A31
6A41
6A51
6A61
6A71
6A81
6A91
6AA1
6AB1
6AC1
6AD1
6AE1
6AF1
樂樒樢樲橂橒橢橲檂檒檢檲櫂櫒櫢櫲
2
6A02
6A12
6A22
6A32
6A42
6A52
6A62
6A72
6A82
6A92
6AA2
6AB2
6AC2
6AD2
6AE2
6AF2
樃樓樣樳橃橓橣橳檃檓檣檳櫃櫓櫣櫳
3
6A03
6A13
6A23
6A33
6A43
6A53
6A63
6A73
6A83
6A93
6AA3
6AB3
6AC3
6AD3
6AE3
6AF3
樄樔樤樴橄橔橤橴檄檔檤檴櫄櫔櫤櫴
4
6A04
6A14
6A24
6A34
6A44
6A54
6A64
6A74
6A84
6A94
6AA4
6AB4
6AC4
6AD4
6AE4
6AF4
樅樕樥樵橅橕橥橵檅檕檥檵櫅櫕櫥櫵
5
6A05
6A15
6A25
6A35
6A45
6A55
6A65
6A75
6A85
6A95
6AA5
6AB5
6AC5
6AD5
6AE5
6AF5
樆樖樦樶橆橖橦橶檆檖檦檶櫆櫖櫦櫶
6
6A06
6A16
6A26
6A36
6A46
6A56
6A66
6A76
6A86
6A96
6AA6
6AB6
6AC6
6AD6
6AE6
6AF6
樇樗樧樷橇橗橧橷檇檗檧檷櫇櫗櫧櫷
7
6A07
6A17
6A27
6A37
6A47
6A57
6A67
6A77
6A87
6A97
6AA7
6AB7
6AC7
6AD7
6AE7
6AF7
樈樘樨樸橈橘橨橸檈檘檨檸櫈櫘櫨櫸
8
6A08
6A18
6A28
6A38
6A48
6A58
6A68
6A78
6A88
6A98
6AA8
6AB8
6AC8
6AD8
6AE8
6AF8
樉標権樹橉橙橩橹檉檙檩檹櫉櫙櫩櫹
9
6A09
6A19
6A29
6A39
6A49
6A59
6A69
6A79
6A89
6A99
6AA9
6AB9
6AC9
6AD9
6AE9
6AF9
樊樚横樺橊橚橪橺檊檚檪檺櫊櫚櫪櫺
A
6A0A
6A1A
6A2A
6A3A
6A4A
6A5A
6A6A
6A7A
6A8A
6A9A
6AAA
6ABA
6ACA
6ADA
6AEA
6AFA
樋樛樫樻橋橛橫橻檋檛檫檻櫋櫛櫫櫻
B
6A0B
C
D
6A1B
6A2B
6A3B
6A4B
6A5B
6A6B
6A7B
6A8B
6A9B
6AAB
6ABB
6ACB
6ADB
6AEB
6AFB
樌樜樬樼橌橜橬橼檌檜檬檼櫌櫜櫬櫼 6A0C
6A1C
6A2C
6A3C
6A4C
6A5C
6A6C
6A7C
6A8C
6A9C
6AAC
6ABC
6ACC
6ADC
6AEC
6AFC
樍樝樭樽橍橝橭橽檍檝檭檽櫍櫝櫭櫽 6A0D
6A1D
6A2D
6A3D
6A4D
6A5D
6A6D
6A7D
6A8D
6A9D
6AAD
6ABD
6ACD
6ADD
6AED
6AFD
樎樞樮樾橎橞橮橾檎檞檮檾櫎櫞櫮櫾 6A0E
F
6A20
樁樑模樱橁橑橡橱檁檑檡檱櫁櫑櫡櫱
1
E
6A10
6A1E
6A2E
6A3E
6A4E
6A5E
6A6E
6A7E
6A8E
6A9E
6AAE
6ABE
6ACE
6ADE
6AEE
6AFE
樏樟樯樿橏機橯橿檏檟檯檿櫏櫟櫯櫿 6A0F
344
6A1F
6A2F
6A3F
6A4F
6A5F
6A6F
6A7F
6A8F
6A9F
6AAF
6ABF
6ACF
6ADF
6AEF
6AFF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
6B00
CJK Unified Ideographs
6BFF
6B0 6B1 6B2 6B3 6B4 6B5 6B6 6B7 6B8 6B9 6BA 6BB 6BC 6BD 6BE 6BF
欀欐欠欰歀歐歠歰殀殐殠殰毀毐毠毰
0
6B00
6B01
6B30
6B40
6B50
6B60
6B70
6B80
6B90
6BA0
6BB0
6BC0
6BD0
6BE0
6BF0
6B11
6B21
6B31
6B41
6B51
6B61
6B71
6B81
6B91
6BA1
6BB1
6BC1
6BD1
6BE1
6BF1
欂欒欢欲歂歒止歲殂殒殢殲毂毒毢毲
2
6B02
6B12
6B22
6B32
6B42
6B52
6B62
6B72
6B82
6B92
6BA2
6BB2
6BC2
6BD2
6BE2
6BF2
欃欓欣欳歃歓正歳殃殓殣殳毃毓毣毳
3
6B03
6B13
6B23
6B33
6B43
6B53
6B63
6B73
6B83
6B93
6BA3
6BB3
6BC3
6BD3
6BE3
6BF3
欄欔欤欴歄歔此歴殄殔殤殴毄比毤毴
4
6B04
6B14
6B24
6B34
6B44
6B54
6B64
6B74
6B84
6B94
6BA4
6BB4
6BC4
6BD4
6BE4
6BF4
欅欕欥欵歅歕步歵殅殕殥段毅毕毥毵
5
6B05
6B15
6B25
6B35
6B45
6B55
6B65
6B75
6B85
6B95
6BA5
6BB5
6BC5
6BD5
6BE5
6BF5
欆欖欦欶歆歖武歶殆殖殦殶毆毖毦毶
6
6B06
6B16
6B26
6B36
6B46
6B56
6B66
6B76
6B86
6B96
6BA6
6BB6
6BC6
6BD6
6BE6
6BF6
欇欗欧欷歇歗歧歷殇殗殧殷毇毗毧毷
7
6B07
6B17
6B27
6B37
6B47
6B57
6B67
6B77
6B87
6B97
6BA7
6BB7
6BC7
6BD7
6BE7
6BF7
欈欘欨欸歈歘歨歸殈殘殨殸毈毘毨毸
8
6B08
6B18
6B28
6B38
6B48
6B58
6B68
6B78
6B88
6B98
6BA8
6BB8
6BC8
6BD8
6BE8
6BF8
欉欙欩欹歉歙歩歹殉殙殩殹毉毙毩毹
9
6B09
6B19
6B29
6B39
6B49
6B59
6B69
6B79
6B89
6B99
6BA9
6BB9
6BC9
6BD9
6BE9
6BF9
權欚欪欺歊歚歪歺殊殚殪殺毊毚毪毺
A
6B0A
6B1A
6B2A
6B3A
6B4A
6B5A
6B6A
6B7A
6B8A
6B9A
6BAA
6BBA
6BCA
6BDA
6BEA
6BFA
欋欛欫欻歋歛歫死残殛殫殻毋毛毫毻
B
6B0B
C
D
6B1B
6B2B
6B3B
6B4B
6B5B
6B6B
6B7B
6B8B
6B9B
6BAB
6BBB
6BCB
6BDB
6BEB
6BFB
欌欜欬欼歌歜歬歼殌殜殬殼毌毜毬毼 6B0C
6B1C
6B2C
6B3C
6B4C
6B5C
6B6C
6B7C
6B8C
6B9C
6BAC
6BBC
6BCC
6BDC
6BEC
6BFC
欍欝欭欽歍歝歭歽殍殝殭殽母毝毭毽 6B0D
6B1D
6B2D
6B3D
6B4D
6B5D
6B6D
6B7D
6B8D
6B9D
6BAD
6BBD
6BCD
6BDD
6BED
6BFD
欎欞欮款歎歞歮歾殎殞殮殾毎毞毮毾 6B0E
F
6B20
欁欑次欱歁歑歡歱殁殑殡殱毁毑毡毱
1
E
6B10
6B1E
6B2E
6B3E
6B4E
6B5E
6B6E
6B7E
6B8E
6B9E
6BAE
6BBE
6BCE
6BDE
6BEE
6BFE
欏欟欯欿歏歟歯歿殏殟殯殿每毟毯毿 6B0F
6B1F
6B2F
6B3F
6B4F
6B5F
6B6F
6B7F
6B8F
6B9F
6BAF
6BBF
6BCF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
6BDF
6BEF
6BFF
345
6C00
CJK Unified Ideographs
6CFF
6C0 6C1 6C2 6C3 6C4 6C5 6C6 6C7 6C8 6C9 6CA 6CB 6CC 6CD 6CE 6CF
氀氐氠氰汀汐池汰沀沐沠沰泀泐泠泰
0
6C00
6C01
6C30
6C40
6C50
6C60
6C70
6C80
6C90
6CA0
6CB0
6CC0
6CD0
6CE0
6CF0
6C11
6C21
6C31
6C41
6C51
6C61
6C71
6C81
6C91
6CA1
6CB1
6CC1
6CD1
6CE1
6CF1
氂氒氢氲求汒汢汲沂沒沢沲泂泒波泲
2
6C02
6C12
6C22
6C32
6C42
6C52
6C62
6C72
6C82
6C92
6CA2
6CB2
6CC2
6CD2
6CE2
6CF2
氃氓氣氳汃汓汣汳沃沓沣河泃泓泣泳
3
6C03
6C13
6C23
6C33
6C43
6C53
6C63
6C73
6C83
6C93
6CA3
6CB3
6CC3
6CD3
6CE3
6CF3
氄气氤水汄汔汤汴沄沔沤沴泄泔泤泴
4
6C04
6C14
6C24
6C34
6C44
6C54
6C64
6C74
6C84
6C94
6CA4
6CB4
6CC4
6CD4
6CE4
6CF4
氅氕氥氵汅汕汥汵沅沕沥沵泅法泥泵
5
6C05
6C15
6C25
6C35
6C45
6C55
6C65
6C75
6C85
6C95
6CA5
6CB5
6CC5
6CD5
6CE5
6CF5
氆氖氦氶汆汖汦汶沆沖沦沶泆泖泦泶
6
6C06
6C16
6C26
6C36
6C46
6C56
6C66
6C76
6C86
6C96
6CA6
6CB6
6CC6
6CD6
6CE6
6CF6
氇気氧氷汇汗汧汷沇沗沧沷泇泗泧泷
7
6C07
6C17
6C27
6C37
6C47
6C57
6C67
6C77
6C87
6C97
6CA7
6CB7
6CC7
6CD7
6CE7
6CF7
氈氘氨永汈汘汨汸沈沘沨沸泈泘注泸
8
6C08
6C18
6C28
6C38
6C48
6C58
6C68
6C78
6C88
6C98
6CA8
6CB8
6CC8
6CD8
6CE8
6CF8
氉氙氩氹汉汙汩汹沉沙沩油泉泙泩泹
9
6C09
6C19
6C29
6C39
6C49
6C59
6C69
6C79
6C89
6C99
6CA9
6CB9
6CC9
6CD9
6CE9
6CF9
氊氚氪氺汊汚汪決沊沚沪沺泊泚泪泺
A
6C0A
6C1A
6C2A
6C3A
6C4A
6C5A
6C6A
6C7A
6C8A
6C9A
6CAA
6CBA
6CCA
6CDA
6CEA
6CFA
氋氛氫氻汋汛汫汻沋沛沫治泋泛泫泻
B
6C0B
C
D
6C1B
6C2B
6C3B
6C4B
6C5B
6C6B
6C7B
6C8B
6C9B
6CAB
6CBB
6CCB
6CDB
6CEB
6CFB
氌氜氬氼汌汜汬汼沌沜沬沼泌泜泬泼 6C0C
6C1C
6C2C
6C3C
6C4C
6C5C
6C6C
6C7C
6C8C
6C9C
6CAC
6CBC
6CCC
6CDC
6CEC
6CFC
氍氝氭氽汍汝汭汽沍沝沭沽泍泝泭泽 6C0D
6C1D
6C2D
6C3D
6C4D
6C5D
6C6D
6C7D
6C8D
6C9D
6CAD
6CBD
6CCD
6CDD
6CED
6CFD
氎氞氮氾汎汞汮汾沎沞沮沾泎泞泮泾 6C0E
F
6C20
氁民氡氱汁汑污汱沁沑没沱況泑泡泱
1
E
6C10
6C1E
6C2E
6C3E
6C4E
6C5E
6C6E
6C7E
6C8E
6C9E
6CAE
6CBE
6CCE
6CDE
6CEE
6CFE
氏氟氯氿汏江汯汿沏沟沯沿泏泟泯泿 6C0F
346
6C1F
6C2F
6C3F
6C4F
6C5F
6C6F
6C7F
6C8F
6C9F
6CAF
6CBF
6CCF
6CDF
6CEF
6CFF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
6D00
CJK Unified Ideographs
6DFF
6D0 6D1 6D2 6D3 6D4 6D5 6D6 6D7 6D8 6D9 6DA 6DB 6DC 6DD 6DE 6DF
洀洐洠洰浀浐浠浰涀涐涠涰淀淐淠淰
0
6D00
6D01
6D30
6D40
6D50
6D60
6D70
6D80
6D90
6DA0
6DB0
6DC0
6DD0
6DE0
6DF0
6D11
6D21
6D31
6D41
6D51
6D61
6D71
6D81
6D91
6DA1
6DB1
6DC1
6DD1
6DE1
6DF1
洂洒洢洲浂浒浢浲涂涒涢液淂淒淢淲
2
6D02
6D12
6D22
6D32
6D42
6D52
6D62
6D72
6D82
6D92
6DA2
6DB2
6DC2
6DD2
6DE2
6DF2
洃洓洣洳浃浓浣浳涃涓涣涳淃淓淣淳
3
6D03
6D13
6D23
6D33
6D43
6D53
6D63
6D73
6D83
6D93
6DA3
6DB3
6DC3
6DD3
6DE3
6DF3
洄洔洤洴浄浔浤浴涄涔涤涴淄淔淤淴
4
6D04
6D14
6D24
6D34
6D44
6D54
6D64
6D74
6D84
6D94
6DA4
6DB4
6DC4
6DD4
6DE4
6DF4
洅洕津洵浅浕浥浵涅涕涥涵淅淕淥淵
5
6D05
6D15
6D25
6D35
6D45
6D55
6D65
6D75
6D85
6D95
6DA5
6DB5
6DC5
6DD5
6DE5
6DF5
洆洖洦洶浆浖浦浶涆涖润涶淆淖淦淶
6
6D06
6D16
6D26
6D36
6D46
6D56
6D66
6D76
6D86
6D96
6DA6
6DB6
6DC6
6DD6
6DE6
6DF6
洇洗洧洷浇浗浧海涇涗涧涷淇淗淧混
7
6D07
6D17
6D27
6D37
6D47
6D57
6D67
6D77
6D87
6D97
6DA7
6DB7
6DC7
6DD7
6DE7
6DF7
洈洘洨洸浈浘浨浸消涘涨涸淈淘淨淸
8
6D08
6D18
6D28
6D38
6D48
6D58
6D68
6D78
6D88
6D98
6DA8
6DB8
6DC8
6DD8
6DE8
6DF8
洉洙洩洹浉浙浩浹涉涙涩涹淉淙淩淹
9
6D09
6D19
6D29
6D39
6D49
6D59
6D69
6D79
6D89
6D99
6DA9
6DB9
6DC9
6DD9
6DE9
6DF9
洊洚洪洺浊浚浪浺涊涚涪涺淊淚淪淺
A
6D0A
6D1A
6D2A
6D3A
6D4A
6D5A
6D6A
6D7A
6D8A
6D9A
6DAA
6DBA
6DCA
6DDA
6DEA
6DFA
洋洛洫活测浛浫浻涋涛涫涻淋淛淫添
B
6D0B
C
D
6D1B
6D2B
6D3B
6D4B
6D5B
6D6B
6D7B
6D8B
6D9B
6DAB
6DBB
6DCB
6DDB
6DEB
6DFB
洌洜洬洼浌浜浬浼涌涜涬涼淌淜淬淼 6D0C
6D1C
6D2C
6D3C
6D4C
6D5C
6D6C
6D7C
6D8C
6D9C
6DAC
6DBC
6DCC
6DDC
6DEC
6DFC
洍洝洭洽浍浝浭浽涍涝涭涽淍淝淭淽 6D0D
6D1D
6D2D
6D3D
6D4D
6D5D
6D6D
6D7D
6D8D
6D9D
6DAD
6DBD
6DCD
6DDD
6DED
6DFD
洎洞洮派济浞浮浾涎涞涮涾淎淞淮淾 6D0E
F
6D20
洁洑洡洱流浑浡浱涁涑涡涱淁淑淡深
1
E
6D10
6D1E
6D2E
6D3E
6D4E
6D5E
6D6E
6D7E
6D8E
6D9E
6DAE
6DBE
6DCE
6DDE
6DEE
6DFE
洏洟洯洿浏浟浯浿涏涟涯涿淏淟淯淿 6D0F
6D1F
6D2F
6D3F
6D4F
6D5F
6D6F
6D7F
6D8F
6D9F
6DAF
6DBF
6DCF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
6DDF
6DEF
6DFF
347
6E00
CJK Unified Ideographs
6EFF
6E0 6E1 6E2 6E3 6E4 6E5 6E6 6E7 6E8 6E9 6EA 6EB 6EC 6ED 6EE 6EF
渀渐渠渰湀湐湠湰満源溠溰滀滐滠滰
0
6E00
6E01
6E30
6E40
6E50
6E60
6E70
6E80
6E90
6EA0
6EB0
6EC0
6ED0
6EE0
6EF0
6E11
6E21
6E31
6E41
6E51
6E61
6E71
6E81
6E91
6EA1
6EB1
6EC1
6ED1
6EE1
6EF1
渂渒渢渲湂湒湢湲溂溒溢溲滂滒滢滲
2
6E02
6E12
6E22
6E32
6E42
6E52
6E62
6E72
6E82
6E92
6EA2
6EB2
6EC2
6ED2
6EE2
6EF2
渃渓渣渳湃湓湣湳溃溓溣溳滃滓滣滳
3
6E03
6E13
6E23
6E33
6E43
6E53
6E63
6E73
6E83
6E93
6EA3
6EB3
6EC3
6ED3
6EE3
6EF3
渄渔渤渴湄湔湤湴溄溔溤溴滄滔滤滴
4
6E04
6E14
6E24
6E34
6E44
6E54
6E64
6E74
6E84
6E94
6EA4
6EB4
6EC4
6ED4
6EE4
6EF4
清渕渥渵湅湕湥湵溅溕溥溵滅滕滥滵
5
6E05
6E15
6E25
6E35
6E45
6E55
6E65
6E75
6E85
6E95
6EA5
6EB5
6EC5
6ED5
6EE5
6EF5
渆渖渦渶湆湖湦湶溆準溦溶滆滖滦滶
6
6E06
6E16
6E26
6E36
6E46
6E56
6E66
6E76
6E86
6E96
6EA6
6EB6
6EC6
6ED6
6EE6
6EF6
渇渗渧渷湇湗湧湷溇溗溧溷滇滗滧滷
7
6E07
6E17
6E27
6E37
6E47
6E57
6E67
6E77
6E87
6E97
6EA7
6EB7
6EC7
6ED7
6EE7
6EF7
済渘渨游湈湘湨湸溈溘溨溸滈滘滨滸
8
6E08
6E18
6E28
6E38
6E48
6E58
6E68
6E78
6E88
6E98
6EA8
6EB8
6EC8
6ED8
6EE8
6EF8
渉渙温渹湉湙湩湹溉溙溩溹滉滙滩滹
9
6E09
6E19
6E29
6E39
6E49
6E59
6E69
6E79
6E89
6E99
6EA9
6EB9
6EC9
6ED9
6EE9
6EF9
渊渚渪渺湊湚湪湺溊溚溪溺滊滚滪滺
A
6E0A
6E1A
6E2A
6E3A
6E4A
6E5A
6E6A
6E7A
6E8A
6E9A
6EAA
6EBA
6ECA
6EDA
6EEA
6EFA
渋減渫渻湋湛湫湻溋溛溫溻滋滛滫滻
B
6E0B
C
D
6E1B
6E2B
6E3B
6E4B
6E5B
6E6B
6E7B
6E8B
6E9B
6EAB
6EBB
6ECB
6EDB
6EEB
6EFB
渌渜測渼湌湜湬湼溌溜溬溼滌滜滬滼 6E0C
6E1C
6E2C
6E3C
6E4C
6E5C
6E6C
6E7C
6E8C
6E9C
6EAC
6EBC
6ECC
6EDC
6EEC
6EFC
渍渝渭渽湍湝湭湽溍溝溭溽滍滝滭滽 6E0D
6E1D
6E2D
6E3D
6E4D
6E5D
6E6D
6E7D
6E8D
6E9D
6EAD
6EBD
6ECD
6EDD
6EED
6EFD
渎渞渮渾湎湞湮湾溎溞溮溾滎滞滮滾 6E0E
F
6E20
渁渑渡渱湁湑湡湱溁溑溡溱滁滑满滱
1
E
6E10
6E1E
6E2E
6E3E
6E4E
6E5E
6E6E
6E7E
6E8E
6E9E
6EAE
6EBE
6ECE
6EDE
6EEE
6EFE
渏渟港渿湏湟湯湿溏溟溯溿滏滟滯滿 6E0F
348
6E1F
6E2F
6E3F
6E4F
6E5F
6E6F
6E7F
6E8F
6E9F
6EAF
6EBF
6ECF
6EDF
6EEF
6EFF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
6F00 6F0
6F00
6F2
6F3
6F4
6F5
6F6
6F7
6F8
6F9 6FA 6FB 6FC 6FD 6FE 6FF
6F10
6F20
6F30
6F40
6F50
6F60
6F70
6F80
6F90
6FA0
6FB0
6FC0
6FD0
6FE0
6FF0
漁漑漡漱潁潑潡潱澁澑澡澱濁濑濡濱
1
6F01
6F11
6F21
6F31
6F41
6F51
6F61
6F71
6F81
6F91
6FA1
6FB1
6FC1
6FD1
6FE1
6FF1
漂漒漢漲潂潒潢潲澂澒澢澲濂濒濢濲
2
6F02
6F12
6F22
6F32
6F42
6F52
6F62
6F72
6F82
6F92
6FA2
6FB2
6FC2
6FD2
6FE2
6FF2
漃漓漣漳潃潓潣潳澃澓澣澳濃濓濣濳
3
6F03
6F13
6F23
6F33
6F43
6F53
6F63
6F73
6F83
6F93
6FA3
6FB3
6FC3
6FD3
6FE3
6FF3
漄演漤漴潄潔潤潴澄澔澤澴濄濔濤濴
4
6F04
6F14
6F24
6F34
6F44
6F54
6F64
6F74
6F84
6F94
6FA4
6FB4
6FC4
6FD4
6FE4
6FF4
漅漕漥漵潅潕潥潵澅澕澥澵濅濕濥濵
5
6F05
6F15
6F25
6F35
6F45
6F55
6F65
6F75
6F85
6F95
6FA5
6FB5
6FC5
6FD5
6FE5
6FF5
漆漖漦漶潆潖潦潶澆澖澦澶濆濖濦濶
6
6F06
6F16
6F26
6F36
6F46
6F56
6F66
6F76
6F86
6F96
6FA6
6FB6
6FC6
6FD6
6FE6
6FF6
漇漗漧漷潇潗潧潷澇澗澧澷濇濗濧濷
7
6F07
6F17
6F27
6F37
6F47
6F57
6F67
6F77
6F87
6F97
6FA7
6FB7
6FC7
6FD7
6FE7
6FF7
漈漘漨漸潈潘潨潸澈澘澨澸濈濘濨濸
8
6F08
6F18
6F28
6F38
6F48
6F58
6F68
6F78
6F88
6F98
6FA8
6FB8
6FC8
6FD8
6FE8
6FF8
漉漙漩漹潉潙潩潹澉澙澩澹濉濙濩濹
9
6F09
6F19
6F29
6F39
6F49
6F59
6F69
6F79
6F89
6F99
6FA9
6FB9
6FC9
6FD9
6FE9
6FF9
漊漚漪漺潊潚潪潺澊澚澪澺濊濚濪濺
A
6F0A
6F1A
6F2A
6F3A
6F4A
6F5A
6F6A
6F7A
6F8A
6F9A
6FAA
6FBA
6FCA
6FDA
6FEA
6FFA
漋漛漫漻潋潛潫潻澋澛澫澻濋濛濫濻
B
6F0B
C
D
6F1B
6F2B
6F3B
6F4B
6F5B
6F6B
6F7B
6F8B
6F9B
6FAB
6FBB
6FCB
6FDB
6FEB
6FFB
漌漜漬漼潌潜潬潼澌澜澬澼濌濜濬濼 6F0C
6F1C
6F2C
6F3C
6F4C
6F5C
6F6C
6F7C
6F8C
6F9C
6FAC
6FBC
6FCC
6FDC
6FEC
6FFC
漍漝漭漽潍潝潭潽澍澝澭澽濍濝濭濽 6F0D
6F1D
6F2D
6F3D
6F4D
6F5D
6F6D
6F7D
6F8D
6F9D
6FAD
6FBD
6FCD
6FDD
6FED
6FFD
漎漞漮漾潎潞潮潾澎澞澮澾濎濞濮濾 6F0E
F
6F1
6FFF
漀漐漠漰潀潐潠潰澀澐澠澰激濐濠濰
0
E
CJK Unified Ideographs
6F1E
6F2E
6F3E
6F4E
6F5E
6F6E
6F7E
6F8E
6F9E
6FAE
6FBE
6FCE
6FDE
6FEE
6FFE
漏漟漯漿潏潟潯潿澏澟澯澿濏濟濯濿 6F0F
6F1F
6F2F
6F3F
6F4F
6F5F
6F6F
6F7F
6F8F
6F9F
6FAF
6FBF
6FCF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
6FDF
6FEF
6FFF
349
7000 700
7000
702
703
704
705
706
707
708
709
70A 70B 70C 70D 70E
70F
7010
7020
7030
7040
7050
7060
7070
7080
7090
70A0
70B0
70C0
70D0
70E0
70F0
瀁瀑瀡瀱灁灑灡灱炁炑炡炱烁烑烡烱
1
7001
7011
7021
7031
7041
7051
7061
7071
7081
7091
70A1
70B1
70C1
70D1
70E1
70F1
瀂瀒瀢瀲灂灒灢灲炂炒炢炲烂烒烢烲
2
7002
7012
7022
7032
7042
7052
7062
7072
7082
7092
70A2
70B2
70C2
70D2
70E2
70F2
瀃瀓瀣瀳灃灓灣灳炃炓炣炳烃烓烣烳
3
7003
7013
7023
7033
7043
7053
7063
7073
7083
7093
70A3
70B3
70C3
70D3
70E3
70F3
瀄瀔瀤瀴灄灔灤灴炄炔炤炴烄烔烤烴
4
7004
7014
7024
7034
7044
7054
7064
7074
7084
7094
70A4
70B4
70C4
70D4
70E4
70F4
瀅瀕瀥瀵灅灕灥灵炅炕炥炵烅烕烥烵
5
7005
7015
7025
7035
7045
7055
7065
7075
7085
7095
70A5
70B5
70C5
70D5
70E5
70F5
瀆瀖瀦瀶灆灖灦灶炆炖炦炶烆烖烦烶
6
7006
7016
7026
7036
7046
7056
7066
7076
7086
7096
70A6
70B6
70C6
70D6
70E6
70F6
瀇瀗瀧瀷灇灗灧灷炇炗炧炷烇烗烧烷
7
7007
7017
7027
7037
7047
7057
7067
7077
7087
7097
70A7
70B7
70C7
70D7
70E7
70F7
瀈瀘瀨瀸灈灘灨灸炈炘炨炸烈烘烨烸
8
7008
7018
7028
7038
7048
7058
7068
7078
7088
7098
70A8
70B8
70C8
70D8
70E8
70F8
瀉瀙瀩瀹灉灙灩灹炉炙炩点烉烙烩烹
9
7009
7019
7029
7039
7049
7059
7069
7079
7089
7099
70A9
70B9
70C9
70D9
70E9
70F9
瀊瀚瀪瀺灊灚灪灺炊炚炪為烊烚烪烺
A
700A
701A
702A
703A
704A
705A
706A
707A
708A
709A
70AA
70BA
70CA
70DA
70EA
70FA
瀋瀛瀫瀻灋灛火灻炋炛炫炻烋烛烫烻
B
700B
C
D
701B
702B
703B
704B
705B
706B
707B
708B
709B
70AB
70BB
70CB
70DB
70EB
70FB
瀌瀜瀬瀼灌灜灬灼炌炜炬炼烌烜烬烼 700C
701C
702C
703C
704C
705C
706C
707C
708C
709C
70AC
70BC
70CC
70DC
70EC
70FC
瀍瀝瀭瀽灍灝灭災炍炝炭炽烍烝热烽 700D
701D
702D
703D
704D
705D
706D
707D
708D
709D
70AD
70BD
70CD
70DD
70ED
70FD
瀎瀞瀮瀾灎灞灮灾炎炞炮炾烎烞烮烾 700E
F
701
70FF
瀀瀐瀠瀰灀灐灠灰炀炐炠炰烀烐烠烰
0
E
CJK Unified Ideographs
701E
702E
703E
704E
705E
706E
707E
708E
709E
70AE
70BE
70CE
70DE
70EE
70FE
瀏瀟瀯瀿灏灟灯灿炏炟炯炿烏烟烯烿 700F
350
701F
702F
703F
704F
705F
706F
707F
708F
709F
70AF
70BF
70CF
70DF
70EF
70FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
7100 710
7100
712
713
714
715
716
717
718
719
71A 71B 71C 71D 71E
71F
7110
7120
7130
7140
7150
7160
7170
7180
7190
71A0
71B0
71C0
71D0
71E0
71F0
焁焑無焱煁煑煡煱熁熑熡熱燁燑燡燱
1
7101
7111
7121
7131
7141
7151
7161
7171
7181
7191
71A1
71B1
71C1
71D1
71E1
71F1
焂焒焢焲煂煒煢煲熂熒熢熲燂燒燢燲
2
7102
7112
7122
7132
7142
7152
7162
7172
7182
7192
71A2
71B2
71C2
71D2
71E2
71F2
焃焓焣焳煃煓煣煳熃熓熣熳燃燓燣燳
3
7103
7113
7123
7133
7143
7153
7163
7173
7183
7193
71A3
71B3
71C3
71D3
71E3
71F3
焄焔焤焴煄煔煤煴熄熔熤熴燄燔燤燴
4
7104
7114
7124
7134
7144
7154
7164
7174
7184
7194
71A4
71B4
71C4
71D4
71E4
71F4
焅焕焥焵煅煕煥煵熅熕熥熵燅燕燥燵
5
7105
7115
7125
7135
7145
7155
7165
7175
7185
7195
71A5
71B5
71C5
71D5
71E5
71F5
焆焖焦然煆煖煦煶熆熖熦熶燆燖燦燶
6
7106
7116
7126
7136
7146
7156
7166
7176
7186
7196
71A6
71B6
71C6
71D6
71E6
71F6
焇焗焧焷煇煗照煷熇熗熧熷燇燗燧燷
7
7107
7117
7127
7137
7147
7157
7167
7177
7187
7197
71A7
71B7
71C7
71D7
71E7
71F7
焈焘焨焸煈煘煨煸熈熘熨熸燈燘燨燸
8
7108
7118
7128
7138
7148
7158
7168
7178
7188
7198
71A8
71B8
71C8
71D8
71E8
71F8
焉焙焩焹煉煙煩煹熉熙熩熹燉燙燩燹
9
7109
7119
7129
7139
7149
7159
7169
7179
7189
7199
71A9
71B9
71C9
71D9
71E9
71F9
焊焚焪焺煊煚煪煺熊熚熪熺燊燚燪燺
A
710A
711A
712A
713A
714A
715A
716A
717A
718A
719A
71AA
71BA
71CA
71DA
71EA
71FA
焋焛焫焻煋煛煫煻熋熛熫熻燋燛燫燻
B
710B
C
D
711B
712B
713B
714B
715B
716B
717B
718B
719B
71AB
71BB
71CB
71DB
71EB
71FB
焌焜焬焼煌煜煬煼熌熜熬熼燌燜燬燼 710C
711C
712C
713C
714C
715C
716C
717C
718C
719C
71AC
71BC
71CC
71DC
71EC
71FC
焍焝焭焽煍煝煭煽熍熝熭熽燍燝燭燽 710D
711D
712D
713D
714D
715D
716D
717D
718D
719D
71AD
71BD
71CD
71DD
71ED
71FD
焎焞焮焾煎煞煮煾熎熞熮熾燎燞燮燾 710E
F
711
71FF
焀焐焠焰煀煐煠煰熀熐熠熰燀燐燠燰
0
E
CJK Unified Ideographs
711E
712E
713E
714E
715E
716E
717E
718E
719E
71AE
71BE
71CE
71DE
71EE
71FE
焏焟焯焿煏煟煯煿熏熟熯熿燏營燯燿 710F
711F
712F
713F
714F
715F
716F
717F
718F
719F
71AF
71BF
71CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
71DF
71EF
71FF
351
7200 720
7200
722
723
724
725
726
727
728
729
72A 72B 72C 72D 72E
72F
7210
7220
7230
7240
7250
7260
7270
7280
7290
72A0
72B0
72C0
72D0
72E0
72F0
爁爑爡爱牁牑牡牱犁犑犡犱狁狑狡狱
1
7201
7211
7221
7231
7241
7251
7261
7271
7281
7291
72A1
72B1
72C1
72D1
72E1
72F1
爂爒爢爲牂牒牢牲犂犒犢犲狂狒狢狲
2
7202
7212
7222
7232
7242
7252
7262
7272
7282
7292
72A2
72B2
72C2
72D2
72E2
72F2
爃爓爣爳牃牓牣牳犃犓犣犳狃狓狣狳
3
7203
7213
7223
7233
7243
7253
7263
7273
7283
7293
72A3
72B3
72C3
72D3
72E3
72F3
爄爔爤爴牄牔牤牴犄犔犤犴狄狔狤狴
4
7204
7214
7224
7234
7244
7254
7264
7274
7284
7294
72A4
72B4
72C4
72D4
72E4
72F4
爅爕爥爵牅牕牥牵犅犕犥犵狅狕狥狵
5
7205
7215
7225
7235
7245
7255
7265
7275
7285
7295
72A5
72B5
72C5
72D5
72E5
72F5
爆爖爦父牆牖牦牶犆犖犦状狆狖狦狶
6
7206
7216
7226
7236
7246
7256
7266
7276
7286
7296
72A6
72B6
72C6
72D6
72E6
72F6
爇爗爧爷片牗牧牷犇犗犧犷狇狗狧狷
7
7207
7217
7227
7237
7247
7257
7267
7277
7287
7297
72A7
72B7
72C7
72D7
72E7
72F7
爈爘爨爸版牘牨牸犈犘犨犸狈狘狨狸
8
7208
7218
7228
7238
7248
7258
7268
7278
7288
7298
72A8
72B8
72C8
72D8
72E8
72F8
爉爙爩爹牉牙物特犉犙犩犹狉狙狩狹
9
7209
7219
7229
7239
7249
7259
7269
7279
7289
7299
72A9
72B9
72C9
72D9
72E9
72F9
爊爚爪爺牊牚牪牺犊犚犪犺狊狚狪狺
A
720A
721A
722A
723A
724A
725A
726A
727A
728A
729A
72AA
72BA
72CA
72DA
72EA
72FA
爋爛爫爻牋牛牫牻犋犛犫犻狋狛狫狻
B
720B
C
D
721B
722B
723B
724B
725B
726B
727B
728B
729B
72AB
72BB
72CB
72DB
72EB
72FB
爌爜爬爼牌牜牬牼犌犜犬犼狌狜独狼 720C
721C
722C
723C
724C
725C
726C
727C
728C
729C
72AC
72BC
72CC
72DC
72EC
72FC
爍爝爭爽牍牝牭牽犍犝犭犽狍狝狭狽 720D
721D
722D
723D
724D
725D
726D
727D
728D
729D
72AD
72BD
72CD
72DD
72ED
72FD
爎爞爮爾牎牞牮牾犎犞犮犾狎狞狮狾 720E
F
721
72FF
爀爐爠爰牀牐牠牰犀犐犠犰狀狐狠狰
0
E
CJK Unified Ideographs
721E
722E
723E
724E
725E
726E
727E
728E
729E
72AE
72BE
72CE
72DE
72EE
72FE
爏爟爯爿牏牟牯牿犏犟犯犿狏狟狯狿 720F
352
721F
722F
723F
724F
725F
726F
727F
728F
729F
72AF
72BF
72CF
72DF
72EF
72FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
7300 730
7300
732
733
734
735
736
737
738
739
73A 73B 73C 73D 73E
73F
7310
7320
7330
7340
7350
7360
7370
7380
7390
73A0
73B0
73C0
73D0
73E0
73F0
猁猑猡猱獁獑獡獱玁玑玡玱珁珑珡珱
1
7301
7311
7321
7331
7341
7351
7361
7371
7381
7391
73A1
73B1
73C1
73D1
73E1
73F1
猂猒猢猲獂獒獢獲玂玒玢玲珂珒珢珲
2
7302
7312
7322
7332
7342
7352
7362
7372
7382
7392
73A2
73B2
73C2
73D2
73E2
73F2
猃猓猣猳獃獓獣獳玃玓玣玳珃珓珣珳
3
7303
7313
7323
7333
7343
7353
7363
7373
7383
7393
73A3
73B3
73C3
73D3
73E3
73F3
猄猔猤猴獄獔獤獴玄玔玤玴珄珔珤珴
4
7304
7314
7324
7334
7344
7354
7364
7374
7384
7394
73A4
73B4
73C4
73D4
73E4
73F4
猅猕猥猵獅獕獥獵玅玕玥玵珅珕珥珵
5
7305
7315
7325
7335
7345
7355
7365
7375
7385
7395
73A5
73B5
73C5
73D5
73E5
73F5
猆猖猦猶獆獖獦獶玆玖玦玶珆珖珦珶
6
7306
7316
7326
7336
7346
7356
7366
7376
7386
7396
73A6
73B6
73C6
73D6
73E6
73F6
猇猗猧猷獇獗獧獷率玗玧玷珇珗珧珷
7
7307
7317
7327
7337
7347
7357
7367
7377
7387
7397
73A7
73B7
73C7
73D7
73E7
73F7
猈猘猨猸獈獘獨獸玈玘玨玸珈珘珨珸
8
7308
7318
7328
7338
7348
7358
7368
7378
7388
7398
73A8
73B8
73C8
73D8
73E8
73F8
猉猙猩猹獉獙獩獹玉玙玩玹珉珙珩珹
9
7309
7319
7329
7339
7349
7359
7369
7379
7389
7399
73A9
73B9
73C9
73D9
73E9
73F9
猊猚猪猺獊獚獪獺玊玚玪玺珊珚珪珺
A
730A
731A
732A
733A
734A
735A
736A
737A
738A
739A
73AA
73BA
73CA
73DA
73EA
73FA
猋猛猫猻獋獛獫獻王玛玫玻珋珛珫珻
B
730B
C
D
731B
732B
733B
734B
735B
736B
737B
738B
739B
73AB
73BB
73CB
73DB
73EB
73FB
猌猜猬猼獌獜獬獼玌玜玬玼珌珜珬珼 730C
731C
732C
733C
734C
735C
736C
737C
738C
739C
73AC
73BC
73CC
73DC
73EC
73FC
猍猝猭猽獍獝獭獽玍玝玭玽珍珝班珽 730D
731D
732D
733D
734D
735D
736D
737D
738D
739D
73AD
73BD
73CD
73DD
73ED
73FD
猎猞献猾獎獞獮獾玎玞玮玾珎珞珮現 730E
F
731
73FF
猀猐猠猰獀獐獠獰玀玐玠现珀珐珠珰
0
E
CJK Unified Ideographs
731E
732E
733E
734E
735E
736E
737E
738E
739E
73AE
73BE
73CE
73DE
73EE
73FE
猏猟猯猿獏獟獯獿玏玟环玿珏珟珯珿 730F
731F
732F
733F
734F
735F
736F
737F
738F
739F
73AF
73BF
73CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
73DF
73EF
73FF
353
7400 740
7400
742
743
744
745
746
747
748
749
74A 74B 74C 74D 74E
74F
7410
7420
7430
7440
7450
7460
7470
7480
7490
74A0
74B0
74C0
74D0
74E0
74F0
琁琑琡琱瑁瑑瑡瑱璁璑璡璱瓁瓑瓡瓱
1
7401
7411
7421
7431
7441
7451
7461
7471
7481
7491
74A1
74B1
74C1
74D1
74E1
74F1
琂琒琢琲瑂瑒瑢瑲璂璒璢璲瓂瓒瓢瓲
2
7402
7412
7422
7432
7442
7452
7462
7472
7482
7492
74A2
74B2
74C2
74D2
74E2
74F2
球琓琣琳瑃瑓瑣瑳璃璓璣璳瓃瓓瓣瓳
3
7403
7413
7423
7433
7443
7453
7463
7473
7483
7493
74A3
74B3
74C3
74D3
74E3
74F3
琄琔琤琴瑄瑔瑤瑴璄璔璤璴瓄瓔瓤瓴
4
7404
7414
7424
7434
7444
7454
7464
7474
7484
7494
74A4
74B4
74C4
74D4
74E4
74F4
琅琕琥琵瑅瑕瑥瑵璅璕璥璵瓅瓕瓥瓵
5
7405
7415
7425
7435
7445
7455
7465
7475
7485
7495
74A5
74B5
74C5
74D5
74E5
74F5
理琖琦琶瑆瑖瑦瑶璆璖璦璶瓆瓖瓦瓶
6
7406
7416
7426
7436
7446
7456
7466
7476
7486
7496
74A6
74B6
74C6
74D6
74E6
74F6
琇琗琧琷瑇瑗瑧瑷璇璗璧璷瓇瓗瓧瓷
7
7407
7417
7427
7437
7447
7457
7467
7477
7487
7497
74A7
74B7
74C7
74D7
74E7
74F7
琈琘琨琸瑈瑘瑨瑸璈璘璨璸瓈瓘瓨瓸
8
7408
7418
7428
7438
7448
7458
7468
7478
7488
7498
74A8
74B8
74C8
74D8
74E8
74F8
琉琙琩琹瑉瑙瑩瑹璉璙璩璹瓉瓙瓩瓹
9
7409
7419
7429
7439
7449
7459
7469
7479
7489
7499
74A9
74B9
74C9
74D9
74E9
74F9
琊琚琪琺瑊瑚瑪瑺璊璚璪璺瓊瓚瓪瓺
A
740A
741A
742A
743A
744A
745A
746A
747A
748A
749A
74AA
74BA
74CA
74DA
74EA
74FA
琋琛琫琻瑋瑛瑫瑻璋璛璫璻瓋瓛瓫瓻
B
740B
C
D
741B
742B
743B
744B
745B
746B
747B
748B
749B
74AB
74BB
74CB
74DB
74EB
74FB
琌琜琬琼瑌瑜瑬瑼璌璜璬璼瓌瓜瓬瓼 740C
741C
742C
743C
744C
745C
746C
747C
748C
749C
74AC
74BC
74CC
74DC
74EC
74FC
琍琝琭琽瑍瑝瑭瑽璍璝璭璽瓍瓝瓭瓽 740D
741D
742D
743D
744D
745D
746D
747D
748D
749D
74AD
74BD
74CD
74DD
74ED
74FD
琎琞琮琾瑎瑞瑮瑾璎璞璮璾瓎瓞瓮瓾 740E
F
741
74FF
琀琐琠琰瑀瑐瑠瑰璀璐璠環瓀瓐瓠瓰
0
E
CJK Unified Ideographs
741E
742E
743E
744E
745E
746E
747E
748E
749E
74AE
74BE
74CE
74DE
74EE
74FE
琏琟琯琿瑏瑟瑯瑿璏璟璯璿瓏瓟瓯瓿 740F
354
741F
742F
743F
744F
745F
746F
747F
748F
749F
74AF
74BF
74CF
74DF
74EF
74FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
7500 750
7500
752
753
754
755
756
757
758
759
75A 75B 75C 75D 75E
75F
7510
7520
7530
7540
7550
7560
7570
7580
7590
75A0
75B0
75C0
75D0
75E0
75F0
甁甑甡由畁畑畡畱疁疑疡疱痁痑痡痱
1
7501
7511
7521
7531
7541
7551
7561
7571
7581
7591
75A1
75B1
75C1
75D1
75E1
75F1
甂甒產甲畂畒畢畲疂疒疢疲痂痒痢痲
2
7502
7512
7522
7532
7542
7552
7562
7572
7582
7592
75A2
75B2
75C2
75D2
75E2
75F2
甃甓産申畃畓畣畳疃疓疣疳痃痓痣痳
3
7503
7513
7523
7533
7543
7553
7563
7573
7583
7593
75A3
75B3
75C3
75D3
75E3
75F3
甄甔甤甴畄畔畤畴疄疔疤疴痄痔痤痴
4
7504
7514
7524
7534
7544
7554
7564
7574
7584
7594
75A4
75B4
75C4
75D4
75E4
75F4
甅甕甥电畅畕略畵疅疕疥疵病痕痥痵
5
7505
7515
7525
7535
7545
7555
7565
7575
7585
7595
75A5
75B5
75C5
75D5
75E5
75F5
甆甖甦甶畆畖畦當疆疖疦疶痆痖痦痶
6
7506
7516
7526
7536
7546
7556
7566
7576
7586
7596
75A6
75B6
75C6
75D6
75E6
75F6
甇甗甧男畇畗畧畷疇疗疧疷症痗痧痷
7
7507
7517
7527
7537
7547
7557
7567
7577
7587
7597
75A7
75B7
75C7
75D7
75E7
75F7
甈甘用甸畈畘畨畸疈疘疨疸痈痘痨痸
8
7508
7518
7528
7538
7548
7558
7568
7578
7588
7598
75A8
75B8
75C8
75D8
75E8
75F8
甉甙甩甹畉留畩畹疉疙疩疹痉痙痩痹
9
7509
7519
7529
7539
7549
7559
7569
7579
7589
7599
75A9
75B9
75C9
75D9
75E9
75F9
甊甚甪町畊畚番畺疊疚疪疺痊痚痪痺
A
750A
751A
752A
753A
754A
755A
756A
757A
758A
759A
75AA
75BA
75CA
75DA
75EA
75FA
甋甛甫画畋畛畫畻疋疛疫疻痋痛痫痻
B
750B
C
D
751B
752B
753B
754B
755B
756B
757B
758B
759B
75AB
75BB
75CB
75DB
75EB
75FB
甌甜甬甼界畜畬畼疌疜疬疼痌痜痬痼 750C
751C
752C
753C
754C
755C
756C
757C
758C
759C
75AC
75BC
75CC
75DC
75EC
75FC
甍甝甭甽畍畝畭畽疍疝疭疽痍痝痭痽 750D
751D
752D
753D
754D
755D
756D
757D
758D
759D
75AD
75BD
75CD
75DD
75ED
75FD
甎甞甮甾畎畞畮畾疎疞疮疾痎痞痮痾 750E
F
751
75FF
甀甐甠田畀畐畠異疀疐疠疰痀痐痠痰
0
E
CJK Unified Ideographs
751E
752E
753E
754E
755E
756E
757E
758E
759E
75AE
75BE
75CE
75DE
75EE
75FE
甏生甯甿畏畟畯畿疏疟疯疿痏痟痯痿 750F
751F
752F
753F
754F
755F
756F
757F
758F
759F
75AF
75BF
75CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
75DF
75EF
75FF
355
Yi Syllables Range: A000–A48F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
A000 A00 0
A00E
F
A0E
A010
A020
A030
A040
A050
A060
A070
A080
A090
A0A0
A0B0
A0C0
A0D0
A0E0
A011
A021
A031
A041
A051
A061
A071
A081
A091
A0A1
A0B1
A0C1
A0D1
A0E1
A012
A022
A032
A042
A052
A062
A072
A082
A092
A0A2
A0B2
A0C2
A0D2
A0E2
A013
A023
A033
A043
A053
A063
A073
A083
A093
A0A3
A0B3
A0C3
A0D3
A0E3
A014
A024
A034
A044
A054
A064
A074
A084
A094
A0A4
A0B4
A0C4
A0D4
A0E4
A015
A025
A035
A045
A055
A065
A075
A085
A095
A0A5
A0B5
A0C5
A0D5
A0E5
A016
A026
A036
A046
A056
A066
A076
A086
A096
A0A6
A0B6
A0C6
A0D6
A0E6
A017
A027
A037
A047
A057
A067
A077
A087
A097
A0A7
A0B7
A0C7
A0D7
A0E7
A018
A028
A038
A048
A058
A068
A078
A088
A098
A0A8
A0B8
A0C8
A0D8
A0E8
A019
A029
A039
A049
A059
A069
A079
A089
A099
A0A9
A0B9
A0C9
A0D9
A0E9
A01A
A02A
A03A
A04A
A05A
A06A
A07A
A08A
A09A
A0AA
A0BA
A0CA
A0DA
A0EA
A01B
A02B
A03B
A04B
A05B
A06B
A07B
A08B
A09B
A0AB
A0BB
A0CB
A0DB
A0EB
A01C
A02C
A03C
A04C
A05C
A06C
A07C
A08C
A09C
A0AC
A0BC
A0CC
A0DC
A0EC
/ ? O _ o ¯ ¿ Ï ß ï A00D
E
A0D
. > N ^ n ~ ® ¾ Î Þ î A00C
D
A0C
- = M ] m } ½ Í Ý í A00B
C
A0B
, < L \ l | ¬ ¼ Ì Ü ì A00A
B
A0A
+ ; K [ k { « » Ë Û ë A009
A
A09
* : J Z j z ª º Ê Ú ê A008
9
A08
) 9 I Y i y © ¹ É Ù é A007
8
A07
( 8 H X h x ¨ ¸ È Ø è A006
7
A06
' 7 G W g w § · Ç × ç A005
6
A05
& 6 F V f v ¦ ¶ Æ Ö æ A004
5
A04
% 5 E U e u
¥ µ Å Õ å A003
4
A03
$ 4 D T d t ¤ ´ Ä Ô ä A002
3
A02
# 3 C S c s £ ³ Ã Ó ã A001
2
A01
A0EF
" 2 B R b r ¢ ² Â Ò â A000
1
Yi Syllables
A01D
A01E
A02D
A03D
A04D
A05D
A06D
A07D
A08D
A09D
A0AD
A0BD
A0CD
A0DD
A0ED
0 @ P ` p ° À Ð à ð A02E
A03E
A04E
A05E
A06E
A07E
A08E
A09E
A0AE
A0BE
A0CE
A0DE
A0EE
! 1 A Q a q ¡ ± Á Ñ á ñ A00F
398
A01F
A02F
A03F
A04F
A05F
A06F
A07F
A08F
A09F
A0AF
A0BF
A0CF
A0DF
A0EF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
A0F0 A0F 0
A1D
A110
A120
A130
A140
A150
A160
A170
A180
A190
A1A0
A1B0
A1C0
A1D0
A101
A111
A121
A131
A141
A151
A161
A171
A181
A191
A1A1
A1B1
A1C1
A1D1
A102
A112
A122
A132
A142
A152
A162
A172
A182
A192
A1A2
A1B2
A1C2
A1D2
A103
A113
A123
A133
A143
A153
A163
A173
A183
A193
A1A3
A1B3
A1C3
A1D3
A104
A114
A124
A134
A144
A154
A164
A174
A184
A194
A1A4
A1B4
A1C4
A1D4
A105
A115
A125
A135
A145
A155
A165
A175
A185
A195
A1A5
A1B5
A1C5
A1D5
A106
A116
A126
A136
A146
A156
A166
A176
A186
A196
A1A6
A1B6
A1C6
A1D6
A107
A117
A127
A137
A147
A157
A167
A177
A187
A197
A1A7
A1B7
A1C7
A1D7
A108
A118
A128
A138
A148
A158
A168
A178
A188
A198
A1A8
A1B8
A1C8
A1D8
A109
A119
A129
A139
A149
A159
A169
A179
A189
A199
A1A9
A1B9
A1C9
A1D9
A10A
A11A
A12A
A13A
A14A
A15A
A16A
A17A
A18A
A19A
A1AA
A1BA
A1CA
A1DA
A10B
A10C
A11B
A11C
A12B
A13B
A14B
A15B
A16B
A17B
A18B
A19B
A1AB
A1BB
A1CB
A1DB
0 @ P ` p ° À Ð à A12C
A13C
A14C
A15C
A16C
A17C
A18C
A19C
A1AC
A1BC
A1CC
A1DC
A10D
A11D
A12D
A13D
A14D
A15D
A16D
A17D
A18D
A19D
A1AD
A1BD
A1CD
A1DD
" 2 B R b r ¢ ² Â Ò â A0FE
F
A1C
! 1 A Q a q ¡ ± Á Ñ á A0FD
E
A100
þ A0FC
D
A1B
ý / ? O _ o ¯ ¿ Ï ß A0FB
C
A1A
ü . > N ^ n ~ ® ¾ Î Þ A0FA
B
A19
û - = M ] m } ½ Í Ý A0F9
A
A18
ú , < L \ l | ¬ ¼ Ì Ü A0F8
9
A17
ù + ; K [ k { « » Ë Û A0F7
8
A16
ø * : J Z j z ª º Ê Ú A0F6
7
A15
÷ ) 9 I Y i y © ¹ É Ù A0F5
6
A14
ö ( 8 H X h x ¨ ¸ È Ø A0F4
5
A13
õ ' 7 G W g w § · Ç × A0F3
4
A12
ô & 6 F V f v ¦ ¶ Æ Ö A0F2
3
A11
ó % 5 E U e u
¥ µ Å Õ A0F1
2
A10
A1DF
ò $ 4 D T d t ¤ ´ Ä Ô A0F0
1
Yi Syllables
A10E
A11E
A12E
A13E
A14E
A15E
A16E
A17E
A18E
A19E
A1AE
A1BE
A1CE
A1DE
# 3 C S c s £ ³ Ã Ó ã A0FF
A10F
A11F
A12F
A13F
A14F
A15F
A16F
A17F
A18F
A19F
A1AF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
A1BF
A1CF
A1DF
399
A1E0 A1E 0
A2C
A200
A210
A220
A230
A240
A250
A260
A270
A280
A290
A2A0
A2B0
A2C0
A1F1
A201
A211
A221
A231
A241
A251
A261
A271
A281
A291
A2A1
A2B1
A2C1
A1F2
A202
A212
A222
A232
A242
A252
A262
A272
A282
A292
A2A2
A2B2
A2C2
A1F3
A203
A213
A223
A233
A243
A253
A263
A273
A283
A293
A2A3
A2B3
A2C3
A1F4
A204
A214
A224
A234
A244
A254
A264
A274
A284
A294
A2A4
A2B4
A2C4
A1F5
A205
A215
A225
A235
A245
A255
A265
A275
A285
A295
A2A5
A2B5
A2C5
A1F6
A206
A216
A226
A236
A246
A256
A266
A276
A286
A296
A2A6
A2B6
A2C6
A1F7
A207
A217
A227
A237
A247
A257
A267
A277
A287
A297
A2A7
A2B7
A2C7
A1F8
A208
A218
A228
A238
A248
A258
A268
A278
A288
A298
A2A8
A2B8
A2C8
A1F9
A209
A1FA
A20A
A219
A21A
A229
A239
A249
A259
A269
A279
A289
A299
A2A9
A2B9
A2C9
0 @ P ` p ° À Ð A22A
A23A
A24A
A25A
A26A
A27A
A28A
A29A
A2AA
A2BA
A2CA
A1FB
A20B
A21B
A22B
A23B
A24B
A25B
A26B
A27B
A28B
A29B
A2AB
A2BB
A2CB
A1FC
A20C
A21C
A22C
A23C
A24C
A25C
A26C
A27C
A28C
A29C
A2AC
A2BC
A2CC
A1FD
A20D
A21D
A22D
A23D
A24D
A25D
A26D
A27D
A28D
A29D
A2AD
A2BD
A2CD
ò $ 4 D T d t ¤ ´ Ä Ô A1EE
F
A2B
ñ # 3 C S c s £ ³ Ã Ó A1ED
E
A2A
ð " 2 B R b r ¢ ² Â Ò A1EC
D
A29
ï ! 1 A Q a q ¡ ± Á Ñ A1EB
C
A1F0
î þ A1EA
B
A28
í ý / ? O _ o ¯ ¿ Ï A1E9
A
A27
ì ü . > N ^ n ~ ® ¾ Î A1E8
9
A26
ë û - = M ] m } ½ Í A1E7
8
A25
ê ú , < L \ l | ¬ ¼ Ì A1E6
7
A24
é ù + ; K [ k { « » Ë A1E5
6
A23
è ø * : J Z j z ª º Ê A1E4
5
A22
ç ÷ ) 9 I Y i y © ¹ É A1E3
4
A21
æ ö ( 8 H X h x ¨ ¸ È A1E2
3
A20
å õ ' 7 G W g w § · Ç A1E1
2
A1F
A2CF
ä ô & 6 F V f v ¦ ¶ Æ A1E0
1
Yi Syllables
A1FE
A20E
A21E
A22E
A23E
A24E
A25E
A26E
A27E
A28E
A29E
A2AE
A2BE
A2CE
ó % 5 E U e u
¥ µ Å Õ A1EF
400
A1FF
A20F
A21F
A22F
A23F
A24F
A25F
A26F
A27F
A28F
A29F
A2AF
A2BF
A2CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
A2D0 A2D 0
A2F0
A300
A310
A320
A330
A340
A350
A360
A370
A380
A390
A3A0
A2E1
A2F1
A301
A311
A321
A331
A341
A351
A361
A371
A381
A391
A3A1
A2E2
A2F2
A302
A312
A322
A332
A342
A352
A362
A372
A382
A392
A3A2
A2E3
A2F3
A303
A313
A323
A333
A343
A353
A363
A373
A383
A393
A3A3
A2E4
A2F4
A304
A314
A324
A334
A344
A354
A364
A374
A384
A394
A3A4
A2E5
A2F5
A305
A315
A325
A335
A345
A355
A365
A375
A385
A395
A3A5
A2E6
A2F6
A306
A316
A326
A336
A346
A356
A366
A376
A386
A396
A3A6
A2E7
A2F7
A307
A2E8
A2F8
A308
A317
A318
A327
A337
A347
A357
A367
A377
A387
A397
A3A7
0 @ P ` p ° A328
A338
A348
A358
A368
A378
A388
A398
A3A8
A2E9
A2F9
A309
A319
A329
A339
A349
A359
A369
A379
A389
A399
A3A9
A2EA
A2FA
A30A
A31A
A32A
A33A
A34A
A35A
A36A
A37A
A38A
A39A
A3AA
A2EB
A2FB
A30B
A31B
A32B
A33B
A34B
A35B
A36B
A37B
A38B
A39B
A3AB
A2EC
A2FC
A30C
A31C
A32C
A33C
A34C
A35C
A36C
A37C
A38C
A39C
A3AC
A2ED
A2FD
A30D
A31D
A32D
A33D
A34D
A35D
A36D
A37D
A38D
A39D
A3AD
ä ô & 6 F V f v ¦ ¶ A2DE
F
A3A
ã ó % 5 E U e u
¥ µ A2DD
E
A39
â ò $ 4 D T d t ¤ ´ A2DC
D
A38
á ñ # 3 C S c s £ ³ A2DB
C
A37
à ð " 2 B R b r ¢ ² A2DA
B
A36
ß ï ! 1 A Q a q ¡ ± A2D9
A
A2E0
Þ î þ A2D8
9
A35
Ý í ý / ? O _ o ¯ A2D7
8
A34
Ü ì ü . > N ^ n ~ ® A2D6
7
A33
Û ë û - = M ] m } A2D5
6
A32
Ú ê ú , < L \ l | ¬ A2D4
5
A31
Ù é ù + ; K [ k { « A2D3
4
A30
Ø è ø * : J Z j z ª A2D2
3
A2F
× ç ÷ ) 9 I Y i y ©
A2D1
2
A2E
A3AF
Ö æ ö ( 8 H X h x ¨ A2D0
1
Yi Syllables
A2EE
A2FE
A30E
A31E
A32E
A33E
A34E
A35E
A36E
A37E
A38E
A39E
A3AE
å õ ' 7 G W g w § · A2DF
A2EF
A2FF
A30F
A31F
A32F
A33F
A34F
A35F
A36F
A37F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
A38F
A39F
A3AF
401
A3B0 A3B 0
A3D0
A3E0
A3F0
A400
A410
A420
A430
A440
A450
A460
A470
A480
A3C1
A3D1
A3E1
A3F1
A401
A411
A421
A431
A441
A451
A461
A471
A481
A3C2
A3D2
A3E2
A3F2
A402
A412
A422
A432
A442
A452
A462
A472
A482
A3C3
A3D3
A3E3
A3F3
A403
A413
A423
A433
A443
A453
A463
A473
A483
A3C4
A3D4
A3E4
A3F4
A404
A414
A424
A434
A444
A454
A464
A474
A484
A3C5
A3D5
A3E5
A3F5
A405
A3C6
A3D6
A3E6
A3F6
A406
A415
A416
A425
A435
A445
A455
A465
A475
A485
0 @ P ` p A426
A436
A446
A456
A466
A476
A486
A3C7
A3D7
A3E7
A3F7
A407
A417
A427
A437
A447
A457
A467
A477
A487
A3C8
A3D8
A3E8
A3F8
A408
A418
A428
A438
A448
A458
A468
A478
A488
A3C9
A3D9
A3E9
A3F9
A409
A419
A429
A439
A449
A459
A469
A479
A489
A3CA
A3DA
A3EA
A3FA
A40A
A41A
A42A
A43A
A44A
A45A
A46A
A47A
A48A
A3CB
A3DB
A3EB
A3FB
A40B
A41B
A42B
A43B
A44B
A45B
A46B
A47B
A48B
A3CC
A3DC
A3EC
A3FC
A40C
A41C
A42C
A43C
A44C
A45C
A46C
A47C
A48C
A3CD
A3DD
A3ED
A3FD
A40D
A41D
A42D
A43D
A44D
A45D
A46D
A47D
Æ Ö æ ö ( 8 H X h x A3BE
F
A48
Å Õ å õ ' 7 G W g w A3BD
E
A47
Ä Ô ä ô & 6 F V f v A3BC
D
A46
Ã Ó ã ó % 5 E U e u
A3BB
C
A45
Â Ò â ò $ 4 D T d t A3BA
B
A44
Á Ñ á ñ # 3 C S c s A3B9
A
A43
À Ð à ð " 2 B R b r A3B8
9
A42
¿ Ï ß ï ! 1 A Q a q A3B7
8
A3C0
¾ Î Þ î þ A3B6
7
A41
½ Í Ý í ý / ? O _ o A3B5
6
A40
¼ Ì Ü ì ü . > N ^ n ~ A3B4
5
A3F
» Ë Û ë û - = M ] m } A3B3
4
A3E
º Ê Ú ê ú , < L \ l | A3B2
3
A3D
¹ É Ù é ù + ; K [ k { A3B1
2
A3C
A48F
¸ È Ø è ø * : J Z j z A3B0
1
Yi Syllables
A3CE
A3DE
A3EE
A3FE
A40E
A41E
A42E
A43E
A44E
A45E
A46E
A47E
Ç × ç ÷ ) 9 I Y i y A3BF
402
A3CF
A3DF
A3EF
A3FF
A40F
A41F
A42F
A43F
A44F
A45F
A46F
A47F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
A000
Yi Syllables
Syllables A000 A001 A002 A003 A004 A005 A006 A007 A008 A009 A00A A00B A00C A00D A00E A00F A010 A011 A012 A013 A014
F G H I J K L M N O P Q R S T U V W X Y Z
YI SYLLABLE IT YI SYLLABLE IX YI SYLLABLE I YI SYLLABLE IP YI SYLLABLE IET YI SYLLABLE IEX YI SYLLABLE IE YI SYLLABLE IEP YI SYLLABLE AT YI SYLLABLE AX YI SYLLABLE A YI SYLLABLE AP YI SYLLABLE UOX YI SYLLABLE UO YI SYLLABLE UOP YI SYLLABLE OT YI SYLLABLE OX YI SYLLABLE O YI SYLLABLE OP YI SYLLABLE EX YI SYLLABLE E
Syllable iteration mark A015
[ YI SYLLABLE WU YI SYLLABLE ITERATION MARK • name is a misnomer
Syllables A016 A017 A018 A019 A01A A01B A01C A01D A01E A01F A020 A021 A022 A023 A024 A025 A026 A027 A028 A029 A02A A02B A02C A02D A02E A02F A030 A031 A032 A033 A034 A035 A036 A037 A038 A039 A03A A03B
\ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~
YI SYLLABLE BIT YI SYLLABLE BIX YI SYLLABLE BI YI SYLLABLE BIP YI SYLLABLE BIET YI SYLLABLE BIEX YI SYLLABLE BIE YI SYLLABLE BIEP YI SYLLABLE BAT YI SYLLABLE BAX YI SYLLABLE BA YI SYLLABLE BAP YI SYLLABLE BUOX YI SYLLABLE BUO YI SYLLABLE BUOP YI SYLLABLE BOT YI SYLLABLE BOX YI SYLLABLE BO YI SYLLABLE BOP YI SYLLABLE BEX YI SYLLABLE BE YI SYLLABLE BEP YI SYLLABLE BUT YI SYLLABLE BUX YI SYLLABLE BU YI SYLLABLE BUP YI SYLLABLE BURX YI SYLLABLE BUR YI SYLLABLE BYT YI SYLLABLE BYX YI SYLLABLE BY YI SYLLABLE BYP YI SYLLABLE BYRX YI SYLLABLE BYR YI SYLLABLE PIT YI SYLLABLE PIX YI SYLLABLE PI YI SYLLABLE PIP
A03C A03D A03E A03F A040 A041 A042 A043 A044 A045 A046 A047 A048 A049 A04A A04B A04C A04D A04E A04F A050 A051 A052 A053 A054 A055 A056 A057 A058 A059 A05A A05B A05C A05D A05E A05F A060 A061 A062 A063 A064 A065 A066 A067 A068 A069 A06A A06B A06C A06D A06E A06F A070 A071 A072 A073 A074 A075 A076 A077 A078 A079 A07A A07B A07C A07D A07E A07F
! " # $ % & ' ( ) * + , . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E
A07F YI SYLLABLE PIEX YI SYLLABLE PIE YI SYLLABLE PIEP YI SYLLABLE PAT YI SYLLABLE PAX YI SYLLABLE PA YI SYLLABLE PAP YI SYLLABLE PUOX YI SYLLABLE PUO YI SYLLABLE PUOP YI SYLLABLE POT YI SYLLABLE POX YI SYLLABLE PO YI SYLLABLE POP YI SYLLABLE PUT YI SYLLABLE PUX YI SYLLABLE PU YI SYLLABLE PUP YI SYLLABLE PURX YI SYLLABLE PUR YI SYLLABLE PYT YI SYLLABLE PYX YI SYLLABLE PY YI SYLLABLE PYP YI SYLLABLE PYRX YI SYLLABLE PYR YI SYLLABLE BBIT YI SYLLABLE BBIX YI SYLLABLE BBI YI SYLLABLE BBIP YI SYLLABLE BBIET YI SYLLABLE BBIEX YI SYLLABLE BBIE YI SYLLABLE BBIEP YI SYLLABLE BBAT YI SYLLABLE BBAX YI SYLLABLE BBA YI SYLLABLE BBAP YI SYLLABLE BBUOX YI SYLLABLE BBUO YI SYLLABLE BBUOP YI SYLLABLE BBOT YI SYLLABLE BBOX YI SYLLABLE BBO YI SYLLABLE BBOP YI SYLLABLE BBEX YI SYLLABLE BBE YI SYLLABLE BBEP YI SYLLABLE BBUT YI SYLLABLE BBUX YI SYLLABLE BBU YI SYLLABLE BBUP YI SYLLABLE BBURX YI SYLLABLE BBUR YI SYLLABLE BBYT YI SYLLABLE BBYX YI SYLLABLE BBY YI SYLLABLE BBYP YI SYLLABLE NBIT YI SYLLABLE NBIX YI SYLLABLE NBI YI SYLLABLE NBIP YI SYLLABLE NBIEX YI SYLLABLE NBIE YI SYLLABLE NBIEP YI SYLLABLE NBAT YI SYLLABLE NBAX YI SYLLABLE NBA
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
403
Yi Radicals Range: A490–A4CF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Modifier Tone Letters Range: A700–A71F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Latin Extended-D Range: A720–A7FF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Syloti Nagri Range: A800–A82F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Phags-pa Range: A840–A87F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
A840
Phags-pa A84 0
A854
A864
A874
A855
A865
A875
A856
A866
A876
A857
A867
A877
A858
A868
A859
A869
A85A
A86A
A85B
A86B
A85C
A86C
A85D
A86D
æ Œ fi A84E
F
A873
Ω Õ › A84D
E
A863
º Ã ‹ A84C
D
A853
ª À ¤ A84B
C
A872
∫ ⁄ A84A
B
A862
π … Ÿ A849
A
A852
∏ » ÿ A848
9
A871
∑ « ◊ Á A847
8
A861
∂ ∆ ÷ Ê A846
7
A851
μ ≈ ’ Â A845
6
A870
¥ ƒ ‘ ‰ A844
5
A860
≥ √ ” „ A843
4
A850
≤ ¬ “ ‚ A842
3
A87
± ¡ — · A841
2
A86
∞ ¿ – ‡ A840
1
A85
A87F
A85E
A86E
ø œ fl A84F
A85F
A86F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
417
Hangul Syllables Range: AC00–D7AF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
AC00
Hangul Syllables
ACFF
AC0 AC1 AC2 AC3 AC4 AC5 AC6 AC7 AC8 AC9 ACA ACB ACC ACD ACE ACF
가감갠갰걀걐걠거검겐겠결곀곐고곰
0
AC00
AC01
AC30
AC40
AC50
AC60
AC70
AC80
AC90
ACA0
ACB0
ACC0
ACD0
ACE0
ACF0
AC11
AC21
AC31
AC41
AC51
AC61
AC71
AC81
AC91
ACA1
ACB1
ACC1
ACD1
ACE1
ACF1
갂값갢갲걂걒걢걲겂겒겢겲곂곒곢곲
2
AC02
AC12
AC22
AC32
AC42
AC52
AC62
AC72
AC82
AC92
ACA2
ACB2
ACC2
ACD2
ACE2
ACF2
갃갓갣갳걃걓걣걳것겓겣겳곃곓곣곳
3
AC03
AC13
AC23
AC33
AC43
AC53
AC63
AC73
AC83
AC93
ACA3
ACB3
ACC3
ACD3
ACE3
ACF3
간갔갤갴걄걔걤건겄겔겤겴계곔곤곴
4
AC04
AC14
AC24
AC34
AC44
AC54
AC64
AC74
AC84
AC94
ACA4
ACB4
ACC4
ACD4
ACE4
ACF4
갅강갥갵걅걕걥걵겅겕겥겵곅곕곥공
5
AC05
AC15
AC25
AC35
AC45
AC55
AC65
AC75
AC85
AC95
ACA5
ACB5
ACC5
ACD5
ACE5
ACF5
갆갖갦갶걆걖걦걶겆겖겦겶곆곖곦곶
6
AC06
AC16
AC26
AC36
AC46
AC56
AC66
AC76
AC86
AC96
ACA6
ACB6
ACC6
ACD6
ACE6
ACF6
갇갗갧갷걇걗걧걷겇겗겧겷곇곗곧곷
7
AC07
AC17
AC27
AC37
AC47
AC57
AC67
AC77
AC87
AC97
ACA7
ACB7
ACC7
ACD7
ACE7
ACF7
갈갘갨갸걈걘걨걸겈겘겨겸곈곘골곸
8
AC08
AC18
AC28
AC38
AC48
AC58
AC68
AC78
AC88
AC98
ACA8
ACB8
ACC8
ACD8
ACE8
ACF8
갉같갩갹걉걙걩걹겉겙격겹곉곙곩곹
9
AC09
AC19
AC29
AC39
AC49
AC59
AC69
AC79
AC89
AC99
ACA9
ACB9
ACC9
ACD9
ACE9
ACF9
갊갚갪갺걊걚걪걺겊겚겪겺곊곚곪곺
A
AC0A
AC1A
AC2A
AC3A
AC4A
AC5A
AC6A
AC7A
AC8A
AC9A
ACAA
ACBA
ACCA
ACDA
ACEA
ACFA
갋갛갫갻걋걛걫걻겋겛겫겻곋곛곫곻
B
AC0B
C
D
AC1B
AC2B
AC3B
AC4B
AC5B
AC6B
AC7B
AC8B
AC9B
ACAB
ACBB
ACCB
ACDB
ACEB
ACFB
갌개갬갼걌걜걬걼게겜견겼곌곜곬과 AC0C
AC1C
AC2C
AC3C
AC4C
AC5C
AC6C
AC7C
AC8C
AC9C
ACAC
ACBC
ACCC
ACDC
ACEC
ACFC
갍객갭갽걍걝걭걽겍겝겭경곍곝곭곽 AC0D
AC1D
AC2D
AC3D
AC4D
AC5D
AC6D
AC7D
AC8D
AC9D
ACAD
ACBD
ACCD
ACDD
ACED
ACFD
갎갞갮갾걎걞걮걾겎겞겮겾곎곞곮곾 AC0E
F
AC20
각갑갡갱걁걑걡걱겁겑겡겱곁곑곡곱
1
E
AC10
AC1E
AC2E
AC3E
AC4E
AC5E
AC6E
AC7E
AC8E
AC9E
ACAE
ACBE
ACCE
ACDE
ACEE
ACFE
갏갟갯갿걏걟걯걿겏겟겯겿곏곟곯곿 AC0F
420
AC1F
AC2F
AC3F
AC4F
AC5F
AC6F
AC7F
AC8F
AC9F
ACAF
ACBF
ACCF
ACDF
ACEF
ACFF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
AD00
Hangul Syllables
ADFF
AD0 AD1 AD2 AD3 AD4 AD5 AD6 AD7 AD8 AD9 ADA ADB ADC ADD ADE ADF
관괐괠괰굀교굠군궀궐궠궰귀귐균귰
0
AD00
AD01
AD30
AD40
AD50
AD60
AD70
AD80
AD90
ADA0
ADB0
ADC0
ADD0
ADE0
ADF0
AD11
AD21
AD31
AD41
AD51
AD61
AD71
AD81
AD91
ADA1
ADB1
ADC1
ADD1
ADE1
ADF1
괂괒괢괲굂굒굢굲궂궒궢궲귂귒귢귲
2
AD02
AD12
AD22
AD32
AD42
AD52
AD62
AD72
AD82
AD92
ADA2
ADB2
ADC2
ADD2
ADE2
ADF2
괃괓괣괳굃굓굣굳궃궓궣궳귃귓귣귳
3
AD03
AD13
AD23
AD33
AD43
AD53
AD63
AD73
AD83
AD93
ADA3
ADB3
ADC3
ADD3
ADE3
ADF3
괄괔괤괴굄굔굤굴궄궔궤궴귄귔귤귴
4
AD04
AD14
AD24
AD34
AD44
AD54
AD64
AD74
AD84
AD94
ADA4
ADB4
ADC4
ADD4
ADE4
ADF4
괅괕괥괵굅굕굥굵궅궕궥궵귅귕귥귵
5
AD05
AD15
AD25
AD35
AD45
AD55
AD65
AD75
AD85
AD95
ADA5
ADB5
ADC5
ADD5
ADE5
ADF5
괆괖괦괶굆굖굦굶궆궖궦궶귆귖귦귶
6
AD06
AD16
AD26
AD36
AD46
AD56
AD66
AD76
AD86
AD96
ADA6
ADB6
ADC6
ADD6
ADE6
ADF6
괇괗괧괷굇굗굧굷궇궗궧궷귇귗귧귷
7
AD07
AD17
AD27
AD37
AD47
AD57
AD67
AD77
AD87
AD97
ADA7
ADB7
ADC7
ADD7
ADE7
ADF7
괈괘괨괸굈굘굨굸궈궘궨궸귈귘귨그
8
AD08
AD18
AD28
AD38
AD48
AD58
AD68
AD78
AD88
AD98
ADA8
ADB8
ADC8
ADD8
ADE8
ADF8
괉괙괩괹굉굙굩굹궉궙궩궹귉귙귩극
9
AD09
AD19
AD29
AD39
AD49
AD59
AD69
AD79
AD89
AD99
ADA9
ADB9
ADC9
ADD9
ADE9
ADF9
괊괚괪괺굊굚굪굺궊궚궪궺귊귚귪귺
A
AD0A
AD1A
AD2A
AD3A
AD4A
AD5A
AD6A
AD7A
AD8A
AD9A
ADAA
ADBA
ADCA
ADDA
ADEA
ADFA
괋괛괫괻굋굛굫굻궋궛궫궻귋귛귫귻
B
AD0B
C
D
AD1B
AD2B
AD3B
AD4B
AD5B
AD6B
AD7B
AD8B
AD9B
ADAB
ADBB
ADCB
ADDB
ADEB
ADFB
괌괜괬괼굌굜구굼권궜궬궼귌규귬근 AD0C
AD1C
AD2C
AD3C
AD4C
AD5C
AD6C
AD7C
AD8C
AD9C
ADAC
ADBC
ADCC
ADDC
ADEC
ADFC
괍괝괭괽굍굝국굽궍궝궭궽귍귝귭귽 AD0D
AD1D
AD2D
AD3D
AD4D
AD5D
AD6D
AD7D
AD8D
AD9D
ADAD
ADBD
ADCD
ADDD
ADED
ADFD
괎괞괮괾굎굞굮굾궎궞궮궾귎귞귮귾 AD0E
F
AD20
괁광괡괱굁굑굡굱궁궑궡궱귁귑귡귱
1
E
AD10
AD1E
AD2E
AD3E
AD4E
AD5E
AD6E
AD7E
AD8E
AD9E
ADAE
ADBE
ADCE
ADDE
ADEE
ADFE
괏괟괯괿굏굟굯굿궏궟궯궿귏귟귯귿 AD0F
AD1F
AD2F
AD3F
AD4F
AD5F
AD6F
AD7F
AD8F
AD9F
ADAF
ADBF
ADCF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
ADDF
ADEF
ADFF
421
AE00
Hangul Syllables
AEFF
AE0 AE1 AE2 AE3 AE4 AE5 AE6 AE7 AE8 AE9 AEA AEB AEC AED AEE AEF
글긐긠기김깐깠깰꺀꺐꺠꺰껀껐껠껰
0
AE00
AE01
AE30
AE40
AE50
AE60
AE70
AE80
AE90
AEA0
AEB0
AEC0
AED0
AEE0
AEF0
AE11
AE21
AE31
AE41
AE51
AE61
AE71
AE81
AE91
AEA1
AEB1
AEC1
AED1
AEE1
AEF1
긂긒긢긲깂깒깢깲꺂꺒꺢꺲껂껒껢껲
2
AE02
AE12
AE22
AE32
AE42
AE52
AE62
AE72
AE82
AE92
AEA2
AEB2
AEC2
AED2
AEE2
AEF2
긃긓긣긳깃깓깣깳꺃꺓꺣꺳껃껓껣껳
3
AE03
AE13
AE23
AE33
AE43
AE53
AE63
AE73
AE83
AE93
AEA3
AEB3
AEC3
AED3
AEE3
AEF3
긄긔긤긴깄깔깤깴꺄꺔꺤꺴껄껔껤껴
4
AE04
AE14
AE24
AE34
AE44
AE54
AE64
AE74
AE84
AE94
AEA4
AEB4
AEC4
AED4
AEE4
AEF4
긅긕긥긵깅깕깥깵꺅꺕꺥꺵껅껕껥껵
5
AE05
AE15
AE25
AE35
AE45
AE55
AE65
AE75
AE85
AE95
AEA5
AEB5
AEC5
AED5
AEE5
AEF5
긆긖긦긶깆깖깦깶꺆꺖꺦꺶껆껖껦껶
6
AE06
AE16
AE26
AE36
AE46
AE56
AE66
AE76
AE86
AE96
AEA6
AEB6
AEC6
AED6
AEE6
AEF6
긇긗긧긷깇깗깧깷꺇꺗꺧꺷껇껗껧껷
7
AE07
AE17
AE27
AE37
AE47
AE57
AE67
AE77
AE87
AE97
AEA7
AEB7
AEC7
AED7
AEE7
AEF7
금긘긨길깈깘깨깸꺈꺘꺨꺸껈께껨껸
8
AE08
AE18
AE28
AE38
AE48
AE58
AE68
AE78
AE88
AE98
AEA8
AEB8
AEC8
AED8
AEE8
AEF8
급긙긩긹깉깙깩깹꺉꺙꺩꺹껉껙껩껹
9
AE09
AE19
AE29
AE39
AE49
AE59
AE69
AE79
AE89
AE99
AEA9
AEB9
AEC9
AED9
AEE9
AEF9
긊긚긪긺깊깚깪깺꺊꺚꺪꺺껊껚껪껺
A
AE0A
AE1A
AE2A
AE3A
AE4A
AE5A
AE6A
AE7A
AE8A
AE9A
AEAA
AEBA
AECA
AEDA
AEEA
AEFA
긋긛긫긻깋깛깫깻꺋꺛꺫꺻껋껛껫껻
B
AE0B
C
D
AE1B
AE2B
AE3B
AE4B
AE5B
AE6B
AE7B
AE8B
AE9B
AEAB
AEBB
AECB
AEDB
AEEB
AEFB
긌긜긬긼까깜깬깼꺌꺜꺬꺼껌껜껬껼 AE0C
AE1C
AE2C
AE3C
AE4C
AE5C
AE6C
AE7C
AE8C
AE9C
AEAC
AEBC
AECC
AEDC
AEEC
AEFC
긍긝긭긽깍깝깭깽꺍꺝꺭꺽껍껝껭껽 AE0D
AE1D
AE2D
AE3D
AE4D
AE5D
AE6D
AE7D
AE8D
AE9D
AEAD
AEBD
AECD
AEDD
AEED
AEFD
긎긞긮긾깎깞깮깾꺎꺞꺮꺾껎껞껮껾 AE0E
F
AE20
긁긑긡긱깁깑깡깱꺁꺑꺡꺱껁껑껡껱
1
E
AE10
AE1E
AE2E
AE3E
AE4E
AE5E
AE6E
AE7E
AE8E
AE9E
AEAE
AEBE
AECE
AEDE
AEEE
AEFE
긏긟긯긿깏깟깯깿꺏꺟꺯꺿껏껟껯껿 AE0F
422
AE1F
AE2F
AE3F
AE4F
AE5F
AE6F
AE7F
AE8F
AE9F
AEAF
AEBF
AECF
AEDF
AEEF
AEFF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
AF00
Hangul Syllables
AFFF
AF0 AF1 AF2 AF3 AF4 AF5 AF6 AF7 AF8 AF9 AFA AFB AFC AFD AFE AFF
꼀꼐꼠꼰꽀꽐꽠꽰꾀꾐꾠꾰꿀꿐꿠꿰
0
AF00
AF01
AF30
AF40
AF50
AF60
AF70
AF80
AF90
AFA0
AFB0
AFC0
AFD0
AFE0
AFF0
AF11
AF21
AF31
AF41
AF51
AF61
AF71
AF81
AF91
AFA1
AFB1
AFC1
AFD1
AFE1
AFF1
꼂꼒꼢꼲꽂꽒꽢꽲꾂꾒꾢꾲꿂꿒꿢꿲
2
AF02
AF12
AF22
AF32
AF42
AF52
AF62
AF72
AF82
AF92
AFA2
AFB2
AFC2
AFD2
AFE2
AFF2
꼃꼓꼣꼳꽃꽓꽣꽳꾃꾓꾣꾳꿃꿓꿣꿳
3
AF03
AF13
AF23
AF33
AF43
AF53
AF63
AF73
AF83
AF93
AFA3
AFB3
AFC3
AFD3
AFE3
AFF3
꼄꼔꼤꼴꽄꽔꽤꽴꾄꾔꾤꾴꿄꿔꿤꿴
4
AF04
AF14
AF24
AF34
AF44
AF54
AF64
AF74
AF84
AF94
AFA4
AFB4
AFC4
AFD4
AFE4
AFF4
꼅꼕꼥꼵꽅꽕꽥꽵꾅꾕꾥꾵꿅꿕꿥꿵
5
AF05
AF15
AF25
AF35
AF45
AF55
AF65
AF75
AF85
AF95
AFA5
AFB5
AFC5
AFD5
AFE5
AFF5
꼆꼖꼦꼶꽆꽖꽦꽶꾆꾖꾦꾶꿆꿖꿦꿶
6
AF06
AF16
AF26
AF36
AF46
AF56
AF66
AF76
AF86
AF96
AFA6
AFB6
AFC6
AFD6
AFE6
AFF6
꼇꼗꼧꼷꽇꽗꽧꽷꾇꾗꾧꾷꿇꿗꿧꿷
7
AF07
AF17
AF27
AF37
AF47
AF57
AF67
AF77
AF87
AF97
AFA7
AFB7
AFC7
AFD7
AFE7
AFF7
꼈꼘꼨꼸꽈꽘꽨꽸꾈꾘꾨꾸꿈꿘꿨꿸
8
AF08
AF18
AF28
AF38
AF48
AF58
AF68
AF78
AF88
AF98
AFA8
AFB8
AFC8
AFD8
AFE8
AFF8
꼉꼙꼩꼹꽉꽙꽩꽹꾉꾙꾩꾹꿉꿙꿩꿹
9
AF09
AF19
AF29
AF39
AF49
AF59
AF69
AF79
AF89
AF99
AFA9
AFB9
AFC9
AFD9
AFE9
AFF9
꼊꼚꼪꼺꽊꽚꽪꽺꾊꾚꾪꾺꿊꿚꿪꿺
A
AF0A
AF1A
AF2A
AF3A
AF4A
AF5A
AF6A
AF7A
AF8A
AF9A
AFAA
AFBA
AFCA
AFDA
AFEA
AFFA
꼋꼛꼫꼻꽋꽛꽫꽻꾋꾛꾫꾻꿋꿛꿫꿻
B
AF0B
C
D
AF1B
AF2B
AF3B
AF4B
AF5B
AF6B
AF7B
AF8B
AF9B
AFAB
AFBB
AFCB
AFDB
AFEB
AFFB
꼌꼜꼬꼼꽌꽜꽬꽼꾌꾜꾬꾼꿌꿜꿬꿼 AF0C
AF1C
AF2C
AF3C
AF4C
AF5C
AF6C
AF7C
AF8C
AF9C
AFAC
AFBC
AFCC
AFDC
AFEC
AFFC
꼍꼝꼭꼽꽍꽝꽭꽽꾍꾝꾭꾽꿍꿝꿭꿽 AF0D
AF1D
AF2D
AF3D
AF4D
AF5D
AF6D
AF7D
AF8D
AF9D
AFAD
AFBD
AFCD
AFDD
AFED
AFFD
꼎꼞꼮꼾꽎꽞꽮꽾꾎꾞꾮꾾꿎꿞꿮꿾 AF0E
F
AF20
꼁꼑꼡꼱꽁꽑꽡꽱꾁꾑꾡꾱꿁꿑꿡꿱
1
E
AF10
AF1E
AF2E
AF3E
AF4E
AF5E
AF6E
AF7E
AF8E
AF9E
AFAE
AFBE
AFCE
AFDE
AFEE
AFFE
꼏꼟꼯꼿꽏꽟꽯꽿꾏꾟꾯꾿꿏꿟꿯꿿 AF0F
AF1F
AF2F
AF3F
AF4F
AF5F
AF6F
AF7F
AF8F
AF9F
AFAF
AFBF
AFCF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
AFDF
AFEF
AFFF
423
B000
Hangul Syllables
B0FF
B00 B01 B02 B03 B04 B05 B06 B07 B08 B09 B0A B0B B0C B0D B0E B0F
뀀뀐뀠뀰끀끐끠끰낀낐날낰냀냐냠냰
0
B000
B001
B030
B040
B050
B060
B070
B080
B090
B0A0
B0B0
B0C0
B0D0
B0E0
B0F0
B011
B021
B031
B041
B051
B061
B071
B081
B091
B0A1
B0B1
B0C1
B0D1
B0E1
B0F1
뀂뀒뀢뀲끂끒끢끲낂낒낢낲냂냒냢냲
2
B002
B012
B022
B032
B042
B052
B062
B072
B082
B092
B0A2
B0B2
B0C2
B0D2
B0E2
B0F2
뀃뀓뀣뀳끃끓끣끳낃낓낣낳냃냓냣냳
3
B003
B013
B023
B033
B043
B053
B063
B073
B083
B093
B0A3
B0B3
B0C3
B0D3
B0E3
B0F3
뀄뀔뀤뀴끄끔끤끴낄낔낤내냄냔냤냴
4
B004
B014
B024
B034
B044
B054
B064
B074
B084
B094
B0A4
B0B4
B0C4
B0D4
B0E4
B0F4
뀅뀕뀥뀵끅끕끥끵낅낕낥낵냅냕냥냵
5
B005
B015
B025
B035
B045
B055
B065
B075
B085
B095
B0A5
B0B5
B0C5
B0D5
B0E5
B0F5
뀆뀖뀦뀶끆끖끦끶낆낖낦낶냆냖냦냶
6
B006
B016
B026
B036
B046
B056
B066
B076
B086
B096
B0A6
B0B6
B0C6
B0D6
B0E6
B0F6
뀇뀗뀧뀷끇끗끧끷낇낗낧낷냇냗냧냷
7
B007
B017
B027
B037
B047
B057
B067
B077
B087
B097
B0A7
B0B7
B0C7
B0D7
B0E7
B0F7
뀈뀘뀨뀸끈끘끨끸낈나남낸냈냘냨냸
8
B008
B018
B028
B038
B048
B058
B068
B078
B088
B098
B0A8
B0B8
B0C8
B0D8
B0E8
B0F8
뀉뀙뀩뀹끉끙끩끹낉낙납낹냉냙냩냹
9
B009
B019
B029
B039
B049
B059
B069
B079
B089
B099
B0A9
B0B9
B0C9
B0D9
B0E9
B0F9
뀊뀚뀪뀺끊끚끪끺낊낚낪낺냊냚냪냺
A
B00A
B01A
B02A
B03A
B04A
B05A
B06A
B07A
B08A
B09A
B0AA
B0BA
B0CA
B0DA
B0EA
B0FA
뀋뀛뀫뀻끋끛끫끻낋낛낫낻냋냛냫냻
B
B00B
C
D
B01B
B02B
B03B
B04B
B05B
B06B
B07B
B08B
B09B
B0AB
B0BB
B0CB
B0DB
B0EB
B0FB
뀌뀜뀬뀼끌끜끬끼낌난났낼냌냜냬냼 B00C
B01C
B02C
B03C
B04C
B05C
B06C
B07C
B08C
B09C
B0AC
B0BC
B0CC
B0DC
B0EC
B0FC
뀍뀝뀭뀽끍끝끭끽낍낝낭낽냍냝냭냽 B00D
B01D
B02D
B03D
B04D
B05D
B06D
B07D
B08D
B09D
B0AD
B0BD
B0CD
B0DD
B0ED
B0FD
뀎뀞뀮뀾끎끞끮끾낎낞낮낾냎냞냮냾 B00E
F
B020
뀁뀑뀡뀱끁끑끡끱낁낑낡낱냁냑냡냱
1
E
B010
B01E
B02E
B03E
B04E
B05E
B06E
B07E
B08E
B09E
B0AE
B0BE
B0CE
B0DE
B0EE
B0FE
뀏뀟뀯뀿끏끟끯끿낏낟낯낿냏냟냯냿 B00F
424
B01F
B02F
B03F
B04F
B05F
B06F
B07F
B08F
B09F
B0AF
B0BF
B0CF
B0DF
B0EF
B0FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
B100
Hangul Syllables
B1FF
B10 B11 B12 B13 B14 B15 B16 B17 B18 B19 B1A B1B B1C B1D B1E B1F
넀널넠넰녀념녠녰놀놐놠놰뇀뇐뇠뇰
0
B100
B101
B130
B140
B150
B160
B170
B180
B190
B1A0
B1B0
B1C0
B1D0
B1E0
B1F0
B111
B121
B131
B141
B151
B161
B171
B181
B191
B1A1
B1B1
B1C1
B1D1
B1E1
B1F1
넂넒넢넲녂녒녢녲놂높놢놲뇂뇒뇢뇲
2
B102
B112
B122
B132
B142
B152
B162
B172
B182
B192
B1A2
B1B2
B1C2
B1D2
B1E2
B1F2
넃넓넣넳녃녓녣녳놃놓놣놳뇃뇓뇣뇳
3
B103
B113
B123
B133
B143
B153
B163
B173
B183
B193
B1A3
B1B3
B1C3
B1D3
B1E3
B1F3
넄넔네넴년녔녤녴놄놔놤놴뇄뇔뇤뇴
4
B104
B114
B124
B134
B144
B154
B164
B174
B184
B194
B1A4
B1B4
B1C4
B1D4
B1E4
B1F4
넅넕넥넵녅녕녥녵놅놕놥놵뇅뇕뇥뇵
5
B105
B115
B125
B135
B145
B155
B165
B175
B185
B195
B1A5
B1B5
B1C5
B1D5
B1E5
B1F5
넆넖넦넶녆녖녦녶놆놖놦놶뇆뇖뇦뇶
6
B106
B116
B126
B136
B146
B156
B166
B176
B186
B196
B1A6
B1B6
B1C6
B1D6
B1E6
B1F6
넇넗넧넷녇녗녧녷놇놗놧놷뇇뇗뇧뇷
7
B107
B117
B127
B137
B147
B157
B167
B177
B187
B197
B1A7
B1B7
B1C7
B1D7
B1E7
B1F7
너넘넨넸녈녘녨노놈놘놨놸뇈뇘뇨뇸
8
B108
B118
B128
B138
B148
B158
B168
B178
B188
B198
B1A8
B1B8
B1C8
B1D8
B1E8
B1F8
넉넙넩넹녉녙녩녹놉놙놩놹뇉뇙뇩뇹
9
B109
B119
B129
B139
B149
B159
B169
B179
B189
B199
B1A9
B1B9
B1C9
B1D9
B1E9
B1F9
넊넚넪넺녊녚녪녺놊놚놪놺뇊뇚뇪뇺
A
B10A
B11A
B12A
B13A
B14A
B15A
B16A
B17A
B18A
B19A
B1AA
B1BA
B1CA
B1DA
B1EA
B1FA
넋넛넫넻녋녛녫녻놋놛놫놻뇋뇛뇫뇻
B
B10B
C
D
B11B
B12B
B13B
B14B
B15B
B16B
B17B
B18B
B19B
B1AB
B1BB
B1CB
B1DB
B1EB
B1FB
넌넜넬넼녌녜녬논놌놜놬놼뇌뇜뇬뇼 B10C
B11C
B12C
B13C
B14C
B15C
B16C
B17C
B18C
B19C
B1AC
B1BC
B1CC
B1DC
B1EC
B1FC
넍넝넭넽녍녝녭녽농놝놭놽뇍뇝뇭뇽 B10D
B11D
B12D
B13D
B14D
B15D
B16D
B17D
B18D
B19D
B1AD
B1BD
B1CD
B1DD
B1ED
B1FD
넎넞넮넾녎녞녮녾놎놞놮놾뇎뇞뇮뇾 B10E
F
B120
넁넑넡넱녁녑녡녱놁놑놡놱뇁뇑뇡뇱
1
E
B110
B11E
B12E
B13E
B14E
B15E
B16E
B17E
B18E
B19E
B1AE
B1BE
B1CE
B1DE
B1EE
B1FE
넏넟넯넿녏녟녯녿놏놟놯놿뇏뇟뇯뇿 B10F
B11F
B12F
B13F
B14F
B15F
B16F
B17F
B18F
B19F
B1AF
B1BF
B1CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
B1DF
B1EF
B1FF
425
B200
Hangul Syllables
B2FF
B20 B21 B22 B23 B24 B25 B26 B27 B28 B29 B2A B2B B2C B2D B2E B2F
눀눐눠눰뉀뉐뉠뉰늀느늠늰닀닐닠닰
0
B200
B201
B230
B240
B250
B260
B270
B280
B290
B2A0
B2B0
B2C0
B2D0
B2E0
B2F0
B211
B221
B231
B241
B251
B261
B271
B281
B291
B2A1
B2B1
B2C1
B2D1
B2E1
B2F1
눂눒눢눲뉂뉒뉢뉲늂늒늢늲닂닒닢닲
2
B202
B212
B222
B232
B242
B252
B262
B272
B282
B292
B2A2
B2B2
B2C2
B2D2
B2E2
B2F2
눃눓눣눳뉃뉓뉣뉳늃늓늣늳닃닓닣닳
3
B203
B213
B223
B233
B243
B253
B263
B273
B283
B293
B2A3
B2B3
B2C3
B2D3
B2E3
B2F3
누눔눤눴뉄뉔뉤뉴늄는늤늴닄닔다담
4
B204
B214
B224
B234
B244
B254
B264
B274
B284
B294
B2A4
B2B4
B2C4
B2D4
B2E4
B2F4
눅눕눥눵뉅뉕뉥뉵늅늕능늵닅닕닥답
5
B205
B215
B225
B235
B245
B255
B265
B275
B285
B295
B2A5
B2B5
B2C5
B2D5
B2E5
B2F5
눆눖눦눶뉆뉖뉦뉶늆늖늦늶닆닖닦닶
6
B206
B216
B226
B236
B246
B256
B266
B276
B286
B296
B2A6
B2B6
B2C6
B2D6
B2E6
B2F6
눇눗눧눷뉇뉗뉧뉷늇늗늧늷닇닗닧닷
7
B207
B217
B227
B237
B247
B257
B267
B277
B287
B297
B2A7
B2B7
B2C7
B2D7
B2E7
B2F7
눈눘눨눸뉈뉘뉨뉸늈늘늨늸니님단닸
8
B208
B218
B228
B238
B248
B258
B268
B278
B288
B298
B2A8
B2B8
B2C8
B2D8
B2E8
B2F8
눉눙눩눹뉉뉙뉩뉹늉늙늩늹닉닙닩당
9
B209
B219
B229
B239
B249
B259
B269
B279
B289
B299
B2A9
B2B9
B2C9
B2D9
B2E9
B2F9
눊눚눪눺뉊뉚뉪뉺늊늚늪늺닊닚닪닺
A
B20A
B21A
B22A
B23A
B24A
B25A
B26A
B27A
B28A
B29A
B2AA
B2BA
B2CA
B2DA
B2EA
B2FA
눋눛눫눻뉋뉛뉫뉻늋늛늫늻닋닛닫닻
B
B20B
C
D
B21B
B22B
B23B
B24B
B25B
B26B
B27B
B28B
B29B
B2AB
B2BB
B2CB
B2DB
B2EB
B2FB
눌눜눬눼뉌뉜뉬뉼늌늜늬늼닌닜달닼 B20C
B21C
B22C
B23C
B24C
B25C
B26C
B27C
B28C
B29C
B2AC
B2BC
B2CC
B2DC
B2EC
B2FC
눍눝눭눽뉍뉝뉭뉽늍늝늭늽닍닝닭닽 B20D
B21D
B22D
B23D
B24D
B25D
B26D
B27D
B28D
B29D
B2AD
B2BD
B2CD
B2DD
B2ED
B2FD
눎눞눮눾뉎뉞뉮뉾늎늞늮늾닎닞닮닾 B20E
F
B220
눁눑눡눱뉁뉑뉡뉱늁늑늡늱닁닑닡닱
1
E
B210
B21E
B22E
B23E
B24E
B25E
B26E
B27E
B28E
B29E
B2AE
B2BE
B2CE
B2DE
B2EE
B2FE
눏눟눯눿뉏뉟뉯뉿늏늟늯늿닏닟닯닿 B20F
426
B21F
B22F
B23F
B24F
B25F
B26F
B27F
B28F
B29F
B2AF
B2BF
B2CF
B2DF
B2EF
B2FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
B300
Hangul Syllables
B3FF
B30 B31 B32 B33 B34 B35 B36 B37 B38 B39 B3A B3B B3C B3D B3E B3F
대댐댠댰덀덐덠데뎀뎐뎠뎰돀돐돠돰
0
B300
B301
B330
B340
B350
B360
B370
B380
B390
B3A0
B3B0
B3C0
B3D0
B3E0
B3F0
B311
B321
B331
B341
B351
B361
B371
B381
B391
B3A1
B3B1
B3C1
B3D1
B3E1
B3F1
댂댒댢댲덂덒덢덲뎂뎒뎢뎲돂돒돢돲
2
B302
B312
B322
B332
B342
B352
B362
B372
B382
B392
B3A2
B3B2
B3C2
B3D2
B3E2
B3F2
댃댓댣댳덃덓덣덳뎃뎓뎣뎳돃돓돣돳
3
B303
B313
B323
B333
B343
B353
B363
B373
B383
B393
B3A3
B3B3
B3C3
B3D3
B3E3
B3F3
댄댔댤댴덄더덤덴뎄뎔뎤뎴도돔돤돴
4
B304
B314
B324
B334
B344
B354
B364
B374
B384
B394
B3A4
B3B4
B3C4
B3D4
B3E4
B3F4
댅댕댥댵덅덕덥덵뎅뎕뎥뎵독돕돥돵
5
B305
B315
B325
B335
B345
B355
B365
B375
B385
B395
B3A5
B3B5
B3C5
B3D5
B3E5
B3F5
댆댖댦댶덆덖덦덶뎆뎖뎦뎶돆돖돦돶
6
B306
B316
B326
B336
B346
B356
B366
B376
B386
B396
B3A6
B3B6
B3C6
B3D6
B3E6
B3F6
댇댗댧댷덇덗덧덷뎇뎗뎧뎷돇돗돧돷
7
B307
B317
B327
B337
B347
B357
B367
B377
B387
B397
B3A7
B3B7
B3C7
B3D7
B3E7
B3F7
댈댘댨댸덈던덨델뎈뎘뎨뎸돈돘돨돸
8
B308
B318
B328
B338
B348
B358
B368
B378
B388
B398
B3A8
B3B8
B3C8
B3D8
B3E8
B3F8
댉댙댩댹덉덙덩덹뎉뎙뎩뎹돉동돩돹
9
B309
B319
B329
B339
B349
B359
B369
B379
B389
B399
B3A9
B3B9
B3C9
B3D9
B3E9
B3F9
댊댚댪댺덊덚덪덺뎊뎚뎪뎺돊돚돪돺
A
B30A
B31A
B32A
B33A
B34A
B35A
B36A
B37A
B38A
B39A
B3AA
B3BA
B3CA
B3DA
B3EA
B3FA
댋댛댫댻덋덛덫덻뎋뎛뎫뎻돋돛돫돻
B
B30B
C
D
B31B
B32B
B33B
B34B
B35B
B36B
B37B
B38B
B39B
B3AB
B3BB
B3CB
B3DB
B3EB
B3FB
댌댜댬댼덌덜덬덼뎌뎜뎬뎼돌돜돬돼 B30C
B31C
B32C
B33C
B34C
B35C
B36C
B37C
B38C
B39C
B3AC
B3BC
B3CC
B3DC
B3EC
B3FC
댍댝댭댽덍덝덭덽뎍뎝뎭뎽돍돝돭돽 B30D
B31D
B32D
B33D
B34D
B35D
B36D
B37D
B38D
B39D
B3AD
B3BD
B3CD
B3DD
B3ED
B3FD
댎댞댮댾덎덞덮덾뎎뎞뎮뎾돎돞돮돾 B30E
F
B320
댁댑댡댱덁덑덡덱뎁뎑뎡뎱돁돑돡돱
1
E
B310
B31E
B32E
B33E
B34E
B35E
B36E
B37E
B38E
B39E
B3AE
B3BE
B3CE
B3DE
B3EE
B3FE
댏댟댯댿덏덟덯덿뎏뎟뎯뎿돏돟돯돿 B30F
B31F
B32F
B33F
B34F
B35F
B36F
B37F
B38F
B39F
B3AF
B3BF
B3CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
B3DF
B3EF
B3FF
427
B400
Hangul Syllables
B4FF
B40 B41 B42 B43 B44 B45 B46 B47 B48 B49 B4A B4B B4C B4D B4E B4F
됀됐될됰둀두둠둰뒀뒐뒠뒰듀듐든듰
0
B400
B401
B430
B440
B450
B460
B470
B480
B490
B4A0
B4B0
B4C0
B4D0
B4E0
B4F0
B411
B421
B431
B441
B451
B461
B471
B481
B491
B4A1
B4B1
B4C1
B4D1
B4E1
B4F1
됂됒됢됲둂둒둢둲뒂뒒뒢뒲듂듒듢듲
2
B402
B412
B422
B432
B442
B452
B462
B472
B482
B492
B4A2
B4B2
B4C2
B4D2
B4E2
B4F2
됃됓됣됳둃둓둣둳뒃뒓뒣뒳듃듓듣듳
3
B403
B413
B423
B433
B443
B453
B463
B473
B483
B493
B4A3
B4B3
B4C3
B4D3
B4E3
B4F3
됄됔됤됴둄둔둤둴뒄뒔뒤뒴듄듔들듴
4
B404
B414
B424
B434
B444
B454
B464
B474
B484
B494
B4A4
B4B4
B4C4
B4D4
B4E4
B4F4
됅됕됥됵둅둕둥둵뒅뒕뒥뒵듅듕듥듵
5
B405
B415
B425
B435
B445
B455
B465
B475
B485
B495
B4A5
B4B5
B4C5
B4D5
B4E5
B4F5
됆됖됦됶둆둖둦둶뒆뒖뒦뒶듆듖듦듶
6
B406
B416
B426
B436
B446
B456
B466
B476
B486
B496
B4A6
B4B6
B4C6
B4D6
B4E6
B4F6
됇됗됧됷둇둗둧둷뒇뒗뒧뒷듇듗듧듷
7
B407
B417
B427
B437
B447
B457
B467
B477
B487
B497
B4A7
B4B7
B4C7
B4D7
B4E7
B4F7
됈되됨됸둈둘둨둸뒈뒘뒨뒸듈듘듨듸
8
B408
B418
B428
B438
B448
B458
B468
B478
B488
B498
B4A8
B4B8
B4C8
B4D8
B4E8
B4F8
됉됙됩됹둉둙둩둹뒉뒙뒩뒹듉듙듩듹
9
B409
B419
B429
B439
B449
B459
B469
B479
B489
B499
B4A9
B4B9
B4C9
B4D9
B4E9
B4F9
됊됚됪됺둊둚둪둺뒊뒚뒪뒺듊듚듪듺
A
B40A
B41A
B42A
B43A
B44A
B45A
B46A
B47A
B48A
B49A
B4AA
B4BA
B4CA
B4DA
B4EA
B4FA
됋됛됫됻둋둛둫둻뒋뒛뒫뒻듋듛듫듻
B
B40B
C
D
B41B
B42B
B43B
B44B
B45B
B46B
B47B
B48B
B49B
B4AB
B4BB
B4CB
B4DB
B4EB
B4FB
됌된됬됼둌둜둬둼뒌뒜뒬뒼듌드듬듼 B40C
B41C
B42C
B43C
B44C
B45C
B46C
B47C
B48C
B49C
B4AC
B4BC
B4CC
B4DC
B4EC
B4FC
됍됝됭됽둍둝둭둽뒍뒝뒭뒽듍득듭듽 B40D
B41D
B42D
B43D
B44D
B45D
B46D
B47D
B48D
B49D
B4AD
B4BD
B4CD
B4DD
B4ED
B4FD
됎됞됮됾둎둞둮둾뒎뒞뒮뒾듎듞듮듾 B40E
F
B420
됁됑됡됱둁둑둡둱뒁뒑뒡뒱듁듑듡등
1
E
B410
B41E
B42E
B43E
B44E
B45E
B46E
B47E
B48E
B49E
B4AE
B4BE
B4CE
B4DE
B4EE
B4FE
됏됟됯됿둏둟둯둿뒏뒟뒯뒿듏듟듯듿 B40F
428
B41F
B42F
B43F
B44F
B45F
B46F
B47F
B48F
B49F
B4AF
B4BF
B4CF
B4DF
B4EF
B4FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
B500
Hangul Syllables
B5FF
B50 B51 B52 B53 B54 B55 B56 B57 B58 B59 B5A B5B B5C B5D B5E B5F
딀딐딠따땀땐땠땰떀떐떠떰뗀뗐뗠뗰
0
B500
B501
B530
B540
B550
B560
B570
B580
B590
B5A0
B5B0
B5C0
B5D0
B5E0
B5F0
B511
B521
B531
B541
B551
B561
B571
B581
B591
B5A1
B5B1
B5C1
B5D1
B5E1
B5F1
딂딒딢딲땂땒땢땲떂떒떢떲뗂뗒뗢뗲
2
B502
B512
B522
B532
B542
B552
B562
B572
B582
B592
B5A2
B5B2
B5C2
B5D2
B5E2
B5F2
딃딓딣딳땃땓땣땳떃떓떣떳뗃뗓뗣뗳
3
B503
B513
B523
B533
B543
B553
B563
B573
B583
B593
B5A3
B5B3
B5C3
B5D3
B5E3
B5F3
딄디딤딴땄땔땤땴떄떔떤떴뗄뗔뗤뗴
4
B504
B514
B524
B534
B544
B554
B564
B574
B584
B594
B5A4
B5B4
B5C4
B5D4
B5E4
B5F4
딅딕딥딵땅땕땥땵떅떕떥떵뗅뗕뗥뗵
5
B505
B515
B525
B535
B545
B555
B565
B575
B585
B595
B5A5
B5B5
B5C5
B5D5
B5E5
B5F5
딆딖딦딶땆땖땦땶떆떖떦떶뗆뗖뗦뗶
6
B506
B516
B526
B536
B546
B556
B566
B576
B586
B596
B5A6
B5B6
B5C6
B5D6
B5E6
B5F6
딇딗딧딷땇땗땧땷떇떗떧떷뗇뗗뗧뗷
7
B507
B517
B527
B537
B547
B557
B567
B577
B587
B597
B5A7
B5B7
B5C7
B5D7
B5E7
B5F7
딈딘딨딸땈땘땨땸떈떘떨떸뗈뗘뗨뗸
8
B508
B518
B528
B538
B548
B558
B568
B578
B588
B598
B5A8
B5B8
B5C8
B5D8
B5E8
B5F8
딉딙딩딹땉땙땩땹떉떙떩떹뗉뗙뗩뗹
9
B509
B519
B529
B539
B549
B559
B569
B579
B589
B599
B5A9
B5B9
B5C9
B5D9
B5E9
B5F9
딊딚딪딺땊땚땪땺떊떚떪떺뗊뗚뗪뗺
A
B50A
B51A
B52A
B53A
B54A
B55A
B56A
B57A
B58A
B59A
B5AA
B5BA
B5CA
B5DA
B5EA
B5FA
딋딛딫딻땋땛땫땻떋떛떫떻뗋뗛뗫뗻
B
B50B
C
D
B51B
B52B
B53B
B54B
B55B
B56B
B57B
B58B
B59B
B5AB
B5BB
B5CB
B5DB
B5EB
B5FB
딌딜딬딼때땜땬땼떌떜떬떼뗌뗜뗬뗼 B50C
B51C
B52C
B53C
B54C
B55C
B56C
B57C
B58C
B59C
B5AC
B5BC
B5CC
B5DC
B5EC
B5FC
딍딝딭딽땍땝땭땽떍떝떭떽뗍뗝뗭뗽 B50D
B51D
B52D
B53D
B54D
B55D
B56D
B57D
B58D
B59D
B5AD
B5BD
B5CD
B5DD
B5ED
B5FD
딎딞딮딾땎땞땮땾떎떞떮떾뗎뗞뗮뗾 B50E
F
B520
딁딑딡딱땁땑땡땱떁떑떡떱뗁뗑뗡뗱
1
E
B510
B51E
B52E
B53E
B54E
B55E
B56E
B57E
B58E
B59E
B5AE
B5BE
B5CE
B5DE
B5EE
B5FE
딏딟딯딿땏땟땯땿떏떟떯떿뗏뗟뗯뗿 B50F
B51F
B52F
B53F
B54F
B55F
B56F
B57F
B58F
B59F
B5AF
B5BF
B5CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
B5DF
B5EF
B5FF
429
B600
Hangul Syllables
B6FF
B60 B61 B62 B63 B64 B65 B66 B67 B68 B69 B6A B6B B6C B6D B6E B6F
똀또똠똰뙀뙐뙠뙰뚀뚐뚠뚰뛀뛐뛠뛰
0
B600
B601
B630
B640
B650
B660
B670
B680
B690
B6A0
B6B0
B6C0
B6D0
B6E0
B6F0
B611
B621
B631
B641
B651
B661
B671
B681
B691
B6A1
B6B1
B6C1
B6D1
B6E1
B6F1
똂똒똢똲뙂뙒뙢뙲뚂뚒뚢뚲뛂뛒뛢뛲
2
B602
B612
B622
B632
B642
B652
B662
B672
B682
B692
B6A2
B6B2
B6C2
B6D2
B6E2
B6F2
똃똓똣똳뙃뙓뙣뙳뚃뚓뚣뚳뛃뛓뛣뛳
3
B603
B613
B623
B633
B643
B653
B663
B673
B683
B693
B6A3
B6B3
B6C3
B6D3
B6E3
B6F3
똄똔똤똴뙄뙔뙤뙴뚄뚔뚤뚴뛄뛔뛤뛴
4
B604
B614
B624
B634
B644
B654
B664
B674
B684
B694
B6A4
B6B4
B6C4
B6D4
B6E4
B6F4
똅똕똥똵뙅뙕뙥뙵뚅뚕뚥뚵뛅뛕뛥뛵
5
B605
B615
B625
B635
B645
B655
B665
B675
B685
B695
B6A5
B6B5
B6C5
B6D5
B6E5
B6F5
똆똖똦똶뙆뙖뙦뙶뚆뚖뚦뚶뛆뛖뛦뛶
6
B606
B616
B626
B636
B646
B656
B666
B676
B686
B696
B6A6
B6B6
B6C6
B6D6
B6E6
B6F6
똇똗똧똷뙇뙗뙧뙷뚇뚗뚧뚷뛇뛗뛧뛷
7
B607
B617
B627
B637
B647
B657
B667
B677
B687
B697
B6A7
B6B7
B6C7
B6D7
B6E7
B6F7
똈똘똨똸뙈뙘뙨뙸뚈뚘뚨뚸뛈뛘뛨뛸
8
B608
B618
B628
B638
B648
B658
B668
B678
B688
B698
B6A8
B6B8
B6C8
B6D8
B6E8
B6F8
똉똙똩똹뙉뙙뙩뙹뚉뚙뚩뚹뛉뛙뛩뛹
9
B609
B619
B629
B639
B649
B659
B669
B679
B689
B699
B6A9
B6B9
B6C9
B6D9
B6E9
B6F9
똊똚똪똺뙊뙚뙪뙺뚊뚚뚪뚺뛊뛚뛪뛺
A
B60A
B61A
B62A
B63A
B64A
B65A
B66A
B67A
B68A
B69A
B6AA
B6BA
B6CA
B6DA
B6EA
B6FA
똋똛똫똻뙋뙛뙫뙻뚋뚛뚫뚻뛋뛛뛫뛻
B
B60B
C
D
B61B
B62B
B63B
B64B
B65B
B66B
B67B
B68B
B69B
B6AB
B6BB
B6CB
B6DB
B6EB
B6FB
똌똜똬똼뙌뙜뙬뙼뚌뚜뚬뚼뛌뛜뛬뛼 B60C
B61C
B62C
B63C
B64C
B65C
B66C
B67C
B68C
B69C
B6AC
B6BC
B6CC
B6DC
B6EC
B6FC
똍똝똭똽뙍뙝뙭뙽뚍뚝뚭뚽뛍뛝뛭뛽 B60D
B61D
B62D
B63D
B64D
B65D
B66D
B67D
B68D
B69D
B6AD
B6BD
B6CD
B6DD
B6ED
B6FD
똎똞똮똾뙎뙞뙮뙾뚎뚞뚮뚾뛎뛞뛮뛾 B60E
F
B620
똁똑똡똱뙁뙑뙡뙱뚁뚑뚡뚱뛁뛑뛡뛱
1
E
B610
B61E
B62E
B63E
B64E
B65E
B66E
B67E
B68E
B69E
B6AE
B6BE
B6CE
B6DE
B6EE
B6FE
똏똟똯똿뙏뙟뙯뙿뚏뚟뚯뚿뛏뛟뛯뛿 B60F
430
B61F
B62F
B63F
B64F
B65F
B66F
B67F
B68F
B69F
B6AF
B6BF
B6CF
B6DF
B6EF
B6FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
B700
Hangul Syllables
B7FF
B70 B71 B72 B73 B74 B75 B76 B77 B78 B79 B7A B7B B7C B7D B7E B7F
뜀뜐뜠뜰띀띐띠띰란랐랠랰럀럐럠런
0
B700
B701
B730
B740
B750
B760
B770
B780
B790
B7A0
B7B0
B7C0
B7D0
B7E0
B7F0
B711
B721
B731
B741
B751
B761
B771
B781
B791
B7A1
B7B1
B7C1
B7D1
B7E1
B7F1
뜂뜒뜢뜲띂띒띢띲랂랒랢랲럂럒럢럲
2
B702
B712
B722
B732
B742
B752
B762
B772
B782
B792
B7A2
B7B2
B7C2
B7D2
B7E2
B7F2
뜃뜓뜣뜳띃띓띣띳랃랓랣랳럃럓럣럳
3
B703
B713
B723
B733
B743
B753
B763
B773
B783
B793
B7A3
B7B3
B7C3
B7D3
B7E3
B7F3
뜄뜔뜤뜴띄띔띤띴랄랔랤랴럄럔럤럴
4
B704
B714
B724
B734
B744
B754
B764
B774
B784
B794
B7A4
B7B4
B7C4
B7D4
B7E4
B7F4
뜅뜕뜥뜵띅띕띥띵랅랕랥략럅럕럥럵
5
B705
B715
B725
B735
B745
B755
B765
B775
B785
B795
B7A5
B7B5
B7C5
B7D5
B7E5
B7F5
뜆뜖뜦뜶띆띖띦띶랆랖랦랶럆럖럦럶
6
B706
B716
B726
B736
B746
B756
B766
B776
B786
B796
B7A6
B7B6
B7C6
B7D6
B7E6
B7F6
뜇뜗뜧뜷띇띗띧띷랇랗랧랷럇럗럧럷
7
B707
B717
B727
B737
B747
B757
B767
B777
B787
B797
B7A7
B7B7
B7C7
B7D7
B7E7
B7F7
뜈뜘뜨뜸띈띘띨띸랈래램랸럈럘럨럸
8
B708
B718
B728
B738
B748
B758
B768
B778
B788
B798
B7A8
B7B8
B7C8
B7D8
B7E8
B7F8
뜉뜙뜩뜹띉띙띩띹랉랙랩랹량럙럩럹
9
B709
B719
B729
B739
B749
B759
B769
B779
B789
B799
B7A9
B7B9
B7C9
B7D9
B7E9
B7F9
뜊뜚뜪뜺띊띚띪띺랊랚랪랺럊럚럪럺
A
B70A
B71A
B72A
B73A
B74A
B75A
B76A
B77A
B78A
B79A
B7AA
B7BA
B7CA
B7DA
B7EA
B7FA
뜋뜛뜫뜻띋띛띫띻랋랛랫랻럋럛럫럻
B
B70B
C
D
B71B
B72B
B73B
B74B
B75B
B76B
B77B
B78B
B79B
B7AB
B7BB
B7CB
B7DB
B7EB
B7FB
뜌뜜뜬뜼띌띜띬라람랜랬랼럌럜러럼 B70C
B71C
B72C
B73C
B74C
B75C
B76C
B77C
B78C
B79C
B7AC
B7BC
B7CC
B7DC
B7EC
B7FC
뜍뜝뜭뜽띍띝띭락랍랝랭랽럍럝럭럽 B70D
B71D
B72D
B73D
B74D
B75D
B76D
B77D
B78D
B79D
B7AD
B7BD
B7CD
B7DD
B7ED
B7FD
뜎뜞뜮뜾띎띞띮띾랎랞랮랾럎럞럮럾 B70E
F
B720
뜁뜑뜡뜱띁띑띡띱랁랑랡랱럁럑럡럱
1
E
B710
B71E
B72E
B73E
B74E
B75E
B76E
B77E
B78E
B79E
B7AE
B7BE
B7CE
B7DE
B7EE
B7FE
뜏뜟뜯뜿띏띟띯띿랏랟랯랿럏럟럯럿 B70F
B71F
B72F
B73F
B74F
B75F
B76F
B77F
B78F
B79F
B7AF
B7BF
B7CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
B7DF
B7EF
B7FF
431
B800
Hangul Syllables
B8FF
B80 B81 B82 B83 B84 B85 B86 B87 B88 B89 B8A B8B B8C B8D B8E B8F
렀렐렠렰례롐론롰뢀뢐뢠뢰룀룐룠룰
0
B800
B801
B830
B840
B850
B860
B870
B880
B890
B8A0
B8B0
B8C0
B8D0
B8E0
B8F0
B811
B821
B831
B841
B851
B861
B871
B881
B891
B8A1
B8B1
B8C1
B8D1
B8E1
B8F1
렂렒렢렲롂롒롢롲뢂뢒뢢뢲룂룒룢룲
2
B802
B812
B822
B832
B842
B852
B862
B872
B882
B892
B8A2
B8B2
B8C2
B8D2
B8E2
B8F2
렃렓렣렳롃롓롣롳뢃뢓뢣뢳룃룓룣룳
3
B803
B813
B823
B833
B843
B853
B863
B873
B883
B893
B8A3
B8B3
B8C3
B8D3
B8E3
B8F3
렄렔려렴롄롔롤롴뢄뢔뢤뢴룄룔룤룴
4
B804
B814
B824
B834
B844
B854
B864
B874
B884
B894
B8A4
B8B4
B8C4
B8D4
B8E4
B8F4
렅렕력렵롅롕롥롵뢅뢕뢥뢵룅룕룥룵
5
B805
B815
B825
B835
B845
B855
B865
B875
B885
B895
B8A5
B8B5
B8C5
B8D5
B8E5
B8F5
렆렖렦렶롆롖롦롶뢆뢖뢦뢶룆룖룦룶
6
B806
B816
B826
B836
B846
B856
B866
B876
B886
B896
B8A6
B8B6
B8C6
B8D6
B8E6
B8F6
렇렗렧렷롇롗롧롷뢇뢗뢧뢷룇룗룧룷
7
B807
B817
B827
B837
B847
B857
B867
B877
B887
B897
B8A7
B8B7
B8C7
B8D7
B8E7
B8F7
레렘련렸롈롘롨롸뢈뢘뢨뢸룈룘루룸
8
B808
B818
B828
B838
B848
B858
B868
B878
B888
B898
B8A8
B8B8
B8C8
B8D8
B8E8
B8F8
렉렙렩령롉롙롩롹뢉뢙뢩뢹룉룙룩룹
9
B809
B819
B829
B839
B849
B859
B869
B879
B889
B899
B8A9
B8B9
B8C9
B8D9
B8E9
B8F9
렊렚렪렺롊롚롪롺뢊뢚뢪뢺룊룚룪룺
A
B80A
B81A
B82A
B83A
B84A
B85A
B86A
B87A
B88A
B89A
B8AA
B8BA
B8CA
B8DA
B8EA
B8FA
렋렛렫렻롋롛롫롻뢋뢛뢫뢻룋룛룫룻
B
B80B
C
D
B81B
B82B
B83B
B84B
B85B
B86B
B87B
B88B
B89B
B8AB
B8BB
B8CB
B8DB
B8EB
B8FB
렌렜렬렼롌로롬롼뢌뢜뢬뢼료룜룬룼 B80C
B81C
B82C
B83C
B84C
B85C
B86C
B87C
B88C
B89C
B8AC
B8BC
B8CC
B8DC
B8EC
B8FC
렍렝렭렽롍록롭롽뢍뢝뢭뢽룍룝룭룽 B80D
B81D
B82D
B83D
B84D
B85D
B86D
B87D
B88D
B89D
B8AD
B8BD
B8CD
B8DD
B8ED
B8FD
렎렞렮렾롎롞롮롾뢎뢞뢮뢾룎룞룮룾 B80E
F
B820
렁렑렡렱롁롑롡롱뢁뢑뢡뢱룁룑룡룱
1
E
B810
B81E
B82E
B83E
B84E
B85E
B86E
B87E
B88E
B89E
B8AE
B8BE
B8CE
B8DE
B8EE
B8FE
렏렟렯렿롏롟롯롿뢏뢟뢯뢿룏룟룯룿 B80F
432
B81F
B82F
B83F
B84F
B85F
B86F
B87F
B88F
B89F
B8AF
B8BF
B8CF
B8DF
B8EF
B8FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
B900
Hangul Syllables
B9FF
B90 B91 B92 B93 B94 B95 B96 B97 B98 B99 B9A B9B B9C B9D B9E B9F
뤀뤐뤠뤰륀륐률륰릀릐릠린맀말맠맰
0
B900
B901
B930
B940
B950
B960
B970
B980
B990
B9A0
B9B0
B9C0
B9D0
B9E0
B9F0
B911
B921
B931
B941
B951
B961
B971
B981
B991
B9A1
B9B1
B9C1
B9D1
B9E1
B9F1
뤂뤒뤢뤲륂륒륢륲릂릒릢릲맂맒맢맲
2
B902
B912
B922
B932
B942
B952
B962
B972
B982
B992
B9A2
B9B2
B9C2
B9D2
B9E2
B9F2
뤃뤓뤣뤳륃륓륣륳릃릓릣릳맃맓맣맳
3
B903
B913
B923
B933
B943
B953
B963
B973
B983
B993
B9A3
B9B3
B9C3
B9D3
B9E3
B9F3
뤄뤔뤤뤴륄륔륤르름릔릤릴맄맔매맴
4
B904
B914
B924
B934
B944
B954
B964
B974
B984
B994
B9A4
B9B4
B9C4
B9D4
B9E4
B9F4
뤅뤕뤥뤵륅륕륥륵릅릕릥릵맅맕맥맵
5
B905
B915
B925
B935
B945
B955
B965
B975
B985
B995
B9A5
B9B5
B9C5
B9D5
B9E5
B9F5
뤆뤖뤦뤶륆륖륦륶릆릖릦릶맆맖맦맶
6
B906
B916
B926
B936
B946
B956
B966
B976
B986
B996
B9A6
B9B6
B9C6
B9D6
B9E6
B9F6
뤇뤗뤧뤷륇륗륧륷릇릗릧릷맇맗맧맷
7
B907
B917
B927
B937
B947
B957
B967
B977
B987
B997
B9A7
B9B7
B9C7
B9D7
B9E7
B9F7
뤈뤘뤨뤸륈류륨른릈릘릨릸마맘맨맸
8
B908
B918
B928
B938
B948
B958
B968
B978
B988
B998
B9A8
B9B8
B9C8
B9D8
B9E8
B9F8
뤉뤙뤩뤹륉륙륩륹릉릙릩릹막맙맩맹
9
B909
B919
B929
B939
B949
B959
B969
B979
B989
B999
B9A9
B9B9
B9C9
B9D9
B9E9
B9F9
뤊뤚뤪뤺륊륚륪륺릊릚릪릺맊맚맪맺
A
B90A
B91A
B92A
B93A
B94A
B95A
B96A
B97A
B98A
B99A
B9AA
B9BA
B9CA
B9DA
B9EA
B9FA
뤋뤛뤫뤻륋륛륫륻릋릛릫릻맋맛맫맻
B
B90B
C
D
B91B
B92B
B93B
B94B
B95B
B96B
B97B
B98B
B99B
B9AB
B9BB
B9CB
B9DB
B9EB
B9FB
뤌뤜뤬뤼륌륜륬를릌릜리림만맜맬맼 B90C
B91C
B92C
B93C
B94C
B95C
B96C
B97C
B98C
B99C
B9AC
B9BC
B9CC
B9DC
B9EC
B9FC
뤍뤝뤭뤽륍륝륭륽릍릝릭립맍망맭맽 B90D
B91D
B92D
B93D
B94D
B95D
B96D
B97D
B98D
B99D
B9AD
B9BD
B9CD
B9DD
B9ED
B9FD
뤎뤞뤮뤾륎륞륮륾릎릞릮릾많맞맮맾 B90E
F
B920
뤁뤑뤡뤱륁륑륡륱릁릑릡릱링맑맡맱
1
E
B910
B91E
B92E
B93E
B94E
B95E
B96E
B97E
B98E
B99E
B9AE
B9BE
B9CE
B9DE
B9EE
B9FE
뤏뤟뤯뤿륏륟륯륿릏릟릯릿맏맟맯맿 B90F
B91F
B92F
B93F
B94F
B95F
B96F
B97F
B98F
B99F
B9AF
B9BF
B9CF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
B9DF
B9EF
B9FF
433
BA00
Hangul Syllables
BAFF
BA0 BA1 BA2 BA3 BA4 BA5 BA6 BA7 BA8 BA9 BAA BAB BAC BAD BAE BAF
먀먐먠먰멀멐멠며몀몐몠몰뫀뫐뫠뫰
0
BA00
BA01
BA30
BA40
BA50
BA60
BA70
BA80
BA90
BAA0
BAB0
BAC0
BAD0
BAE0
BAF0
BA11
BA21
BA31
BA41
BA51
BA61
BA71
BA81
BA91
BAA1
BAB1
BAC1
BAD1
BAE1
BAF1
먂먒먢먲멂멒멢멲몂몒몢몲뫂뫒뫢뫲
2
BA02
BA12
BA22
BA32
BA42
BA52
BA62
BA72
BA82
BA92
BAA2
BAB2
BAC2
BAD2
BAE2
BAF2
먃먓먣먳멃멓멣멳몃몓몣몳뫃뫓뫣뫳
3
BA03
BA13
BA23
BA33
BA43
BA53
BA63
BA73
BA83
BA93
BAA3
BAB3
BAC3
BAD3
BAE3
BAF3
먄먔먤먴멄메멤면몄몔몤몴뫄뫔뫤뫴
4
BA04
BA14
BA24
BA34
BA44
BA54
BA64
BA74
BA84
BA94
BAA4
BAB4
BAC4
BAD4
BAE4
BAF4
먅먕먥먵멅멕멥멵명몕몥몵뫅뫕뫥뫵
5
BA05
BA15
BA25
BA35
BA45
BA55
BA65
BA75
BA85
BA95
BAA5
BAB5
BAC5
BAD5
BAE5
BAF5
먆먖먦먶멆멖멦멶몆몖몦몶뫆뫖뫦뫶
6
BA06
BA16
BA26
BA36
BA46
BA56
BA66
BA76
BA86
BA96
BAA6
BAB6
BAC6
BAD6
BAE6
BAF6
먇먗먧먷멇멗멧멷몇몗몧몷뫇뫗뫧뫷
7
BA07
BA17
BA27
BA37
BA47
BA57
BA67
BA77
BA87
BA97
BAA7
BAB7
BAC7
BAD7
BAE7
BAF7
먈먘먨머멈멘멨멸몈몘모몸뫈뫘뫨뫸
8
BA08
BA18
BA28
BA38
BA48
BA58
BA68
BA78
BA88
BA98
BAA8
BAB8
BAC8
BAD8
BAE8
BAF8
먉먙먩먹멉멙멩멹몉몙목몹뫉뫙뫩뫹
9
BA09
BA19
BA29
BA39
BA49
BA59
BA69
BA79
BA89
BA99
BAA9
BAB9
BAC9
BAD9
BAE9
BAF9
먊먚먪먺멊멚멪멺몊몚몪몺뫊뫚뫪뫺
A
BA0A
BA1A
BA2A
BA3A
BA4A
BA5A
BA6A
BA7A
BA8A
BA9A
BAAA
BABA
BACA
BADA
BAEA
BAFA
먋먛먫먻멋멛멫멻몋몛몫못뫋뫛뫫뫻
B
BA0B
C
D
BA1B
BA2B
BA3B
BA4B
BA5B
BA6B
BA7B
BA8B
BA9B
BAAB
BABB
BACB
BADB
BAEB
BAFB
먌먜먬먼멌멜멬멼몌몜몬몼뫌뫜뫬뫼 BA0C
BA1C
BA2C
BA3C
BA4C
BA5C
BA6C
BA7C
BA8C
BA9C
BAAC
BABC
BACC
BADC
BAEC
BAFC
먍먝먭먽멍멝멭멽몍몝몭몽뫍뫝뫭뫽 BA0D
BA1D
BA2D
BA3D
BA4D
BA5D
BA6D
BA7D
BA8D
BA9D
BAAD
BABD
BACD
BADD
BAED
BAFD
먎먞먮먾멎멞멮멾몎몞몮몾뫎뫞뫮뫾 BA0E
F
BA20
먁먑먡먱멁멑멡멱몁몑몡몱뫁뫑뫡뫱
1
E
BA10
BA1E
BA2E
BA3E
BA4E
BA5E
BA6E
BA7E
BA8E
BA9E
BAAE
BABE
BACE
BADE
BAEE
BAFE
먏먟먯먿멏멟멯멿몏몟몯몿뫏뫟뫯뫿 BA0F
434
BA1F
BA2F
BA3F
BA4F
BA5F
BA6F
BA7F
BA8F
BA9F
BAAF
BABF
BACF
BADF
BAEF
BAFF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
BB00
Hangul Syllables
BBFF
BB0 BB1 BB2 BB3 BB4 BB5 BB6 BB7 BB8 BB9 BBA BBB BBC BBD BBE BBF
묀묐묠묰뭀뭐뭠뭰뮀뮐뮠뮰므믐믠믰
0
BB00
BB01
BB30
BB40
BB50
BB60
BB70
BB80
BB90
BBA0
BBB0
BBC0
BBD0
BBE0
BBF0
BB11
BB21
BB31
BB41
BB51
BB61
BB71
BB81
BB91
BBA1
BBB1
BBC1
BBD1
BBE1
BBF1
묂묒묢묲뭂뭒뭢뭲뮂뮒뮢뮲믂믒믢믲
2
BB02
BB12
BB22
BB32
BB42
BB52
BB62
BB72
BB82
BB92
BBA2
BBB2
BBC2
BBD2
BBE2
BBF2
묃묓묣묳뭃뭓뭣뭳뮃뮓뮣뮳믃믓믣믳
3
BB03
BB13
BB23
BB33
BB43
BB53
BB63
BB73
BB83
BB93
BBA3
BBB3
BBC3
BBD3
BBE3
BBF3
묄묔묤무뭄뭔뭤뭴뮄뮔뮤뮴믄믔믤믴
4
BB04
BB14
BB24
BB34
BB44
BB54
BB64
BB74
BB84
BB94
BBA4
BBB4
BBC4
BBD4
BBE4
BBF4
묅묕묥묵뭅뭕뭥뭵뮅뮕뮥뮵믅믕믥믵
5
BB05
BB15
BB25
BB35
BB45
BB55
BB65
BB75
BB85
BB95
BBA5
BBB5
BBC5
BBD5
BBE5
BBF5
묆묖묦묶뭆뭖뭦뭶뮆뮖뮦뮶믆믖믦믶
6
BB06
BB16
BB26
BB36
BB46
BB56
BB66
BB76
BB86
BB96
BBA6
BBB6
BBC6
BBD6
BBE6
BBF6
묇묗묧묷뭇뭗뭧뭷뮇뮗뮧뮷믇믗믧믷
7
BB07
BB17
BB27
BB37
BB47
BB57
BB67
BB77
BB87
BB97
BBA7
BBB7
BBC7
BBD7
BBE7
BBF7
묈묘묨문뭈뭘뭨뭸뮈뮘뮨뮸믈믘믨미
8
BB08
BB18
BB28
BB38
BB48
BB58
BB68
BB78
BB88
BB98
BBA8
BBB8
BBC8
BBD8
BBE8
BBF8
묉묙묩묹뭉뭙뭩뭹뮉뮙뮩뮹믉믙믩믹
9
BB09
BB19
BB29
BB39
BB49
BB59
BB69
BB79
BB89
BB99
BBA9
BBB9
BBC9
BBD9
BBE9
BBF9
묊묚묪묺뭊뭚뭪뭺뮊뮚뮪뮺믊믚믪믺
A
BB0A
BB1A
BB2A
BB3A
BB4A
BB5A
BB6A
BB7A
BB8A
BB9A
BBAA
BBBA
BBCA
BBDA
BBEA
BBFA
묋묛묫묻뭋뭛뭫뭻뮋뮛뮫뮻믋믛믫믻
B
BB0B
C
D
BB1B
BB2B
BB3B
BB4B
BB5B
BB6B
BB7B
BB8B
BB9B
BBAB
BBBB
BBCB
BBDB
BBEB
BBFB
묌묜묬물뭌뭜뭬뭼뮌뮜뮬뮼믌믜믬민 BB0C
BB1C
BB2C
BB3C
BB4C
BB5C
BB6C
BB7C
BB8C
BB9C
BBAC
BBBC
BBCC
BBDC
BBEC
BBFC
묍묝묭묽뭍뭝뭭뭽뮍뮝뮭뮽믍믝믭믽 BB0D
BB1D
BB2D
BB3D
BB4D
BB5D
BB6D
BB7D
BB8D
BB9D
BBAD
BBBD
BBCD
BBDD
BBED
BBFD
묎묞묮묾뭎뭞뭮뭾뮎뮞뮮뮾믎믞믮믾 BB0E
F
BB20
묁묑묡묱뭁뭑뭡뭱뮁뮑뮡뮱믁믑믡믱
1
E
BB10
BB1E
BB2E
BB3E
BB4E
BB5E
BB6E
BB7E
BB8E
BB9E
BBAE
BBBE
BBCE
BBDE
BBEE
BBFE
묏묟묯묿뭏뭟뭯뭿뮏뮟뮯뮿믏믟믯믿 BB0F
BB1F
BB2F
BB3F
BB4F
BB5F
BB6F
BB7F
BB8F
BB9F
BBAF
BBBF
BBCF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
BBDF
BBEF
BBFF
435
BC00
Hangul Syllables
BCFF
BC0 BC1 BC2 BC3 BC4 BC5 BC6 BC7 BC8 BC9 BCA BCB BCC BCD BCE BCF
밀밐밠배뱀뱐뱠뱰벀벐베벰변볐볠볰
0
BC00
BC01
BC30
BC40
BC50
BC60
BC70
BC80
BC90
BCA0
BCB0
BCC0
BCD0
BCE0
BCF0
BC11
BC21
BC31
BC41
BC51
BC61
BC71
BC81
BC91
BCA1
BCB1
BCC1
BCD1
BCE1
BCF1
밂밒밢밲뱂뱒뱢뱲벂벒벢벲볂볒볢볲
2
BC02
BC12
BC22
BC32
BC42
BC52
BC62
BC72
BC82
BC92
BCA2
BCB2
BCC2
BCD2
BCE2
BCF2
밃밓밣밳뱃뱓뱣뱳벃벓벣벳볃볓볣볳
3
BC03
BC13
BC23
BC33
BC43
BC53
BC63
BC73
BC83
BC93
BCA3
BCB3
BCC3
BCD3
BCE3
BCF3
밄바밤밴뱄뱔뱤뱴버범벤벴별볔볤보
4
BC04
BC14
BC24
BC34
BC44
BC54
BC64
BC74
BC84
BC94
BCA4
BCB4
BCC4
BCD4
BCE4
BCF4
밅박밥밵뱅뱕뱥뱵벅법벥벵볅볕볥복
5
BC05
BC15
BC25
BC35
BC45
BC55
BC65
BC75
BC85
BC95
BCA5
BCB5
BCC5
BCD5
BCE5
BCF5
밆밖밦밶뱆뱖뱦뱶벆벖벦벶볆볖볦볶
6
BC06
BC16
BC26
BC36
BC46
BC56
BC66
BC76
BC86
BC96
BCA6
BCB6
BCC6
BCD6
BCE6
BCF6
밇밗밧밷뱇뱗뱧뱷벇벗벧벷볇볗볧볷
7
BC07
BC17
BC27
BC37
BC47
BC57
BC67
BC77
BC87
BC97
BCA7
BCB7
BCC7
BCD7
BCE7
BCF7
밈반밨밸뱈뱘뱨뱸번벘벨벸볈볘볨본
8
BC08
BC18
BC28
BC38
BC48
BC58
BC68
BC78
BC88
BC98
BCA8
BCB8
BCC8
BCD8
BCE8
BCF8
밉밙방밹뱉뱙뱩뱹벉벙벩벹볉볙볩볹
9
BC09
BC19
BC29
BC39
BC49
BC59
BC69
BC79
BC89
BC99
BCA9
BCB9
BCC9
BCD9
BCE9
BCF9
밊밚밪밺뱊뱚뱪뱺벊벚벪벺볊볚볪볺
A
BC0A
BC1A
BC2A
BC3A
BC4A
BC5A
BC6A
BC7A
BC8A
BC9A
BCAA
BCBA
BCCA
BCDA
BCEA
BCFA
밋받밫밻뱋뱛뱫뱻벋벛벫벻볋볛볫볻
B
BC0B
C
D
BC1B
BC2B
BC3B
BC4B
BC5B
BC6B
BC7B
BC8B
BC9B
BCAB
BCBB
BCCB
BCDB
BCEB
BCFB
밌발밬밼뱌뱜뱬뱼벌벜벬벼볌볜볬볼 BC0C
BC1C
BC2C
BC3C
BC4C
BC5C
BC6C
BC7C
BC8C
BC9C
BCAC
BCBC
BCCC
BCDC
BCEC
BCFC
밍밝밭밽뱍뱝뱭뱽벍벝벭벽볍볝볭볽 BC0D
BC1D
BC2D
BC3D
BC4D
BC5D
BC6D
BC7D
BC8D
BC9D
BCAD
BCBD
BCCD
BCDD
BCED
BCFD
밎밞밮밾뱎뱞뱮뱾벎벞벮벾볎볞볮볾 BC0E
F
BC20
밁밑밡백뱁뱑뱡뱱벁벑벡벱볁병볡볱
1
E
BC10
BC1E
BC2E
BC3E
BC4E
BC5E
BC6E
BC7E
BC8E
BC9E
BCAE
BCBE
BCCE
BCDE
BCEE
BCFE
및밟밯밿뱏뱟뱯뱿벏벟벯벿볏볟볯볿 BC0F
436
BC1F
BC2F
BC3F
BC4F
BC5F
BC6F
BC7F
BC8F
BC9F
BCAF
BCBF
BCCF
BCDF
BCEF
BCFF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
BD00
Hangul Syllables
BDFF
BD0 BD1 BD2 BD3 BD4 BD5 BD6 BD7 BD8 BD9 BDA BDB BDC BDD BDE BDF
봀봐봠봰뵀뵐뵠뵰부붐붠붰뷀뷐뷠뷰
0
BD00
BD01
BD30
BD40
BD50
BD60
BD70
BD80
BD90
BDA0
BDB0
BDC0
BDD0
BDE0
BDF0
BD11
BD21
BD31
BD41
BD51
BD61
BD71
BD81
BD91
BDA1
BDB1
BDC1
BDD1
BDE1
BDF1
봂봒봢봲뵂뵒뵢뵲붂붒붢붲뷂뷒뷢뷲
2
BD02
BD12
BD22
BD32
BD42
BD52
BD62
BD72
BD82
BD92
BDA2
BDB2
BDC2
BDD2
BDE2
BDF2
봃봓봣봳뵃뵓뵣뵳붃붓붣붳뷃뷓뷣뷳
3
BD03
BD13
BD23
BD33
BD43
BD53
BD63
BD73
BD83
BD93
BDA3
BDB3
BDC3
BDD3
BDE3
BDF3
봄봔봤봴뵄뵔뵤뵴분붔붤붴뷄뷔뷤뷴
4
BD04
BD14
BD24
BD34
BD44
BD54
BD64
BD74
BD84
BD94
BDA4
BDB4
BDC4
BDD4
BDE4
BDF4
봅봕봥봵뵅뵕뵥뵵붅붕붥붵뷅뷕뷥뷵
5
BD05
BD15
BD25
BD35
BD45
BD55
BD65
BD75
BD85
BD95
BDA5
BDB5
BDC5
BDD5
BDE5
BDF5
봆봖봦봶뵆뵖뵦뵶붆붖붦붶뷆뷖뷦뷶
6
BD06
BD16
BD26
BD36
BD46
BD56
BD66
BD76
BD86
BD96
BDA6
BDB6
BDC6
BDD6
BDE6
BDF6
봇봗봧봷뵇뵗뵧뵷붇붗붧붷뷇뷗뷧뷷
7
BD07
BD17
BD27
BD37
BD47
BD57
BD67
BD77
BD87
BD97
BDA7
BDB7
BDC7
BDD7
BDE7
BDF7
봈봘봨봸뵈뵘뵨뵸불붘붨붸뷈뷘뷨뷸
8
BD08
BD18
BD28
BD38
BD48
BD58
BD68
BD78
BD88
BD98
BDA8
BDB8
BDC8
BDD8
BDE8
BDF8
봉봙봩봹뵉뵙뵩뵹붉붙붩붹뷉뷙뷩뷹
9
BD09
BD19
BD29
BD39
BD49
BD59
BD69
BD79
BD89
BD99
BDA9
BDB9
BDC9
BDD9
BDE9
BDF9
봊봚봪봺뵊뵚뵪뵺붊붚붪붺뷊뷚뷪뷺
A
BD0A
BD1A
BD2A
BD3A
BD4A
BD5A
BD6A
BD7A
BD8A
BD9A
BDAA
BDBA
BDCA
BDDA
BDEA
BDFA
봋봛봫봻뵋뵛뵫뵻붋붛붫붻뷋뷛뷫뷻
B
BD0B
C
D
BD1B
BD2B
BD3B
BD4B
BD5B
BD6B
BD7B
BD8B
BD9B
BDAB
BDBB
BDCB
BDDB
BDEB
BDFB
봌봜봬봼뵌뵜뵬뵼붌붜붬붼뷌뷜뷬뷼 BD0C
BD1C
BD2C
BD3C
BD4C
BD5C
BD6C
BD7C
BD8C
BD9C
BDAC
BDBC
BDCC
BDDC
BDEC
BDFC
봍봝봭봽뵍뵝뵭뵽붍붝붭붽뷍뷝뷭뷽 BD0D
BD1D
BD2D
BD3D
BD4D
BD5D
BD6D
BD7D
BD8D
BD9D
BDAD
BDBD
BDCD
BDDD
BDED
BDFD
봎봞봮봾뵎뵞뵮뵾붎붞붮붾뷎뷞뷮뷾 BD0E
F
BD20
봁봑봡봱뵁뵑뵡뵱북붑붡붱뷁뷑뷡뷱
1
E
BD10
BD1E
BD2E
BD3E
BD4E
BD5E
BD6E
BD7E
BD8E
BD9E
BDAE
BDBE
BDCE
BDDE
BDEE
BDFE
봏봟봯봿뵏뵟뵯뵿붏붟붯붿뷏뷟뷯뷿 BD0F
BD1F
BD2F
BD3F
BD4F
BD5F
BD6F
BD7F
BD8F
BD9F
BDAF
BDBF
BDCF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
BDDF
BDEF
BDFF
437
BE00
Hangul Syllables
BEFF
BE0 BE1 BE2 BE3 BE4 BE5 BE6 BE7 BE8 BE9 BEA BEB BEC BED BEE BEF
븀븐븠븰빀빐빠빰뺀뺐뺠뺰뻀뻐뻠뻰
0
BE00
BE01
BE30
BE40
BE50
BE60
BE70
BE80
BE90
BEA0
BEB0
BEC0
BED0
BEE0
BEF0
BE11
BE21
BE31
BE41
BE51
BE61
BE71
BE81
BE91
BEA1
BEB1
BEC1
BED1
BEE1
BEF1
븂븒븢븲빂빒빢빲뺂뺒뺢뺲뻂뻒뻢뻲
2
BE02
BE12
BE22
BE32
BE42
BE52
BE62
BE72
BE82
BE92
BEA2
BEB2
BEC2
BED2
BEE2
BEF2
븃븓븣븳빃빓빣빳뺃뺓뺣뺳뻃뻓뻣뻳
3
BE03
BE13
BE23
BE33
BE43
BE53
BE63
BE73
BE83
BE93
BEA3
BEB3
BEC3
BED3
BEE3
BEF3
븄블븤븴비빔빤빴뺄뺔뺤뺴뻄뻔뻤뻴
4
BE04
BE14
BE24
BE34
BE44
BE54
BE64
BE74
BE84
BE94
BEA4
BEB4
BEC4
BED4
BEE4
BEF4
븅븕븥븵빅빕빥빵뺅뺕뺥뺵뻅뻕뻥뻵
5
BE05
BE15
BE25
BE35
BE45
BE55
BE65
BE75
BE85
BE95
BEA5
BEB5
BEC5
BED5
BEE5
BEF5
븆븖븦븶빆빖빦빶뺆뺖뺦뺶뻆뻖뻦뻶
6
BE06
BE16
BE26
BE36
BE46
BE56
BE66
BE76
BE86
BE96
BEA6
BEB6
BEC6
BED6
BEE6
BEF6
븇븗븧븷빇빗빧빷뺇뺗뺧뺷뻇뻗뻧뻷
7
BE07
BE17
BE27
BE37
BE47
BE57
BE67
BE77
BE87
BE97
BEA7
BEB7
BEC7
BED7
BEE7
BEF7
븈븘븨븸빈빘빨빸뺈뺘뺨뺸뻈뻘뻨뻸
8
BE08
BE18
BE28
BE38
BE48
BE58
BE68
BE78
BE88
BE98
BEA8
BEB8
BEC8
BED8
BEE8
BEF8
븉븙븩븹빉빙빩빹뺉뺙뺩뺹뻉뻙뻩뻹
9
BE09
BE19
BE29
BE39
BE49
BE59
BE69
BE79
BE89
BE99
BEA9
BEB9
BEC9
BED9
BEE9
BEF9
븊븚븪븺빊빚빪빺뺊뺚뺪뺺뻊뻚뻪뻺
A
BE0A
BE1A
BE2A
BE3A
BE4A
BE5A
BE6A
BE7A
BE8A
BE9A
BEAA
BEBA
BECA
BEDA
BEEA
BEFA
븋븛븫븻빋빛빫빻뺋뺛뺫뺻뻋뻛뻫뻻
B
BE0B
C
D
BE1B
BE2B
BE3B
BE4B
BE5B
BE6B
BE7B
BE8B
BE9B
BEAB
BEBB
BECB
BEDB
BEEB
BEFB
브븜븬븼빌빜빬빼뺌뺜뺬뺼뻌뻜뻬뻼 BE0C
BE1C
BE2C
BE3C
BE4C
BE5C
BE6C
BE7C
BE8C
BE9C
BEAC
BEBC
BECC
BEDC
BEEC
BEFC
븍븝븭븽빍빝빭빽뺍뺝뺭뺽뻍뻝뻭뻽 BE0D
BE1D
BE2D
BE3D
BE4D
BE5D
BE6D
BE7D
BE8D
BE9D
BEAD
BEBD
BECD
BEDD
BEED
BEFD
븎븞븮븾빎빞빮빾뺎뺞뺮뺾뻎뻞뻮뻾 BE0E
F
BE20
븁븑븡븱빁빑빡빱뺁뺑뺡뺱뻁뻑뻡뻱
1
E
BE10
BE1E
BE2E
BE3E
BE4E
BE5E
BE6E
BE7E
BE8E
BE9E
BEAE
BEBE
BECE
BEDE
BEEE
BEFE
븏븟븯븿빏빟빯빿뺏뺟뺯뺿뻏뻟뻯뻿 BE0F
438
BE1F
BE2F
BE3F
BE4F
BE5F
BE6F
BE7F
BE8F
BE9F
BEAF
BEBF
BECF
BEDF
BEEF
BEFF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
BF00
Hangul Syllables
BFFF
BF0 BF1 BF2 BF3 BF4 BF5 BF6 BF7 BF8 BF9 BFA BFB BFC BFD BFE BFF
뼀뼐뼠뼰뽀뽐뽠뽰뾀뾐뾠뾰뿀뿐뿠뿰
0
BF00
BF01
BF30
BF40
BF50
BF60
BF70
BF80
BF90
BFA0
BFB0
BFC0
BFD0
BFE0
BFF0
BF11
BF21
BF31
BF41
BF51
BF61
BF71
BF81
BF91
BFA1
BFB1
BFC1
BFD1
BFE1
BFF1
뼂뼒뼢뼲뽂뽒뽢뽲뾂뾒뾢뾲뿂뿒뿢뿲
2
BF02
BF12
BF22
BF32
BF42
BF52
BF62
BF72
BF82
BF92
BFA2
BFB2
BFC2
BFD2
BFE2
BFF2
뼃뼓뼣뼳뽃뽓뽣뽳뾃뾓뾣뾳뿃뿓뿣뿳
3
BF03
BF13
BF23
BF33
BF43
BF53
BF63
BF73
BF83
BF93
BFA3
BFB3
BFC3
BFD3
BFE3
BFF3
뼄뼔뼤뼴뽄뽔뽤뽴뾄뾔뾤뾴뿄뿔뿤뿴
4
BF04
BF14
BF24
BF34
BF44
BF54
BF64
BF74
BF84
BF94
BFA4
BFB4
BFC4
BFD4
BFE4
BFF4
뼅뼕뼥뼵뽅뽕뽥뽵뾅뾕뾥뾵뿅뿕뿥뿵
5
BF05
BF15
BF25
BF35
BF45
BF55
BF65
BF75
BF85
BF95
BFA5
BFB5
BFC5
BFD5
BFE5
BFF5
뼆뼖뼦뼶뽆뽖뽦뽶뾆뾖뾦뾶뿆뿖뿦뿶
6
BF06
BF16
BF26
BF36
BF46
BF56
BF66
BF76
BF86
BF96
BFA6
BFB6
BFC6
BFD6
BFE6
BFF6
뼇뼗뼧뼷뽇뽗뽧뽷뾇뾗뾧뾷뿇뿗뿧뿷
7
BF07
BF17
BF27
BF37
BF47
BF57
BF67
BF77
BF87
BF97
BFA7
BFB7
BFC7
BFD7
BFE7
BFF7
뼈뼘뼨뼸뽈뽘뽨뽸뾈뾘뾨뾸뿈뿘뿨뿸
8
BF08
BF18
BF28
BF38
BF48
BF58
BF68
BF78
BF88
BF98
BFA8
BFB8
BFC8
BFD8
BFE8
BFF8
뼉뼙뼩뼹뽉뽙뽩뽹뾉뾙뾩뾹뿉뿙뿩뿹
9
BF09
BF19
BF29
BF39
BF49
BF59
BF69
BF79
BF89
BF99
BFA9
BFB9
BFC9
BFD9
BFE9
BFF9
뼊뼚뼪뼺뽊뽚뽪뽺뾊뾚뾪뾺뿊뿚뿪뿺
A
BF0A
BF1A
BF2A
BF3A
BF4A
BF5A
BF6A
BF7A
BF8A
BF9A
BFAA
BFBA
BFCA
BFDA
BFEA
BFFA
뼋뼛뼫뼻뽋뽛뽫뽻뾋뾛뾫뾻뿋뿛뿫뿻
B
BF0B
C
D
BF1B
BF2B
BF3B
BF4B
BF5B
BF6B
BF7B
BF8B
BF9B
BFAB
BFBB
BFCB
BFDB
BFEB
BFFB
뼌뼜뼬뼼뽌뽜뽬뽼뾌뾜뾬뾼뿌뿜뿬뿼 BF0C
BF1C
BF2C
BF3C
BF4C
BF5C
BF6C
BF7C
BF8C
BF9C
BFAC
BFBC
BFCC
BFDC
BFEC
BFFC
뼍뼝뼭뼽뽍뽝뽭뽽뾍뾝뾭뾽뿍뿝뿭뿽 BF0D
BF1D
BF2D
BF3D
BF4D
BF5D
BF6D
BF7D
BF8D
BF9D
BFAD
BFBD
BFCD
BFDD
BFED
BFFD
뼎뼞뼮뼾뽎뽞뽮뽾뾎뾞뾮뾾뿎뿞뿮뿾 BF0E
F
BF20
뼁뼑뼡뼱뽁뽑뽡뽱뾁뾑뾡뾱뿁뿑뿡뿱
1
E
BF10
BF1E
BF2E
BF3E
BF4E
BF5E
BF6E
BF7E
BF8E
BF9E
BFAE
BFBE
BFCE
BFDE
BFEE
BFFE
뼏뼟뼯뼿뽏뽟뽯뽿뾏뾟뾯뾿뿏뿟뿯뿿 BF0F
BF1F
BF2F
BF3F
BF4F
BF5F
BF6F
BF7F
BF8F
BF9F
BFAF
BFBF
BFCF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
BFDF
BFEF
BFFF
439
C000
Hangul Syllables
C0FF
C00 C01 C02 C03 C04 C05 C06 C07 C08 C09 C0A C0B C0C C0D C0E C0F
쀀쀐쀠쀰쁀쁐쁠쁰삀삐삠산샀샐샠샰
0
C000
C001
C030
C040
C050
C060
C070
C080
C090
C0A0
C0B0
C0C0
C0D0
C0E0
C0F0
C011
C021
C031
C041
C051
C061
C071
C081
C091
C0A1
C0B1
C0C1
C0D1
C0E1
C0F1
쀂쀒쀢쀲쁂쁒쁢쁲삂삒삢삲샂샒샢샲
2
C002
C012
C022
C032
C042
C052
C062
C072
C082
C092
C0A2
C0B2
C0C2
C0D2
C0E2
C0F2
쀃쀓쀣쀳쁃쁓쁣쁳삃삓삣삳샃샓샣샳
3
C003
C013
C023
C033
C043
C053
C063
C073
C083
C093
C0A3
C0B3
C0C3
C0D3
C0E3
C0F3
쀄쀔쀤쀴쁄쁔쁤쁴삄삔삤살샄샔샤샴
4
C004
C014
C024
C034
C044
C054
C064
C074
C084
C094
C0A4
C0B4
C0C4
C0D4
C0E4
C0F4
쀅쀕쀥쀵쁅쁕쁥쁵삅삕삥삵샅샕샥샵
5
C005
C015
C025
C035
C045
C055
C065
C075
C085
C095
C0A5
C0B5
C0C5
C0D5
C0E5
C0F5
쀆쀖쀦쀶쁆쁖쁦쁶삆삖삦삶샆샖샦샶
6
C006
C016
C026
C036
C046
C056
C066
C076
C086
C096
C0A6
C0B6
C0C6
C0D6
C0E6
C0F6
쀇쀗쀧쀷쁇쁗쁧쁷삇삗삧삷샇샗샧샷
7
C007
C017
C027
C037
C047
C057
C067
C077
C087
C097
C0A7
C0B7
C0C7
C0D7
C0E7
C0F7
쀈쀘쀨쀸쁈쁘쁨쁸삈삘삨삸새샘샨샸
8
C008
C018
C028
C038
C048
C058
C068
C078
C088
C098
C0A8
C0B8
C0C8
C0D8
C0E8
C0F8
쀉쀙쀩쀹쁉쁙쁩쁹삉삙삩삹색샙샩샹
9
C009
C019
C029
C039
C049
C059
C069
C079
C089
C099
C0A9
C0B9
C0C9
C0D9
C0E9
C0F9
쀊쀚쀪쀺쁊쁚쁪쁺삊삚삪삺샊샚샪샺
A
C00A
C01A
C02A
C03A
C04A
C05A
C06A
C07A
C08A
C09A
C0AA
C0BA
C0CA
C0DA
C0EA
C0FA
쀋쀛쀫쀻쁋쁛쁫쁻삋삛삫삻샋샛샫샻
B
C00B
C
D
C01B
C02B
C03B
C04B
C05B
C06B
C07B
C08B
C09B
C0AB
C0BB
C0CB
C0DB
C0EB
C0FB
쀌쀜쀬쀼쁌쁜쁬쁼삌삜사삼샌샜샬샼 C00C
C01C
C02C
C03C
C04C
C05C
C06C
C07C
C08C
C09C
C0AC
C0BC
C0CC
C0DC
C0EC
C0FC
쀍쀝쀭쀽쁍쁝쁭쁽삍삝삭삽샍생샭샽 C00D
C01D
C02D
C03D
C04D
C05D
C06D
C07D
C08D
C09D
C0AD
C0BD
C0CD
C0DD
C0ED
C0FD
쀎쀞쀮쀾쁎쁞쁮쁾삎삞삮삾샎샞샮샾 C00E
F
C020
쀁쀑쀡쀱쁁쁑쁡쁱삁삑삡삱상샑샡샱
1
E
C010
C01E
C02E
C03E
C04E
C05E
C06E
C07E
C08E
C09E
C0AE
C0BE
C0CE
C0DE
C0EE
C0FE
쀏쀟쀯쀿쁏쁟쁯쁿삏삟삯삿샏샟샯샿 C00F
440
C01F
C02F
C03F
C04F
C05F
C06F
C07F
C08F
C09F
C0AF
C0BF
C0CF
C0DF
C0EF
C0FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
CJK Compatibility Ideographs Range: F900–FAFF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
F900 F90
F900
F92
F93
F94
F95
F96
F97
F98
F99 F9A F9B F9C F9D F9E F9F
F910
F920
F930
F940
F950
F960
F970
F980
F990
F9A0
F9B0
F9C0
F9D0
F9E0
F9F0
更螺嵐櫓論陋率辰女撚說鈴療六李隣
1
F901
F911
F921
F931
F941
F951
F961
F971
F981
F991
F9A1
F9B1
F9C1
F9D1
F9E1
F9F1
車裸濫爐壟勒異沈廬漣廉零蓼戮梨鱗
2
F902
F912
F922
F932
F942
F952
F962
F972
F982
F992
F9A2
F9B2
F9C2
F9D2
F9E2
F9F2
賈邏藍盧弄肋北拾旅煉念靈遼陸泥麟
3
F903
F913
F923
F933
F943
F953
F963
F973
F983
F993
F9A3
F9B3
F9C3
F9D3
F9E3
F9F3
滑樂襤老籠凜磻若濾璉捻領龍倫理林
4
F904
F914
F924
F934
F944
F954
F964
F974
F984
F994
F9A4
F9B4
F9C4
F9D4
F9E4
F9F4
串洛拉蘆聾凌便掠礪秊殮例暈崙痢淋
5
F905
F915
F925
F935
F945
F955
F965
F975
F985
F995
F9A5
F9B5
F9C5
F9D5
F9E5
F9F5
句烙臘虜牢稜復略閭練簾禮阮淪罹臨
6
F906
F916
F926
F936
F946
F956
F966
F976
F986
F996
F9A6
F9B6
F9C6
F9D6
F9E6
F9F6
龜珞蠟路磊綾不亮驪聯獵醴劉輪裏立
7
F907
F917
F927
F937
F947
F957
F967
F977
F987
F997
F9A7
F9B7
F9C7
F9D7
F9E7
F9F7
龜落廊露賂菱泌兩麗輦令隸杻律裡笠
8
F908
F918
F928
F938
F948
F958
F968
F978
F988
F998
F9A8
F9B8
F9C8
F9D8
F9E8
F9F8
契酪朗魯雷陵數凉黎蓮囹惡柳慄里粒
9
F909
F919
F929
F939
F949
F959
F969
F979
F989
F999
F9A9
F9B9
F9C9
F9D9
F9E9
F9F9
金駱浪鷺壘讀索梁力連寧了流栗離狀
A
F90A
F91A
F92A
F93A
F94A
F95A
F96A
F97A
F98A
F99A
F9AA
F9BA
F9CA
F9DA
F9EA
F9FA
喇亂狼碌屢拏參糧曆鍊嶺僚溜率匿炙
B
F90B
C
D
F91B
F92B
F93B
F94B
F95B
F96B
F97B
F98B
F99B
F9AB
F9BB
F9CB
F9DB
F9EB
F9FB
奈卵郎祿樓樂塞良歷列怜寮琉隆溺識 F90C
F91C
F92C
F93C
F94C
F95C
F96C
F97C
F98C
F99C
F9AC
F9BC
F9CC
F9DC
F9EC
F9FC
懶欄來綠淚諾省諒轢劣玲尿留利吝什 F90D
F91D
F92D
F93D
F94D
F95D
F96D
F97D
F98D
F99D
F9AD
F9BD
F9CD
F9DD
F9ED
F9FD
癩爛冷菉漏丹葉量年咽瑩料硫吏燐茶 F90E
F
F91
F9FF
豈蘿鸞擄鹿縷怒殺呂戀裂聆燎類易藺
0
E
CJK Compatibility Ideographs
F91E
F92E
F93E
F94E
F95E
F96E
F97E
F98E
F99E
F9AE
F9BE
F9CE
F9DE
F9EE
F9FE
羅蘭勞錄累寧說勵憐烈羚樂紐履璘刺 F90F
464
F91F
F92F
F93F
F94F
F95F
F96F
F97F
F98F
F99F
F9AF
F9BF
F9CF
F9DF
F9EF
F9FF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
FA00
CJK Compatibility Ideographs
FAFF
FA0 FA1 FA2 FA3 FA4 FA5 FA6 FA7 FA8 FA9 FAA FAB FAC FAD FAE FAF
切塚蘒 " 2=M]m}
0
FA00
FA01
FA30
FA40
FA50
FA60
FA70
FA80
FA90
FAA0
FAB0
FAC0
FAD0
FA11
FA21
FA31
FA41
FA51
FA61
FA71
FA81
FA91
FAA1
FAB1
FAC1
FAD1
拓晴諸 $ 4?O_o
2
FA02
FA12
FA22
FA32
FA42
FA52
FA62
FA72
FA82
FA92
FAA2
FAB2
FAC2
FAD2
糖﨓﨣 % 5@P`p
3
FA03
FA13
FA23
FA33
FA43
FA53
FA63
FA73
FA83
FA93
FAA3
FAB3
FAC3
FAD3
宅﨔﨤 & 6AQaq¡
4
FA04
FA14
FA24
FA34
FA44
FA54
FA64
FA74
FA84
FA94
FAA4
FAB4
FAC4
FAD4
洞凞逸 ' 7BRbr¢
5
FA05
FA15
FA25
FA35
FA45
FA55
FA65
FA75
FA85
FA95
FAA5
FAB5
FAC5
FAD5
暴猪都 ( 8CScs£
6
FA06
FA16
FA26
FA36
FA46
FA56
FA66
FA76
FA86
FA96
FAA6
FAB6
FAC6
FAD6
輻益﨧 ) 9DTdt¤
7
FA07
FA17
FA27
FA37
FA47
FA57
FA67
FA77
FA87
FA97
FAA7
FAB7
FAC7
FAD7
行礼﨨 * :EUeu
¥
8
FA08
FA18
FA28
FA38
FA48
FA58
FA68
FA78
FA88
FA98
FAA8
FAB8
FAC8
FAD8
降神﨩 + ;FVfv¦
9
FA09
FA19
FA29
FA39
FA49
FA59
FA69
FA79
FA89
FA99
FAA9
FAB9
FAC9
FAD9
見祥飯 ,
A
FA0A
B
C
FA2A
FA3A
FA4A
FA5A
FA6A
FA7A
FA8A
FA9A
FAAA
FABA
FACA
HXhx
兀靖館 .
IYiy
嗀精鶴 /
JZjz
FA0C
D
FA1A
廓福飼 FA0B
FA0D
FA1B
FA1C
FA1D
FA2B
FA2C
FA2D
FA3B
FA3C
FA3D
FA4B
FA4C
FA4D
FA5B
FA5C
FA5D
FA7B
FA7C
FA7D
FA8B
FA8C
FA8D
FA9B
FA9C
FA9D
FAAB
FAAC
FAAD
FABB
FABC
FABD
FACB
FACC
FACD
﨎羽
0
K[k{
﨏﨟
!1
L\l|
FA0E
F
FA20
度﨑﨡 # 3>N^n~
1
E
FA10
FA0F
FA1E
FA1F
FA3E
FA3F
FA4E
FA4F
FA5E
FA5F
FA7E
FA7F
FA8E
FA8F
FA9E
FA9F
FAAE
FAAF
FABE
FABF
FACE
FACF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
465
F900
CJK Compatibility Ideographs
Pronunciation variants from KS X 1001:1998
F900 豈 CJK COMPATIBILITY IDEOGRAPH-F900 ≡ 8C48 豈 F901 更 CJK COMPATIBILITY IDEOGRAPH-F901 ≡ 66F4 更 F902 車 CJK COMPATIBILITY IDEOGRAPH-F902 ≡ 8ECA 車 F903 賈 CJK COMPATIBILITY IDEOGRAPH-F903 ≡ 8CC8 賈 F904 滑 CJK COMPATIBILITY IDEOGRAPH-F904 ≡ 6ED1 滑 F905 串 CJK COMPATIBILITY IDEOGRAPH-F905 ≡ 4E32 串 F906 句 CJK COMPATIBILITY IDEOGRAPH-F906 ≡ 53E5 句 F907 龜 CJK COMPATIBILITY IDEOGRAPH-F907 ≡ 9F9C 龜 F908 龜 CJK COMPATIBILITY IDEOGRAPH-F908 ≡ 9F9C 龜 F909 契 CJK COMPATIBILITY IDEOGRAPH-F909 ≡ 5951 契 F90A 金 CJK COMPATIBILITY IDEOGRAPH-F90A ≡ 91D1 金 F90B 喇 CJK COMPATIBILITY IDEOGRAPH-F90B ≡ 5587 喇 F90C 奈 CJK COMPATIBILITY IDEOGRAPH-F90C ≡ 5948 奈 F90D 懶 CJK COMPATIBILITY IDEOGRAPH-F90D ≡ 61F6 懶 F90E 癩 CJK COMPATIBILITY IDEOGRAPH-F90E ≡ 7669 癩 F90F 羅 CJK COMPATIBILITY IDEOGRAPH-F90F ≡ 7F85 羅 F910 蘿 CJK COMPATIBILITY IDEOGRAPH-F910 ≡ 863F 蘿 F911 螺 CJK COMPATIBILITY IDEOGRAPH-F911 ≡ 87BA 螺 F912 裸 CJK COMPATIBILITY IDEOGRAPH-F912 ≡ 88F8 裸 F913 邏 CJK COMPATIBILITY IDEOGRAPH-F913 ≡ 908F 邏 F914 樂 CJK COMPATIBILITY IDEOGRAPH-F914 ≡ 6A02 樂 F915 洛 CJK COMPATIBILITY IDEOGRAPH-F915 ≡ 6D1B 洛 F916 烙 CJK COMPATIBILITY IDEOGRAPH-F916 ≡ 70D9 烙 F917 珞 CJK COMPATIBILITY IDEOGRAPH-F917 ≡ 73DE 珞 F918 落 CJK COMPATIBILITY IDEOGRAPH-F918 ≡ 843D 落 F919 酪 CJK COMPATIBILITY IDEOGRAPH-F919 ≡ 916A 酪 F91A 駱 CJK COMPATIBILITY IDEOGRAPH-F91A ≡ 99F1 駱 F91B 亂 CJK COMPATIBILITY IDEOGRAPH-F91B ≡ 4E82 亂 F91C 卵 CJK COMPATIBILITY IDEOGRAPH-F91C ≡ 5375 卵
466
F93B
F91D 欄 CJK COMPATIBILITY IDEOGRAPH-F91D ≡ 6B04 欄 F91E 爛 CJK COMPATIBILITY IDEOGRAPH-F91E ≡ 721B 爛 F91F 蘭 CJK COMPATIBILITY IDEOGRAPH-F91F ≡ 862D 蘭 F920 鸞 CJK COMPATIBILITY IDEOGRAPH-F920 ≡ 9E1E 鸞 F921 嵐 CJK COMPATIBILITY IDEOGRAPH-F921 ≡ 5D50 嵐 F922 濫 CJK COMPATIBILITY IDEOGRAPH-F922 ≡ 6FEB 濫 F923 藍 CJK COMPATIBILITY IDEOGRAPH-F923 ≡ 85CD 藍 F924 襤 CJK COMPATIBILITY IDEOGRAPH-F924 ≡ 8964 襤 F925 拉 CJK COMPATIBILITY IDEOGRAPH-F925 ≡ 62C9 拉 F926 臘 CJK COMPATIBILITY IDEOGRAPH-F926 ≡ 81D8 臘 F927 蠟 CJK COMPATIBILITY IDEOGRAPH-F927 ≡ 881F 蠟 F928 廊 CJK COMPATIBILITY IDEOGRAPH-F928 ≡ 5ECA 廊 F929 朗 CJK COMPATIBILITY IDEOGRAPH-F929 ≡ 6717 朗 F92A 浪 CJK COMPATIBILITY IDEOGRAPH-F92A ≡ 6D6A 浪 F92B 狼 CJK COMPATIBILITY IDEOGRAPH-F92B ≡ 72FC 狼 F92C 郎 CJK COMPATIBILITY IDEOGRAPH-F92C ≡ 90CE 郎 F92D 來 CJK COMPATIBILITY IDEOGRAPH-F92D ≡ 4F86 來 F92E 冷 CJK COMPATIBILITY IDEOGRAPH-F92E ≡ 51B7 冷 F92F 勞 CJK COMPATIBILITY IDEOGRAPH-F92F ≡ 52DE 勞 F930 擄 CJK COMPATIBILITY IDEOGRAPH-F930 ≡ 64C4 擄 F931 櫓 CJK COMPATIBILITY IDEOGRAPH-F931 ≡ 6AD3 櫓 F932 爐 CJK COMPATIBILITY IDEOGRAPH-F932 ≡ 7210 爐 F933 盧 CJK COMPATIBILITY IDEOGRAPH-F933 ≡ 76E7 盧 F934 老 CJK COMPATIBILITY IDEOGRAPH-F934 ≡ 8001 老 F935 蘆 CJK COMPATIBILITY IDEOGRAPH-F935 ≡ 8606 蘆 F936 虜 CJK COMPATIBILITY IDEOGRAPH-F936 ≡ 865C 虜 F937 路 CJK COMPATIBILITY IDEOGRAPH-F937 ≡ 8DEF 路 F938 露 CJK COMPATIBILITY IDEOGRAPH-F938 ≡ 9732 露 F939 魯 CJK COMPATIBILITY IDEOGRAPH-F939 ≡ 9B6F 魯 F93A 鷺 CJK COMPATIBILITY IDEOGRAPH-F93A ≡ 9DFA 鷺 F93B 碌 CJK COMPATIBILITY IDEOGRAPH-F93B ≡ 788C 碌
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
F93C
CJK Compatibility Ideographs
F93C 祿 CJK COMPATIBILITY IDEOGRAPH-F93C ≡ 797F 祿 F93D 綠 CJK COMPATIBILITY IDEOGRAPH-F93D ≡ 7DA0 綠 F93E 菉 CJK COMPATIBILITY IDEOGRAPH-F93E ≡ 83C9 菉 F93F 錄 CJK COMPATIBILITY IDEOGRAPH-F93F ≡ 9304 錄 F940 鹿 CJK COMPATIBILITY IDEOGRAPH-F940 ≡ 9E7F 鹿 F941 論 CJK COMPATIBILITY IDEOGRAPH-F941 ≡ 8AD6 論 F942 壟 CJK COMPATIBILITY IDEOGRAPH-F942 ≡ 58DF 壟 F943 弄 CJK COMPATIBILITY IDEOGRAPH-F943 ≡ 5F04 弄 F944 籠 CJK COMPATIBILITY IDEOGRAPH-F944 ≡ 7C60 籠 F945 聾 CJK COMPATIBILITY IDEOGRAPH-F945 ≡ 807E 聾 F946 牢 CJK COMPATIBILITY IDEOGRAPH-F946 ≡ 7262 牢 F947 磊 CJK COMPATIBILITY IDEOGRAPH-F947 ≡ 78CA 磊 F948 賂 CJK COMPATIBILITY IDEOGRAPH-F948 ≡ 8CC2 賂 F949 雷 CJK COMPATIBILITY IDEOGRAPH-F949 ≡ 96F7 雷 F94A 壘 CJK COMPATIBILITY IDEOGRAPH-F94A ≡ 58D8 壘 F94B 屢 CJK COMPATIBILITY IDEOGRAPH-F94B ≡ 5C62 屢 F94C 樓 CJK COMPATIBILITY IDEOGRAPH-F94C ≡ 6A13 樓 F94D 淚 CJK COMPATIBILITY IDEOGRAPH-F94D ≡ 6DDA 淚 F94E 漏 CJK COMPATIBILITY IDEOGRAPH-F94E ≡ 6F0F 漏 F94F 累 CJK COMPATIBILITY IDEOGRAPH-F94F ≡ 7D2F 累 F950 縷 CJK COMPATIBILITY IDEOGRAPH-F950 ≡ 7E37 縷 F951 陋 CJK COMPATIBILITY IDEOGRAPH-F951 ≡ 964B 陋 F952 勒 CJK COMPATIBILITY IDEOGRAPH-F952 ≡ 52D2 勒 F953 肋 CJK COMPATIBILITY IDEOGRAPH-F953 ≡ 808B 肋 F954 凜 CJK COMPATIBILITY IDEOGRAPH-F954 ≡ 51DC 凜 F955 凌 CJK COMPATIBILITY IDEOGRAPH-F955 ≡ 51CC 凌 F956 稜 CJK COMPATIBILITY IDEOGRAPH-F956 ≡ 7A1C 稜 F957 綾 CJK COMPATIBILITY IDEOGRAPH-F957 ≡ 7DBE 綾 F958 菱 CJK COMPATIBILITY IDEOGRAPH-F958 ≡ 83F1 菱 F959 陵 CJK COMPATIBILITY IDEOGRAPH-F959 ≡ 9675 陵 F95A 讀 CJK COMPATIBILITY IDEOGRAPH-F95A ≡ 8B80 讀
F979
F95B 拏 CJK COMPATIBILITY IDEOGRAPH-F95B ≡ 62CF 拏 F95C 樂 CJK COMPATIBILITY IDEOGRAPH-F95C ≡ 6A02 樂 F95D 諾 CJK COMPATIBILITY IDEOGRAPH-F95D ≡ 8AFE 諾 F95E 丹 CJK COMPATIBILITY IDEOGRAPH-F95E ≡ 4E39 丹 F95F 寧 CJK COMPATIBILITY IDEOGRAPH-F95F ≡ 5BE7 寧 F960 怒 CJK COMPATIBILITY IDEOGRAPH-F960 ≡ 6012 怒 F961 率 CJK COMPATIBILITY IDEOGRAPH-F961 ≡ 7387 率 F962 異 CJK COMPATIBILITY IDEOGRAPH-F962 ≡ 7570 異 F963 北 CJK COMPATIBILITY IDEOGRAPH-F963 ≡ 5317 北 F964 磻 CJK COMPATIBILITY IDEOGRAPH-F964 ≡ 78FB 磻 F965 便 CJK COMPATIBILITY IDEOGRAPH-F965 ≡ 4FBF 便 F966 復 CJK COMPATIBILITY IDEOGRAPH-F966 ≡ 5FA9 復 F967 不 CJK COMPATIBILITY IDEOGRAPH-F967 ≡ 4E0D 不 F968 泌 CJK COMPATIBILITY IDEOGRAPH-F968 ≡ 6CCC 泌 F969 數 CJK COMPATIBILITY IDEOGRAPH-F969 ≡ 6578 數 F96A 索 CJK COMPATIBILITY IDEOGRAPH-F96A ≡ 7D22 索 F96B 參 CJK COMPATIBILITY IDEOGRAPH-F96B ≡ 53C3 參 F96C 塞 CJK COMPATIBILITY IDEOGRAPH-F96C ≡ 585E 塞 F96D 省 CJK COMPATIBILITY IDEOGRAPH-F96D ≡ 7701 省 F96E 葉 CJK COMPATIBILITY IDEOGRAPH-F96E ≡ 8449 葉 F96F 說 CJK COMPATIBILITY IDEOGRAPH-F96F ≡ 8AAA 說 F970 殺 CJK COMPATIBILITY IDEOGRAPH-F970 ≡ 6BBA 殺 F971 辰 CJK COMPATIBILITY IDEOGRAPH-F971 ≡ 8FB0 辰 F972 沈 CJK COMPATIBILITY IDEOGRAPH-F972 ≡ 6C88 沈 F973 拾 CJK COMPATIBILITY IDEOGRAPH-F973 ≡ 62FE 拾 F974 若 CJK COMPATIBILITY IDEOGRAPH-F974 ≡ 82E5 若 F975 掠 CJK COMPATIBILITY IDEOGRAPH-F975 ≡ 63A0 掠 F976 略 CJK COMPATIBILITY IDEOGRAPH-F976 ≡ 7565 略 F977 亮 CJK COMPATIBILITY IDEOGRAPH-F977 ≡ 4EAE 亮 F978 兩 CJK COMPATIBILITY IDEOGRAPH-F978 ≡ 5169 兩 F979 凉 CJK COMPATIBILITY IDEOGRAPH-F979 ≡ 51C9 凉
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
467
Alphabetic Presentation Forms Range: FB00–FB4F This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
FB00
Alphabetic Presentation Forms FB0 0
FB1
fi
"
fl
FB20
FB21
FB02
3
FB13
FB14
FB15
FB16
FB23
FB24
FB25
FB26
FB44
FB35
FB36
FB46
& FB47
FB38
FB48
(
9
FB29
FB39
FB49
)
A
FB2A
FB3A
FB4A
*
B
FB2B
FB3B
FB4B
+
C
FB2C
FB3C
FB2E
FB1F
FB4D
FB2D
ﬞ
FB1E
FB2F
FB4C
,
FB1D
474
FB34
FB27
FB28
F
FB43
'
8
E
FB33
FB17
D
FB32
% FB06
7
FB41
FB05
6
FB31
FB40
ffl $ FB04
5
FB22
FB30
ffi # FB03
4
FB4
!
FB01
2
FB3
ff FB00
1
FB2
FB4F
FB3E
FB4E
. FB4F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Arabic Presentation Forms-A Range: FB50–FDFF This file contains an excerpt from the character code tables and list of character names for
The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.
See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.
Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.
See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.
See http://www.unicode.org/charts/fonts.html for a list.
Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.
See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.
FB50 FB5 0
FB50
1
FB51
2
5
FB5D
E
FB5E
F
$
FB72
FB66
FB73
FB67
FB68
FB69
FB6A
FB6B
FB6C
FB8
2
FB80
FB9
B
FB90
3 C
FB81
4
FB82
FB91
D
FB92
FB93
S
FBA1
q
FBE0
FBB0
c
r
FBE1
FBB1
T
s
FBE2
FBA2
FBA3
d
FBD3
t
FBE3
FB94
FBA4
V
e u
' 7
G
W
f
FB75
(
FB76
FB77
*
FB78
6
FB85
8
FB86
FB7A
FB7B
.
FB7C
/
FB7D
FB6E
FB7E
0
F
FB95
H
FB96
FB7F
FBA5
X
FBA6
9 I Y
FB87
:
FB88
+ ;
FB79
FB6D
FB6F
FBB FBC FBD FBE
R b
FBA0
5 E U
FB83
FB89
<
FB8A
FB97
J
FB98
K
FB99
L
FB9A
FBA7
Z
FBA8
[
FBA9
\
FBAA
= M ]
FB8B
>
FB8C
?
FB8D
@
FB8E
FB9B
N
FB9C
O
FB9D
FB8F
FBAB
^
FBAC
_
FBAD
P `
FB9E
! 1 A Q FB5F
FBA
FB84
&
FB74
FB5C
D
#
FB71
,
FB5B
C
"
FB70
)
FB5A
B
FB65
FB59
A
FB58
9
FB63
FB57
8
FB62
FB64
FB56
7
FB61
FB54
FB55
6
FB60
FB7
% FB53
4
FB6
FB52
3
Arabic Presentation Forms-A
FB9F
FBD4
FBD5
g
FBD6
h
FBD7
i
FBD8
j
FBD9
k
FBDA
l
FBDB
m
FBDC
n
FBDD
o
FBE4
FBF
FBF0
FBF1
FBF2
FBF4
FBF5
FC01
FC02
FC03
FC04
FC05
x
FBE7
y
FBE8
z
FBE9
FBF6
FBF7
FBF8
FC06
FC07
FC08
FC1
FBF9
FC09
|
FBEB
}
FBEC
FBFA
FBFB
FBFC
~
FBED
FBFD
FBEE
FBFE
a
p
FBEF
FBFF
FC0A
FC0B
FC0C
FC0D
FC0E
FC20
FC21
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
Á
FC30
FC31
£ ³ Ã
FC12
FC22
FC32
¤ ´ Ä
FC13
FC23
¥ µ
FC14
FC24
FC33
Å
FC34
¦ ¶ Æ
FC15
FC25
FC35
§ · Ç
FC16
FC26
¨ ¸
FC17
FC27
© ¹
FC18
FC19
FC1A
¬
FC1B
FC28
º
FC29
»
FC2A
¼
FC2B
½
FC1C
®
FC1D
¯
FC1E
FC2C
¾
FC2D
¿
FC2E
° À
FC0F
FC3
¢ ² Â
FC11
{ «
FBEA
FC2
¡ ±
FC10
ª
FBDE
FBDF
FC00
w
FBE6
FBAE
FBAF
FC0
FBF3
v
FBE5
FC3F
FC1F
FC2F
FC36
È
FC37
É
FC38
Ê
FC39
Ë
FC3A
Ì
FC3B
Í
FC3C
Î
FC3D
Ï
FC3E
Ð
FC3F
477
FC40 FC4
Ñ
0
FC40
Ò
1
ã
ó
FC7
FC70
FC8
FC9
# FC80
FC90
$
FC71
FC81
FC82
FC91
%
FC92
FC42
FC52
FC62
FC72
Ô
ä
ô
&
Õ
å FC54
FC64
FC74
Ö
æ
ö
FC53
FC45
FC55
×
ç
6
FC63
õ
FC65
÷
FC73
FC75
FC56
FC66
FC76
Ø
è
ø
FC47
FC57
FC67
Ù
é
ù
8
FC48
Ú
FC49
Û
FC4A
Ü
B
FC58
ê
FC59
ë
FC5A
ì
FC4B
FC5B
Ý
í
C
FC68
ú
FC69
û
FC6A
ü
FC6B
ý
FC4C
FC5C
FC6C
Þ
î
þ
D
FC4D
ß
E
FC5D
ï
FC6D
FC4E
FC5E
FC6E
à
ð
FC4F
478
FC5F
FC6F
FC83
FC93
'
FC46
7
F
ò
Ó
5
A
â
ñ
FC60
FC61
FC44
9
á
FC50
FC6
FC51
FC43
4
FC5
FC41
2
3
Arabic Presentation Forms-A
FC77
FC78
FC84
FC85
FC86
FC87
FC88
FC7A
FC7B
FC7C
FC89
FC8A
FC8B
FC7D
FC7E
FC7F
C
4
FCA1
5
FCA2
FCB0
D
FCB1
E
FCB2
6 F
FCA3
7
FCB3
G
S
FCC0
T
FCC1
U
FCC2
V
FCC3
W
FCD
c
FCD0
d
FCD1
e
FCD2
f
FCD3
g
FCE
s
FCE0
t
FCE1
u
FCE2
v
FCE3
w
FD0
FD1
£
¤
¥
FCF0
FCF1
FCF2
FD00
FD01
FD02
FD10
FD11
FD12
¦
FCF3
FD03
FD13
§
FCF4
FD04
FD14
FCC4
FCD4
FCE4
(
8
H
X
h
x
¨
y
©
FC95
)
FC96
*
FC97
+
FC98
FC99
-
FC9A
.
FC9B
/
FC8D
FC9D
0 1
FCA5
9
FCB5
I
FCA6
FCB6
:
J
FCA7
;
FCB7
K
FCA8
FCB8
<
L
FCA9
=
FCAA
>
FCAB
?
FCAC
@
FCAD
A
FCB9
M
FCBA
N
FCBB
O
FCBC
P
FCBD
Q
FCC5
Y
FCC6
Z
FCC7
[
FCC8
\
FCC9
]
FCCA
^
FCCB
_
FCCC
`
FCCD
a
FCD5
i
FCD6
j
FCD7
FCE5
FCE6
FCE7
l
FCD9
FCE8
FCE9
n
FCDB
o
FCDC
p
FCDD
q
FCEA
FCF7
FD07
FD17
FCF8
FD08
FD18
FCF9
FD09
FD19
FD0A
FD1A
FCFB
FD0B
FD1B
¯
FCEC
FCFC
FD0C
FD1C
°
FCED
FCFD
FD0D
FD1D
¡ ±
FCDE
FCEE
"
2
B
R
b
r
FCDF
FCFA
~ ®
FCEB
FCCE
FCCF
FD16
m }
FCDA
FCBE
FCBF
FD06
FD15
| ¬
FCAE
FCAF
FCF6
FD05
k { «
FCD8
FC9E
FC9F
FCF5
z ª
FC8E
FC8F
FCF
FCB4
FC9C
!
3
FCA0
FCC
FCA4
FC8C
FCB
FC94
,
FC79
FCA
FD1F
FCEF
FCFE
FD0E
FD1E
¢ ² FCFF
FD0F
FD1F
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
FD20 FD2 0
FDA
FDB
FDC
FDD
FDE
FDF
Ñ á ñ
! 1 A – ‡ I
Ò â ò
" 2 B — · J
FD50
FD51
FD60
FD61
FD70
FD71
FD62
FD72
FD82
Ô äô
FD53
FD63
FD73
FDA1
FD81
Ó ã ó
FD52
FDA0
FD80
FD83
FDB0
FDB1
FD64
FD74
FD84
Ö
æ
ö
FD55
FD65
FD75
FD85
FDC1
FDD0
FDD1
FDE0
FDE1
FDF0
FDF1
# 3 C “ ‚ K
FD92
FDA2
FDB2
FDC2
FDD2
FDE2
FDF2
$ 4 D ” „ L
FD93
FDA3
FDB3
Õ å õ % 5
FD54
FDC0
FD94
FDA4
FDB4
FDC3
FDD3
FDE3
FDF3
E ‘ ‰ M
FDC4
FDD4
FDE4
FDF4
& 6 F’ Â N FD95
FDA5
FDB5
FDC5
FDD5
FDE5
FDF5
× ç ÷ ' 7 G ÷ ÊO
FD56
Ø
FD57
FD66
FD76
FD86
è ø
FD67
FD77
FD87
FD96
FDA6
FDB6
FDC6
FDD6
FDE6
FDF6
( 8 H ◊ Á P
FD97
FDA7
FDB7
FDC7
FDD7
FDE7
FDF7
Ÿ È R
½ Í
Û ë û + ;
⁄ ÍS
¾ Î
Ü ì ü , <
¤ Î T
FD38
FD39
FD3A
FD3B
¿ Ï À
FD3C
Ð
FD3D
FD58
FD59
FD5A
FD5B
FD68
FD69
FD6A
FD6B
FD78
FD79
FD7A
FD7B
Ý í ý
FD5C
FD6C
FD7C
FD88
FD89
FD8A
FD8B
FD98
FD99
FD9A
FD9B
FDA8
FDA9
FDAA
FDAB
-
FD8C
FD9C
FDAC
Þ î þ .
FD5D
FD6D
FD7D
FD8D
FD9D
FDAD
FDB8
FDB9
FDBA
FDBB
FDD8
FDD9
FDDA
FDDB
FDE8
FDE9
FDEA
FDEB
FDDD
FDED
à ð
0 @
fl Ô
FD2F
FD3F
FD5F
FD6F
FD7E
FD7F
FD8E
FD8F
FD9E
FD9F
FDAE
FDAF
FDFB
› Ìs
FDEC
Â
FD6E
FDFA
>
FDBD
FDDC
fi Ó
FD5E
FDF9
‹ Ï U
ß ï / ?
FD3E
FDF8
=
FDBC
Á
FD2E
F
FD9
Ú ê ú * :
FD2D
E
FD8
¼ Ì
FD2C
D
FD7
ÿ Ë Q
FD2B
C
FD6
) 9
FD2A
B
FD5
Ù é ù
FD29
A
FD37
FD4
FDFF
» Ë FD28
9
FD36
º Ê FD27
8
FD35
¹ É FD26
7
FD34
¸ È FD25
6
FD33
· Ç FD24
5
FD32
¶ Æ FD23
4
FD31
µ Å FD22
3
FD30
´ Ä FD21
2
FD3
³ Ã FD20
1
Arabic Presentation Forms-A
FDBE
FDBF
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
FDDE
FDDF
FDFC
FDFD
FDEE
FDEF
479
FB50
Arabic Presentation Forms-A
Preferred characters are found in the Arabic block 0600 06FF. This block also contains 32 noncharacters in the range FDD0 - FDDF.
Glyphs for contextual forms of letters for Persian, Urdu, Sindhi, etc. FB50 FB51 FB52 FB53 FB54 FB55 FB56 FB57 FB58 FB59 FB5A
ARABIC LETTER ALEF WASLA ISOLATED FORM
! " # $ % & ' ( ) *
FB5B + FB5C FB5D FB5E FB5F FB60 FB61 FB62 FB63 FB64 FB65 FB66 FB67 FB68 FB69 FB6A
480
, . / 0 1 2 3 4 5 6 7 8 9 :
FB6B
ARABIC LETTER VEH FINAL FORM
FB6C
ARABIC LETTER VEH INITIAL FORM
FB6D
FB6E
FB6F
FB70
FB71
FB72
FB73
FB74
FB75
FB76
FB77
FB78
FB79
FB7A
FB7B
FB7C
FB7D
FB7E
FB7F
FB80
FB81
FB82
FB83
FB84
FB85
FB86
FB87
FB88
ARABIC LETTER ALEF WASLA FINAL FORM
ARABIC LETTER BEEH ISOLATED FORM
ARABIC LETTER BEEH FINAL FORM
ARABIC LETTER BEEH INITIAL FORM
ARABIC LETTER BEEH MEDIAL FORM
<medial> 067B
ARABIC LETTER PEH ISOLATED FORM
ARABIC LETTER PEH FINAL FORM
ARABIC LETTER PEH INITIAL FORM
ARABIC LETTER PEH MEDIAL FORM
<medial> 067E
ARABIC LETTER BEHEH ISOLATED FORM
ARABIC LETTER BEHEH FINAL FORM
ARABIC LETTER BEHEH INITIAL FORM
ARABIC LETTER BEHEH MEDIAL FORM
<medial> 0680
ARABIC LETTER TTEHEH ISOLATED FORM
ARABIC LETTER TTEHEH FINAL FORM
ARABIC LETTER TTEHEH INITIAL FORM
ARABIC LETTER TTEHEH MEDIAL FORM
<medial> 067A
ARABIC LETTER TEHEH ISOLATED FORM
ARABIC LETTER TEHEH FINAL FORM
ARABIC LETTER TEHEH INITIAL FORM
ARABIC LETTER TEHEH MEDIAL FORM
<medial> 067F
ARABIC LETTER TTEH ISOLATED FORM
ARABIC LETTER TTEH FINAL FORM
ARABIC LETTER TTEH INITIAL FORM
ARABIC LETTER TTEH MEDIAL FORM
<medial> 0679
ARABIC LETTER VEH ISOLATED FORM
FB88
ARABIC LETTER VEH MEDIAL FORM
<medial> 06A4
ARABIC LETTER PEHEH ISOLATED FORM
ARABIC LETTER PEHEH FINAL FORM
ARABIC LETTER PEHEH INITIAL FORM
ARABIC LETTER PEHEH MEDIAL FORM
<medial> 06A6
ARABIC LETTER DYEH ISOLATED FORM
ARABIC LETTER DYEH FINAL FORM
ARABIC LETTER DYEH INITIAL FORM
ARABIC LETTER DYEH MEDIAL FORM
<medial> 0684
ARABIC LETTER NYEH ISOLATED FORM
ARABIC LETTER NYEH FINAL FORM
ARABIC LETTER NYEH INITIAL FORM
ARABIC LETTER NYEH MEDIAL FORM
<medial> 0683
ARABIC LETTER TCHEH ISOLATED FORM
ARABIC LETTER TCHEH FINAL FORM
ARABIC LETTER TCHEH INITIAL FORM
ARABIC LETTER TCHEH MEDIAL FORM
<medial> 0686
ARABIC LETTER TCHEHEH ISOLATED FORM
ARABIC LETTER TCHEHEH FINAL FORM
ARABIC LETTER TCHEHEH INITIAL FORM
ARABIC LETTER TCHEHEH MEDIAL FORM
<medial> 0687
ARABIC LETTER DDAHAL ISOLATED FORM
ARABIC LETTER DDAHAL FINAL FORM
ARABIC LETTER DAHAL ISOLATED FORM
ARABIC LETTER DAHAL FINAL FORM
ARABIC LETTER DUL ISOLATED FORM
ARABIC LETTER DUL FINAL FORM
ARABIC LETTER DDAL ISOLATED FORM
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
FB89
Arabic Presentation Forms-A
FBE0
FB89
U
ARABIC LETTER DDAL FINAL FORM
FBA6
;
ARABIC LETTER HEH GOAL ISOLATED FORM
FB8A
V
ARABIC LETTER JEH ISOLATED FORM
FBA7
<
ARABIC LETTER HEH GOAL FINAL FORM
FB8B
W
ARABIC LETTER JEH FINAL FORM
FBA8
=
FB8C
X
ARABIC LETTER RREH ISOLATED FORM
FBA9
>
FB8D
Y
ARABIC LETTER RREH FINAL FORM
FBAA
?
FB8E
Z
FBAB
@
FB8F
[
FBAC
A
ARABIC LETTER KEHEH MEDIAL FORM
FBAD
B
ARABIC LETTER GAF ISOLATED FORM
<medial> 06BE FBAE C ARABIC LETTER YEH BARREE ISOLATED
FB90
\
FB91
]
FB92
^
FB93
_
FB94
`
FB95
a
FB96
b
FB97
c
FB98
d
FB99
e
FB9A
f
FB9B g FB9C
h
FB9D
i
FB9E
j
FB9F
k
FBA0
l
FBA1
m
FBA2
n
FBA3
o
FBA4
p
FBA5
q
ARABIC LETTER KEHEH ISOLATED FORM
ARABIC LETTER KEHEH FINAL FORM
ARABIC LETTER KEHEH INITIAL FORM
<medial> 06A9
ARABIC LETTER GAF INITIAL FORM
ARABIC LETTER GAF MEDIAL FORM
FBAF
D
ARABIC LETTER GUEH INITIAL FORM
ARABIC LETTER GUEH MEDIAL FORM
<medial> 06B3 "
FBB1
F
FBD3
G
ARABIC LETTER NGOEH FINAL FORM
FBD5
I
ARABIC LETTER NGOEH INITIAL FORM
FBD6
J
ARABIC LETTER NGOEH MEDIAL FORM
FBD7
K
ARABIC LETTER NOON GHUNNA ISOLATED FORM
FBD8
L
ARABIC LETTER NOON GHUNNA FINAL FORM
FBD9
M
ARABIC LETTER RNOON ISOLATED FORM
FBDA
N
ARABIC LETTER RNOON FINAL FORM
FBDB
O
ARABIC LETTER RNOON INITIAL FORM
FBDC
P
ARABIC LETTER RNOON MEDIAL FORM
FBDD
Q
ARABIC LETTER HEH WITH YEH ABOVE ISOLATED FORM
FBDE
R
FBDF
S
FBE0
T
<medial> 06B1 #
<medial> 06BB %
ARABIC LETTER HEH WITH YEH ABOVE FINAL FORM
<medial> 06C1
ARABIC LETTER HEH DOACHASHMEE ISOLATED FORM
ARABIC LETTER HEH DOACHASHMEE FINAL FORM
ARABIC LETTER HEH DOACHASHMEE INITIAL FORM
ARABIC LETTER HEH DOACHASHMEE MEDIAL FORM
ARABIC LETTER YEH BARREE WITH HAMZA ABOVE FINAL FORM
Glyphs for contextual forms of letters for Central Asian languages FBD4 H
ARABIC LETTER HEH GOAL MEDIAL FORM
ARABIC LETTER NGOEH ISOLATED FORM
ABOVE ISOLATED FORM
ARABIC LETTER GUEH FINAL FORM
ARABIC LETTER HEH GOAL INITIAL FORM
<medial> 06AF !
ARABIC LETTER GUEH ISOLATED FORM
FORM
ARABIC LETTER GAF FINAL FORM
ARABIC LETTER NG ISOLATED FORM
ARABIC LETTER NG FINAL FORM
ARABIC LETTER NG INITIAL FORM
ARABIC LETTER NG MEDIAL FORM
<medial> 06AD
ARABIC LETTER U ISOLATED FORM
ARABIC LETTER U FINAL FORM
ARABIC LETTER OE ISOLATED FORM
ARABIC LETTER OE FINAL FORM
ARABIC LETTER YU ISOLATED FORM
ARABIC LETTER YU FINAL FORM
ARABIC LETTER U WITH HAMZA ABOVE ISOLATED FORM
ARABIC LETTER VE ISOLATED FORM
ARABIC LETTER VE FINAL FORM
ARABIC LETTER KIRGHIZ OE ISOLATED FORM
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
481
FBE1
Arabic Presentation Forms-A
FBE1
ARABIC LETTER KIRGHIZ OE FINAL FORM
FBE2
ARABIC LETTER KIRGHIZ YU ISOLATED FORM
FBE3
FBE4
FBE5
FBE6
FBE7
FBE8 FBE9
FBEB
FBEC
FBED FBEE
FBEF
FBF0
FBF1
FBF2
FBF3
FBF4
FBF5
FBF6
FBF7
ARABIC LETTER E ISOLATED FORM
s
FBFA t
ARABIC LETTER E FINAL FORM
ARABIC LETTER E INITIAL FORM
FBFB
u
FBFC
v
FBFD
w
FBFE
x
FBFF
y
FC00
z
FC01
{
FC02
|
FC03
}
FC04
~
FC05
FC06
FC07
FC08
FC09
FC0A
FC0B
FC0C
FC0D
ARABIC LETTER E MEDIAL FORM
<medial> 06D0 (
ARABIC LETTER UIGHUR KAZAKH KIRGHIZ ALEF MAKSURA INITIAL FORM
ARABIC LETTER UIGHUR KAZAKH KIRGHIZ ALEF MAKSURA MEDIAL FORM
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH ALEF ISOLATED FORM
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH ALEF FINAL FORM
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH AE ISOLATED FORM
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH AE FINAL FORM
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH WAW ISOLATED FORM
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH WAW FINAL FORM
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH U ISOLATED FORM
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH U FINAL FORM
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH OE ISOLATED FORM
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH OE FINAL FORM
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH YU ISOLATED FORM
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH YU FINAL FORM
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH E ISOLATED FORM
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH E FINAL FORM
482
FBF9
Ligatures (two elements)
r
ARABIC LETTER KIRGHIZ YU FINAL FORM
<medial> 0649 ) FBEA
FBF8
FC0D
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH E INITIAL FORM
ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM
ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA FINAL FORM
ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA INITIAL FORM
ARABIC LETTER FARSI YEH ISOLATED FORM
ARABIC LETTER FARSI YEH FINAL FORM
ARABIC LETTER FARSI YEH INITIAL FORM
ARABIC LETTER FARSI YEH MEDIAL FORM
<medial> 06CC *
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH JEEM ISOLATED FORM
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH HAH ISOLATED FORM
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH MEEM ISOLATED FORM
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH YEH ISOLATED FORM
ARABIC LIGATURE BEH WITH JEEM ISOLATED FORM
ARABIC LIGATURE BEH WITH HAH ISOLATED FORM
ARABIC LIGATURE BEH WITH KHAH ISOLATED FORM
ARABIC LIGATURE BEH WITH MEEM ISOLATED FORM
ARABIC LIGATURE BEH WITH ALEF MAKSURA ISOLATED FORM
ARABIC LIGATURE BEH WITH YEH ISOLATED FORM
ARABIC LIGATURE TEH WITH JEEM ISOLATED FORM
ARABIC LIGATURE TEH WITH HAH ISOLATED FORM
ARABIC LIGATURE TEH WITH KHAH ISOLATED FORM
The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.
FC0E FC0E
µ
FC0F
¶
FC10
·
FC11
¸
FC12
¹
FC13
º
FC14
»
FC15
¼
FC16
½
FC17
¾
FC18
¿
FC19
À
FC1A
Á
FC1B
Â
FC1C Ã FC1D Ä FC1E
Å
FC1F Æ FC20
Ç
FC21
È
FC22
É
FC23
Ê
Arabic Presentation Forms-A ARABIC LIGATURE TEH WITH MEEM ISOLATED FORM
FC24
ARABIC LIGATURE TEH WITH ALEF MAKSURA ISOLATED FORM
FC25
ARABIC LIGATURE TEH WITH YEH ISOLATED FORM
FC26
¡
ARABIC LIGATURE THEH WITH JEEM ISOLATED FORM
FC27
¢
ARABIC LIGATURE THEH WITH MEEM ISOLATED FORM
FC28
£
ARABIC LIGATURE THEH WITH ALEF MAKSURA ISOLATED FORM
FC29
¤
ARABIC LIGATURE THEH WITH YEH ISOLATED FC2A FORM
¥
ARABIC LIGATURE JEEM WITH HAH ISOLATED FORM
FC2B
¦
ARABIC LIGATURE JEEM WITH MEEM ISOLATED FORM
FC2C
§
ARABIC LIGATURE HAH WITH JEEM ISOLATED FORM
FC2D
¨
ARABIC LIGATURE HAH WITH MEEM ISOLATED FORM
FC2E
©
ARABIC LIGATURE KHAH WITH JEEM ISOLATED FORM
FC2F
ª
ARABIC LIGATURE KHAH WITH HAH ISOLATED FORM
FC30
«
ARABIC LIGATURE KHAH WITH MEEM ISOLATED FORM
FC31
¬
ARABIC LIGATURE SEEN WITH JEEM ISOLATED FORM
FC32
ARABIC LIGATURE SEEN WITH HAH ISOLATED FORM
FC33
®
ARABIC LIGATURE SEEN WITH KHAH ISOLATED FORM
FC34
¯
ARABIC LIGATURE SEEN WITH MEEM ISOLATED FORM
FC35
°
ARABIC LIGATURE SAD WITH HAH ISOLATED FORM
FC36
±
ARABIC LIGATURE SAD WITH MEEM ISOLATED FORM
FC37
²
ARABIC LIGATURE DAD WITH JEEM ISOLATED FORM
FC38
³
ARABIC LIGATURE DAD WITH HAH ISOLATED FORM
FC39
´