Unicode Standard, Version 5.0, The

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online...

Author: The Unicode Consortium

23 downloads 1841 Views 47MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Praise for The Unicode Standard, Version 5.0 “The world is a global village, trade crosses language barriers, and yet every one of us likes to feel comfortable within their own mother tongue. Unicode enabled us to give the local sense to every one of our users, while connecting the world of trade—which is the reason we will support Unicode in all of our products.” —Shai Agassi, Member SAP Executive Board “The W3C was founded to develop common protocols to lead the evolution of the World Wide Web. The path W3C follows to making text on the Web truly global is Unicode. Unicode is fundamental to the work of the W3C; it is a component of W3C Specifications, from the early days of HTML, to the growing XML Family of specifications and beyond.” —Sir Tim Berners-Lee, KBE Web Inventor and Director of the World Wide Web Consortium (W3C) “The IETF has made the Unicode-compatible UTF-8 format of ISO 10646 the basis for its preferred default character encoding for internationalization of Internet application protocols, so I am delighted to see the official release of Unicode 5.0.” —Brian E. Carpenter, Chair, Internet Engineering Task Force Distinguished Engineer, Internet Standards & Technology, IBM “Google’s objective is to organize the world’s information and to make it accessible. Unicode plays a central role in this effort because it is the principal means by which content in every language can be represented in a form that can be processed by software. As Unicode extends its coverage of the world’s languages, it helps Google accomplish its mission.” —Vint Cerf, Chief Internet Evangelist Google, Inc. “Unicode Standard Version 5.0 is a great milestone for the Unicode Standard, which has been critical to computing since it was first published in 1991. With extended script and character support, this new version will help us bridge the digital divide by enabling more people to access computing in the language they use every day. The comprehensive set of mathematics symbols simplifies support for technical documents in business software. For more than a decade, Unicode has been a foundation for many Microsoft products and technologies: Unicode Standard Version 5.0 will help us deliver important new benefits to users.” —Bill Gates, Chairman Microsoft Corporation

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

5

Unicode .o THE

STANDARD

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Acknowledgments The production of The Unicode Standard, Version 5.0, is due to the dedication of many people over several years. We would like to acknowledge the following individuals, whose major contributions were central to the design, authorship, and review of this book. Julie D. Allen was responsible for the editing of the book. As Senior Editor and Project Manager, she contributed to the rewriting of many of the script descriptions and managed the general project schedule for the completion of the book. Julie led the updating of the glossary and the coordination with the publisher, graphic artist, and other contributors. Joe Becker created the original Unicode prospectus and continued as contributing editor for this version. Richard Cook contributed to maintaining and updating the Unihan database and its documentation. He also served as a Unicode Consortium representative to the IRG. Mark Davis was essential to the development of Version 5.0. Mark led many aspects of overall design of the Unicode Standard. He contributed significant revisions and enhancements to the statement of conformance, casing behavior, the stability of programmatic identifiers, text boundaries, bidirectional behavior, implementation guidelines, normalization, and the addition of properties to the Unicode Character Database. Mark is the author of three of the Unicode Standard Annexes, is a co-author of two others, and was a major contributor in defining Unicode security mechanisms. Michael Everson was the driving force behind encoding many of the minority and historic scripts that were added in Version 5.0 and was a major contributor to their script descriptions. These scripts include Balinese, Coptic, Glagolitic, N’Ko, New Tai Lue, Old Persian, Phoenician, and Sumero-Akkadian Cuneiform. Michael provided many of the fonts used in this standard and extensively reviewed code charts, character names, and annotations. Asmus Freytag made significant contributions to the general structure and property chapters, and continued his focus on symbols. He led the updates to punctuation, symbols, and special areas and format characters, and he also made contributions to European alphabetic scripts, bidirectional behavior, and line breaking properties. He designed a number of additional figures and suggested improvements to many others. Asmus drove the effort to define the Unicode character property model, was the author of two Unicode Standard Annexes, and is a co-author of one other. He was instrumental in incorporating the annexes into the book. He also created custom formatting software, negotiated font donations, and produced the code charts.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

xl

Acknowledgments

John H. Jenkins, as Unicode Consortium representative to the IRG, contributed to the maintenance and extension of the Unihan database, extended the Han radical-stroke index to the ideographic content of the standard, and prepared the radical-stroke index of the IICore Han subset. John was also responsible for maintaining the Han cross-reference tables and contributed fonts for KangXi and CJK radicals. Mike Ksar, as Convener of JTC1/SC2/WG2 and SC2 liaison, led the effort to synchronize Version 5.0 and ISO/IEC 10646:2003 Amendments 1 and 2. He contributed to Middle Eastern scripts, Appendix C, and he thoroughly reviewed all of the newly added scripts and ensured they were well documented in both standards. Rick McGowan coordinated the work to encode new scripts and contributed to the editing of many of the new script descriptions. He revised or drew more than 100 figures in the book and was responsible for mastering and producing the CD-ROM. Lisa Moore, as Chair of the Unicode Technical Committee, oversaw the content of Version 5.0. She edited the Kharoshthi description, rewrote Appendix D and much of the front matter, and contributed to the general editing of the text. Eric Muller thoroughly reviewed all chapters of the book, making many improvements in the clarity and consistency of the text. He contributed to the validation of Unihan data and provided critical PDF expertise. Markus Scherer thoroughly reviewed the general structure and conformance chapters of Unicode Version 5.0, contributed significant updates to the implementation guidelines found in the standard, and provided a painstakingly thorough verification of properties. Michel Suignard was a leader in the synchronization of Unicode and ISO/IEC 10646 through his role as Project Editor for 10646. He was responsible for editing ISO/IEC 10646: 2003, Amendments 1 and 2, and thus provided the foundation for the seamless coordination with the publication of Unicode Versions 4.1 and 5.0. Michel added IRG sources to Unihan and was a major contributor in defining Unicode security mechanisms. Ken Whistler was the managing editor of Version 5.0. He led the effort to redesign the book to a smaller size while including expanded script descriptions and all of the Unicode Standard Annexes. He had responsibility for all aspects of production and verified the accuracy and quality of all updates to the text. Ken meticulously updated the Unicode Character Database, adding all of the new characters and some of their properties. He also maintained the Character Names List and supplied many of the annotations. Ken led the rewriting of the parts of the general structure and conformance chapters related to combining classes and the application of combining marks, as well as the renumbering of conformance clauses and definitions. *** Fonts were essential for the production of this book. Asmus Freytag worked to acquire and organize the font collection with support from Michael Everson, further developing the original collection of fonts for Unicode 2.0 assembled by John Jenkins. In addition to the individuals mentioned previously, and the companies and organizations named in the col-

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

Acknowledgments

xli

ophon, fonts were contributed by Patrick Andries (Tifinagh), Cora Chang (Braille), Oliver Corff (Yi), Anton Dumbadze and Irakli Garibashvili (Georgian), Andrew Glass (Kharoshthi), Yannis Haralambous (Greek, Syriac, and Thai), George Kiraz (Syriac), Svante Lagman (Runic), Raymond Mercier (Greek zero), Stephen Morey and Michael Everson (Tai Le), Paul Nelson and Sarmad Hussain (Syriac and Sindhi/Urdu numbers), David Perry and James Kass (Greek musical symbols), Peter Martin (Phonetic Additions), Hector Santos (Philippine scripts), Yayasan Bali Simbar (Balinese), Ngakham Southichack (Lao), Michael Stone (Armenian), Steve Tinney and Michael Everson (Cuneiform), Dirk VanDamme (Coptic), Al Webster (Cherokee), Andrew West (Phags-pa), and K. Yarang, J. R. Pandhak, Y. Lawoti, and Y. P. Yakwa (Limbu). Michael Everson (Evertype) provided fonts for Canadian Syllabics, Osmanya, many historic scripts (including Kharoshthi, Linear B, Ogham, and Old Persian), symbols, and Latin, Greek, and Cyrillic characters. John M. Fiscella (Production First Software) designed fonts for symbols and many of the alphabetic scripts. Yang Song Jin of the Pyongyang Informatics Centre (DPR of Korea) provided the CJK compatibility symbols. Thomas Milo (DecoType) designed the Arabic font. SIL contributed several fonts designed by Jonathan Kew (Arabic Additions), Peter Martin (Phonetic Additions), as well as Victor Gaultney (New Tai Lue). The fonts for CJK Extensions A and B were provided by Beijing Zhong Yi (Zheng Code) Electronics Company. Extension A was designed by Technical Supervisor Zheng Long and Hua Weicang. Asmus Freytag created many individual glyphs for symbols or special characters. Critical comments and work on fonts are due to Heidi Jenkins, Dr. Virach Sornlertlamvanich (Thai), Dr. Sarmad Hussain (Urdu), Roozbeh Pournader (Farsi), Barbara Beeton and Patrick Ion (Mathematical Symbols), and many others. Many individuals and organizations provided additional fonts used during the development of Version 5.0. New figures enhanced the text significantly. Grenfel (1921), Austin (1973), and Allen (1931) were used as sources to draw the large figure for Greek editorial marks. Parisian Schola Cantorum and Hymns of Faith were the sources used for Arabic musical passages. The Kharoshthi map in Figure 10-5 was adapted from Glass (2000). Steve Mehallo designed the cover for the book. He also updated existing chapter divider artwork and designed additional new artwork for Version 5.0. Kamal Mansour was instrumental in the graphic design process and continued his longstanding support in coordinating the cover design of this book. Monotype Imaging generously sponsored the cost of the cover design, the CD-ROM design, and updates to the chapter divider artwork for Version 5.0. The development of this book would not have been possible without the support of the office staff of Unicode, Inc., and the work of Mike Kernaghan, as operational manager of the Unicode office. We thank Magda Danish, who helped with Version 5.0 in countless ways, including assistance in editing of the Unicode Standard Annexes, painstaking proofing of Pinyin data for the Unihan database, and additions and corrections for the technical references. We also thank Sarasvati, who minded the mailing lists. We especially wish to thank Microsoft for its generous support in providing office space.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

xlii

Acknowledgments

The text, code charts, and data were reviewed critically by experts. The Editorial Committee appreciates the expert contributions and feedback provided for specific scripts: Barbara Beeton (mathematical symbols), Peter Constable (New Tai Lue and phonetic extensions), Roozbeh Pournader (Arabic), Lorna Priest (Cyrillic), Andrew West (Phags-pa, Mongolian and Yi), and the members of the International Forum for Information Technology in Tamil, INFITT (Tamil), in addition to Kent Karlsson and many others. Thomas Bishop checked the data for CJK ideographs and contributed stroke data for sorting the radicalstroke index. We also wish to acknowledge the inestimable contribution by Patrick Andries for the French translation of the character chart annotations for Unicode 4.1, along with help from François Yergeau, Alain LaBonté, Jacques André, and other reviewers. A number of individuals contributed to the better representation of Indic scripts in Version 5.0: Stefan Baums (Devanagari), Gihan Dias (Sinhala), Naga Ganesan (Malayalam), Manoj Jain (improvement of the overall Indic text), Gautam Sengupta (Bengali), Sukhjinder Sidhu (Gurmukhi), K. G. Sulochana (Malayalam), and Om Vikas (improvement of the overall Indic text). New characters were added, script descriptions were improved, many annotations were added, and a systematization of the approach to encoding was established. The work to develop and verify the consistency of many of the character properties and algorithms was a significant contribution to Version 5.0. An important role in this effort was played by the International Components for Unicode (ICU) team, including the following individuals: Min Cui, Mark Davis, John Emmons, Doug Felt, Deborah Goldsmith, Andy Heninger, Qian Jing, Yan Xuan Liang, Alan Liu, Steven Loomis, Eric Mader, George Rhoten, Markus Scherer, Bei Shu, William Sullivan, Raghuram Viswanadha, and Vladimir Weinstein. In addition, Kent Karlsson carefully reviewed properties. The growth of the synchronized character repertoires of the Unicode Standard and International Standard ISO/IEC 10646 reflects a worldwide effort conducted over a number of years. For Version 5.0, a number of universities and research institutes contributed many excellent proposals for the encoding of minority and historic scripts. The Script Encoding Initiative, University of California at Berkeley, led by Deborah Anderson with the assistance of Rick McGowan, secured funding and created proposals for many historic and minority scripts. Major funders of this effort include the National Endowment for the Humanities, the N’Ko Institute of America and Mamady Doumbouya, the Society of Biblical Literature, Association Manden, and UNESCO (Communication & Information Sector, Initiative B@bel). Other universities and research institutes to which the Unicode Consortium is much indebted include Thesaurus Linguae Graecae Project, University of California, Irvine, the Initiative for Cuneiform Encoding (ICE), Johns Hopkins University, and the International Association for Coptic Studies. With Version 5.0, the Unicode Standard encodes all of the major modern scripts and a significant number of historic and minority scripts. We express deep appreciation to the following experts who shared their specialized knowledge to bring about this achievement:

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

Acknowledgments

xliii

• For Arabic additions: Jonathan Kew, Michael Everson, and Roozbeh Pournader. • For Balinese: Michael Everson, Made Suatjana, also thanks are due to Ida Bagus Adi Sudewa, I Nyoman Suarka, Donny Harimurti, Tudy Harimurti, and Nyoman Sugiarta. Unicode gratefully acknowledges support from UNESCO, the National Endowment for the Humanities, and the Yayasan Bali Galang (Bright Bali Foundation), which organized the technical discussion sessions in Bali. • For Buginese: Michael Everson. • For CJK ideographs, symbol, and mark additions: China National Information Technology Standardization Technical Committee, Christopher Cullen, Deborah Goldsmith, John Jenkins, Eric Muller, Michel Suignard, and Andrew West. • For Coptic: Michael Everson, Gerald Browne, Stephen Emmel, and the International Association for Coptic Studies. • For Cyrillic additions: Lorna Priest. • For Ethiopic additions: Daniel Yacob. • For Georgian additions: Michael Everson, Georgian State Department of Information Technology, David Tarkhan-Mouravi (Chair), and Jost Gippert. • For Glagolitic: Michael Everson and Ralph Cleminson. • For Greek: Maria Pantelia, Deborah Anderson, Nick Nicholas, and Richard Peevers. • For Hebrew additions: Peter Constable, Michael Everson, Peter Kirk, and Mark Shoulson. • For Indic additions: Government of India, Ministry of Information Technology, Om Vikas and Manoj Jain, INFITT, Michael Kaplan, and Peter Constable. • For Kannada and Devanagari additions: Michael Everson. • For Kharoshthi: Andrew Glass, Stefan Baums, and Richard Salomon. • For Latin additions: Peter Constable, Mark Davis, Michael Everson, Chris Harvey, Jonathan Kew, and Lorna Priest. • For Mongolian: Andrew West. • For New Tai Lue: China National Information Technology Standardization Technical Committee and Michael Everson. • For N’Ko: Michael Everson, Mamadi Doumbouya, Mamadi Baba Diané, and Karamo Kaba Jammeh. Unicode gratefully acknowledges support from UNESCO Initiative B@bel, N’Ko Institute of America and Mamady Doumbouya, and Association Manden. • For Old Persian: Michael Everson.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Contents List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii Foreword by Mark Davis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxiii

1

2

Why Buy This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxiii Why Upgrade to Version 5.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xxxiv Organization of This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xxxiv Unicode Standard Annexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xxxvi The Unicode Character Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvii Unicode Technical Standards and Unicode Technical Reports . . . . . . . . xxxvii On the CD-ROM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvii Updates and Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxviii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxix Unicode Consortium Members . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlvii Unicode Consortium Liaison Members. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlix Unicode Consortium Board of Directors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . l Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Standards Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 New Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Text Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Text Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 General Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1 Architectural Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Basic Text Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Text Elements, Characters, and Text Processes . . . . . . . . . . . . . . . . . . . . . . . . 10 Text Processes and Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Unicode Design Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Universality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Characters, Not Glyphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Plain Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Logical Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Unification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Dynamic Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

x

Contents

2.3

2.4 2.5

2.6 2.7 2.8

2.9

2.10 2.11

2.12 2.13

2.14

Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Convertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Compatibility Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Compatibility Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Compatibility Decomposable Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Mapping Compatibility Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Code Points and Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Types of Code Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Encoding Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 UTF-32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 UTF-16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 UTF-8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Comparison of the Advantages of UTF-32, UTF-16, and UTF-8 . . . . . . . . . 33 Encoding Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Unicode Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Unicode Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Allocation Areas and Character Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Assignment of Code Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Details of Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Plane 0 (BMP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Plane 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Plane 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Other Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Writing Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Combining Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Sequence of Base Characters and Diacritics . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Multiple Combining Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Ligated Multiple Base Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Exhibiting Nonspacing Marks in Isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 “Characters” and Grapheme Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Equivalent Sequences and Normalization . . . . . . . . . . . . . . . . . . . . 54 Special Characters and Noncharacters . . . . . . . . . . . . . . . . . . . . . . . . 57 Special Noncharacter Code Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Byte Order Mark (BOM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Layout and Format Control Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 The Replacement Character . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Control Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Conforming to the Unicode Standard. . . . . . . . . . . . . . . . . . . . . . . . . 59 Characteristics of Conformant Implementations . . . . . . . . . . . . . . . . . . . . . . 59 Unacceptable Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Acceptable Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Supported Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

Contents

3

xi

Conformance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.1 Versions of the Unicode Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.2

3.3 3.4 3.5

3.6 3.7 3.8 3.9

Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Version Numbering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Errata and Corrigenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 References to the Unicode Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Precision in Version Citation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 References to Unicode Character Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 69 References to Unicode Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Conformance Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Code Points Unassigned to Abstract Characters . . . . . . . . . . . . . . . . . . . . . . . 70 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Character Encoding Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Character Encoding Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Bidirectional Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Normalization Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Normative References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Unicode Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Default Casing Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Unicode Standard Annexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Character Identity and Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Characters and Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Types of Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Property Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Classification of Properties by Their Values . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Normative and Informative Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Context Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Stability of Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Simple and Derived Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Property Aliases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Private Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Decomposition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Compatibility Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Canonical Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Surrogates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Unicode Encoding Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 UTF-32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 UTF-16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

xii

4

5

Contents

UTF-8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Encoding Form Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 3.10 Unicode Encoding Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 3.11 Canonical Ordering Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Application of Combining Marks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Combining Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Canonical Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 3.12 Conjoining Jamo Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Hangul Syllable Boundary Determination . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Standard Korean Syllables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Hangul Syllable Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Hangul Syllable Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Hangul Syllable Name Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 3.13 Default Case Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Default Case Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Default Case Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Default Caseless Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Character Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 4.1 Unicode Character Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 4.2 Case—Normative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Case Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 4.3 Combining Classes—Normative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Reordrant, Split, and Subjoined Combining Marks . . . . . . . . . . . . . . . . . . . 134 4.4 Directionality—Normative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 4.5 General Category—Normative. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 4.6 Numeric Value—Normative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Ideographic Numeric Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 4.7 Bidi Mirrored—Normative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 4.8 Name—Normative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 4.9 Unicode 1.0 Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 4.10 Letters, Alphabetic, and Ideographic. . . . . . . . . . . . . . . . . . . . . . . . . 144 4.11 Properties Related to Text Boundaries . . . . . . . . . . . . . . . . . . . . . . . 145 4.12 Characters with Unusual Properties . . . . . . . . . . . . . . . . . . . . . . . . . 145 Implementation Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 5.1 Transcoding to Other Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Multistage Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 5.2 Programming Languages and Data Types . . . . . . . . . . . . . . . . . . . 153 Unicode Data Types for C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

Contents

xiii

5.3

Unknown and Missing Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

5.4 5.5 5.6 5.7 5.8

5.9 5.10 5.11 5.12 5.13 5.14 5.15 5.16

5.17 5.18

Reserved and Private-Use Character Codes . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Interpretable but Unrenderable Characters . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Default Property Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Default Ignorable Code Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Interacting with Downlevel Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Handling Surrogate Pairs in UTF-16 . . . . . . . . . . . . . . . . . . . . . . . . . 157 Handling Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Newline Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Line Separator and Paragraph Separator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 Language Information in Plain Text . . . . . . . . . . . . . . . . . . . . . . . . . 166 Requirements for Language Tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 Language Tags and Han Unification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 Editing and Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Consistent Text Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Strategies for Handling Nonspacing Marks . . . . . . . . . . . . . . . . . . 169 Keyboard Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Truncation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Rendering Nonspacing Marks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Canonical Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Positioning Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 Locating Text Element Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . 178 Identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Sorting and Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Culturally Expected Sorting and Searching . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Language-Insensitive Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Sublinear Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Binary Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 UTF-8 in UTF-16 Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 UTF-16 in UTF-8 Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Case Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Titlecasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Complications for Case Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Reversibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Caseless Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

xiv

6

7

Contents

5.19 Unicode Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 5.20 Default Ignorable Code Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Writing Systems and Punctuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 6.1 Writing Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 6.2 General Punctuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 Blocks Devoted to Punctuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 Format Control Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 Space Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Dashes and Hyphens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Paired Punctuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 Language-Based Usage of Quotation Marks . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Apostrophes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Other Punctuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 Archaic Punctuation and Editorial Marks . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 Indic Punctuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 CJK Punctuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Unknown or Unavailable Ideographs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 CJK Compatibility Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 European Alphabetic Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 7.1 Latin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 Letters of Basic Latin: U+0041–U+007A . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 Letters of the Latin-1 Supplement: U+00C0–U+00FF . . . . . . . . . . . . . . . . . 230 Latin Extended-A: U+0100–U+017F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 Latin Extended-B: U+0180–U+024F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 IPA Extensions: U+0250–U+02AF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Phonetic Extensions: U+1D00–U+1DBF . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 Latin Extended Additional: U+1E00–U+1EFF . . . . . . . . . . . . . . . . . . . . . . . 235 Latin Extended-C: U+2C60–U+2C7F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 Latin Extended-D: U+A720–U+A7FF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 Latin Ligatures: U+FB00–U+FB06 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 7.2 Greek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 Greek: U+0370–U+03FF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 Greek Extended: U+1F00–U+1FFF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Ancient Greek Numbers: U+10140–U+1018F . . . . . . . . . . . . . . . . . . . . . . . . 242 7.3 Coptic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 7.4 Cyrillic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 Cyrillic: U+0400–U+04FF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 Cyrillic Supplement: U+0500–U+052F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 7.5 Glagolitic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 7.6 Armenian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 7.7 Georgian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Figures Figure 1-1. Figure 1-2. Figure 2-1. Figure 2-2. Figure 2-3. Figure 2-4. Figure 2-5. Figure 2-6. Figure 2-7. Figure 2-8. Figure 2-9. Figure 2-10. Figure 2-11. Figure 2-12. Figure 2-13. Figure 2-14. Figure 2-15. Figure 2-16. Figure 2-17. Figure 2-18. Figure 2-19. Figure 2-20. Figure 2-21. Figure 2-22. Figure 2-23. Figure 2-24. Figure 2-25. Figure 3-1. Figure 4-1. Figure 5-1. Figure 5-2. Figure 5-3. Figure 5-4. Figure 5-5. Figure 5-6. Figure 5-7. Figure 5-8. Figure 5-9.

Wide ASCII. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Unicode Compared to the 2022 Framework. . . . . . . . . . . . . . . . . . . . . . . 5 Text Elements and Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Characters Versus Glyphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Unicode Character Code to Rendered Glyphs . . . . . . . . . . . . . . . . . . . . 17 Bidirectional Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Writing Direction and Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Typeface Variation for the Bone Character . . . . . . . . . . . . . . . . . . . . . . . 22 Dynamic Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Codespace and Encoded Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Overlap in Legacy Mixed-Width Encodings . . . . . . . . . . . . . . . . . . . . . . 29 Boundaries and Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Unicode Encoding Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Unicode Encoding Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Unicode Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Allocation on the BMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Allocation on Plane 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Writing Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Combining Enclosing Marks for Symbols. . . . . . . . . . . . . . . . . . . . . . . . 49 Sequence of Base Characters and Diacritics . . . . . . . . . . . . . . . . . . . . . . 49 Properties and Combining Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Reordered Indic Vowel Signs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Stacking Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Ligated Multiple Base Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Equivalent Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Canonical Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Types of Decomposables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Enclosing Marks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Positions of Common Combining Marks . . . . . . . . . . . . . . . . . . . . . . . 134 Two-Stage Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 CJK Ideographic Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 Consistent Character Boundaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 Dead Keys Versus Handwriting Sequence. . . . . . . . . . . . . . . . . . . . . . . 171 Truncating Grapheme Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Inside-Out Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Fallback Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Bidirectional Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Tables Table 2-1. Table 2-2. Table 2-3. Table 2-4. Table 2-5. Table 2-6. Table 3-1. Table 3-2. Table 3-3. Table 3-4. Table 3-5. Table 3-6. Table 3-7. Table 3-8. Table 3-9. Table 3-10. Table 3-11. Table 3-12. Table 3-13. Table 3-14. Table 3-15. Table 4-1. Table 4-2. Table 4-3. Table 4-4. Table 4-5. Table 4-6. Table 4-7. Table 4-8. Table 4-9. Table 4-10. Table 5-1. Table 5-2. Table 5-3. Table 5-4. Table 5-5. Table 5-6. Table 6-1.

The 10 Unicode Design Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 User-Perceived Characters with Multiple Code Points . . . . . . . . . . . . . 16 Types of Code Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 The Seven Unicode Encoding Schemes . . . . . . . . . . . . . . . . . . . . . . . . . 36 Interaction of Combining Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Nondefault Stacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Named Unicode Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Normative Character Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Informative Character Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Examples of Unicode Encoding Forms . . . . . . . . . . . . . . . . . . . . . . . . . 101 UTF-16 Bit Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 UTF-8 Bit Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Well-Formed UTF-8 Byte Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Summary of UTF-16BE, UTF-16LE, and UTF-16 . . . . . . . . . . . . . . . . 106 Summary of UTF-32BE, UTF-32LE, and UTF-32 . . . . . . . . . . . . . . . . 108 Sample Combining Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Canonical Ordering Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Hangul Syllable No-Break Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Korean Syllable Break Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Context Specification for Casing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Case Detection Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Sources for Case Mapping Information . . . . . . . . . . . . . . . . . . . . . . . . 133 Class Zero Combining Marks—Reordrant . . . . . . . . . . . . . . . . . . . . . 135 Thai and Lao Logical Order Exceptions . . . . . . . . . . . . . . . . . . . . . . . . 136 Class Zero Combining Marks—Split . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Class Zero Combining Marks—Subjoined . . . . . . . . . . . . . . . . . . . . . . 137 Class Zero Combining Marks—Strikethrough . . . . . . . . . . . . . . . . . . 137 General Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Primary Numeric Ideographs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Ideographs Used as Accounting Numbers . . . . . . . . . . . . . . . . . . . . . . 141 Unusual Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Hex Values for Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 NLF Platform Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Typing Order Differing from Canonical Order . . . . . . . . . . . . . . . . . . 175 Permuting Combining Class Weights . . . . . . . . . . . . . . . . . . . . . . . . . . 176 Casing and Normalization in Strings . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Paired Stateful Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 Typology of Scripts in the Unicode Standard . . . . . . . . . . . . . . . . . . . 201

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Preface This book, The Unicode Standard, Version 5.0, together with the Unicode Character Database, is the authoritative source of information on Version 5.0 of the Unicode character encoding standard. Version 5.0 of the standard is a significant departure from prior versions. It lays out much clearer requirements for supporting Unicode and provides more explicit guidance for implementers to quickly embrace the proliferation of new growth technologies and emerging markets while at the same time meeting users’ needs for secure, robust software.

Why Buy This Book In a major enhancement, Version 5.0 of the Unicode Standard is now available in a smaller, more convenient size while including much more textual content. Most notably, for the first time the book includes all of the Unicode Standard Annexes, which provide specifications for vital processes such as text normalization, bidirectional handling, and identifier parsing. Version 5.0 contains the knowledge gained from many years of worldwide implementation experience and has been enhanced significantly: the text incorporates 15 years of user feedback, provides thorough answers to the many questions users of Unicode have raised, and is much more accessible—with greatly improved figures and tables, and with the text revised for clarity. • Four-fifths of the figures are new. • Two-thirds of the definitions are new. • One-half of the Unicode Standard Annexes are new. • One-third of the conformance clauses are new. • One-fourth of the tables are new. In addition, the text of Version 5.0 reflects advances in the computer implementation of writing systems. It substantially improves the descriptions of rendering Indic scripts to meet the demands of this area of growing market importance—Unicode-based implementations are supported by the government of India, and this book explains how to build them. Version 5.0 also highlights the newly established core CJK subset of characters, IICore, which is critical for rendering and interoperability in the East Asian market. In short, The Unicode Standard, Version 5.0, enables developers to implement quickly the latest advances for worldwide software users while opening new opportunities in highgrowth markets. The changes from Versions 3.0 and 4.0 to Version 5.0 are major and important—this is the one book all Unicode implementers must have.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

xxxiv

Preface

Why Upgrade to Version 5.0 Version 5.0 of the Unicode Standard brings significant improvements beyond Versions 3.0 and 4.0. The industry has noticed and is quickly moving to Version 5.0—Windows Vista runs on 5.0; ICU, Google, and Yahoo! all have plans to upgrade to 5.0. Internet and W3C protocols are built on Unicode and are continually adapting to the latest versions. The International Standard ISO/IEC 10646 is also synchronized with Version 5.0. This latest version of the Unicode Standard is the basis for Unicode security mechanisms, the Unicode collation algorithm, the locale data provided by the Common Locale Data Repository, and support for Unicode in regular expressions. Improved expression of the Unicode encoding model makes it much clearer how implementers need to support the representation of Unicode text in UTF-8 and other encoding forms. Character properties have been systematized and greatly extended to help implementers in support of Unicode text processing. The standard has also established principles of stability for casefolding and identifiers, crucial for interoperability and backward compatibility for formal language use and in other contexts that depend on exact usage and matching of identifiers. Version 5.0 delivers a stable, practical character processing model in sync with today’s information technology needs. Unicode now offers: • Round-trip compatibility with the Chinese standards GB18030 and HKSCS • The specification of the newly established core CJK subset of characters, IICore • Refinements to casing and bidirectional behavior to meet industry requirements • Improved Indic rendering guidelines • Better guidance on the handling of combining characters, Unicode strings, variation selectors, line breaking, and segmentation Implementers who want to keep pace with the industry and take advantage of a stable foundation for security, to align with the latest collation and locale data definitions, and, most importantly, to expand their market reach need to upgrade to Version 5.0 as soon as possible. Detailed Change Information. See Appendix D, Changes from Previous Versions, for detailed information about the changes from previous versions of the standard, including character counts, stability guarantees, and updates to the Unicode Character Database and Unicode Standard Annexes. Version 5.0 of the Unicode Standard corresponds to ISO/IEC 10646:2003 plus Amendments 1 and 2 to that standard and four characters to support Sindhi from Amendment 3.

Organization of This Book This book and the Unicode Character Database define Version 5.0 of the Unicode Standard. The book gives the general principles, requirements for conformance, guidelines for implementers, character code charts and names, and the Unicode Standard Annexes.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Chapter 1

Introduction

1

The Unicode Standard is the universal character encoding standard for written characters and text. It defines a consistent way of encoding multilingual text that enables the exchange of text data internationally and creates the foundation for global software. As the default encoding of HTML and XML, the Unicode Standard provides a sound underpinning for the World Wide Web and new methods of business in a networked world. Required in new Internet protocols and implemented in all modern operating systems and computer languages such as Java and C#, Unicode is the basis of software that must function all around the world. With Unicode, the information technology industry has replaced proliferating character sets with data stability, global interoperability and data interchange, simplified software, and reduced development costs. While taking the ASCII character set as its starting point, the Unicode Standard goes far beyond ASCII’s limited ability to encode only the upper- and lowercase letters A through Z. It provides the capacity to encode all characters used for the written languages of the world—more than 1 million characters can be encoded. No escape sequence or control code is required to specify any character in any language. The Unicode character encoding treats alphabetic characters, ideographic characters, and symbols equivalently, which means they can be used in any mixture and with equal facility (see Figure 1-1). The Unicode Standard specifies a numeric value (code point) and a name for each of its characters. In this respect, it is similar to other character encoding standards from ASCII onward. In addition to character codes and names, other information is crucial to ensure legible text: a character’s case, directionality, and alphabetic properties must be well defined. The Unicode Standard defines these and other semantic values, and it includes application data such as case mapping tables and character property tables as part of the Unicode Character Database. Character properties define a character’s identity and behavior; they ensure consistency in the processing and interchange of Unicode data. See Section 4.1, Unicode Character Database. Unicode characters are represented in one of three encoding forms: a 32-bit form (UTF32), a 16-bit form (UTF-16), and an 8-bit form (UTF-8). The 8-bit, byte-oriented form, UTF-8, has been designed for ease of use with existing ASCII-based systems. The Unicode Standard, Version 5.0, is code-for-code identical with International Standard ISO/IEC 10646. Any implementation that is conformant to Unicode is therefore conformant to ISO/IEC 10646.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

2

Introduction

Figure 1-1. Wide ASCII

ASCII/8859-1 Text A S C I I / 8 8 5 9 1

0100 0001 0101 0011 0100 0011 0100 1001 0100 1001 0010 1111 0011 1000 0011 1000 0011 0101 0011 1001 0010 1101 0011 0001 0010 0000

t e x t

0111 0100 0110 0101 0111 1000 0111 0100

Unicode Text A S C I I

0000 0000 0100 0001 0000 0000 0101 0011 0000 0000 0100 0011 0000 0000 0100 1001 0000 0000 0100 1001 0000 0000 0010 0000 0101 1001 0010 1001 0101 0111 0011 0000 0000 0000 0010 0000 0000 0110 0011 0011 0000 0110 0100 0100 0000 0000 0000 0000 0010 0000

0110 0110 0000 0011 0010 0011

0010 0100 0010 1011 0111 1011

0111 0101 0000 0001 0000 0011

The Unicode Standard contains 1,114,112 code points, most of which are available for encoding of characters. The majority of the common characters used in the major languages of the world are encoded in the first 65,536 code points, also known as the Basic Multilingual Plane (BMP). The overall capacity for more than 1 million characters is more than sufficient for all known character encoding requirements, including full coverage of all minority and historic scripts of the world.

1.1 Coverage The Unicode Standard, Version 5.0, contains 99,024 characters from the world’s scripts. These characters are more than sufficient not only for modern communication in most languages, but also for the classical forms of many languages. The standard includes the European alphabetic scripts, Middle Eastern right-to-left scripts, and scripts of Asia, as well as many others. The unified Han subset contains 70,229 ideographic characters defined by national and industry standards of China, Japan, Korea, Taiwan, Vietnam, and Singapore.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Chapter 2

General Structure

2

This chapter describes the fundamental principles governing the design of the Unicode Standard and presents an informal overview of its main features. The chapter starts by placing the Unicode Standard in an architectural context by discussing the nature of text representation and text processing and its bearing on character encoding decisions. Next, the Unicode Design Principles are introduced—10 basic principles that convey the essence of the standard. The Unicode Design Principles serve as a tutorial framework for understanding the Unicode Standard. The chapter then moves on to the Unicode character encoding model, introducing the concepts of character, code point, and encoding forms, and diagramming the relationships between them. This provides an explanation of the encoding forms UTF-8, UTF-16, and UTF-32 and some general guidelines regarding the circumstances under which one form would be preferable to another. The sections on Unicode allocation then describe the overall structure of the Unicode codespace, showing a summary of the code charts and the locations of blocks of characters associated with different scripts or sets of symbols. Next, the chapter discusses the issue of writing direction and introduces several special types of characters important for understanding the Unicode Standard. In particular, the use of combining characters, the byte order mark, and other special characters is explored in some detail. The section on equivalent sequences and normalization describes the issue of multiple equivalent representations of Unicode text and explains how text can be transformed to use a unique and preferred representation for each character sequence. Finally, there is an informal statement of the conformance requirements for the Unicode Standard. This informal statement, with a number of easy-to-understand examples, gives a general sense of what conformance to the Unicode Standard means. The rigorous, formal definition of conformance is given in the subsequent Chapter 3, Conformance.

2.1 Architectural Context A character code standard such as the Unicode Standard enables the implementation of useful processes operating on textual data. The interesting end products are not the charac-

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

10

General Structure

ter codes but rather the text processes, because these directly serve the needs of a system’s users. Character codes are like nuts and bolts—minor, but essential and ubiquitous components used in many different ways in the construction of computer software systems. No single design of a character set can be optimal for all uses, so the architecture of the Unicode Standard strikes a balance among several competing requirements.

Basic Text Processes Most computer systems provide low-level functionality for a small number of basic text processes from which more sophisticated text-processing capabilities are built. The following text processes are supported by most computer systems to some degree: • Rendering characters visible (including ligatures, contextual forms, and so on) • Breaking lines while rendering (including hyphenation) • Modifying appearance, such as point size, kerning, underlining, slant, and weight (light, demi, bold, and so on) • Determining units such as “word” and “sentence” • Interacting with users in processes such as selecting and highlighting text • Accepting keyboard input and editing stored text through insertion and deletion • Comparing text in operations such as in searching or determining the sort order of two strings • Analyzing text content in operations such as spell-checking, hyphenation, and parsing morphology (that is, determining word roots, stems, and affixes) • Treating text as bulk data for operations such as compressing and decompressing, truncating, transmitting, and receiving

Text Elements, Characters, and Text Processes One of the more profound challenges in designing a character encoding stems from the fact that there is no universal set of fundamental units of text. Instead, the division of text into text elements necessarily varies by language and text process. For example, in traditional German orthography, the letter combination “ck” is a text element for the process of hyphenation (where it appears as “k-k”), but not for the process of sorting. In Spanish, the combination “ll” may be a text element for the traditional process of sorting (where it is sorted between “l” and “m”), but not for the process of rendering. In English, the letters “A” and “a” are usually distinct text elements for the process of rendering, but generally not distinct for the process of searching text. The text elements in a given language depend upon the specific text process; a text element for spell-checking may have different boundaries from a text element for sorting purposes. For example, in the phrase “the quick brown fox,” the sequence “fox” is a text element for the purpose of spell-checking.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

2.1

Architectural Context

11

In contrast, a character encoding standard provides a single set of fundamental units of encoding, to which it uniquely assigns numerical code points. These units, called assigned characters, are the smallest interpretable units of stored text. Text elements are then represented by a sequence of one or more characters. Figure 2-1 illustrates the relationship between several different types of text elements and the characters that are used to represent those text elements. Unicode Standard Annex #29, “Text Boundaries,” provides more details regarding the specifications of boundaries.

Figure 2-1. Text Elements and Characters

Text Elements

Characters Ç

Composite:

Ç

@

C ¸ Collation Unit:

ch

(Slovak)

@

Syllable: Word:

c h

cat

c a t

The design of the character encoding must provide precisely the set of characters that allows programmers to design applications capable of implementing a variety of text processes in the desired languages. Therefore, the text elements encountered in most text processes are represented as sequences of character codes. See Unicode Standard Annex #29, “Text Boundaries,” for detailed information on how to segment character strings into common types of text elements. Certain text elements correspond to what users perceive as single characters. These are called grapheme clusters.

Text Processes and Encoding In the case of English text using an encoding scheme such as ASCII, the relationships between the encoding and the basic text processes built on it are seemingly straightforward: characters are generally rendered visible one by one in distinct rectangles from left to right in linear order. Thus one character code inside the computer corresponds to one logical character in a process such as simple English rendering.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

12

General Structure

When designing an international and multilingual text encoding such as the Unicode Standard, the relationship between the encoding and implementation of basic text processes must be considered explicitly, for several reasons: • Many assumptions about character rendering that hold true for the English alphabet fail for other writing systems. Characters in these other writing systems are not necessarily rendered visible one by one in rectangles from left to right. In many cases, character positioning is quite complex and does not proceed in a linear fashion. See Section 8.2, Arabic, and Section 9.1, Devanagari, for detailed examples of this situation. • It is not always obvious that one set of text characters is an optimal encoding for a given language. For example, two approaches exist for the encoding of accented characters commonly used in French or Swedish: ISO/IEC 8859 defines letters such as “ä” and “ö” as individual characters, whereas ISO 5426 represents them by composition with diacritics instead. In the Swedish language, both are considered distinct letters of the alphabet, following the letter “z”. In French, the diaeresis on a vowel merely marks it as being pronounced in isolation. In practice, both approaches can be used to implement either language. • No encoding can support all basic text processes equally well. As a result, some trade-offs are necessary. For example, following common practice, Unicode defines separate codes for uppercase and lowercase letters. This choice causes some text processes, such as rendering, to be carried out more easily, but other processes, such as comparison, to become more difficult. A different encoding design for English, such as case-shift control codes, would have the opposite effect. In designing a new encoding scheme for complex scripts, such trade-offs must be evaluated and decisions made explicitly, rather than unconsciously. For these reasons, design of the Unicode Standard is not specific to the design of particular basic text-processing algorithms. Instead, it provides an encoding that can be used with a wide variety of algorithms. In particular, sorting and string comparison algorithms cannot assume that the assignment of Unicode character code numbers provides an alphabetical ordering for lexicographic string comparison. Culturally expected sorting orders require arbitrarily complex sorting algorithms. The expected sort sequence for the same characters differs across languages; thus, in general, no single acceptable lexicographic ordering exists. See Unicode Technical Standard #10, “Unicode Collation Algorithm,” for the standard default mechanism for comparing Unicode strings. Text processes supporting many languages are often more complex than they are for English. The character encoding design of the Unicode Standard strives to minimize this additional complexity, enabling modern computer systems to interchange, render, and manipulate text in a user’s own script and language—and possibly in other languages as well. Character Identity. Whenever Unicode makes statements about the default layout behavior of characters, it is in an attempt to ensure that users and implementers face no ambigu-

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

2.2

Unicode Design Principles

13

ities as to which characters or character sequences to use for a given purpose. For bidirectional writing systems, this includes the specification of the sequence in which characters are to be encoded so as to correspond to a specific reading order when displayed. See Section 2.10, Writing Direction. The actual layout in an implementation may differ in detail. A mathematical layout system, for example, will have many additional, domain-specific rules for layout, but a welldesigned system leaves no ambiguities as to which character codes are to be used for a given aspect of the mathematical expression being encoded. The purpose of defining Unicode default layout behavior is not to enforce a single and specific aesthetic layout for each script, but rather to encourage uniformity in encoding. In that way implementers of layout systems can rely on the fact that user would have chosen a particular character sequence for a given purpose, and users can rely on the fact that implementers will create a layout for a particular character sequence that matches the intent of the user to within the capabilities or technical limitations of the implementation. In other words, the ideal is that two users who are familiar with the standard and who are presented with the same text would choose the same sequence of character codes to encode the text. In actual practice there are many limitations that mean this goal cannot always be realized.

2.2 Unicode Design Principles The design of the Unicode Standard reflects the 10 fundamental principles stated in Table 2-1. Not all of these principles can be satisfied simultaneously. The design strikes a balance between maintaining consistency for the sake of simplicity and efficiency and maintaining compatibility for interchange with existing standards.

Table 2-1. The 10 Unicode Design Principles Principle

Statement

Universality Efficiency Characters, not glyphs Semantics Plain text Logical order Unification

The Unicode Standard provides a single, universal repertoire. Unicode text is simple to parse and process. The Unicode Standard encodes characters, not glyphs. Characters have well-defined semantics. Unicode characters represent plain text. The default for memory representation is logical order. The Unicode Standard unifies duplicate characters within scripts across languages. Accented forms can be dynamically composed. Characters, once assigned, cannot be reassigned and key properties are immutable. Accurate convertibility is guaranteed between the Unicode Standard and other widely accepted standards.

Dynamic composition Stability Convertibility

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

14

General Structure

Universality The Unicode Standard encodes a single, very large set of characters, encompassing all the characters needed for worldwide use. This single repertoire is intended to be universal in coverage, containing all the characters for textual representation in all modern writing systems, in most historic writing systems, and for symbols used in plain text. The Unicode Standard is designed to meet the needs of diverse user communities within each language, serving business, educational, liturgical and scientific users, and covering the needs of both modern and historical texts. Despite its aim of universality, the Unicode Standard considers the following to be outside its scope: writing systems for which insufficient information is available to enable reliable encoding of characters, writing systems that have not become standardized through use, and writing systems that are nontextual in nature. Because the universal repertoire is known and well defined in the standard, it is possible to specify a rich set of character semantics. By relying on those character semantics, implementations can provide detailed support for complex operations on text in a portable way. See “Semantics” later in this section.

Efficiency The Unicode Standard is designed to make efficient implementation possible. There are no escape characters or shift states in the Unicode character encoding model. Each character code has the same status as any other character code; all codes are equally accessible. All Unicode encoding forms are self-synchronizing and non-overlapping. This makes randomly accessing and searching inside streams of characters efficient. By convention, characters of a script are grouped together as far as is practical. Not only is this practice convenient for looking up characters in the code charts, but it makes implementations more compact and compression methods more efficient. The common punctuation characters are shared. Format characters are given specific and unambiguous functions in the Unicode Standard. This design simplifies the support of subsets. To keep implementations simple and efficient, stateful controls and format characters are avoided wherever possible.

Characters, Not Glyphs The Unicode Standard draws a distinction between characters and glyphs. Characters are the abstract representations of the smallest components of written language that have semantic value. They represent primarily, but not exclusively, the letters, punctuation, and other signs that constitute natural language text and technical notation. The letters used in natural language text are grouped into scripts—sets of letters that are used together in writing languages. Letters in different scripts, even when they correspond either semantically or

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

2.2

Unicode Design Principles

15

graphically, are represented in Unicode by distinct characters. This is true even in those instances where they correspond in semantics, pronunciation, or appearance. Characters are represented by code points that reside only in a memory representation, as strings in memory, on disk, or in data transmission. The Unicode Standard deals only with character codes. Glyphs represent the shapes that characters can have when they are rendered or displayed. In contrast to characters, glyphs appear on the screen or paper as particular representations of one or more characters. A repertoire of glyphs makes up a font. Glyph shape and methods of identifying and selecting glyphs are the responsibility of individual font vendors and of appropriate standards and are not part of the Unicode Standard. Various relationships may exist between character and glyph: a single glyph may correspond to a single character or to a number of characters, or multiple glyphs may result from a single character. The distinction between characters and glyphs is illustrated in Figure 2-2.

Figure 2-2. Characters Versus Glyphs

Glyphs

Unicode Characters U+0041 latin capital letter a U+0061 latin small letter a U+043F cyrillic small letter pe U+0647 arabic letter heh U+0066 latin small letter f + U+0069 latin small letter i

Even the letter “a” has a wide variety of glyphs that can represent it. A lowercase Cyrillic “Ò” also has a variety of glyphs; the second glyph for U+043F cyrillic small letter pe shown in Figure 2-2 is customary for italic in Russia, while the third is customary for italic in Serbia. Arabic letters are displayed with different glyphs, depending on their position in a word; the glyphs in Figure 2-2 show independent, final, initial, and medial forms. Sequences such as “fi” may be displayed with two independent glyphs or with a ligature glyph. What the user thinks of as a single character—which may or may not be represented by a single glyph—may be represented in the Unicode Standard as multiple code points. See Table 2-2 for additional examples. For certain scripts, such as Arabic and the various Indic scripts, the number of glyphs needed to display a given script may be significantly larger than the number of characters encoding the basic units of that script. The number of glyphs may also depend on the orthographic style supported by the font. For example, an Arabic font intended to support

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

16

General Structure

Table 2-2. User-Perceived Characters with Multiple Code Points Character

Code Points

Linguistic Usage

0063 0068

Slovak, traditional Spanish

0074 02B0 0078 0323

Native American languages

019B 0313 00E1 0328

Lithuanian

0069 0307 0301 30C8 309A

Ainu (in kana transcription)

the Nastaliq style of Arabic script may possess many thousands of glyphs. However, the character encoding employs the same few dozen letters regardless of the font style used to depict the character data in context. A font and its associated rendering process define an arbitrary mapping from Unicode characters to glyphs. Some of the glyphs in a font may be independent forms for individual characters; others may be rendering forms that do not directly correspond to any single character. Text rendering requires that characters in memory be mapped to glyphs. The final appearance of rendered text may depend on context (neighboring characters in the memory representation), variations in typographic design of the fonts used, and formatting information (point size, superscript, subscript, and so on). The results on screen or paper can differ considerably from the prototypical shape of a letter or character, as shown in Figure 2-3. For the Latin script, this relationship between character code sequence and glyph is relatively simple and well known; for several other scripts, it is documented in this standard. However, in all cases, fine typography requires a more elaborate set of rules than given here. The Unicode Standard documents the default relationship between character sequences and glyphic appearance for the purpose of ensuring that the same text content can be stored with the same, and therefore interchangeable, sequence of character codes.

Semantics Characters have well-defined semantics. These semantics are defined by explicitly assigned character properties, rather than implied through the character name or the position of a character in the code tables (see Section 3.5, Properties). The Unicode Character Database provides machine-readable character property tables for use in implementations of parsing, sorting, and other algorithms requiring semantic knowledge about the code points.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

2.2

Unicode Design Principles

17

Figure 2-3. Unicode Character Code to Rendered Glyphs

Text Character Sequence 0000 1001 0010 1010 0000 1001 0100 0010 0000 1001 0011 0000 0000 1001 0100 1101 0000 1001 0010 0100 0000 1001 0011 1111

Font (Glyph Source)

Text Rendering Process

These properties are supplemented by the description of script and character behavior in this standard. See also Unicode Technical Report #23, “The Unicode Character Property Model.” The Unicode Standard identifies more than 50 different character properties, including numeric, casing, combination, and directionality properties (see Chapter 4, Character Properties). Additional properties may be defined as needed from time to time. Where characters are used in different ways in different languages, the relevant properties are normally defined outside the Unicode Standard. For example, Unicode Technical Standard #10, “Unicode Collation Algorithm,” defines a set of default collation weights that can be used with a standard algorithm. Tailorings for each language are provided in the Common Locale Data Repository (CLDR); see Section B.6, Other Unicode Online Resources. The Unicode Standard, by supplying a universal repertoire associated with well-defined character semantics, does not require the code set independent model of internationalization and text handling. That model abstracts away string handling as manipulation of byte

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

18

General Structure

streams of unknown semantics to protect implementations from the details of hundreds of different character encodings and selectively late-binds locale-specific character properties to characters. Of course, it is always possible for code set independent implementations to retain their model and to treat Unicode characters as just another character set in that context. It is not at all unusual for Unix implementations to simply add UTF-8 as another character set, parallel to all the other character sets they support. By contrast, the Unicode approach—because it is associated with a universal repertoire—assumes that characters and their properties are inherently and inextricably associated. If an internationalized application can be structured to work directly in terms of Unicode characters, all levels of the implementation can reliably and efficiently access character storage and be assured of the universal applicability of character property semantics.

Plain Text Plain text is a pure sequence of character codes; plain Unicode-encoded text is therefore a sequence of Unicode character codes. In contrast, styled text, also known as rich text, is any text representation consisting of plain text plus added information such as a language identifier, font size, color, hypertext links, and so on. For example, the text of this book, a multifont text as formatted by a book editing system, is rich text. The simplicity of plain text gives it a natural role as a major structural element of rich text. SGML, RTF, HTML, XML, and TEX are examples of rich text fully represented as plain text streams, interspersing plain text data with sequences of characters that represent the additional data structures. They use special conventions embedded within the plain text file, such as “

”, to distinguish the markup or tags from the “real” content. Many popular word processing packages rely on a buffer of plain text to represent the content and implement links to a parallel store of formatting data. The relative functional roles of both plain text and rich text are well established: • Plain text is the underlying content stream to which formatting can be applied. • Rich text carries complex formatting information as well as text context. • Plain text is public, standardized, and universally readable. • Rich text representation may be implementation-specific or proprietary. Although some rich text formats have been standardized or made public, the majority of rich text designs are vehicles for particular implementations and are not necessarily readable by other implementations. Given that rich text equals plain text plus added information, the extra information in rich text can always be stripped away to reveal the “pure” text underneath. This operation is often employed, for example, in word processing systems that use both their own private rich text format and plain text file format as a universal, if limited, means of exchange. Thus, by default, plain text represents the basic, interchangeable content of text. Plain text represents character content only, not its appearance. It can be displayed in a varity of ways and requires a rendering process to make it visible with a particular appearance.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

2.2

Unicode Design Principles

19

If the same plain text sequence is given to disparate rendering processes, there is no expectation that rendered text in each instance should have the same appearance. Instead, the disparate rendering processes are simply required to make the text legible according to the intended reading. This legibility criterion constrains the range of possible appearances. The relationship between appearance and content of plain text may be summarized as follows: Plain text must contain enough information to permit the text to be rendered legibly, and nothing more. The Unicode Standard encodes plain text. The distinction between plain text and other forms of data in the same data stream is the function of a higher-level protocol and is not specified by the Unicode Standard itself.

Logical Order The order in which Unicode text is stored in the memory representation is called logical order. This order roughly corresponds to the order in which text is typed in via the keyboard; it also roughly corresponds to phonetic order. For decimal numbers, the logical order consistently corresponds to the most significant digit first, which is the order expected by number-parsing software. When displayed, this logical order often corresponds to a simple linear progression of characters in one direction, such as from left to right, right to left, or top to bottom. In other circumstances, text is displayed or printed in an order that differs from a single linear progression. Some of the clearest examples are situations where a right-to-left script (such as Arabic or Hebrew) is mixed with a left-to-right script (such as Latin or Greek). For example, when the text in Figure 2-4 is ordered for display, the glyph that represents the first character of the English text appears at the left. The logical start character of the Hebrew text, however, is represented by the Hebrew glyph closest to the right margin. The succeeding Hebrew glyphs are laid out to the left.

Figure 2-4. Bidirectional Ordering

In logical order, numbers are encoded with most significant digit first, but are displayed in different writing directions. As shown in Figure 2-5 these writing directions do not always correspond to the writing direction of the surrounding text. The first example shows N’Ko, a right-to-left script with digits that also render right to left. Examples 2 and 3 show Hebrew and Arabic, in which the numbers are rendered left to right, resulting in bidirectional layout. In left-to-right scripts, such as Latin and Hiragana and Katakana (for Japa-

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

20

General Structure

nese), numbers follow the predominant left-to-right direction of the script, as shown in Examples 4 and 5. When Japanese is laid out vertically, numbers are either laid out vertically or may be rotated clockwise 90 degrees to follow the layout direction of the lines, as shown in Example 6.

Figure 2-5. Writing Direction and Numbers .1123‫נא ראה עמוד‬

Please see page 1123. 1123ページをみてください。 The Unicode Standard precisely defines the conversion of Unicode text from logical order to the order of readable (displayed) text so as to ensure consistent legibility. Properties of directionality inherent in characters generally determine the correct display order of text. The Unicode Bidirectional Algorithm specifies how these properties are used to resolve directional interactions when characters of right-to-left and left-to-right directionality are mixed. (See Unicode Standard Annex #9, “The Bidirectional Algorithm.”) However, when characters of different directionality are mixed, inherent directionality alone is occasionally insufficient to render plain text legibly. The Unicode Standard therefore includes characters to explicitly specify changes in direction when necessary. The Bidirectional Algorithm uses these directional layout control characters together with the inherent directional properties of characters to exert exact control over the display ordering for legible interchange. By requiring the use of this algorithm, the Unicode Standard ensures that plain text used for simple items like file names or labels can always be correctly ordered for display. Besides mixing runs of differing overall text direction, there are many other cases where the logical order does not correspond to a linear progression of characters. Combining characters (such as accents) are stored following the base character to which they apply, but are positioned relative to that base character and thus do not follow a simple linear progression in the final rendered text. For example, the Latin letter “Ï” is stored as “x” followed by combining “Î”; the accent appears below, not to the right of the base. This position with respect to the base holds even where the overall text progression is from top to bottom—for example, with “Ï” appearing upright within a vertical Japanese line. Characters may also combine into ligatures or conjuncts or otherwise change positions of their components radically, as shown in Figure 2-3 and Figure 2-20. There is one particular exception to the usual practice of logical order paralleling phonetic order. With the Thai and Lao scripts, users traditionally type in visual order rather than phonetic order, resulting in some vowel letters being stored ahead of consonants, even though they are pronounced after them.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

2.2

Unicode Design Principles

21

Unification The Unicode Standard avoids duplicate encoding of characters by unifying them within scripts across language. Common letters are given one code each, regardless of language, as are common Chinese/Japanese/Korean (CJK) ideographs. (See Section 12.1, Han.) Punctuation marks, symbols, and diacritics are handled in a similar manner as letters. If they can be clearly identified with a particular script, they are encoded once for that script and are unified across any languages that may use that script. See, for example, U+1362 ethiopic full stop, U+060F arabic sign misra, and U+0592 hebrew accent segol. However, some punctuation or diacritic marks may be shared in common across a number of scripts—the obvious example being Western-style punctuation characters, which are often recently added to the writing systems of scripts other than Latin. In such cases, characters are encoded only once and are intended for use with multiple scripts. Common symbols are also encoded only once and are not associated with any script in particular. It is quite normal for many characters to have different usages, such as comma “,” for either thousands-separator (English) or decimal-separator (French). The Unicode Standard avoids duplication of characters due to specific usage in different languages; rather, it duplicates characters only to support compatibility with base standards. Avoidance of duplicate encoding of characters is important to avoid visual ambiguity. There are a few notable instances in the standard where visual ambiguity between different characters is tolerated, however. For example, in most fonts there is little or no distinction visible between Latin “o”, Cyrillic “o”, and Greek “o” (omicron). These are not unified because they are characters from three different scripts, and many legacy character encodings distinguish between them. As another example, there are three characters whose glyph is the same uppercase barred D shape, but they correspond to three distinct lowercase forms. Unifying these uppercase characters would have resulted in unnecessary complications for case mapping. The Unicode Standard does not attempt to encode features such as language, font, size, positioning, glyphs, and so forth. For example, it does not preserve language as a part of character encoding: just as French i grec, German ypsilon, and English wye are all represented by the same character code, U+0057 “Y”, so too are Chinese zi, Japanese ji, and Korean ja all represented as the same character code, U+5B57 %. In determining whether to unify variant CJK ideograph forms across standards, the Unicode Standard follows the principles described in Section 12.1, Han. Where these principles determine that two forms constitute a trivial difference, the Unicode Standard assigns a single code. Just as for the Latin and other scripts, typeface distinctions or local preferences in glyph shapes alone are not sufficient grounds for disunification of a character. Figure 2-6 illustrates the well-known example of the CJK ideograph for “bone,” which shows significant shape differences from typeface to typeface, with some forms preferred in China and some in Japan. All of these forms are considered to be the same character, encoded at U+9AA8 in the Unicode Standard.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

22

General Structure

Figure 2-6. Typeface Variation for the Bone Character

EF Many characters in the Unicode Standard could have been unified with existing visually similar Unicode characters or could have been omitted in favor of some other Unicode mechanism for maintaining the kinds of text distinctions for which they were intended. However, considerations of interoperability with other standards and systems often require that such compatibility characters be included in the Unicode Standard. See Section 2.3, Compatibility Characters. In particular, whenever font style, size, positioning or precise glyph shape carry a specific meaning and are used in distinction to the ordinary character—for example, in phonetic or mathematical notation—the characters are not unified.

Dynamic Composition The Unicode Standard allows for the dynamic composition of accented forms and Hangul syllables. Combining characters used to create composite forms are productive. Because the process of character composition is open-ended, new forms with modifying marks may be created from a combination of base characters followed by combining characters. For example, the diaeresis “¨” may be combined with all vowels and a number of consonants in languages using the Latin script and several other scripts, as shown in Figure 2-7.

Figure 2-7. Dynamic Composition

¨ Ä A +  0041

0308

Equivalent Sequences. Some text elements can be encoded either as static precomposed forms or by dynamic composition. Common precomposed forms such as U+00DC “Ü” latin capital letter u with diaeresis are included for compatibility with current standards. For static precomposed forms, the standard provides a mapping to an equivalent dynamically composed sequence of characters. (See also Section 3.7, Decomposition.) Thus different sequences of Unicode characters are considered equivalent. A precomposed character may be represented as an equivalent composed character sequence (see Section 2.12, Equivalent Sequences and Normalization).

Stability Certain aspects of the Unicode Standard must be absolutely stable between versions, so that implementers and users can be guaranteed that text data, once encoded, retains the same meaning. Most importantly, this means that once Unicode characters are assigned, their code point assignments cannot be changed, nor can characters be removed.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

2.3

Compatibility Characters

23

Characters are retained in the standard, so that previously conforming data stay conformant in future versions of the standard. Sometimes characters are deprecated—that is, their use in new documents is discouraged. Usually, this is because the characters were found not to be needed, and their continued use would merely result in duplicate ways of encoding the same information. While implementations should continue to recognize such characters when they are encountered, spell-checkers or editors could warn users of their presence and suggest replacements. Unicode character names are also never changed, so that they can be used as identifiers that are valid across versions. See Section 4.8, Name—Normative. Similar stability guarantees exist for certain important properties. For example, the decompositions are kept stable, so that it is possible to normalize a Unicode text once and have it remain normalized in all future versions. For a list of stability policies for the Unicode Standard, see Appendix F, Unicode Encoding Stability Policies.

Convertibility Character identity is preserved for interchange with a number of different base standards, including national, international, and vendor standards. Where variant forms (or even the same form) are given separate codes within one base standard, they are also kept separate within the Unicode Standard. This choice guarantees the existence of a mapping between the Unicode Standard and base standards. Accurate convertibility is guaranteed between the Unicode Standard and other standards in wide usage as of May 1993. Characters have also been added to allow convertibility to several important East Asian character sets created after that date—for example, GB 18030. In general, a single code point in another standard will correspond to a single code point in the Unicode Standard. Sometimes, however, a single code point in another standard corresponds to a sequence of code points in the Unicode Standard, or vice versa. Conversion between Unicode text and text in other character codes must, in general, be done by explicit table-mapping processes. (See also Section 5.1, Transcoding to Other Standards.)

2.3 Compatibility Characters Compatibility Variants Conceptually, compatibility characters are those that would not have been encoded except for compatibility and round-trip convertibility with other standards. They are variants of characters that already have encodings as normal (that is, non-compatibility) characters in the Unicode Standard; as such, they are more properly referred to as compatibility variants. Examples of compatibility variants in this sense include all of the glyph variants in the Compatibility and Specials Area: halfwidth or fullwidth characters from East Asian charac-

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

24

General Structure

ter encoding standards, Arabic contextual form glyphs from preexisting Arabic code pages, Arabic ligatures and ligatures from other scripts, and so on. Other examples include CJK compatibility ideographs, which are generally duplicates of a unified Han ideograph, and legacy alternate format characters such as U+206C inhibit arabic form shaping. The fact that a character can be considered a compatibility variant does not mean that the character is deprecated in the standard. The use of many compatibility variants in general interchange is unproblematic. Some, however, such as Arabic contextual forms or vertical forms, can lead to problems when used in general interchange. In identifiers, compatibility variants should be avoided because of their visual similarity with regular characters. (See Unicode Technical Report #36, “Unicode Security Considerations.”) The Compatibility and Specials Area contains a large number of compatibility characters, but the Unicode Standard also contains many compatibility characters that do not appear in that area. These include examples such as U+2163 “IV” roman numeral four, U+2007 figure space, and U+00B2 “2” superscript two. There is no formal listing of all compatibility characters in the Unicode Standard.

Compatibility Decomposable Characters There is a second, narrow sense of the term “compatibility character” in the Unicode Standard, corresponding to the notion of a compatibility decomposable introduced in Section 2.2, Unicode Design Principles. This sense is strictly defined as any Unicode character whose compatibility decomposition is not identical to its canonical decomposition. (See definition D66 in Section 3.7, Decomposition.) Because a compatibility character in this narrow sense must also be a composite character, it may also be unambiguously referred to as a compatibility composite character, or compatibility composite for short. The compatibility decomposable characters are precisely defined in the Unicode Character Database. Because of their use in normalization, their compatibility decompositions are stable and cannot be changed. Compatibility decomposable characters and compatibility characters are two distinct concepts, even though the two sets of characters overlap. Not all compatibility characters have decomposition mappings. For example, the deprecated alternate format characters do not have any distinct decomposition, and CJK compatibility ideographs have canonical decomposition mappings rather than compatibility decomposition mappings. Some compatibility decomposable characters are widely used characters serving essential functions. The no-break space is one example. A large number of compatibility decomposable characters are really distinct symbols used in specialized notations, whether phonetic or mathematical. They are therefore not compatibility variants in the strict sense. Rather, their compatibility mappings express their historical derivation from styled forms of standard letters. In these and similar cases, such as fixed-width space characters, the compatibility decompositions define possible fallback representations.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

2.4

Code Points and Characters

25

Mapping Compatibility Characters Identifying one character as a compatibility variant of another character usually implies that the first can be remapped to the second without the loss of any textual information other than formatting and layout. However, such remapping cannot always take place because many of the compatibility characters are included in the standard precisely to allow systems to maintain one-to-one mappings to other existing character encoding standards and code pages. In such cases, a remapping would lose information that is important to maintaining some distinction in the original encoding. By definition, a compatibility decomposable character decomposes into a compatibly equivalent character or character sequence. Even in such cases, an implementation must proceed with due caution—replacing one with the other may change not only formatting information, but also other textual distinctions on which some other process may depend. In many cases there exists a visual relationship between a compatibility composition and a standard character that is akin to a font style or directionality difference. Replacing such characters with unstyled characters could affect the meaning of the text. Replacing them with rich text would preserve the meaning for a human reader, but could cause some programs that depend on the distinction to behave unpredictably. This issue particularly affects compatibility characters used in mathematical notation. In some usage domains (for example, network identifiers), it may be acceptable to prohibit the use of compatibility variants or to remap them consistently. In fact, in such cases, further sets of characters may be restricted in a similar way to compatibility variants. For more information and an introduction to the concept of “confusable” characters, see Unicode Technical Standard #39, “Unicode Security Mechanisms.”

2.4 Code Points and Characters On a computer, abstract characters are encoded internally as numbers. To create a complete character encoding, it is necessary to define the list of all characters to be encoded and to establish systematic rules for how the numbers represent the characters. The range of integers used to code the abstract characters is called the codespace. A particular integer in this set is called a code point. When an abstract character is mapped or assigned to a particular code point in the codespace, it is then referred to as an encoded character. In the Unicode Standard, the codespace consists of the integers from 0 to 10FFFF16, comprising 1,114,112 code points available for assigning the repertoire of abstract characters. There are constraints on how the codespace is organized, and particular areas of the codespace have been set aside for encoding of certain kinds of abstract characters or for other uses in the standard. For more on the allocation of the Unicode codespace, see Section 2.8, Unicode Allocation.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

26

General Structure

Figure 2-8 illustrates the relationship between abstract characters and code points, which together constitute encoded characters. Note that some abstract characters may be associated with multiple, separately encoded characters (that is, be encoded “twice”). In other instances, an abstract character may be represented by a sequence of two (or more) other encoded characters. The solid arrows connect encoded characters with the abstract characters that they represent and encode.

Figure 2-8. Codespace and Encoded Characters

Abstract

Encoded 00C5 212B 0041

030A

When referring to code points in the Unicode Standard, the usual practice is to refer to them by their numeric value expressed in hexadecimal, with a “U+” prefix. (See Appendix A, Notational Conventions.) Encoded characters can also be referred to by their code points only. To prevent ambiguity, the official Unicode name of the character is often added; this clearly identifies the abstract character that is encoded. For example: U+0061 latin small letter a U+10330 gothic letter ahsa U+201DF cjk unified ideograph-201df Such citations refer only to the encoded character per se, associating the code point (as an integral value) with the abstract character that is encoded.

Types of Code Points There are many ways to categorize code points. Table 2-3 illustrates some of the categorizations and basic terminology used in the Unicode Standard. Not all assigned code points represent abstract characters; only Graphic, Format, Control and Private-use do. Surrogates and Noncharacters are assigned code points but are not assigned to abstract characters. Reserved code points are assignable: any may be assigned in a future version of the standard. The General Category provides a finer breakdown of

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

2.4

Code Points and Characters

27

Table 2-3. Types of Code Points Basic Type

Brief Description

Letter, mark, number, punctuation, symbol, and spaces Invisible but affects neighboring characters; Format includes line/paragraph separators Usage defined by protocols Control or standards outside the Unicode Standard Usage defined by private Private-use agreement outside the Unicode Standard Permanently reserved for Surrogate UTF-16; restricted interchange Permanently reserved for Noncharacter internal usage; restricted interchange Reserved for future assignReserved ment; restricted interchange Graphic

Character Status

General Category

Code Point Status

L, M, N, P, S, Zs

Cf, Zl, Zp

Assigned to abstract character

Cc

Designated (assigned) code point

Co

Cs

Cn

Not assigned to abstract character Undesignated (unassigned) code point

Graphic characters and also distinguishes between the other basic types (except between Noncharacter and Reserved). Other properties defined in the Unicode Character Database provide for different categorizations of Unicode code points. Control Codes. Sixty-five code points (U+0000..U+001F and U+007F.. U+009F) are reserved specifically as control codes, for compatibility with the C0 and C1 control codes of the ISO/IEC 2022 framework. A few of these control codes are given specific interpretations by the Unicode Standard. (See Section 16.1, Control Codes.) Noncharacters. Sixty-six code points are not used to encode characters. Noncharacters consist of U+FDD0..U+FDEF and any code point ending in the value FFFE16 or FFFF16— that is, U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, ... U+10FFFE, U+10FFFF. (See Section 16.7, Noncharacters.) Private Use. Three ranges of code points have been set aside for private use. Characters in these areas will never be defined by the Unicode Standard. These code points can be freely used for characters of any purpose, but successful interchange requires an agreement between sender and receiver on their interpretation. (See Section 16.5, Private-Use Characters.) Surrogates. Some 2,048 code points have been allocated as surrogate code points, which are used in the UTF-16 encoding form. (See Section 16.6, Surrogates Area.)

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

28

General Structure

Restricted Interchange. Code points that are not assigned to abstract characters are subject to restrictions in interchange. • Surrogate code points cannot be conformantly interchanged using Unicode encoding forms. They do not correspond to Unicode scalar values and thus do not have well-formed representations in any Unicode encoding form. (See Section 3.8, Surrogates.) • Noncharacter code points are reserved for internal use, such as for sentinel values. They should never be interchanged. They do, however, have well-formed representations in Unicode encoding forms and survive conversions between encoding forms. This allows sentinel values to be preserved internally across Unicode encoding forms, even though they are not designed to be used in open interchange. • All implementations need to preserve reserved code points because they may originate in implementations that use a future version of the Unicode Standard. For example, suppose that one person is using a Unicode 5.0 system and a second person is using a Unicode 3.2 system. The first person sends the second person a document containing some code points newly assigned in Unicode 5.0; these code points were unassigned in Unicode 3.2. The second person may edit the document, not changing the reserved codes, and send it on. In that case the second person is interchanging what are, as far as the second person knows, reserved code points. Code Point Semantics. The semantics of most code points are established by this standard; the exceptions are Controls, Private-use, and Noncharacters. Control codes generally have semantics determined by other standards or protocols (such as ISO/IEC 6429), but there are a small number of control codes for which the Unicode Standard specifies particular semantics. See Table 16-1 in Section 16.1, Control Codes, for the exact list of those control codes. The semantics of private-use characters are outside the scope of the Unicode Standard; their use is determined by private agreement, as, for example, between vendors. Noncharacters have semantics in internal use only.

2.5 Encoding Forms Computers handle numbers not simply as abstract mathematical objects, but as combinations of fixed-size units like bytes and 32-bit words. A character encoding model must take this fact into account when determining how to associate numbers with the characters. Actual implementations in computer systems represent integers in specific code units of particular size—usually 8-bit (= byte), 16-bit, or 32-bit. In the Unicode character encoding model, precisely defined encoding forms specify how each integer (code point) for a Unicode character is to be expressed as a sequence of one or more code units. The Unicode Standard provides three distinct encoding forms for Unicode characters, using 8-bit, 16bit, and 32-bit units. These are named UTF-8, UTF-16, and UTF-32, respectively. The

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

2.5

Encoding Forms

29

“UTF” is a carryover from earlier terminology meaning Unicode (or UCS) Transformation Format. Each of these three encoding forms is an equally legitimate mechanism for representing Unicode characters; each has advantages in different environments. All three encoding forms can be used to represent the full range of encoded characters in the Unicode Standard; they are thus fully interoperable for implementations that may choose different encoding forms for various reasons. Each of the three Unicode encoding forms can be efficiently transformed into either of the other two without any loss of data. Non-overlap. Each of the Unicode encoding forms is designed with the principle of nonoverlap in mind. Figure 2-9 presents an example of an encoding where overlap is permitted. In this encoding (Windows code page 932), characters are formed from either one or two code bytes. Whether a sequence is one or two bytes in length depends on the first byte, so that the values for lead bytes (of a two-byte sequence) and single bytes are disjoint. However, single-byte values and trail-byte values can overlap. That means that when someone searches for the character “D”, for example, he or she might find it either (mistakenly) as the trail byte of a two-byte sequence or as a single, independent byte. To find out which alternative is correct, a program must look backward through text.

Figure 2-9. Overlap in Legacy Mixed-Width Encodings

84 44 0414

D

44

Trail and Single

0044

84 84 84 84

0442

Lead and Trail

The situation is made more complex by the fact that lead and trail bytes can also overlap, as shown in the second part of Figure 2-9. This means that the backward scan has to repeat until it hits the start of the text or hits a sequence that could not exist as a pair as shown in Figure 2-10. This is not only inefficient, but also extremely error-prone: corruption of one byte can cause entire lines of text to be corrupted.

Figure 2-10. Boundaries and Interpretation

?? ... 84 84 84 84 84 84 44

0442

The Unicode Standard 5.0 – Electronic edition

0414

D

0044

Copyright © 1991–2007 Unicode, Inc.

30

General Structure

The Unicode encoding forms avoid this problem, because none of the ranges of values for the lead, trail, or single code units in any of those encoding forms overlap. Non-overlap makes all of the Unicode encoding forms well behaved for searching and comparison. When searching for a particular character, there will never be a mismatch against some code unit sequence that represents just part of another character. The fact that all Unicode encoding forms observe this principle of non-overlap distinguishes them from many legacy East Asian multibyte character encodings, for which overlap of code unit sequences may be a significant problem for implementations. Another aspect of non-overlap in the Unicode encoding forms is that all Unicode characters have determinate boundaries when expressed in any of the encoding forms. That is, the edges of code unit sequences representing a character are easily determined by local examination of code units; there is never any need to scan back indefinitely in Unicode text to correctly determine a character boundary. This property of the encoding forms has sometimes been referred to as self-synchronization. This property has another very important implication: corruption of a single code unit corrupts only a single character; none of the surrounding characters are affected. For example, when randomly accessing a string, a program can find the boundary of a character with limited backup. In UTF-16, if a pointer points to a leading surrogate, a single backup is required. In UTF-8, if a pointer points to a byte starting with 10xxxxxx (in binary), one to three backups are required to find the beginning of the character. Conformance. The Unicode Consortium fully endorses the use of any of the three Unicode encoding forms as a conformant way of implementing the Unicode Standard. It is important not to fall into the trap of trying to distinguish “UTF-8 versus Unicode,” for example. UTF-8, UTF-16, and UTF-32 are all equally valid and conformant ways of implementing the encoded characters of the Unicode Standard. Examples. Figure 2-11 shows the three Unicode encoding forms, including how they are related to Unicode code points.

Figure 2-11. Unicode Encoding Forms

UTF-32 00000041 000003A9 00008A9E 00010384

UTF-16 0041 03A9 8A9E D800 DF84

UTF-8 41 CE A9 E8 AA 9E F0 90 8E 84

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

2.5

Encoding Forms

31

In Figure 2-11, the UTF-32 line shows that each example character can be expressed with one 32-bit code unit. Those code units have the same values as the code point for the character. For UTF-16, most characters can be expressed with one 16-bit code unit, whose value is the same as the code point for the character, but characters with high code point values require a pair of 16-bit surrogate code units instead. In UTF-8, a character may be expressed with one, two, three, or four bytes, and the relationship between those byte values and the code point value is more complex. UTF-8, UTF-16, and UTF-32 are further described in the subsections that follow. See each subsection for a general overview of how each encoding form is structured and the general benefits or drawbacks of each encoding form for particular purposes. For the detailed formal definition of the encoding forms and conformance requirements, see Section 3.9, Unicode Encoding Forms.

UTF-32 UTF-32 is the simplest Unicode encoding form. Each Unicode code point is represented directly by a single 32-bit code unit. Because of this, UTF-32 has a one-to-one relationship between encoded character and code unit; it is a fixed-width character encoding form. This makes UTF-32 an ideal form for APIs that pass single character values. As for all of the Unicode encoding forms, UTF-32 is restricted to representation of code points in the range 0..10FFFF16—that is, the Unicode codespace. This guarantees interoperability with the UTF-16 and UTF-8 encoding forms. Fixed Width. The value of each UTF-32 code unit corresponds exactly to the Unicode code point value. This situation differs significantly from that for UTF-16 and especially UTF-8, where the code unit values often change unrecognizably from the code point value. For example, U+10000 is represented as <00010000> in UTF-32 and as in UTF8. For UTF-32, it is trivial to determine a Unicode character from its UTF-32 code unit representation. In contrast, UTF-16 and UTF-8 representations often require doing a code unit conversion before the character can be identified in the Unicode code charts. Preferred Usage. UTF-32 may be a preferred encoding form where memory or disk storage space for characters is not a particular concern, but where fixed-width, single code unit access to characters is desired. UTF-32 is also a preferred encoding form for processing characters on most Unix platforms.

UTF-16 In the UTF-16 encoding form, code points in the range U+0000..U+FFFF are represented as a single 16-bit code unit; code points in the supplementary planes, in the range U+10000..U+10FFFF, are represented as pairs of 16-bit code units. These pairs of special code units are known as surrogate pairs. The values of the code units used for surrogate pairs are completely disjunct from the code units used for the single code unit representations, thus maintaining non-overlap for all code point representations in UTF-16. For the formal definition of surrogates, see Section 3.8, Surrogates.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

32

General Structure

Optimized for BMP. UTF-16 optimizes the representation of characters in the Basic Multilingual Plane (BMP)—that is, the range U+0000..U+FFFF. For that range, which contains the vast majority of common-use characters for all modern scripts of the world, each character requires only one 16-bit code unit, thus requiring just half the memory or storage of the UTF-32 encoding form. For the BMP, UTF-16 can effectively be treated as if it were a fixed-width encoding form. Supplementary Characters and Surrogates. For supplementary characters, UTF-16 requires two 16-bit code units. The distinction between characters represented with one versus two 16-bit code units means that formally UTF-16 is a variable-width encoding form. That fact can create implementation difficulties if it is not carefully taken into account; UTF-16 is somewhat more complicated to handle than UTF-32. Preferred Usage. UTF-16 may be a preferred encoding form in many environments that need to balance efficient access to characters with economical use of storage. It is reasonably compact, and all the common, heavily used characters fit into a single 16-bit code unit. Origin. UTF-16 is the historical descendant of the earliest form of Unicode, which was originally designed to use a fixed-width, 16-bit encoding form exclusively. The surrogates were added to provide an encoding form for the supplementary characters at code points past U+FFFF. The design of the surrogates made them a simple and efficient extension mechanism that works well with older Unicode implementations and that avoids many of the problems of other variable-width character encodings. See Section 5.4, Handling Surrogate Pairs in UTF-16, for more information about surrogates and their processing. Collation. For the purpose of sorting text, binary order for data represented in the UTF-16 encoding form is not the same as code point order. This means that a slightly different comparison implementation is needed for code point order. For more information, see Section 5.17, Binary Order.

UTF-8 To meet the requirements of byte-oriented, ASCII-based systems, a third encoding form is specified by the Unicode Standard: UTF-8. This variable-width encoding form preserves ASCII transparency by making use of 8-bit code units. Byte-Oriented. Much existing software and practice in information technology have long depended on character data being represented as a sequence of bytes. Furthermore, many of the protocols depend not only on ASCII values being invariant, but must make use of or avoid special byte values that may have associated control functions. The easiest way to adapt Unicode implementations to such a situation is to make use of an encoding form that is already defined in terms of 8-bit code units and that represents all Unicode characters while not disturbing or reusing any ASCII or C0 control code value. That is the function of UTF-8. Variable Width. UTF-8 is a variable-width encoding form, using 8-bit code units, in which the high bits of each code unit indicate the part of the code unit sequence to which each byte belongs. A range of 8-bit code unit values is reserved for the first, or leading, element

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

2.5

Encoding Forms

33

of a UTF-8 code unit sequences, and a completely disjunct range of 8-bit code unit values is reserved for the subsequent, or trailing, elements of such sequences; this convention preserves non-overlap for UTF-8. Table 3-6 on page 103 shows how the bits in a Unicode code point are distributed among the bytes in the UTF-8 encoding form. See Section 3.9, Unicode Encoding Forms, for the full, formal definition of UTF-8. ASCII Transparency. The UTF-8 encoding form maintains transparency for all of the ASCII code points (0x00..0x7F). That means Unicode code points U+0000..U+007F are converted to single bytes 0x00..0x7F in UTF-8 and are thus indistinguishable from ASCII itself. Furthermore, the values 0x00..0x7F do not appear in any byte for the representation of any other Unicode code point, so that there can be no ambiguity. Beyond the ASCII range of Unicode, many of the non-ideographic scripts are represented by two bytes per code point in UTF-8; all non-surrogate code points between U+0800 and U+FFFF are represented by three bytes; and supplementary code points above U+FFFF require four bytes. Preferred Usage. UTF-8 is typically the preferred encoding form for HTML and similar protocols, particularly for the Internet. The ASCII transparency helps migration. UTF-8 also has the advantage that it is already inherently byte-serialized, as for most existing 8-bit character sets; strings of UTF-8 work easily with C or other programming languages, and many existing APIs that work for typical Asian multibyte character sets adapt to UTF-8 as well with little or no change required. Self-synchronizing. In environments where 8-bit character processing is required for one reason or another, UTF-8 has the following attractive features as compared to other multibyte encodings: • The first byte of a UTF-8 code unit sequence indicates the number of bytes to follow in a multibyte sequence. This allows for very efficient forward parsing. • It is efficient to find the start of a character when beginning from an arbitrary location in a byte stream of UTF-8. Programs need to search at most four bytes backward, and usually much less. It is a simple task to recognize an initial byte, because initial bytes are constrained to a fixed range of values. • As with the other encoding forms, there is no overlap of byte values.

Comparison of the Advantages of UTF-32, UTF-16, and UTF-8 On the face of it, UTF-32 would seem to be the obvious choice of Unicode encoding forms for an internal processing code because it is a fixed-width encoding form. It can be conformantly bound to the C and C++ wchar_t, which means that such programming languages may offer built-in support and ready-made string APIs that programmers can take advantage of. However, UTF-16 has many countervailing advantages that may lead implementers to choose it instead as an internal processing code. While all three encoding forms need at most 4 bytes (or 32 bits) of data for each character, in practice UTF-32 in almost all cases for real data sets occupies twice the storage that UTF-

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Chapter 3

Conformance

3

This chapter defines conformance to the Unicode Standard in terms of the principles and encoding architecture it embodies. The first section defines the format for referencing the Unicode Standard and Unicode properties. The second section consists of the conformance clauses, followed by sections that define more precisely the technical terms used in those clauses. The remaining sections contain the formal algorithms that are part of conformance and referenced by the conformance clause. Additional definitions and algorithms that are part of this standard can be found in the Unicode Standard Annexes listed at the end of Section 3.2, Conformance Requirements. In this chapter, conformance clauses are identified with the letter C. Definitions are identified with the letter D. Bulleted items are explanatory comments regarding definitions or subclauses. The numbering of clauses and definitions has been changed from that of prior versions of The Unicode Standard. This change was necessitated by the addition of a substantial number of new definitions that did not fit well into the prior numbering scheme. A cross-reference table enabling the matching of a clause or definition between Version 5.0 and earlier versions of the standard is available in Section D.3, Clause and Definition Numbering Changes. For information on implementing best practices, see Chapter 5, Implementation Guidelines.

3.1 Versions of the Unicode Standard For most character encodings, the character repertoire is fixed (and often small). Once the repertoire is decided upon, it is never changed. Addition of a new abstract character to a given repertoire creates a new repertoire, which will be treated either as an update of the existing character encoding or as a completely new character encoding. For the Unicode Standard, by contrast, the repertoire is inherently open. Because Unicode is a universal encoding, any abstract character that could ever be encoded is a potential candidate to be encoded, regardless of whether the character is currently known. Each new version of the Unicode Standard supersedes the previous one, but implementations—and, more significantly, data—are not updated instantly. In general, major and minor version changes include new characters, which do not create particular problems

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

66

Conformance

with old data. The Unicode Technical Committee will neither remove nor move characters. Characters may be deprecated, but this does not remove them from the standard or from existing data. The code point for a deprecated character will never be reassigned to a different character, but the use of a deprecated character is strongly discouraged. Generally these rules make the encoded characters of a new version backward-compatible with previous versions. Implementations should be prepared to be forward-compatible with respect to Unicode versions. That is, they should accept text that may be expressed in future versions of this standard, recognizing that new characters may be assigned in those versions. Thus they should handle incoming unassigned code points as they do unsupported characters. (See Section 5.3, Unknown and Missing Characters.) A version change may also involve changes to the properties of existing characters. When this situation occurs, modifications are made to the Unicode Character Database and a new update version is issued for the standard. Changes to the data files may alter program behavior that depends on them. However, such changes to properties and to data files are never made lightly. They are made only after careful deliberation by the Unicode Technical Committee has determined that there is an error, inconsistency, or other serious problem in the property assignments.

Stability Each version of the Unicode Standard, once published, is absolutely stable and will never change. Implementations or specifications that refer to a specific version of the Unicode Standard can rely upon this stability. When implementations or specifications are upgraded to a future version of the Unicode Standard, then changes to them may be necessary. Note that even errata and corrigenda do not formally change the text of a published version; see “Errata and Corrigenda” later in this section. Some features of the Unicode Standard are guaranteed to be stable across versions. These include the names and code positions of characters, their decompositions, and several other character properties for which stability is important to implementations. See also “Stability of Properties” in Section 3.5, Properties. The formal statement of such stability guarantees is contained in the policies on character encoding stability found on the Unicode Web site. See the subsection “Policies” in Section B.6, Other Unicode Online Resources. Appendix F, Unicode Encoding Stability Policies, presents a copy of these policies in effect at the time of this publication. See also the discussion of backward compatibility in Unicode Standard Annex #31, “Identifier and Pattern Syntax,” and the subsection “Interacting with Downlevel Systems” in Section 5.3, Unknown and Missing Characters.

Version Numbering Version numbers for the Unicode Standard consist of three fields, denoting the major version, the minor version, and the update version, respectively. For example, “Unicode 3.1.1” indicates major version 3 of the Unicode Standard, minor version 1 of Unicode 3, and update version 1 of minor version Unicode 3.1.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

3.1

Versions of the Unicode Standard

67

Formally, each new version of the Unicode Standard supersedes all earlier versions. However, because of the differences in the ways major, minor, and update versions are documented, minor and update versions generally do not obsolete all of the documentation of the immediately prior versions of the standard. Additional information on the current and past versions of the Unicode Standard can be found on the Unicode Web site. See the subsection “Versions” in Section B.6, Other Unicode Online Resources. The online document contains the precise list of contributing files from the Unicode Character Database and the Unicode Standard Annexes, which are formally part of each version of the Unicode Standard. The differences between major, minor, and update versions are as follows: Major Version. A major version represents significant additions to the standard, including but not limited to major additions to the repertoire of encoded characters. A major version is published as a book, together with associated updates to Unicode Standard Annexes and the Unicode Character Database. A major version consolidates all errata and corrigenda to data. The publication of the book for a major version supersedes any prior documentation for major, minor, and update versions. Minor Version. A minor version also represents significant additions to the standard. It may include small or large additions to the repertoire of encoded characters or other significant normative changes. A minor version is published only online and is not published as a book. Prior to Unicode 4.1, a minor version was published as a Unicode Standard Annex (or as a Unicode Technical Report for the very earliest minor versions). Starting with Unicode 4.1, minor versions are published as stable version pages online. A minor version is also associated with an update to the Unicode Character Database and updates to the UAXes. A minor version incorporates selected errata as appropriate. The documentation for a minor version does not stand alone, but rather amends the documentation of the prior version. Update Version. An update version represents relatively small changes to the standard, focusing on updates to the data files of the Unicode Character Database. An update version never involves any additions to character repertoire. It is published only online. Starting with Unicode 3.0.1, update versions are published as stable version pages online. Prior to that version, update versions were simply documented with the list of relevant data file changes to the Unicode Character Database. An update version incorporates selected errata, primarily for the data files. The documentation for an update version does not stand alone, but rather amends the prior version.

Errata and Corrigenda From time to time it may be necessary to publish errata or corrigenda to the Unicode Standard. Such errata and corrigenda will be published on the Unicode Web site. See Section B.6, Other Unicode Online Resources, for information on how to report errors in the standard.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

68

Conformance

Errata. Errata correct errors in the text or other informative material, such as the representative glyphs in the code charts. See the subsection “Updates and Errata” in Section B.6, Other Unicode Online Resources. Whenever a new major version of the standard is published, all errata up to that point are incorporated into the text. Corrigenda. Occasionally errors may be important enough that a corrigendum is issued prior to the next version of the Unicode Standard. Such a corrigendum does not change the contents of the previous version. Instead, it provides a mechanism for an implementation, protocol, or other standard to cite the previous version of the Unicode Standard with the corrigendum applied. If a citation does not specifically mention the corrigendum, the corrigendum does not apply. For more information on citing corrigenda, see “Versions” in Section B.6, Other Unicode Online Resources.

References to the Unicode Standard The documents associated with the major, minor, and update versions are called the major reference, minor reference, and update reference, respectively. For example, consider Unicode Version 3.1.1. The major reference for that version is The Unicode Standard, Version 3.0 (ISBN 0-201-61633-5). The minor reference is Unicode Standard Annex #27, “The Unicode Standard, Version 3.1.” The update reference is Unicode Version 3.1.1. The exact list of contributory files, Unicode Standard Annexes, and Unicode Character Database files can be found at Enumerated Version 3.1.1. The reference for this version, Version 5.0.0, of the Unicode Standard, is The Unicode Consortium. The Unicode Standard, Version 5.0.0, defined by: The Unicode Standard, Version 5.0 (Boston, MA: Addison-Wesley, 2007. ISBN 0-321-48091-0) References to an update or minor version include a reference to both the major version and the documents modifying it. For the standard citation format for other versions of the Unicode Standard, see “Versions” in Section B.6, Other Unicode Online Resources.

Precision in Version Citation Because Unicode has an open repertoire with relatively frequent updates, it is important not to over-specify the version number. Wherever the precise behavior of all Unicode characters needs to be cited, the full three-field version number should be used, as in the first example below. However, trailing zeros are often omitted, as in the second example. In such a case, writing 3.1 is in all respects equivalent to writing 3.1.0. 1. The Unicode Standard, Version 3.1.1 2. The Unicode Standard, Version 3.1 3. The Unicode Standard, Version 3.0 or later 4. The Unicode Standard

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

3.1

Versions of the Unicode Standard

69

Where some basic level of content is all that is important, phrasing such as in the third example can be used. Where the important information is simply the overall architecture and semantics of the Unicode Standard, the version can be omitted entirely, as in example 4.

References to Unicode Character Properties Properties and property values have defined names and abbreviations, such as Property:

General_Category (gc)

Property Value: Uppercase_Letter (Lu) To reference a given property and property value, these aliases are used, as in this example: The property value Uppercase_Letter from the General_Category property, as specified in Version 5.0.0 of the Unicode Standard. Then cite that version of the standard, using the standard citation format that is provided for each version of the Unicode Standard. When referencing multi-word properties or property values, it is permissible to omit the underscores in these aliases or to replace them by spaces. When referencing a Unicode character property, it is customary to prepend the word “Unicode” to the name of the property, unless it is clear from context that the Unicode Standard is the source of the specification.

References to Unicode Algorithms A reference to a Unicode algorithm must specify the name of the algorithm or its abbreviation, followed by the version of the Unicode Standard, as in this example: The Unicode Bidirectional Algorithm, as specified in Version 4.1.0 of the Unicode Standard. See Unicode Standard Annex #9, “The Bidirectional Algorithm,” (http://www.unicode.org/reports/tr9/tr9-15.html) Where algorithms allow tailoring, the reference must state whether any such tailorings were applied or are applicable. For algorithms contained in a Unicode Standard Annex, the document itself and its location on the Unicode Web site may be cited as the location of the specification. When referencing a Unicode algorithm it is customary to prepend the word “Unicode” to the name of the algorithm, unless it is clear from the context that the Unicode Standard is the source of the specification. Omitting a version number when referencing a Unicode algorithm may be appropriate when such a reference is meant as a generic reference to the overall algorithm. Such a generic reference may also be employed in the sense of latest available version of the algorithm. However, for specific and detailed conformance claims for Unicode algorithms,

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

70

Conformance

generic references are generally not sufficient, and a full version number must accompany the reference.

3.2 Conformance Requirements This section presents the clauses specifying the formal conformance requirements for processes implementing Version 5.0 of the Unicode Standard. A few of these clauses have been revised from Version 4.0 of the Unicode Standard. The revisions do not change the fundamental substance of the conformance requirements previously set forth, but rather are reformulated to clarify their applicability to Unicode algorithms and tailoring. The definitions that these clauses—particularly conformance clause C4—depend on have been extended to cover additional aspects of properties and algorithms. In addition to the specifications printed in this book, the Unicode Standard, Version 5.0, includes a number of Unicode Standard Annexes (UAXes) and the Unicode Character Database. Both are available only electronically, either on the CD-ROM or on the Unicode Web site. At the end of this section there is a list of those annexes that are considered an integral part of the Unicode Standard, Version 5.0.0, and therefore covered by these conformance requirements. The Unicode Character Database contains an extensive specification of normative and informative character properties completing the formal definition of the Unicode Standard. See Chapter 4, Character Properties, for more information. Not all conformance requirements are relevant to all implementations at all times because implementations may not support the particular characters or operations for which a given conformance requirement may be relevant. See Section 2.14, Conforming to the Unicode Standard, for more information. In this section, conformance clauses are identified with the letter C. The numbering of clauses has been changed from that of prior versions of The Unicode Standard. A cross-reference table enabling the matching of a clause between Version 5.0 and earlier versions of the standard is available in Section D.3, Clause and Definition Numbering Changes.

Code Points Unassigned to Abstract Characters C1 A process shall not interpret a high-surrogate code point or a low-surrogate code point as an abstract character. • The high-surrogate and low-surrogate code points are designated for surrogate code units in the UTF-16 character encoding form. They are unassigned to any abstract character. C2 A process shall not interpret a noncharacter code point as an abstract character.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

3.2

Conformance Requirements

71

• The noncharacter code points may be used internally, such as for sentinel values or delimiters, but should not be exchanged publicly. C3 A process shall not interpret an unassigned code point as an abstract character. • This clause does not preclude the assignment of certain generic semantics to unassigned code points (for example, rendering with a glyph to indicate the position within a character block) that allow for graceful behavior in the presence of code points that are outside a supported subset. • Unassigned code points may have default property values. (See D26.) • Code points whose use has not yet been designated may be assigned to abstract characters in future versions of the standard. Because of this fact, due care in the handling of generic semantics for such code points is likely to provide better robustness for implementations that may encounter data based on future versions of the standard.

Interpretation C4 A process shall interpret a coded character sequence according to the character semantics established by this standard, if that process does interpret that coded character sequence. • This restriction does not preclude internal transformations that are never visible external to the process. C5 A process shall not assume that it is required to interpret any particular coded character sequence. • Processes that interpret only a subset of Unicode characters are allowed; there is no blanket requirement to interpret all Unicode characters. • Any means for specifying a subset of characters that a process can interpret is outside the scope of this standard. • The semantics of a private-use code point is outside the scope of this standard. • Although these clauses are not intended to preclude enumerations or specifications of the characters that a process or system is able to interpret, they do separate supported subset enumerations from the question of conformance. In actuality, any system may occasionally receive an unfamiliar character code that it is unable to interpret. C6 A process shall not assume that the interpretations of two canonical-equivalent character sequences are distinct. • The implications of this conformance clause are twofold. First, a process is never required to give different interpretations to two different, but canonicalequivalent character sequences. Second, no process can assume that another

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

72

Conformance

process will make a distinction between two different, but canonical-equivalent character sequences. • Ideally, an implementation would always interpret two canonical-equivalent character sequences identically. There are practical circumstances under which implementations may reasonably distinguish them. • Even processes that normally do not distinguish between canonical-equivalent character sequences can have reasonable exception behavior. Some examples of this behavior include graceful fallback processing by processes unable to support correct positioning of nonspacing marks; “Show Hidden Text” modes that reveal memory representation structure; and the choice of ignoring collating behavior of combining sequences that are not part of the repertoire of a specified language (see Section 5.12, Strategies for Handling Nonspacing Marks).

Modification C7 When a process purports not to modify the interpretation of a valid coded character sequence, it shall make no change to that coded character sequence other than the possible replacement of character sequences by their canonical-equivalent sequences or the deletion of noncharacter code points. • Replacement of a character sequence by a compatibility-equivalent sequence does modify the interpretation of the text. • Replacement or deletion of a character sequence that the process cannot or does not interpret does modify the interpretation of the text. • Changing the bit or byte ordering of a character sequence when transforming it between different machine architectures does not modify the interpretation of the text. • Changing a valid coded character sequence from one Unicode character encoding form to another does not modify the interpretation of the text. • Changing the byte serialization of a code unit sequence from one Unicode character encoding scheme to another does not modify the interpretation of the text. • If a noncharacter that does not have a specific internal use is unexpectedly encountered in processing, an implementation may signal an error or delete or ignore the noncharacter. If these options are not taken, the noncharacter should be treated as an unassigned code point. For example, an API that returned a character property value for a noncharacter would return the same value as the default value for an unassigned code point. • All processes and higher-level protocols are required to abide by conformance clause C7 at a minimum. However, higher-level protocols may define additional equivalences that do not constitute modifications under that protocol.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

3.2

Conformance Requirements

73

For example, a higher-level protocol may allow a sequence of spaces to be replaced by a single space.

Character Encoding Forms C8 When a process interprets a code unit sequence which purports to be in a Unicode character encoding form, it shall interpret that code unit sequence according to the corresponding code point sequence. • The specification of the code unit sequences for UTF-8 is given in D92. • The specification of the code unit sequences for UTF-16 is given in D91. • The specification of the code unit sequences for UTF-32 is given in D90. C9 When a process generates a code unit sequence which purports to be in a Unicode character encoding form, it shall not emit ill-formed code unit sequences. • The definition of each Unicode character encoding form specifies the illformed code unit sequences in the character encoding form. For example, the definition of UTF-8 (D92) specifies that code unit sequences such as are ill-formed. C10 When a process interprets a code unit sequence which purports to be in a Unicode character encoding form, it shall treat ill-formed code unit sequences as an error condition and shall not interpret such sequences as characters. • For example, in UTF-8 every code unit of the form 110xxxx2 must be followed by a code unit of the form 10xxxxxx2. A sequence such as 110xxxxx2 0xxxxxxx2 is ill-formed and must never be generated. When faced with this ill-formed code unit sequence while transforming or interpreting text, a conformant process must treat the first code unit 110xxxxx2 as an illegally terminated code unit sequence—for example, by signaling an error, filtering the code unit out, or representing the code unit with a marker such as U+FFFD replacement character. • Conformant processes cannot interpret ill-formed code unit sequences. However, the conformance clauses do not prevent processes from operating on code unit sequences that do not purport to be in a Unicode character encoding form. For example, for performance reasons a low-level string operation may simply operate directly on code units, without interpreting them as characters. See, especially, the discussion under definition D89. • Utility programs are not prevented from operating on “mangled” text. For example, a UTF-8 file could have had CRLF sequences introduced at every 80 bytes by a bad mailer program. This could result in some UTF-8 byte sequences being interrupted by CRLFs, producing illegal byte sequences. This mangled text is no longer UTF-8. It is permissible for a conformant program to repair such text, recognizing that the mangled text was originally well-formed UTF-8

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

74

Conformance

byte sequences. However, such repair of mangled data is a special case, and it must not be used in circumstances where it would cause security problems.

Character Encoding Schemes C11 When a process interprets a byte sequence which purports to be in a Unicode character encoding scheme, it shall interpret that byte sequence according to the byte order and specifications for the use of the byte order mark established by this standard for that character encoding scheme. • Machine architectures differ in ordering in terms of whether the most significant byte or the least significant byte comes first. These sequences are known as “big-endian” and “little-endian” orders, respectively. • For example, when using UTF-16LE, pairs of bytes are interpreted as UTF-16 code units using the little-endian byte order convention, and any initial sequence is interpreted as U+FEFF zero width no-break space (part of the text), rather than as a byte order mark (not part of the text). (See D97.)

Bidirectional Text C12 A process that displays text containing supported right-to-left characters or embedding codes shall display all visible representations of characters (excluding format characters) in the same order as if the Bidirectional Algorithm had been applied to the text, unless tailored by a higher-level protocol as permitted by the specification. • The Bidirectional Algorithm is specified in Unicode Standard Annex #9, “The Bidirectional Algorithm.”

Normalization Forms C13 A process that produces Unicode text that purports to be in a Normalization Form shall do so in accordance with the specifications in Unicode Standard Annex #15, “Unicode Normalization Forms.” C14 A process that tests Unicode text to determine whether it is in a Normalization Form shall do so in accordance with the specifications in Unicode Standard Annex #15, “Unicode Normalization Forms.” C15 A process that purports to transform text into a Normalization Form must be able to produce the results of the conformance test specified in Unicode Standard Annex #15, “Unicode Normalization Forms.” • This means that when a process uses the input specified in the conformance test, its output must match the expected output of the test.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

3.2

Conformance Requirements

75

Normative References C16 Normative references to the Unicode Standard itself, to property aliases, to property value aliases, or to Unicode algorithms shall follow the formats specified in Section 3.1, Versions of the Unicode Standard. C17 Higher-level protocols shall not make normative references to provisional properties. • Higher-level protocols may make normative references to informative properties.

Unicode Algorithms C18 If a process purports to implement a Unicode algorithm, it shall conform to the specification of that algorithm in the standard, including any tailoring by a higher-level protocol as permitted by the specification. • The term Unicode algorithm is defined at D17. • An implementation claiming conformance to a Unicode algorithm need only guarantee that it produces the same results as those specified in the logical description of the process; it is not required to follow the actual described procedure in detail. This allows room for alternative strategies and optimizations in implementation. C19 The specification of an algorithm may prohibit or limit tailoring by a higher-level protocol. If a process that purports to implement a Unicode algorithm applies a tailoring, that fact must be disclosed. • For example, the algorithms for normalization and canonical ordering are not tailorable. The Bidirectional Algorithm allows some tailoring by higher-level protocols. The Unicode Default Case algorithms may be tailored without limitation.

Default Casing Algorithms C20 An implementation that purports to support Default Case Conversion, Default Case Detection, or Default Caseless Matching shall do so in accordance with the definitions and specifications in Section 3.13, Default Case Algorithms. • A conformant implementation may perform casing operations that are different from the default algorithms, perhaps tailored to a particular orthography, so long as the fact that a tailoring is applied is disclosed.

Unicode Standard Annexes The following standard annexes are approved and considered part of Version 5.0 of the Unicode Standard. These annexes may contain either normative or informative material, or

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

76

Conformance

both. Any reference to Version 5.0 of the standard automatically includes these standard annexes. • UAX #9: The Bidirectional Algorithm, Version 5.0.0 • UAX #11: East Asian Width, Version 5.0.0 • UAX #14: Line Breaking Properties, Version 5.0.0 • UAX #15: Unicode Normalization Forms, Version 5.0.0 • UAX #24: Script Names, Version 5.0.0 • UAX #29: Text Boundaries, Version 5.0.0 • UAX #31: Identifier and Pattern Syntax, Version 5.0.0 • UAX #34: Unicode Named Character Sequences, Version 5.0.0 Conformance to the Unicode Standard requires conformance to the specifications contained in these annexes, as detailed in the conformance clauses listed earlier in this section.

3.3 Semantics Definitions This and the following sections more precisely define the terms that are used in the conformance clauses. The numbering of definitions has been changed from that of prior versions of The Unicode Standard. A cross-reference table enabling the matching of a definition between Version 5.0 and earlier versions of the standard is available in Section D.3, Clause and Definition Numbering Changes.

Character Identity and Semantics D1 Normative behavior: The normative behaviors of the Unicode Standard consist of the following list or any other behaviors specified in the conformance clauses: • Character combination • Canonical decomposition • Compatibility decomposition • Canonical ordering behavior • Bidirectional behavior, as specified in the Unicode Bidirectional Algorithm (see Unicode Standard Annex #9, “The Bidirectional Algorithm”)

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

3.3

Semantics

77

• Conjoining jamo behavior, as specified in Section 3.12, Conjoining Jamo Behavior • Variation selection, as specified in Section 16.4, Variation Selectors • Normalization, as specified in Unicode Standard Annex #15, “Unicode Normalization Forms” • Default casing, as specified in Section 3.13, Default Case Algorithms D2 Character identity: The identity of a character is established by its character name and representative glyph in Chapter 17, Code Charts. • A character may have a broader range of use than the most literal interpretation of its name might indicate; the coded representation, name, and representative glyph need to be assessed in context when establishing the identity of a character. For example, U+002E full stop can represent a sentence period, an abbreviation period, a decimal number separator in English, a thousands number separator in German, and so on. The character name itself is unique, but may be misleading. See “Character Names” in Section 17.1, Character Names List. • Consistency with the representative glyph does not require that the images be identical or even graphically similar; rather, it means that both images are generally recognized to be representations of the same character. Representing the character U+0061 latin small letter a by the glyph “X” would violate its character identity. D3 Character semantics: The semantics of a character are determined by its identity, normative properties, and behavior. • Some normative behavior is default behavior; this behavior can be overridden by higher-level protocols. However, in the absence of such protocols, the behavior must be observed so as to follow the character semantics. • The character combination properties and the canonical ordering behavior cannot be overridden by higher-level protocols. The purpose of this constraint is to guarantee that the order of combining marks in text and the results of normalization are predictable. D4 Character name: A unique string used to identify each abstract character encoded in the standard. • The character names in the Unicode Standard match those of the English edition of ISO/IEC 10646. • Character names are immutable and cannot be overridden; they are stable identifiers. For more information, see Section 4.8, Name—Normative. • The name of a Unicode character is also formally a character property in the Unicode Character Database. Its long property alias is “Name” and its short

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

78

Conformance

property alias is “na”. Its value is the unique string label associated with the encoded character. D5 Character name alias: An additional unique string identifier, other than the character name, associated with an encoded character in the standard. • Character name aliases are assigned when there is a serious clerical defect with a character name, such that the character name itself may be misleading regarding the identity of the character. A character name alias constitutes an alternate identifier for the character. • Character name aliases are unique within the common namespace shared by character names, character name aliases, and named character sequences. • Character name aliases are a formal, normative part of the standard and should be distinguished from the informative, editorial aliases provided in the code charts. See Section 17.1, Character Names List, for the notational conventions used to distinguish the two. D6 Namespace: A set of names together with name matching rules, so that all names are distinct under the matching rules. • Within a given namespace all names must be unique, although the same name may be used with a different meaning in a different namespace. • Character names, character name aliases, and named character sequences share a single namespace in the Unicode Standard.

3.4 Characters and Encoding D7 Abstract character: A unit of information used for the organization, control, or representation of textual data. • When representing data, the nature of that data is generally symbolic as opposed to some other kind of data (for example, aural or visual). Examples of such symbolic data include letters, ideographs, digits, punctuation, technical symbols, and dingbats. • An abstract character has no concrete form and should not be confused with a glyph. • An abstract character does not necessarily correspond to what a user thinks of as a “character” and should not be confused with a grapheme. • The abstract characters encoded by the Unicode Standard are known as Unicode abstract characters. • Abstract characters not directly encoded by the Unicode Standard can often be represented by the use of combining character sequences.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

3.4

Characters and Encoding

79

D8 Abstract character sequence: An ordered sequence of one or more abstract characters. D9 Unicode codespace: A range of integers from 0 to 10FFFF16. • This particular range is defined for the codespace in the Unicode Standard. Other character encoding standards may use other codespaces. D10 Code point: Any value in the Unicode codespace. • A code point is also known as a code position. • See D77 for the definition of code unit. D11 Encoded character: An association (or mapping) between an abstract character and a code point. • An encoded character is also referred to as a coded character. • While an encoded character is formally defined in terms of the mapping between an abstract character and a code point, informally it can be thought of as an abstract character taken together with its assigned code point. • Occasionally, for compatibility with other standards, a single abstract character may correspond to more than one code point—for example, “Å” corresponds both to U+00C5 Å latin capital letter a with ring above and to U+212B Å angstrom sign. • A single abstract character may also be represented by a sequence of code points—for example, latin capital letter g with acute may be represented by the sequence , rather than being mapped to a single code point. D12 Coded character sequence: An ordered sequence of one or more code points. • A coded character sequence is also known as a coded character representation. • Normally a coded character sequence consists of a sequence of encoded characters, but it may also include noncharacters or reserved code points. • Internally, a process may choose to make use of noncharacter code points in its coded character sequences. However, such noncharacter code points may not be interpreted as abstract characters (see conformance clause C2), and their removal by a conformant process does not constitute modification of interpretation of the coded character sequence (see conformance clause C7). • Reserved code points are included in coded character sequences, so that the conformance requirements regarding interpretation and modification are properly defined when a Unicode-conformant implementation encounters coded character sequences produced under a future version of the standard. Unless specified otherwise for clarity, in the text of the Unicode Standard the term character alone designates an encoded character. Similarly, the term character sequence alone designates a coded character sequence.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

80

Conformance

D13 Deprecated character: A coded character whose use is strongly discouraged. Such characters are retained in the standard, but should not be used. • Deprecated characters are retained in the standard so that previously conforming data stay conformant in future versions of the standard. Deprecated characters should not be confused with obsolete characters, which are historical. Obsolete characters do not occur in modern text, but they are not deprecated; their use is not discouraged. D14 Noncharacter: A code point that is permanently reserved for internal use and that should never be interchanged. Noncharacters consist of the values U+nFFFE and U+nFFFF (where n is from 0 to 1016) and the values U+FDD0..U+FDEF. • For more information, see Section 16.7, Noncharacters. • These code points are permanently reserved as noncharacters. D15 Reserved code point: Any code point of the Unicode Standard that is reserved for future assignment. Also known as an unassigned code point. • Surrogate code points and noncharacters are considered assigned code points, but not assigned characters. • For a summary classification of reserved and other types of code points, see Table 2-3. In general, a conforming process may indicate the presence of a code point whose use has not been designated (for example, by showing a missing glyph in rendering or by signaling an appropriate error in a streaming protocol), even though it is forbidden by the standard from interpreting that code point as an abstract character. D16 Higher-level protocol: Any agreement on the interpretation of Unicode characters that extends beyond the scope of this standard. • Such an agreement need not be formally announced in data; it may be implicit in the context. • The specification of some Unicode algorithms may limit the scope of what a conformant higher-level protocol may do. D17 Unicode algorithm: The logical description of a process used to achieve a specified result involving Unicode characters. • This definition, as used in the Unicode Standard and other publications of the Unicode Consortium, is intentionally broad so as to allow precise logical description of required results, without constraining implementations to follow the precise steps of that logical description. D18 Named Unicode algorithm: A Unicode algorithm that is specified in the Unicode Standard or in other standards published by the Unicode Consortium and that is given an explicit name for ease of reference.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

3.5

Properties

81

• Named Unicode algorithms are cited in titlecase in the Unicode Standard. • When referenced outside the context of the Unicode Standard, it is customary to prepend the word “Unicode” to the name of the algorithm. Table 3-1 lists the named Unicode algorithms and indicates the locations of their specifications. Details regarding conformance to these algorithms and any restrictions they place on the scope of allowable tailoring by higher-level protocols can be found in the specifications. In some cases, a named Unicode algorithm is provided for information only.

Table 3-1. Named Unicode Algorithms Name

Description

Canonical Ordering Hangul Syllable Boundary Determination Hangul Syllable Composition Hangul Syllable Decomposition Hangul Syllable Name Generation Default Case Conversion Default Case Detection Default Caseless Matching Bidirectional Algorithm Line Breaking Algorithm Normalization Algorithm Grapheme Cluster Boundary Determination Word Boundary Determination Sentence Boundary Determination Default Identifier Determination Alternative Identifier Determination Pattern Syntax Determination Identifier Normalization Identifier Case Folding Standard Compression Scheme for Unicode (SCSU) Collation Algorithm (UCA)

Section 3.11 Section 3.12 Section 3.12 Section 3.12 Section 3.12 Section 3.13 Section 3.13 Section 3.13 and Section 5.18 UAX #9 UAX #14 UAX #15 UAX #29 UAX #29 UAX #29 UAX #31 UAX #31 UAX #31 UAX #31 UAX #31 UTS #6 UTS #10

3.5 Properties The Unicode Standard specifies many different types of character properties. This section provides the basic definitions related to character properties. The actual values of Unicode character properties are specified in the Unicode Character Database. See Section 4.1, Unicode Character Database, for an overview of those data files. Chapter 4, Character Properties, contains more detailed descriptions of some particular, important character properties. Additional properties that are specific to particular charac-

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

82

Conformance

ters (such as the definition and use of the right-to-left override character or zero width space) are discussed in the relevant sections of this standard. The interpretation of some properties (such as the case of a character) is independent of context, whereas the interpretation of other properties (such as directionality) is applicable to a character sequence as a whole, rather than to the individual characters that compose the sequence.

Types of Properties D19 Property: A named attribute of an entity in the Unicode Standard, associated with a defined set of values. D20 Code point property: A property of code points. • Code point properties refer to attributes of code points per se, based on architectural considerations of this standard, irrespective of any particular encoded character. • Thus the Surrogate property and the Noncharacter property are code point properties. D21 Abstract character property: A property of abstract characters. • Abstract character properties refer to attributes of abstract characters per se, based on their independent existence as elements of writing systems or other notational systems, irrespective of their encoding in the Unicode Standard. • Thus the Alphabetic property, the Punctuation property, the Hex_Digit property, the Numeric_Value property, and so on are properties of abstract characters and are associated with those characters whether encoded in the Unicode Standard or in any other character encoding—or even prior to their being encoded in any character encoding standard. D22 Encoded character property: A property of encoded characters in the Unicode Standard. • For each encoded character property there is a mapping from every code point to some value in the set of values associated with that property. Encoded character properties are defined this way to facilitate the implementation of character property APIs based on the Unicode Character Database. Typically, an API will take a property and a code point as input, and will return a value for that property as output, interpreting it as the “character property” for the “character” encoded at that code point. However, to be useful, such APIs must return meaningful values for unassigned code points, as well as for encoded characters. In some instances an encoded character property in the Unicode Standard is exactly equivalent to a code point property. For example, the Pattern_Syntax property simply defines a

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

3.5

Properties

83

range of code points that are reserved for pattern syntax. (See Unicode Standard Annex #31, “Identifier and Pattern Syntax.”) In other instances, an encoded character property directly reflects an abstract character property, but extends the domain of the property to include all code points, including unassigned code points. For Boolean properties, such as the Hex_Digit property, typically an encoded character property will be true for the encoded characters with that abstract character property and will be false for all other code points, including unassigned code points, noncharacters, private-use characters, and encoded characters for which the abstract character property is inapplicable or irrelevant. However, in many instances, an encoded character property is semantically complex and may telescope together values associated with a number of abstract character properties and/or code point properties. The General_Category property is an example—it contains values associated with several abstract character properties (such as Letter, Punctuation, and Symbol) as well as code point properties (such as \p{gc=Cs} for the Surrogate code point property). In the text of this standard the terms “Unicode character property,” “character property,” and “property” without qualifier generally refer to an encoded character property, unless otherwise indicated. A list of the encoded character properties formally considered to be a part of the Unicode Standard can be found in PropertyAliases.txt in the Unicode Character Database. See also “Property Aliases” later in this section.

Property Values D23 Property value: One of the set of values associated with an encoded character property. • For example, the East_Asian_Width [EAW] property has the possible values “Narrow”, “Neutral”, “Wide”, “Ambiguous”, and “Unassigned”. A list of the values associated with encoded character properties in the Unicode Standard can be found in PropertyValueAliases.txt in the Unicode Character Database. See also “Property Aliases” later in this section. D24 Explicit property value: A value for an encoded character property that is explicitly associated with a code point in one of the data files of the Unicode Character Database. D25 Implicit property value: A value for an encoded character property that is given by a generic rule or by an “otherwise” clause in one of the data files of the Unicode Character Database. • Implicit property values are used to avoid having to explicitly list values for more than 1 million code points (most of them unassigned) for every property.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

84

Conformance

D26 Default property value: The value (or in some cases small set of values) of a property associated with unassigned code points or with encoded characters for which the property is irrelevant. • For example, for most Boolean properties, “false” is the default property value. In such cases, the default property value used for unassigned code points may be the same value that is used for many assigned characters as well. • Some properties, particularly enumerated properties, specify a particular, unique value as their default value. For example, “XX” is the default property value for the Line_Break property. • A default property value is typically defined implicitly, to avoid having to repeat long lists of unassigned code points. • In the case of some properties with arbitrary string values, the default property value is an implied null value. For example, the fact that there is no Unicode character name for unassigned code points is equivalent to saying that the default property value for the Name property for an unassigned code point is a null string. • In some instances, an encoded character property may have multiple default values. For example, the Bidi_Class property defines a range of unassigned code points as having the “R” value, another range of unassigned code points as having the “AL” value, and the otherwise case as having the “L” value.

Classification of Properties by Their Values D27 Enumerated property: A property with a small set of named values. • As characters are added to the Unicode Standard, the set of values may need to be extended in the future, but enumerated properties have a relatively fixed set of possible values. D28 Closed enumeration: An enumerated property for which the set of values is closed and will not be extended for future versions of the Unicode Standard. • Currently, the General Category is the only closed enumeration, except for the Boolean properties. D29 Boolean property: A closed enumerated property whose set of values is limited to “true” and “false”. • The presence or absence of the property is the essential information. D30 Numeric property: A numeric property is a property whose value is a number that can take on any integer or real value. • An example is the Numeric_Value property. There is no implied limit to the number of possible distinct values for the property, except the limitations on representing integers or real numbers in computers.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

3.5

Properties

85

D31 String-valued property: A property whose value is a string. • The Canonical_Decomposition property is a string-valued property. D32 Catalog property: A property that is an enumerated property, typically unrelated to an algorithm, that may be extended in each successive version of the Unicode Standard. • Examples are the Age and Block properties. Additional values for both may be added each time a new version of the Unicode Standard adds new characters or blocks.

Normative and Informative Properties Unicode character properties are divided into those that are normative and those that are informative. D33 Normative property: A Unicode character property used in the specification of the standard. Specification that a character property is normative means that implementations which claim conformance to a particular version of the Unicode Standard and which make use of that particular property must follow the specifications of the standard for that property for the implementation to be conformant. For example, the directionality property (bidirectional character type) is required for conformance whenever rendering text that requires bidirectional layout, such as Arabic or Hebrew. Whenever a normative process depends on a property in a specified way, that property is designated as normative. The fact that a given Unicode character property is normative does not mean that the values of the property will never change for particular characters. Corrections and extensions to the standard in the future may require minor changes to normative values, even though the Unicode Technical Committee strives to minimize such changes. See also “Stability of Properties” later in this section. Some of the normative Unicode algorithms depend critically on particular property values for their behavior. Normalization, for example, defines an aspect of textual interoperability that many applications rely on to be absolutely stable. As a result, some of the normative properties disallow any kind of overriding by higher-level protocols. Thus the decomposition of Unicode characters is both normative and not overridable; no higher-level protocol may override these values, because to do so would result in non-interoperable results for the normalization of Unicode text. Other normative properties, such as case mapping, are overridable by higher-level protocols, because their intent is to provide a common basis for behavior. Nevertheless, they may require tailoring for particular local cultural conventions or particular implementations. Some important normative character properties of the Unicode Standard are listed in Table 3-2, with an indication of which sections in the standard provide a general descrip-

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

86

Conformance

tion of the properties and their use. Other normative properties are documented in the Unicode Character Database. In all cases, the Unicode Character Database provides the definitive list of character properties and the exact list of property value assignments for each version of the standard. A list of additional special character properties can be found in Section 4.12, Characters with Unusual Properties.

Table 3-2. Normative Character Properties Property

Description

Bidi_Class (directionality) Bidi_Mirrored Block Canonical_Combining_Class Case-related properties Composition_Exclusion Decomposition_Mapping Default_Ignorable_Code_Point Deprecated General_Category Hangul_Syllable_Type Jamo_Short_Name Joining_Type and Joining_Group Name Noncharacter_Code_Point Numeric_Value White_Space

UAX #9 and Section 4.4 Section 4.7 and UAX #9 Chapter 17 Section 3.11, Section 4.3, and UAX #15 Section 3.13, Section 4.2, and Chapter 17 UAX #15 Chapter 3, Chapter 17, and UAX #15 Section 5.20 Section 3.1 Section 4.5 Section 3.12 and UAX #29 Section 3.12 Section 8.2 Chapter 17 Section 16.7 Section 4.6 UCD.html

D34 Overridable property: A normative property whose values may be overridden by conformant higher-level protocols. • For example, the Canonical_Decomposition property is not overridable. The Uppercase property can be overridden. D35 Informative property: A Unicode character property whose values are provided for information only. A conformant implementation of the Unicode Standard is free to use or change informative property values as it may require, while remaining conformant to the standard. An implementer always has the option of establishing a protocol to convey the fact that informative properties are being used in distinct ways. Informative properties capture expert implementation experience. When an informative property is explicitly specified in the Unicode Character Database, its use is strongly recommended for implementations to encourage comparable behavior between implementations. Note that it is possible for an informative property in one version of the Unicode Standard to become a normative property in a subsequent version of the standard if its use starts to acquire conformance implications in some part of the standard.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

3.5

Properties

87

Table 3-3 provides a partial list of the more important informative character properties. For a complete listing, see the Unicode Character Database.

Table 3-3. Informative Character Properties Property

Description

Dash East_Asian_Width Letter-related properties Line_Break Mathematical Script Space Unicode_1_Name

Section 6.2 and Table 6-3 Section 12.4 and UAX #11 Section 4.10 Section 16.1, Section 16.2, and UAX #14 Section 15.4 UAX #24 Section 6.2 and Table 6-2 Section 4.9

D36 Provisional property: A Unicode character property whose values are unapproved and tentative, and which may be incomplete or otherwise not in a usable state. • Provisional properties may be removed from future versions of the standard, without prior notice. Some of the information provided about characters in the Unicode Character Database constitutes provisional data. This data may capture partial or preliminary information. It may contain errors or omissions, or otherwise not be ready for systematic use; however, it is included in the data files for distribution partly to encourage review and improvement of the information. For example, a number of the tags in the Unihan.txt file provide provisional property values of various sorts about Han characters. The data files of the Unicode Character Database may also contain various annotations and comments about characters, and those annotations and comments should be considered provisional. Implementations should not attempt to parse annotations and comments out of the data files and treat them as informative character properties per se.

Context Dependence D37 Context-dependent property: A property that applies to a code point in the context of a longer code point sequence. • For example, the lowercase mapping of a Greek sigma depends on the context of the surrounding characters. D38 Context-independent property: A property that is not context dependent; it applies to a code point in isolation.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

88

Conformance

Stability of Properties D39 Stable transformation: A transformation T on a property P is stable with respect to an algorithm A if the result of the algorithm on the transformed property A(T(P)) is the same as the original result A(P) for all code points. D40 Stable property: A property is stable with respect to a particular algorithm or process as long as possible changes in the assignment of property values are restricted in such a manner that the result of the algorithm on the property continues to be the same as the original result for all previously assigned code points. • For example, while the absolute values of the canonical combining classes are not guaranteed to be the same between versions of the Unicode Standard, their relative values will be maintained. As a result, the Canonical Combining Class, while not immutable, is a stable property with respect to the Normalization Forms as defined in Unicode Standard Annex #15, “Unicode Normalization Forms.” • As new characters are assigned to previously unassigned code points, the replacement of any default values for these code points with actual property values must maintain stability. D41 Fixed property: A property whose values (other than a default value), once associated with a specific code point, are fixed and will not be changed, except to correct obvious or clerical errors. • For a fixed property, any default values can be replaced without restriction by actual property values as new characters are assigned to previously unassigned code points. Examples of fixed properties include Age and Hangul_Syllable_Type. • Designating a property as fixed does not imply stability or immutability (see “Stability” in Section 3.1, Versions of the Unicode Standard). While the age of a character, for example, is established by the version of the Unicode Standard to which it was added, errors in the published listing of the property value could be corrected. For some other properties, explicit stability guarantees prohibit the correction even of such errors. D42 Immutable property: A fixed property that is also subject to a stability guarantee preventing any change in the published listing of property values other than assignment of new values to formerly unassigned code points. • An immutable property is trivially stable with respect to all algorithms. • An example of an immutable property is the Unicode character name itself. Because character names are values of an immutable property, misspellings and incorrect names will never be corrected clerically. Any errata will be noted in a comment in the character names list and, where needed, an informative character name alias will be provided.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

3.5

Properties

89

• When an encoded character property representing a code point property is immutable, none of its values can ever change. This follows from the fact that the code points themselves do not change, and the status of the property is unaffected by whether a particular abstract character is encoded at a code point later. An example of such a property is the Pattern_Syntax property; all values of that property are unchangeable for all code points, forever. • In the more typical case of an immutable property, the values for existing encoded characters cannot change, but when a new character is encoded, the formerly unassigned code point changes from having a default value for the property to having one of its nondefault values. Once that nondefault value is published, it can no longer be changed. D43 Stabilized property: A property that is neither extended to new characters nor maintained in any other manner, but that is retained in the Unicode Character Database. • A stabilized property is also a fixed property. D44 Deprecated property: A property whose use by implementations is discouraged. • One of the reasons a property may be deprecated is because a different combination of properties better expresses the intended semantics. • Where sufficiently widespread legacy support exists for the deprecated property, not all implementations may be able to discontinue the use of the deprecated property. In such a case, a deprecated property may be extended to new characters so as to maintain it in a usable and consistent state. Informative or normative properties in the standard will not be removed even when they are supplanted by other properties or are no longer useful. However, they may be stabilized and/or deprecated. For a list of stability policies related to character properties, see Appendix F, Unicode Encoding Stability Policies.

Simple and Derived Properties D45 Simple property: A Unicode character property whose values are specified directly in the Unicode Character Database (or elsewhere in the standard) and whose values cannot be derived from other simple properties. D46 Derived property: A Unicode character property whose values are algorithmically derived from some combination of simple properties. The Unicode Character Database lists a number of derived properties explicitly. Even though these values can be derived, they are provided as lists because the derivation may not be trivial and because explicit lists are easier to understand, reference, and implement. Good examples of derived properties include the ID_Start and ID_Continue properties, which can be used to specify a formal identifier syntax for Unicode characters. The details

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

90

Conformance

of how derived properties are computed can be found in the documentation for the Unicode Character Database.

Property Aliases To enable normative references to Unicode character properties, formal aliases for properties and for property values are defined as part of the Unicode Character Database. D47 Property alias: A unique identifier for a particular Unicode character property. • The identifiers used for property aliases contain only ASCII alphanumeric characters or the underscore character. • Short and long forms for each property alias are defined. The short forms are typically just two or three characters long to facilitate their use as attributes for tags in markup languages. For example, “General_Category” is the long form and “gc” is the short form of the property alias for the General Category property. • Property aliases are defined in the file PropertyAliases.txt in the Unicode Character Database. • Property aliases of normative properties are themselves normative. D48 Property value alias: A unique identifier for a particular enumerated value for a particular Unicode character property. • The identifiers used for property value aliases contain only ASCII alphanumeric characters or the underscore character, or have the special value “n/a”. • Short and long forms for property value aliases are defined. For example, “Currency_Symbol” is the long form and “Sc” is the short form of the property value alias for the currency symbol value of the General Category property. • Property value aliases are defined in the file PropertyValueAliases.txt in the Unicode Character Database. • Property value aliases are unique identifiers only in the context of the particular property with which they are associated. The same identifier string might be associated with an entirely different value for a different property. The combination of a property alias and a property value alias is, however, guaranteed to be unique. • Property value aliases referring to values of normative properties are themselves normative. The property aliases and property value aliases can be used, for example, in XML formats of property data, for regular-expression property tests, and in other programmatic textual descriptions of Unicode property data. Thus “gc=Lu” is a formal way of specifying that the General Category of a character (using the property alias “gc”) has the value of being an uppercase letter (using the property value alias “Lu”).

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

3.6

Combination

91

Private Use D49 Private-use code point: Code points in the ranges U+E000..U+F8FF, U+F0000.. U+FFFFD, and U+100000..U+10FFFD. • Private-use code points are considered to be assigned characters, but the abstract characters associated with them have no interpretation specified by this standard. They can be given any interpretation by conformant processes. • Private-use code points may be given default property values, but these default values are overridable by higher-level protocols that give those private-use code points a specific interpretation.

3.6 Combination D50 Graphic character: A character with the General Category of Letter (L), Combining Mark (M), Number (N), Punctuation (P), Symbol (S), or Space Separator (Zs). • Graphic characters specifically exclude the line and paragraph separators (Zl, Zp), as well as the characters with the General Category of Other (Cn, Cs, Cc, Cf). • The interpretation of private-use characters (Co) as graphic characters or not is determined by the implementation. • For more information, see Chapter 2, General Structure, especially Section 2.4, Code Points and Characters, and Table 2-3. D51 Base character: Any graphic character except for those with the General Category of Combining Mark (M). • Most Unicode characters are base characters. In terms of General Category values, a base character is any code point that has one of the following categories: Letter (L), Number (N), Punctuation (P), Symbol (S), or Space Separator (Zs). • Base characters do not include control characters or format controls. • Base characters are independent graphic characters, but this does not preclude the presentation of base characters from adopting different contextual forms or participating in ligatures. • The interpretation of private-use characters (Co) as base characters or not is determined by the implementation. However, the default interpretation of private-use characters should be as base characters, in the absence of other information. D52 Combining character: A character with the General Category of Combining Mark (M).

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

92

Conformance

• Combining characters consist of all characters with the General Category values of Spacing Combining Mark (Mc), Nonspacing Mark (Mn), and Enclosing Mark (Me). • All characters with non-zero canonical combining class are combining characters, but the reverse is not the case: there are combining characters with a zero canonical combining class. • The interpretation of private-use characters (Co) as combining characters or not is determined by the implementation. • These characters are not normally used in isolation unless they are being described. They include such characters as accents, diacritics, Hebrew points, Arabic vowel signs, and Indic matras. • The graphic positioning of a combining character depends on the last preceding base character, unless they are separated by a character that is neither a combining character nor either zero width joiner or zero width nonjoiner. The combining character is said to apply to that base character. • There may be no such base character, such as when a combining character is at the start of text or follows a control or format character—for example, a carriage return, tab, or right-left mark. In such cases, the combining characters are called isolated combining characters. • With isolated combining characters or when a process is unable to perform graphical combination, a process may present a combining character without graphical combination; that is, it may present it as if it were a base character. • The representative images of combining characters are depicted with a dotted circle in the code charts. When presented in graphical combination with a preceding base character, that base character is intended to appear in the position occupied by the dotted circle. D53 Nonspacing mark: A combining character with the General Category of Nonspacing Mark (Mn) or Enclosing Mark (Me). • The position of a nonspacing mark in presentation depends on its base character. It generally does not consume space along the visual baseline in and of itself. • Such characters may be large enough to affect the placement of their base character relative to preceding and succeeding base characters. For example, a circumflex applied to an “i” may affect spacing (“î”), as might the character U+20DD combining enclosing circle. D54 Enclosing mark: A nonspacing mark with the General Category of Enclosing Mark (Me). • Enclosing marks are a subclass of nonspacing marks that surround a base character, rather than merely being placed over, under, or through it.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

3.6

Combination

93

D55 Spacing mark: A combining character that is not a nonspacing mark. • Examples include U+093F devanagari vowel sign i. In general, the behavior of spacing marks does not differ greatly from that of base characters. • Spacing marks such as U+0BCA tamil vowel sign o may appear on both sides of a base character, but are not enclosing marks. D56 Combining character sequence: A maximal character sequence consisting of either a base character followed by a sequence of one or more characters where each is a combining character, zero width joiner, or zero width non-joiner; or a sequence of one or more characters where each is a combining character, zero width joiner, or zero width non-joiner. • When identifying a combining character sequence in Unicode text, the definition of the combining character sequence is applied maximally. For example, in the sequence , the entire sequence is identified as the combining character sequence, rather than the alternative of identifying as a combining character sequence followed by a separate (defective) combining character sequence . D57 Defective combining character sequence: A combining character sequence that does not start with a base character. • Defective combining character sequences occur when a sequence of combining characters appears at the start of a string or follows a control or format character. Such sequences are defective from the point of view of handling of combining marks, but are not ill-formed. (See D84.) D58 Grapheme base: A character with the property Grapheme_Base, or any standard Korean syllable block. • Characters with the property Grapheme_Base include all base characters plus most spacing marks. • The concept of a grapheme base is introduced to simplify discussion of the graphical application of nonspacing marks to other elements of text. A grapheme base may consist of a spacing (combining) mark, which distinguishes it from a base character per se. A grapheme base may also itself consist of a sequence of characters, in the case of the standard Korean syllable block. • For the definition of standard Korean syllable block, see D117 in Section 3.12, Conjoining Jamo Behavior. D59 Grapheme extender: A character with the property Grapheme_Extend. • Grapheme extender characters consist of all nonspacing marks, zero width joiner, zero width non-joiner, and a small number of spacing marks.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

94

Conformance

• A grapheme extender can be conceived of primarily as the kind of nonspacing graphical mark that is applied above or below another spacing character. • zero width joiner and zero width non-joiner are formally defined to be grapheme extenders so that their presence does not break up a sequence of other grapheme extenders. • The small number of spacing marks that have the property Grapheme_Extend are all the second parts of a two-part combining mark. D60 Grapheme cluster: A maximal character sequence consisting of a grapheme base followed by zero or more grapheme extenders or, alternatively, the sequence . • The grapheme cluster represents a horizontally segmentable unit of text, consisting of some grapheme base (which may consist of a Korean syllable) together with any number of nonspacing marks applied to it. • A grapheme cluster is similar, but not identical to a combining character sequence. A combining character sequence starts with a base character and extends across any subsequent sequence of combining marks, nonspacing or spacing. A combining character sequence is most directly relevant to processing issues related to normalization, comparison, and searching. • A grapheme cluster starts with a grapheme base and extends across any subsequent sequence of nonspacing marks. A grapheme cluster is most directly relevant to text rendering and such processes as cursor placement and text selection in editing. • In most processing using character properties, a grapheme behaves as if it were a single character cluster with the same properties as the grapheme base. For example, <x, macron> behaves in line breaking or bidirectional layout as if it were the character x. • For many processes, a grapheme cluster behaves as if it were a single character with the same properties as its base character. Effectively, nonspacing marks apply graphically to the base character but do not change the properties of the base character. D61 Extended grapheme cluster: The text between grapheme cluster boundaries as specified by Unicode Standard Annex #29, “Text Boundaries.” • Extended grapheme clusters are either a grapheme cluster, a single character such as a control character, or the sequence . They do not have linguistic significance, but are used to break up a string of text into units for processing.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Chapter 4

Character Properties

4

Disclaimer The content of all character property tables has been verified as far as possible by the Unicode Consortium. However, in case of conflict, the most authoritative version of the information for Version 5.0.0 is that supplied in the Unicode Character Database on the Unicode Web site. The contents of all the tables in this chapter may be superseded or augmented by information in future versions of the Unicode Standard. The Unicode Standard associates a rich set of semantics with characters and, in some instances, with code points. The support of character semantics is required for conformance; see Section 3.2, Conformance Requirements. Where character semantics can be expressed formally, they are provided as machine-readable lists of character properties in the Unicode Character Database (UCD). This chapter gives an overview of character properties, their status and attributes, followed by an overview of the UCD and more detailed notes on some important character properties. For a further discussion of character properties, see Unicode Technical Report #23, “Unicode Character Property Model.” Status and Attributes. Character properties may be normative or informative. Normative properties are those required for conformance. The following sections discuss important properties identified by their status. Many Unicode character properties can be overridden by implementations as needed. Section 3.2, Conformance Requirements, specifies when such overrides must be documented. A few properties, such as Noncharacter_Code_Point, may not be overridden. See Section 3.5, Properties, for the formal discussion of the status and attributes of properties. Consistency of Properties. The Unicode Standard is the product of many compromises. It has to strike a balance between uniformity of treatment for similar characters and compatibility with existing practice for characters inherited from legacy encodings. Because of this balancing act, one can expect a certain number of anomalies in character properties. For example, some pairs of characters might have been treated as canonical equivalents but are left unequivalent for compatibility with legacy differences. This situation pertains to U+00B5 µ micro sign and U+03BC º greek small letter mu, as well as to certain Korean jamo.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

130

Character Properties

In addition, some characters might have had properties differing in some ways from those assigned in this standard, but those properties are left as is for compatibility with existing practice. This situation can be seen with the halfwidth voicing marks for Japanese (U+FF9E halfwidth katakana voiced sound mark and U+FF9F halfwidth katakana semi-voiced sound mark), which might have been better analyzed as spacing combining marks, and with the conjoining Hangul jamo, which might have been better analyzed as an initial base character followed by formally combining medial and final characters. In the interest of efficiency and uniformity in algorithms, implementations may take advantage of such reanalyses of character properties, as long as this does not conflict with the conformance requirements with respect to normative properties. See Section 3.5, Properties; Section 3.2, Conformance Requirements; and Section 3.3, Semantics, for more information.

4.1 Unicode Character Database The Unicode Character Database (UCD) consists of a set of files that define the Unicode character properties and internal mappings. For each property, the files determine the assignment of property values to each code point. The UCD also supplies recommended property aliases and property value aliases for textual parsing and display in environments such as regular expressions. The properties include the following: • Name • General Category (basic partition into letters, numbers, symbols, punctuation, and so on) • Other important general characteristics (whitespace, dash, ideographic, alphabetic, noncharacter, deprecated, and so on) • Display-related properties (bidirectional class, shaping, mirroring, width, and so on) • Casing (upper, lower, title, folding—both simple and full) • Numeric values and types • Script and Block • Normalization properties (decompositions, decomposition type, canonical combining class, composition exclusions, and so on) • Age (version of the standard in which the code point was first designated) • Boundaries (grapheme cluster, word, line, and sentence) • Standardized variants

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

4.1

Unicode Character Database

131

See the Unicode Character Database for more details on the character properties, their distribution across files, and the file formats. Unihan Database. In addition, a large number of properties specific to CJK ideographs are defined in the Unicode Character Database. These properties include source information, radical and stroke counts, phonetic values, meanings, and mappings to many East Asian standards. These properties are documented in the file Unihan.txt, also known as the Unihan Database. For a complete description of the properties in the Unihan Database, see the documentation file Unihan.html in the Unicode Character Database. (See also “Online Unihan Database” in Section B.6, Other Unicode Online Resources.) Many properties apply to both ideographs and other characters. These are not specified in the Unihan Database. Stability. While the Unicode Consortium strives to minimize changes to character property data, occasionally character properties must be updated. When this situation occurs, a new version of the Unicode Character Database is created, containing updated data files. Data file changes are associated with specific, numbered versions of the standard; character properties are never silently corrected between official versions. Each version of the Unicode Character Database, once published, is absolutely stable and will never change. Implementations or specifications that refer to a specific version of the UCD can rely upon this stability. Detailed policies on character encoding stability as they relate to properties are found in Appendix F, Unicode Encoding Stability Policies. See the subsection “Policies” in Section B.6, Other Unicode Online Resources. See also the discussion of versioning and stability in Section 3.1, Versions of the Unicode Standard. Aliases. Character properties and their values are given formal aliases to make it easier to refer to them consistently in specifications and in implementations, such as regular expressions, which may use them. These aliases are listed exhaustively in the Unicode Character Database, in the data files PropertyAliases.txt and PropertyValueAliases.txt. Many of the aliases have both a long form and a short form. For example, the General Category has a long alias “General_Category” and a short alias “gc”. The long alias is more comprehensible and is usually used in the text of the standard when referring to a particular character property. The short alias is more appropriate for use in regular expressions and other algorithmic contexts. In comparing aliases programmatically, loose matching is appropriate. That entails ignoring case differences and any whitespace, underscore, and hyphen characters. For example, “GeneralCategory”, “general_category”, and “GENERAL-CATEGORY” would all be considered equivalent property aliases. See UCD.html in the Unicode Character Database for further discussion of property and property value matching. For each character property whose values are not purely numeric, the Unicode Character Database provides a list of value aliases. For example, one of the values of the Line_Break property is given the long alias “Open_Punctuation” and the short alias “OP”.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

132

Character Properties

Property aliases and property value aliases can be combined in regular expressions that pick out a particular value of a particular property. For example, “\p{lb=OP}” means the Open_Punctuation value of the Line_Break property, and “\p{gc=Lu}” means the Uppercase_Letter value of the General_Category property. Property aliases define a namespace. No two character properties have the same alias. For each property, the set of corresponding property value aliases constitutes its own namespace. No constraint prevents property value aliases for different properties from having the same property value alias. Thus “B” is the short alias for the Paragraph_Separator value of the Bidi_Class property; “B” is also the short alias for the Below value of the Canonical_Combining_Class property. However, because of the namespace restrictions, any combination of a property alias plus an appropriate property value alias is guaranteed to constitute a unique string, as in “\p{bc=B}” versus “\p{ccc=B}”. For a recommended use of property and property value aliases, see Unicode Technical Standard #18, “Unicode Regular Expressions.” Aliases are also used for normatively referencing properties, as described in Section 3.1, Versions of the Unicode Standard. CD-ROM and Online Availability. A copy of the 5.0.0 version of the UCD is provided on the CD-ROM. All versions of the UCD are available online on the Unicode Web site. See the subsections “Online Unicode Character Database” and “Online Unihan Database” in Section B.6, Other Unicode Online Resources.

4.2 Case—Normative Case is a normative property of characters in certain alphabets whereby characters are considered to be variants of a single letter. These variants, which may differ markedly in shape and size, are called the uppercase letter (also known as capital or majuscule) and the lowercase letter (also known as small or minuscule). The uppercase letter is generally larger than the lowercase letter. Because of the inclusion of certain composite characters for compatibility, such as U+01F1 latin capital letter dz, a third case, called titlecase, is used where the first character of a word must be capitalized. An example of such a character is U+01F2 latin capital letter d with small letter z. The three case forms are UPPERCASE, Titlecase, and lowercase. For those scripts that have case (Latin, Greek, Coptic, Cyrillic, Glagolitic, Armenian, Deseret, and archaic Georgian), uppercase characters typically contain the word capital in their names. Lowercase characters typically contain the word small. However, this is not a reliable guide. The word small in the names of characters from scripts other than those just listed has nothing to do with case. There are other exceptions as well, such as small capital letters that are not formally uppercase. Some Greek characters with capital in their names are actually titlecase. (Note that while the archaic Georgian script contained upper- and lowercase pairs, they are not used in modern Georgian. See Section 7.7, Georgian.) The authoritative source for case of Unicode characters is the specification of lowercase, uppercase, and titlecase properties in the Unicode Character Database.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

4.3

Combining Classes—Normative

133

Case Mapping The default case mapping tables defined in the Unicode Standard are normative, but may be overridden to match user or implementation requirements. The Unicode Character Database contains five files with case mapping information, as shown in Table 4-1. Full case mappings for Unicode characters are obtained by using the basic mappings from UnicodeData.txt and extending or overriding them where necessary with the mappings from SpecialCasing.txt. Full case mappings may depend on the context surrounding the character in the original string. Some characters have a “best” single-character mapping in UnicodeData.txt as well as a full mapping in SpecialCasing.txt. Any character that does not have a mapping in these files is considered to map to itself. For more information on case mappings, see Section 5.18, Case Mappings.

Table 4-1. Sources for Case Mapping Information File Name

Description

UnicodeData.txt

Contains the case mappings that map to a single character. These do not increase the length of strings, nor do they contain context-dependent mappings. SpecialCasing.txt Contains additional case mappings that map to more than one character, such as “ß” to “SS”. Also contains context-dependent mappings, with flags to distinguish them from the normal mappings, as well as some locale-dependent mappings. CaseFolding.txt Contains data for performing locale-independent case folding, as described in “Caseless Matching,” in Section 5.18, Case Mappings. DerivedCoreProp- Contains definitions of the properties Lowercase and Uppercase. erties.txt PropList.txt Contains the definition of the property Soft_Dotted.

The single-character mappings in UnicodeData.txt are insufficient for languages such as German. Therefore, only legacy implementations that cannot handle case mappings that increase string lengths should use UnicodeData.txt case mappings alone. A set of charts that show the latest case mappings is also available on the Unicode Web site. See “Charts” in Section B.6, Other Unicode Online Resources.

4.3 Combining Classes—Normative Each combining character has a normative canonical combining class. This class is used with the Canonical Ordering Algorithm to determine which combining characters interact typographically and to determine how the canonical ordering of sequences of combining characters takes place. Class zero combining characters act like base letters for the purpose of determining canonical order. Combining characters with non-zero classes participate in

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

134

Character Properties

reordering for the purpose of determining the canonical order of sequences of characters. (See Section 3.11, Canonical Ordering Behavior, for a description of the algorithm.) The list of combining characters and their canonical combining class appears in the Unicode Character Database. Most combining characters are nonspacing. The canonical order of character sequences does not imply any kind of linguistic correctness or linguistic preference for ordering of combining marks in sequences. For more information on rendering combining marks, see Section 5.13, Rendering Nonspacing Marks. Class zero combining marks are never reordered by the Canonical Ordering Algorithm. Except for class zero, the exact numerical values of the combining classes are of no importance in canonical equivalence, although the relative magnitude of the classes is significant. For example, it is crucial that the combining class of the cedilla be lower than the combining class of the dot below, although their exact values of 202 and 220 are not important for implementations. Certain classes tend to correspond with particular rendering positions relative to the base character, as shown in Figure 4-1.

Figure 4-1. Positions of Common Combining Marks 230 216

202 220

Reordrant, Split, and Subjoined Combining Marks In some scripts, the rendering of combining marks is notably complex. This is true in particular of the Brahmi-derived scripts of South and Southeast Asia, whose vowels are often encoded as class zero combining marks in the Unicode Standard, known as matras for the Indic scripts. In the case of simple combining marks, as for the accent marks of the Latin script, the normative Unicode combining class of that combining mark typically corresponds to its positional placement with regard to a base letter, as described earlier. However, in the case of the combining marks representing vowels (and sometimes consonants) in the Brahmiderived scripts, all of the combining marks are given the normative combining class of zero, regardless of their positional placement within an aksara. The placement and rendering of a class zero combining mark cannot be derived from its combining class alone, but rather depends on having more information about the particulars of the script involved. In some instances, the position may migrate in different historical periods for a script or may even differ depending on font style.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

4.3

Combining Classes—Normative

135

Such matters are not treated as normative character properties in the Unicode Standard, because they are more properly considered properties of the glyphs and fonts used for rendering. However, to assist implementers, earlier versions of the Unicode Standard did subcategorize some class zero combining marks, pointing out significant types that need to be handled consistently. That earlier subcategorization is extended and refined in this section. Reordrant Class Zero Combining Marks. In many instances in Indic scripts, a vowel is represented in logical order after the consonant of a syllable, but is displayed before (to the left of) the consonant when rendered. Such combining marks are termed reordrant to reflect their visual reordering to the left of a consonant (or, in some instances, a consonant cluster). Special handling is required for selection and editing of these marks. In particular, the possibility that the combining mark may be reordered left past a cluster, and not simply past the immediate preceding character in the backing store, requires attention to the details for each script involved. The visual reordering of these reordrant class zero combining marks has nothing to do with the reordering of combining character sequences in the Canonical Ordering Algorithm. All of these marks are class zero and thus are never reordered by the Canonical Ordering Algorithm or during normalization. The reordering is purely a presentational issue for glyphs during rendering of text. Table 4-2 lists reordrant class zero combining marks in the Unicode Standard.

Table 4-2. Class Zero Combining Marks—Reordrant Script

Code Points

Devanagari Bengali Gurmukhi Gujarati Oriya Tamil Malayalam Sinhala Myanmar Khmer Balinese Buginese

093F 09BF, 09C7, 09C8 0A3F 0ABF 0B47 0BC6, 0BC7, 0BC8 0D46, 0D47, 0D48 0DD9, 0DDA, 0DDB 1031 17C1, 17C2, 17C3 1B3E, 1B3F 1A19, 1A1B

In addition, there are historically related vowel characters in the Thai and Lao scripts that, for legacy reasons, are not treated as combining marks. Instead, for Thai and Lao, these vowels are represented in the backing store in visual order and require no reordering for rendering. The trade-off is that they have to be rearranged logically for searching and sorting. Because of that processing requirement, these characters are given a formal character property assignment, the Logical_Order_Exception property, as listed in Table 4-3. See PropList.txt in the Unicode Character Database.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

136

Character Properties

Table 4-3. Thai and Lao Logical Order Exceptions Script

Code Points

Thai Lao

0E40..0E44 0EC0..0EC4

Split Class Zero Combining Marks. In addition to the reordrant class zero combining marks, there are a number of class zero combining marks whose representative glyph typically consists of two parts, which are split into different positions with respect to the consonant (or consonant cluster) in an aksara. Sometimes these glyphic pieces are rendered both to the left and the right of a consonant. Sometimes one piece is rendered above or below the consonant and the other piece is rendered to the left or the right. Particularly in the instances where some piece of the glyph is rendered to the left of the consonant, these split class zero combining marks pose similar implementation problems as for the reordrant marks. Table 4-4 lists split class zero combining marks in the Unicode Standard, subgrouped by positional patterns.

Table 4-4. Class Zero Combining Marks—Split Glyph Positions

Script

Left and right

Bengali Oriya Tamil Malayalam Sinhala Khmer Balinese Left and top Oriya Sinhala Khmer Left, top, and right Oriya Sinhala Khmer Top and right Oriya Kannada Limbu Balinese Top and bottom Telugu Tibetan Balinese Top, bottom, and right Balinese Bottom and right Balinese

Copyright © 1991-2007, Unicode, Inc.

Code Points 09CB, 09CC 0B4B 0BCA, 0BCB, 0BCC 0D4A, 0D4B, 0D4C 0DDC, 0DDE 17C0, 17C4, 17C5 1B40, 1B41 0B48 0DDA 17BE 0B4C 0DDD 17BF 0B57 0CC0, 0CC7, 0CC8, 0CCA, 0CCB 1925, 1926 1B43 0C48 0F73, 0F76, 0F77, 0F78, 0F79, 0F81 1B3C 1B3D 1B3B

The Unicode Standard 5.0 – Electronic edition

4.3

Combining Classes—Normative

137

One should pay very careful attention to all split class zero combining marks in implementations. Not only do they pose issues for rendering and editing, but they also often have canonical equivalences defined involving the separate pieces, when those pieces are also encoded as characters. As a consequence, the split combining marks may constitute exceptional cases under normalization. Some of the Tibetan split combining marks are discouraged from use. The split vowels also pose difficult problems for understanding the standard, as the phonological status of the vowel phonemes, the encoding status of the characters (including any canonical equivalences), and the graphical status of the glyphs are easily confused, both for native users of the script and for engineers working on implementations of the standard. Subjoined Class Zero Combining Marks. Brahmi-derived scripts that are not represented in the Unicode Standard with a virama may have class zero combining marks to represent subjoined forms of consonants. These correspond graphologically to what would be represented by a sequence of virama + consonant in other related scripts. The subjoined consonants do not pose particular rendering problems, at least not in comparison to other combining marks, but they should be noted as constituting an exception to the normal pattern in Brahmi-derived scripts of consonants being represented with base letters. This exception needs to be taken into account when doing linguistic processing or searching and sorting. Table 4-5 lists subjoined class zero combining marks in the Unicode Standard.

Table 4-5. Class Zero Combining Marks—Subjoined Script

Code Points

Tibetan Limbu

0F90..0F97, 0F99..0FBC 1929, 192A, 192B

These Limbu consonants, while logically considered subjoined combining marks, are rendered mostly at the lower right of a base letter, rather than directly beneath them. Strikethrough Class Zero Combining Marks. The Kharoshthi script is unique in having some class zero combining marks for vowels that are struck through a consonant, rather than being placed in a position around the consonant. These are also called out in Table 4-6 specifically as a warning that they may involve particular problems for implementations.

Table 4-6. Class Zero Combining Marks—Strikethrough Script

Code Points

Kharoshthi

10A01, 10A06

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Chapter 5

Implementation Guidelines 5 It is possible to implement a substantial subset of the Unicode Standard as “wide ASCII” with little change to existing programming practice. However, the Unicode Standard also provides for languages and writing systems that have more complex behavior than English does. Whether one is implementing a new operating system from the ground up or enhancing existing programming environments or applications, it is necessary to examine many aspects of current programming practice and conventions to deal with this more complex behavior. This chapter covers a series of short, self-contained topics that are useful for implementers. The information and examples presented here are meant to help implementers understand and apply the design and features of the Unicode Standard. That is, they are meant to promote good practice in implementations conforming to the Unicode Standard. These recommended guidelines are not normative and are not binding on the implementer, but are intended to represent best practice. When implementing the Unicode Standard, it is important to look not only at the letter of the conformance rules, but also at their spirit. Many of the following guidelines have been created specifically to assist people who run into issues with conformant implementations, while reflecting the requirements of actual usage.

5.1 Transcoding to Other Standards The Unicode Standard exists in a world of other text and character encoding standards— some private, some national, some international. A major strength of the Unicode Standard is the number of other important standards that it incorporates. In many cases, the Unicode Standard included duplicate characters to guarantee round-trip transcoding to established and widely used standards.

Issues Conversion of characters between standards is not always a straightforward proposition. Many characters have mixed semantics in one standard and may correspond to more than one character in another. Sometimes standards give duplicate encodings for the same character; at other times the interpretation of a whole set of characters may depend on the application. Finally, there are subtle differences in what a standard may consider a character.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

152

Implementation Guidelines

For these reasons, mapping tables are usually required to map between the Unicode Standard and another standard. Mapping tables need to be used consistently for text data exchange to avoid modification and loss of text data. For details, see Unicode Technical Standard #22, “Character Mapping Markup Language (CharMapML).” By contrast, conversions between different Unicode encoding forms are fast, lossless permutations. The Unicode Standard can be used as a pivot to transcode among n different standards. This process, which is sometimes called triangulation, reduces the number of mapping tables that an implementation needs from O(n2) to O(n).

Multistage Tables Tables require space. Even small character sets often map to characters from several different blocks in the Unicode Standard and thus may contain up to 64K entries (for the BMP) or 1,088K entries (for the entire codespace) in at least one direction. Several techniques exist to reduce the memory space requirements for mapping tables. These techniques apply not only to transcoding tables, but also to many other tables needed to implement the Unicode Standard, including character property data, case mapping, collation tables, and glyph selection tables. Flat Tables. If diskspace is not at issue, virtual memory architectures yield acceptable working set sizes even for flat tables because the frequency of usage among characters differs widely. Even small character sets contain many infrequently used characters. In addition, data intended to be mapped into a given character set generally does not contain characters from all blocks of the Unicode Standard (usually, only a few blocks at a time need to be transcoded to a given character set). This situation leaves certain sections of the mapping tables unused—and therefore paged to disk. The effect is most pronounced for large tables mapping from the Unicode Standard to other character sets, which have large sections simply containing mappings to the default character, or the “unmappable character” entry. Ranges. It may be tempting to “optimize” these tables for space by providing elaborate provisions for nested ranges or similar devices. This practice leads to unnecessary performance costs on modern, highly pipelined processor architectures because of branch penalties. A faster solution is to use an optimized two-stage table, which can be coded without any test or branch instructions. Hash tables can also be used for space optimization, although they are not as fast as multistage tables. Two-Stage Tables. Two-stage tables are a commonly employed mechanism to reduce table size (see Figure 5-1). They use an array of pointers and a default value. If a pointer is NULL, the value returned by a lookup operation in the table is the default value. Otherwise, the pointer references a block of values used for the second stage of the lookup. For BMP characters, it is quite efficient to organize such two-stage tables in terms of high byte and low byte values. The first stage is an array of 256 pointers, and each of the secondary blocks contains 256 values indexed by the low byte in the code point. For supplementary characters, it is often advisable to structure the pointers and second-stage arrays somewhat differ-

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

5.2

Programming Languages and Data Types

153

ently, so as to take best advantage of the very sparse distribution of supplementary characters in the remaining codespace.

Figure 5-1. Two-Stage Tables

Optimized Two-Stage Table. Wherever any blocks are identical, the pointers just point to the same block. For transcoding tables, this case occurs generally for a block containing only mappings to the default or “unmappable” character. Instead of using NULL pointers and a default value, one “shared” block of default entries is created. This block is pointed to by all first-stage table entries, for which no character value can be mapped. By avoiding tests and branches, this strategy provides access time that approaches the simple array access, but at a great savings in storage. Multistage Table Tuning. Given a table of arbitrary size and content, it is a relatively simple matter to write a small utility that can calculate the optimal number of stages and their width for a multistage table. Tuning the number of stages and the width of their arrays of index pointers can result in various trade-offs of table size versus average access time.

5.2 Programming Languages and Data Types Programming languages provide for the representation and handling of characters and strings via data types, data constants (literals), and methods. Explicit support for Unicode helps with the development of multilingual applications. In some programming languages, strings are expressed as sequences (arrays) of primitive types, exactly corresponding to sequences of code units of one of the Unicode encoding forms. In other languages, strings

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

154

Implementation Guidelines

are objects, but indexing into strings follows the semantics of addressing code units of a particular encoding form. Data types for “characters” generally hold just a single Unicode code point value for lowlevel processing and lookup of character property values. When a primitive data type is used for single-code point values, a signed integer type can be useful; negative values can hold “sentinel” values like end-of-string or end-of-file, which can be easily distinguished from Unicode code point values. However, in most APIs, string types should be used to accommodate user-perceived characters, which may require sequences of code points.

Unicode Data Types for C ISO/IEC Technical Report 19769, Extensions for the programming language C to support new character types, defines data types for the three Unicode encoding forms (UTF-8, UTF-16, and UTF-32), syntax for Unicode string and character literals, and methods for the conversion between the Unicode encoding forms. No other methods are specified. Unicode strings are encoded as arrays of primitive types as usual. For UTF-8, UTF-16, and UTF-32, the basic types are char, char16_t, and char32_t, respectively. The ISO Technical Report assumes that char is at least 8 bits wide for use with UTF-8. While char and wchar_t may be signed or unsigned types, the new char16_t and char32_t types are defined to be unsigned integer types. Unlike the specification in the wchar_t programming model, the Unicode data types do not require that a single string base unit alone (especially char or char16_t) must be able to store any one character (code point). UTF-16 string and character literals are written with a lowercase u as a prefix, similar to the L prefix for wchar_t literals. UTF-32 literals are written with an uppercase U as a prefix. Characters outside the basic character set are available for use in string literals through the \uhhhh and \Uhhhhhhhh escape sequences. These types and semantics are available in a compiler if the header is present and defines the __STDC_UTF_16__ (for char16_t) and __STDC_UTF_32__ (for char32_t) macros. Because Technical Report 19769 was not available when UTF-16 was first introduced, many implementations have been supporting a 16-bit wchar_t to contain UTF-16 code units. Such usage is not conformant to the C standard, because supplementary characters require use of pairs of wchar_t units in this case. ANSI/ISO C wchar_t. With the wchar_t wide character type, ANSI/ISO C provides for inclusion of fixed-width, wide characters. ANSI/ISO C leaves the semantics of the wide character set to the specific implementation but requires that the characters from the portable C execution set correspond to their wide character equivalents by zero extension. The Unicode characters in the ASCII range U+0020 to U+007E satisfy these conditions. Thus, if an implementation uses ASCII to code the portable C execution set, the use of the Unicode character set for the wchar_t type, in either UTF-16 or UTF-32 form, fulfills the requirement.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

5.3

Unknown and Missing Characters

155

The width of wchar_t is compiler-specific and can be as small as 8 bits. Consequently, programs that need to be portable across any C or C++ compiler should not use wchar_t for storing Unicode text. The wchar_t type is intended for storing compiler-defined wide characters, which may be Unicode characters in some compilers. However, programmers who want a UTF-16 implementation can use a macro or typedef (for example, UNICHAR) that can be compiled as unsigned short or wchar_t depending on the target compiler and platform. Other programmers who want a UTF-32 implementation can use a macro or typedef that might be compiled as unsigned int or wchar_t, depending on the target compiler and platform. This choice enables correct compilation on different platforms and compilers. Where a 16-bit implementation of wchar_t is guaranteed, such macros or typedefs may be predefined (for example, TCHAR on the Win32 API). On systems where the native character type or wchar_t is implemented as a 32-bit quantity, an implementation may use the UTF-32 form to represent Unicode characters. A limitation of the ISO/ANSI C model is its assumption that characters can always be processed in isolation. Implementations that choose to go beyond the ISO/ANSI C model may find it useful to mix widths within their APIs. For example, an implementation may have a 32-bit wchar_t and process strings in any of the UTF-8, UTF-16, or UTF-32 forms. Another implementation may have a 16-bit wchar_t and process strings as UTF-8 or UTF-16, but have additional APIs that process individual characters as UTF-32 or deal with pairs of UTF-16 code units.

5.3 Unknown and Missing Characters This section briefly discusses how users or implementers might deal with characters that are not supported or that, although supported, are unavailable for legible rendering.

Reserved and Private-Use Character Codes There are two classes of code points that even a “complete” implementation of the Unicode Standard cannot necessarily interpret correctly: • Code points that are reserved • Code points in the Private Use Area for which no private agreement exists An implementation should not attempt to interpret such code points. However, in practice, applications must deal with unassigned code points or private-use characters. This may occur, for example, when the application is handling text that originated on a system implementing a later release of the Unicode Standard, with additional assigned characters. Options for rendering such unknown code points include printing the code point as four to six hexadecimal digits, printing a black or white box, using appropriate glyphs such as ê for reserved and | for private use, or simply displaying nothing. An implementation should not blindly delete such characters, nor should it unintentionally transform them into something else.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

156

Implementation Guidelines

Interpretable but Unrenderable Characters An implementation may receive a code point that is assigned to a character in the Unicode character encoding, but be unable to render it because it lacks a font for the code point or is otherwise incapable of rendering it appropriately. In this case, an implementation might be able to provide limited feedback to the user’s queries, such as being able to sort the data properly, show its script, or otherwise display the code point in a default manner. An implementation can distinguish between unrenderable (but assigned) code points and unassigned code points by printing the former with distinctive glyphs that give some general indication of their type, such as A, B, C, D, E, F, G, H, J, R, S, and so on.

Default Property Values To work properly in implementations, unassigned code points must be given default property values as if they were characters, because various algorithms require property values to be assigned to every code point before they can function at all. These default values are not uniform across all unassigned code points, because certain ranges of code points need different values to maximize compatibility with expected future assignments. For information on the default values for each property, see its description in the Unicode Character Database. Except where indicated, the default values are not normative—conformant implementations can use other values.

Default Ignorable Code Points Normally, code points outside the repertoire of supported characters would be displayed with a fallback glyph, such as a black box. However, format and control characters must not have visible glyphs (although they may have an effect on other characters in display). These characters are also ignored except with respect to specific, defined processes; for example, zero width non-joiner is ignored by default in collation. To allow a greater degree of compatibility across versions of the standard, the ranges U+2060..U+206F, U+FFF0..U+FFFB, and U+E0000..U+E0FFF are reserved for format and control characters (General Category = Cf). Unassigned code points in these ranges should be ignored in processing and display. For more information, see Section 5.20, Default Ignorable Code Points.

Interacting with Downlevel Systems Versions of the Unicode Standard after Unicode 2.0 are strict supersets of Unicode 2.0 and all intervening versions. The Derived Age property tracks the version of the standard at which a particular character was added to the standard. This information can be particularly helpful in some interactions with downlevel systems. If the protocol used for communication between the systems provides for an announcement of the Unicode version on each one, an uplevel system can predict which recently added characters will appear as unassigned characters to the downlevel system.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

5.4

Handling Surrogate Pairs in UTF-16

157

5.4 Handling Surrogate Pairs in UTF-16 The method used by UTF-16 to address the 1,048,576 supplementary code points that cannot be represented by a single 16-bit value is called surrogate pairs. A surrogate pair consists of a high-surrogate code unit (leading surrogate) followed by a low-surrogate code unit (trailing surrogate), as described in the specifications in Section 3.8, Surrogates, and the UTF-16 portion of Section 3.9, Unicode Encoding Forms. In well-formed UTF-16, a trailing surrogate can be preceded only by a leading surrogate and not by another trailing surrogate, a non-surrogate, or the start of text. A leading surrogate can be followed only by a trailing surrogate and not by another leading surrogate, a non-surrogate, or the end of text. Maintaining the well-formedness of a UTF-16 code sequence or accessing characters within a UTF-16 code sequence therefore puts additional requirements on some text processes. Surrogate pairs are designed to minimize this impact. Leading surrogates and trailing surrogates are assigned to disjoint ranges of code units. In UTF-16, non-surrogate code points can never be represented with code unit values in those ranges. Because the ranges are disjoint, each code unit in well-formed UTF-16 must meet one of only three possible conditions: • A single non-surrogate code unit, representing a code point between 0 and D7FF16 or between E00016 and FFFF16 • A leading surrogate, representing the first part of a surrogate pair • A trailing surrogate, representing the second part of a surrogate pair By accessing at most two code units, a process using the UTF-16 encoding form can therefore interpret any Unicode character. Determining character boundaries requires at most scanning one preceding or one following code unit without regard to any other context. As long as an implementation does not remove either of a pair of surrogate code units or incorrectly insert another character between them, the integrity of the data is maintained. Moreover, even if the data becomes corrupted, the corruption remains localized, unlike with some other multibyte encodings such as Shift-JIS or EUC. Corrupting a single UTF16 code unit affects only a single character. Because of non-overlap (see Section 2.5, Encoding Forms), this kind of error does not propagate throughout the rest of the text. UTF-16 enjoys a beneficial frequency distribution in that, for the majority of all text data, surrogate pairs will be very rare; non-surrogate code points, by contrast, will be very common. Not only does this help to limit the performance penalty incurred when handling a variable-width encoding, but it also allows many processes either to take no specific action for surrogates or to handle surrogate pairs with existing mechanisms that are already needed to handle character sequences. Implementations should fully support surrogate pairs in processing UTF-16 text. Without surrogate support, an implementation would not interpret any supplementary characters or guarantee the integrity of surrogate pairs. This might apply, for example, to an older

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

158

Implementation Guidelines

implementation, conformant to Unicode Version 1.1 or earlier, before UTF-16 was defined. Support for supplementary characters is important because a significant number of them are relevant for modern use, despite their low frequency. The individual components of implementations may have different levels of support for surrogates, as long as those components are assembled and communicate correctly. Lowlevel string processing, where a Unicode string is not interpreted but is handled simply as an array of code units, may ignore surrogate pairs. With such strings, for example, a truncation operation with an arbitrary offset might break a surrogate pair. (For further discussion, see Section 2.7, Unicode Strings.) For performance in string operations, such behavior is reasonable at a low level, but it requires higher-level processes to ensure that offsets are on character boundaries so as to guarantee the integrity of surrogate pairs. Strategies for Surrogate Pair Support. Many implementations that handle advanced features of the Unicode Standard can easily be modified to support surrogate pairs in UTF-16. For example: • Text collation can be handled by treating those surrogate pairs as “grouped characters,” such as is done for “ij” in Dutch or “ch” in Slovak. • Text entry can be handled by having a keyboard generate two Unicode code points with a single keypress, much as an ENTER key can generate CRLF or an Arabic keyboard can have a “lam-alef ” key that generates a sequence of two characters, lam and alef. • Truncation can be handled with the same mechanism as used to keep combining marks with base characters. For more information, see Unicode Standard Annex #29, “Text Boundaries.” Users are prevented from damaging the text if a text editor keeps insertion points (also known as carets) on character boundaries. Implementations using UTF-8 and Unicode 8-bit strings necessitate similar considerations. The main difference from handling UTF-16 is that in the UTF-8 case the only characters that are represented with single code units (single bytes) in UTF-8 are the ASCII characters, U+0000..U+007F. Characters represented with multibyte sequences are very common in UTF-8, unlike surrogate pairs in UTF-16, which are rather uncommon. This difference in frequency may result in different strategies for handling the multibyte sequences in UTF-8.

5.5 Handling Numbers There are many sets of characters that represent decimal digits in different scripts. Systems that interpret those characters numerically should provide the correct numerical values. For example, the sequence when numerically interpreted has the value twenty.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

5.5

Handling Numbers

159

When converting binary numerical values to a visual form, digits can be chosen from different scripts. For example, the value twenty can be represented either by or by or by . It is recommended that systems allow users to choose the format of the resulting digits by replacing the appropriate occurrence of U+0030 digit zero with U+0660 arabic-indic digit zero, and so on. (See Chapter 4, Character Properties, for the information needed to implement formatting and scanning numerical values.) Fullwidth variants of the ASCII digits are simply compatibility variants of regular digits and should be treated as regular Western digits. The Roman numerals, Greek acrophonic numerals, and East Asian ideographic numerals are decimal numeral writing systems, but they are not formally decimal radix digit systems. That is, it is not possible to do a one-to-one transcoding to forms such as 123456.789. Such systems are appropriate only for positive integer writing. Sumero-Akkadian numerals were used for sexagesimal systems. There was no symbol for zero, but by Babylonian times, a place value system was in use. Thus the exact value of a digit depended on its position in a number. There was also ambiguity in numerical representation, because a symbol such as U+12079 cuneiform sign dish could represent either 1 or 1 × 60 or 1 × (60 × 60), depending on the context. A numerical expression might also be interpreted as a sexigesimal fraction. So the sequence <1, 10, 5> might be evaluated as 1 × 60 + 10 + 5 = 75 or 1 × 60 × 60 + 10 + 5 = 3615 or 1 + (10 + 5)/60 = 1.25. Many other complications arise in Cuneiform numeral systems, and they clearly require special processing distinct from that used for modern decimal radix systems. It is also possible to write numbers in two ways with CJK ideographic digits. For example, Figure 5-2 shows how the number 1,234 can be written.

Figure 5-2. CJK Ideographic Numbers

or

Supporting these ideographic digits for numerical parsing means that implementations must be smart about distinguishing between these two cases. Digits often occur in situations where they need to be parsed, but are not part of numbers. One such example is alphanumeric identifiers (see Unicode Standard Annex #31, “Identifier and Pattern Syntax”). Only in higher-level protocols, such as when implementing a full mathematical formula parser, do considerations such as superscripting and subscripting of digits become crucial for numerical interpretation.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

160

Implementation Guidelines

5.6 Normalization Alternative Spellings. The Unicode Standard contains explicit codes for the most frequently used accented characters. These characters can also be composed; in the case of accented letters, characters can be composed from a base character and nonspacing mark(s). The Unicode Standard provides decompositions for characters that can be composed using a base character plus one or more nonspacing marks. Implementations that are “liberal” in what they accept but “conservative” in what they issue will have the fewest compatibility problems. The decomposition mappings are specific to a particular version of the Unicode Standard. Further decomposition mappings may be added to the standard for new characters encoded in the future; however, no existing decomposition mapping for a currently encoded character will ever be removed, nor will a decomposition mapping be added for a currently encoded character. This follows from the stability guarantees for normalization. See Appendix F, Unicode Encoding Stability Policies, for more information. Normalization. Systems may normalize Unicode-encoded text to one particular sequence, such as normalizing composite character sequences into precomposed characters, or vice versa (see Figure 5-3).

Figure 5-3. Normalization

Unnormalized

a @¨

@· ë @˜ ò

ä· ë˜ ò

a @¨ @· e @¨ @˜ o @`

Precomposed

Decomposed

Compared to the number of possible combinations, only a relatively small number of precomposed base character plus nonspacing marks have independent Unicode character values. Most existed in dominant standards. Systems that cannot handle nonspacing marks can normalize to precomposed characters; this option can accommodate most modern Latin-based languages. Such systems can use fallback rendering techniques to at least visually indicate combinations that they cannot handle (see the “Fallback Rendering” subsection of Section 5.13, Rendering Nonspacing Marks).

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

5.7

Compression

161

In systems that can handle nonspacing marks, it may be useful to normalize so as to eliminate precomposed characters. This approach allows such systems to have a homogeneous representation of composed characters and maintain a consistent treatment of such characters. However, in most cases, it does not require too much extra work to support mixed forms, which is the simpler route. The standard forms for normalization are defined in Unicode Standard Annex #15, “Unicode Normalization Forms.” For further information, see Chapter 3, Conformance; “Equivalent Sequences” in Section 2.2, Unicode Design Principles; and Section 2.11, Combining Characters.

5.7 Compression Using the Unicode character encoding may increase the amount of storage or memory space dedicated to the text portion of files. Compressing Unicode-encoded files or strings can therefore be an attractive option if the text portion is a large part of the volume of data compared to binary and numeric data, and if the processing overhead of the compression and decompression is acceptable. Compression always constitutes a higher-level protocol and makes interchange dependent on knowledge of the compression method employed. For a detailed discussion of compression and a standard compression scheme for Unicode, see Unicode Technical Standard #6, “A Standard Compression Scheme for Unicode.” Encoding forms defined in Section 2.5, Encoding Forms, have different storage characteristics. For example, as long as text contains only characters from the Basic Latin (ASCII) block, it occupies the same amount of space whether it is encoded with the UTF-8 or ASCII codes. Conversely, text consisting of CJK ideographs encoded with UTF-8 will require more space than equivalent text encoded with UTF-16. For processing rather than storage, the Unicode encoding form is usually selected for easy interoperability with existing APIs. Where there is a choice, the trade-off between decoding complexity (high for UTF-8, low for UTF-16, trivial for UTF-32) and memory and cache bandwidth (high for UTF-32, low for UTF-8 or UTF-16) should be considered.

5.8 Newline Guidelines Newlines are represented on different platforms by carriage return (CR), line feed (LF), CRLF, or next line (NEL). Not only are newlines represented by different characters on different platforms, but they also have ambiguous behavior even on the same platform. These characters are often transcoded directly into the corresponding Unicode code points when a character set is transcoded; this means that even programs handling pure Unicode have to deal with the problems. Especially with the advent of the Web, where text on a single machine can arise from many sources, this causes a significant problem.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

162

Implementation Guidelines

Newline characters are used to explicitly indicate line boundaries. For more information, see Unicode Standard Annex #14, “Line Breaking Properties.” Newlines are also handled specially in the context of regular expressions. For information, see Unicode Technical Standard #18, “Unicode Regular Expression Guidelines.” For the use of these characters in markup languages, see Unicode Technical Report #20, “Unicode in XML and Other Markup Languages.”

Definitions Table 5-1 provides hexadecimal values for the acronyms used in these guidelines.

Table 5-1. Hex Values for Acronyms Acronym

Name

Unicode

ASCII

EBCDIC

CR

carriage return

000D

0D

0D

0D

LF

line feed

000A

0A

25

15

CRLF

<000D 000A> <0D 0A>

<0D 25>

<0D 15>

NEL

carriage return and line feed next line

0085

85

15

25

VT

vertical tab

000B

0B

0B

0B

FF

form feed

000C

0C

0C

0C

LS

line separator

2028

n/a

n/a

n/a

PS

paragraph separator

2029

n/a

n/a

n/a

The acronyms shown in Table 5-1 correspond to characters or sequences of characters. The name column shows the usual names used to refer to the characters in question, whereas the other columns show the Unicode, ASCII, and EBCDIC encoded values for the characters. Encoding. Except for LS and PS, the newline characters discussed here are encoded as control codes. Many control codes were originally designed for device control but, together with TAB, the newline characters are commonly used as part of plain text. For more information on how Unicode encodes control codes, see Section 16.1, Control Codes. Notation. This discussion of newline guidelines uses lowercase when referring to functions having to do with line determination, but uses the acronyms when referring to the actual characters involved. Keys on keyboards are indicated in all caps. For example: The line separator may be expressed by LS in Unicode text or CR on some platforms. It may be entered into text with the SHIFT-RETURN key. EBCDIC. Table 5-1 shows the two mappings of LF and NEL used by EBCDIC systems. The first EBCDIC column shows the default control code mapping of these characters, which is

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

5.8

Newline Guidelines

163

used in most EBCDIC environments. The second column shows the z/OS Unix System Services (Open Edition) mapping of LF and NEL. That mapping arises from the use of the LF character for the newline function in C programs and in Unix environments, while text files on z/OS traditionally use NEL for the newline function. NEL (next line) is not actually defined in 7-bit ASCII. It is defined in the ISO control function standard, ISO 6429, as a C1 control function. However, the 0x85 mapping shown in the ASCII column in Table 5-1 is the usual way that this C1 control function is mapped in ASCII-based character encodings. Newline Function. The acronym NLF (newline function) stands for the generic control function for indication of a new line break. It may be represented by different characters, depending on the platform, as shown in Table 5-2.

Table 5-2. NLF Platform Correlations Platform

NLF Value

MacOS 9.x and earlier MacOS X Unix Windows EBCDIC-based OS

CR LF LF CRLF NEL

Line Separator and Paragraph Separator A paragraph separator—independent of how it is encoded—is used to indicate a separation between paragraphs. A line separator indicates where a line break alone should occur, typically within a paragraph. For example: This is a paragraph with a line separator at this point, causing the word “causing” to appear on a different line, but not causing the typical paragraph indentation, sentence breaking, line spacing, or change in flush (right, center, or left paragraphs). For comparison, line separators basically correspond to HTML
, and paragraph separators to older usage of HTML

(modern HTML delimits paragraphs by enclosing them in

...

). In word processors, paragraph separators are usually entered using a keyboard RETURN or ENTER; line separators are usually entered using a modified RETURN or ENTER, such as SHIFT-ENTER. A record separator is used to separate records. For example, when exchanging tabular data, a common format is to tab-separate the cells and to use a CRLF at the end of a line of cells. This function is not precisely the same as line separation, but the same characters are often used. Traditionally, NLF started out as a line separator (and sometimes record separator). It is still used as a line separator in simple text editors such as program editors. As platforms and programs started to handle word processing with automatic line-wrap, these characters were reinterpreted to stand for paragraph separators. For example, even such simple

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

164

Implementation Guidelines

programs as the Windows Notepad program and the Mac SimpleText program interpret their platform’s NLF as a paragraph separator, not a line separator. Once NLF was reinterpreted to stand for a paragraph separator, in some cases another control character was pressed into service as a line separator. For example, vertical tabulation VT is used in Microsoft Word. However, the choice of character for line separator is even less standardized than the choice of character for NLF. Many Internet protocols and a lot of existing text treat NLF as a line separator, so an implementer cannot simply treat NLF as a paragraph separator in all circumstances.

Recommendations The Unicode Standard defines two unambiguous separator characters: U+2029 paragraph separator (PS) and U+2028 line separator (LS). In Unicode text, the PS and LS characters should be used wherever the desired function is unambiguous. Otherwise, the following recommendations specify how to cope with an NLF when converting from other character sets to Unicode, when interpreting characters in text, and when converting from Unicode to other character sets. Note that even if an implementer knows which characters represent NLF on a particular platform, CR, LF, CRLF, and NEL should be treated the same on input and in interpretation. Only on output is it necessary to distinguish between them. Converting from Other Character Code Sets R1 If the exact usage of any NLF is known, convert it to LS or PS. R1a If the exact usage of any NLF is unknown, remap it to the platform NLF. Recommendation R1a does not really help in interpreting Unicode text unless the implementer is the only source of that text, because another implementer may have left in LF, CR, CRLF, or NEL. Interpreting Characters in Text R2 Always interpret PS as paragraph separator and LS as line separator. R2a In word processing, interpret any NLF the same as PS. R2b In simple text editors, interpret any NLF the same as LS. In line breaking, both PS and LS terminate a line; therefore, the Unicode Line Breaking Algorithm in Unicode Standard Annex #14, “Line Breaking Properties,” is defined such that any NLF causes a line break. R2c In parsing, choose the safest interpretation. For example, in recommendation R2c an implementer dealing with sentence break heuristics would reason in the following way that it is safer to interpret any NLF as LS:

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

5.8

Newline Guidelines

165

• Suppose an NLF were interpreted as LS, when it was meant to be PS. Because most paragraphs are terminated with punctuation anyway, this would cause misidentification of sentence boundaries in only a few cases. • Suppose an NLF were interpreted as PS, when it was meant to be LS. In this case, line breaks would cause sentence breaks, which would result in significant problems with the sentence break heuristics. Converting to Other Character Code Sets R3 If the intended target is known, map NLF, LS, and PS depending on the target conventions. For example, when mapping to Microsoft Word’s internal conventions for documents, LS would be mapped to VT, and PS and any NLF would be mapped to CRLF. R3a If the intended target is unknown, map NLF, LS, and PS to the platform newline convention (CR, LF, CRLF, or NEL). In Java, for example, this is done by mapping to a string nlf, defined as follows: String nlf = System.getProperties("line.separator");

Input and Output R4 A readline function should stop at NLF, LS, FF, or PS. In the typical implementation, it does not include the NLF, LS, PS, or FF that caused it to stop. Because the separator is lost, the use of such a readline function is limited to text processing, where there is no difference among the types of separators. R4a A writeline (or newline) function should convert NLF, LS, and PS according to the recommendations R3 and R3a. In C, gets is defined to terminate at a newline and replaces the newline with '\0', while fgets is defined to terminate at a newline and includes the newline in the array into which it copies the data. C implementations interpret '\n' either as LF or as the underlying platform newline NLF, depending on where it occurs. EBCDIC C compilers substitute the relevant codes, based on the EBCDIC execution set. Page Separator FF is commonly used as a page separator, and it should be interpreted that way in text. When displaying on the screen, it causes the text after the separator to be forced to the next page. It is interpreted in the same way as the LS for line breaking, in parsing, or in input segmentation such as readline. FF does not interrupt a paragraph, as paragraphs can and do span page boundaries.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

166

Implementation Guidelines

5.9 Regular Expressions Byte-oriented regular expression engines require extensions to handle Unicode successfully. The following issues are involved in such extensions: • Unicode is a large character set—regular expression engines that are adapted to handle only small character sets may not scale well. • Unicode encompasses a wide variety of languages that can have very different characteristics than English or other Western European text. For detailed information on the requirements of Unicode regular expressions, see Unicode Technical Standard #18, “Unicode Regular Expression Guidelines.”

5.10 Language Information in Plain Text Requirements for Language Tagging The requirement for language information embedded in plain text data is often overstated. Many commonplace operations such as collation seldom require this extra information. In collation, for example, foreign language text is generally collated as if it were not in a foreign language. (See Unicode Technical Standard #10, “Unicode Collation Algorithm,” for more information.) For example, an index in an English book would not sort the Slovak word “chlieb” after “czar,” where it would be collated in Slovak, nor would an English atlas put the Swedish city of Örebro after Zanzibar, where it would appear in Swedish. Text to speech is also an area where the case for embedded language information is overstated. Although language information may be useful in performing text-to-speech operations, modern software for doing acceptable text-to-speech must be so sophisticated in performing grammatical analysis of text that the extra work in determining the language is not significant in practice. Language information can be useful in certain operations, such as spell-checking or hyphenating a mixed-language document. It is also useful in choosing the default font for a run of unstyled text; for example, the ellipsis character may have a very different appearance in Japanese fonts than in European fonts. Modern font and layout technologies produce different results based on language information. For example, the angle of the acute accent may be different for French and Polish.

Language Tags and Han Unification A common misunderstanding about Unicode Han unification is the mistaken belief that Han characters cannot be rendered properly without language information. This idea might lead an implementer to conclude that language information must always be added to plain text using the tags. However, this implication is incorrect. The goal and methods of

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

5.11

Editing and Selection

167

Han unification were to ensure that the text remained legible. Although font, size, width, and other format specifications need to be added to produce precisely the same appearance on the source and target machines, plain text remains legible in the absence of these specifications. There should never be any confusion in Unicode, because the distinctions between the unified characters are all within the range of stylistic variations that exist in each country. No unification in Unicode should make it impossible for a reader to identify a character if it appears in a different font. Where precise font information is important, it is best conveyed in a rich text format. Typical Scenarios. The following e-mail scenarios illustrate that the need for language information with Han characters is often overstated: • Scenario 1. A Japanese user sends out untagged Japanese text. Readers are Japanese (with Japanese fonts). Readers see no differences from what they expect. • Scenario 2. A Japanese user sends out an untagged mixture of Japanese and Chinese text. Readers are Japanese (with Japanese fonts) and Chinese (with Chinese fonts). Readers see the mixed text with only one font, but the text is still legible. Readers recognize the difference between the languages by the content. • Scenario 3. A Japanese user sends out a mixture of Japanese and Chinese text. Text is marked with font, size, width, and so on, because the exact format is important. Readers have the fonts and other display support. Readers see the mixed text with different fonts for different languages. They recognize the difference between the languages by the content, and see the text with glyphs that are more typical for the particular language. It is common even in printed matter to render passages of foreign language text in nativelanguage fonts, just for familiarity. For example, Chinese text in a Japanese document is commonly rendered in a Japanese font.

5.11 Editing and Selection Consistent Text Elements As far as a user is concerned, the underlying representation of text is not a material concern, but it is important that an editing interface present a uniform implementation of what the user thinks of as characters. (See “‘Characters’ and Grapheme Clusters” in Section 2.11, Combining Characters.) The user expects them to behave as units in terms of mouse selection, arrow key movement, backspacing, and so on. For example, when such behavior is implemented, and an accented letter is represented by a sequence of base character plus a nonspacing combining mark, using the right arrow key would logically skip from the start of the base character to the end of the last nonspacing character.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

168

Implementation Guidelines

In some cases, editing a user-perceived “character” or visual cluster element by element may be the preferred way. For example, a system might have the backspace key delete by using the underlying code point, while the delete key could delete an entire cluster. Moreover, because of the way keyboards and input method editors are implemented, there often may not be a one-to-one relationship between what the user thinks of as a character and the key or key sequence used to input it. Three types of boundaries are generally useful in editing and selecting within words: cluster boundaries, stacked boundaries and atomic character boundaries. Cluster Boundaries. Arbitrarily defined cluster boundaries may occur in scripts such as Devanagari, for which selection may be defined as applying to syllables or parts of syllables. In such cases, combining character sequences such as ka + vowel sign a or conjunct clusters such as ka + halant + ta are selected as a single unit. (See Figure 5-4.)

Figure 5-4. Consistent Character Boundaries

Cluster Stack Atomic

∑Ê’¸– Rôle  ∑Ê’¸– Rôle  ∑Ê’¸– Rôle 

Stacked Boundaries. Stacked boundaries are generally somewhat finer than cluster boundaries. Free-standing elements (such as vowel sign a in Devanagari) can be independently selected, but any elements that “stack” (including vertical ligatures such as Arabic lam + meem in Figure 5-4) can be selected only as a single unit. Stacked boundaries treat default grapheme clusters as single entities, much like composite characters. (See Unicode Standard Annex #29, “Text Boundaries,” for the definition of default grapheme clusters and for a discussion of how grapheme clusters can be tailored to meet the needs of defining arbitrary cluster boundaries.) Atomic Character Boundaries. The use of atomic character boundaries is closest to selection of individual Unicode characters. However, most modern systems indicate selection with some sort of rectangular highlighting. This approach places restrictions on the consistency of editing because some sequences of characters do not linearly progress from the start of the line. When characters stack, two mechanisms are used to visually indicate partial selection: linear and nonlinear boundaries. Linear Boundaries. Use of linear boundaries treats the entire width of the resultant glyph as belonging to the first character of the sequence, and the remaining characters in the backing-store representation as having no width and being visually afterward.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

5.12

Strategies for Handling Nonspacing Marks

169

This option is the simplest mechanism. The advantage of this system is that it requires very little additional implementation work. The disadvantage is that it is never easy to select narrow characters, let alone a zero-width character. Mechanically, it requires the user to select just to the right of the nonspacing mark and drag just to the left. It also does not allow the selection of individual nonspacing marks if more than one is present. Nonlinear Boundaries. Use of nonlinear boundaries divides any stacked element into parts. For example, picking a point halfway across a lam + meem ligature can represent the division between the characters. One can either allow highlighting with multiple rectangles or use another method such as coloring the individual characters. With more work, a precomposed character can behave in deletion as if it were a composed character sequence with atomic character boundaries. This procedure involves deriving the character’s decomposition on the fly to get the components to be used in simulation. For example, deletion occurs by decomposing, removing the last character, then recomposing (if more than one character remains). However, this technique does not work in general editing and selection. In most editing systems, the code point is the smallest addressable item, so the selection and assignment of properties (such as font, color, letterspacing, and so on) cannot be done on any finer basis than the code point. Thus the accent on an “e” could not be colored differently than the base in a precomposed character, although it could be colored differently if the text were stored internally in a decomposed form. Just as there is no single notion of text element, so there is no single notion of editing character boundaries. At different times, users may want different degrees of granularity in the editing process. Two methods suggest themselves. First, the user may set a global preference for the character boundaries. Second, the user may have alternative command mechanisms, such as Shift-Delete, which give more (or less) fine control than the default mode.

5.12 Strategies for Handling Nonspacing Marks By following these guidelines, a programmer should be able to implement systems and routines that provide for the effective and efficient use of nonspacing marks in a wide variety of applications and systems. The programmer also has the choice between minimal techniques that apply to the vast majority of existing systems and more sophisticated techniques that apply to more demanding situations, such as higher-end desktop publishing. In this section and the following section, the terms nonspacing mark and combining character are used interchangeably. The terms diacritic, accent, stress mark, Hebrew point, Arabic vowel, and others are sometimes used instead of nonspacing mark. (They refer to particular types of nonspacing marks.) Properly speaking, a nonspacing mark is any combining character that does not add space along the writing direction. For a formal definition of nonspacing mark, see Section 3.6, Combination.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

170

Implementation Guidelines

A relatively small number of implementation features are needed to support nonspacing marks. Different levels of implementation are also possible. A minimal system yields good results and is relatively simple to implement. Most of the features required by such a system are simply modifications of existing software. As nonspacing marks are required for a number of writing systems, such as Arabic, Hebrew, and those of South Asia, many vendors already have systems capable of dealing with these characters and can use their experience to produce general-purpose software for handling these characters in the Unicode Standard. Rendering. Composite character sequences can be rendered effectively by means of a fairly simple mechanism. In simple character rendering, a nonspacing combining mark has a zero advance width, and a composite character sequence will have the same width as the base character. Wherever a sequence of base character plus one or more nonspacing marks occurs, the glyphs for the nonspacing marks can be positioned relative to the base. The ligature mechanisms in the fonts can also substitute a glyph representing the combined form. In some cases the width of the base should change because of an applied accent, such as with “î”. The ligature or contextual form mechanisms in the font can be used to change the width of the base in cases where this is required. Other Processes. Correct multilingual comparison routines must already be able to compare a sequence of characters as one character, or one character as if it were a sequence. Such routines can also handle combining character sequences when supplied with the appropriate data. When searching strings, remember to check for additional nonspacing marks in the target string that may affect the interpretation of the last matching character. Line breaking algorithms generally use state machines for determining word breaks. Such algorithms can be easily adapted to prevent separation of nonspacing marks from base characters. (See also the discussion in Section 5.6, Normalization. For details in particular contexts, see Unicode Technical Standard #10, “Unicode Collation Algorithm”; Unicode Standard Annex #14, “Line Breaking Properties”; and Unicode Standard Annex #29, “Text Boundaries.”)

Keyboard Input A common implementation for the input of combining character sequences is the use of dead keys. These keys match the mechanics used by typewriters to generate such sequences through overtyping the base character after the nonspacing mark. In computer implementations, keyboards enter a special state when a dead key is pressed for the accent and emit a precomposed character only when one of a limited number of “legal” base characters is entered. It is straightforward to adapt such a system to emit combining character sequences or precomposed characters as needed. Typists, especially in the Latin script, are trained on systems that work using dead keys. However, many scripts in the Unicode Standard (including the Latin script) may be imple-

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

5.12

Strategies for Handling Nonspacing Marks

171

mented according to the handwriting sequence, in which users type the base character first, followed by the accents or other nonspacing marks (see Figure 5-5).

Figure 5-5. Dead Keys Versus Handwriting Sequence

Dead Key

Zrich Zrich u Zürich ¨

Handwriting

u ¨

Zrich Zurich Zürich

In the case of handwriting sequence, each keystroke produces a distinct, natural change on the screen; there are no hidden states. To add an accent to any existing character, the user positions the insertion point (caret) after the character and types the accent.

Truncation There are two types of truncation: truncation by character count and truncation by displayed width. Truncation by character count can entail loss (be lossy) or be lossless. Truncation by character count is used where, due to storage restrictions, a limited number of characters can be entered into a field; it is also used where text is broken into buffers for transmission and other purposes. The latter case can be lossless if buffers are recombined seamlessly before processing or if lookahead is performed for possible combining character sequences straddling buffers. When fitting data into a field of limited storage length, some information will be lost. The preferred position for truncating text in that situation is on a grapheme cluster boundary. As Figure 5-6 shows, such truncation can mean truncating at an earlier point than the last character that would have fit within the physical storage limitation. (See Unicode Standard Annex #29, “Text Boundaries.”) Truncation by displayed width is used for visual display in a narrow field. In this case, truncation occurs on the basis of the width of the resulting string rather than on the basis of a character count. In simple systems, it is easiest to truncate by width, starting from the end and working backward by subtracting character widths as one goes. Because a trailing nonspacing mark does not contribute to the measurement of the string, the result will not separate nonspacing marks from their base characters. If the textual environment is more sophisticated, the widths of characters may depend on their context, due to effects such as kerning, ligatures, or contextual formation. For such

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Chapter 6

Writing Systems and Punctuation

6

This chapter begins the portion of the Unicode Standard devoted to the detailed description of each script or other related group of Unicode characters. Each of the subsequent chapters presents a historically or geographically related group of scripts. This chapter presents a general introduction to writing systems, explains how they can be used to classify scripts, and then presents a detailed discussion of punctuation characters that are shared across scripts. Scripts and Blocks. The codespace of the Unicode Standard is divided into subparts called blocks. Character blocks generally contain characters from a single script, and in many cases, a script is fully represented in its character block; however, some scripts are encoded using several blocks, which are not always adjacent. Discussion of scripts and other groups of characters are structured by character blocks. Corresponding subsection headers identify each block and its associated range of Unicode code points. The code charts in Chapter 17, Code Charts, are also organized by character blocks. Scripts and Writing Systems. There are many different kinds of writing systems in the world. Their variety poses some significant issues for character encoding in the Unicode Standard as well as for implementers of the standard. Those who first approach the Unicode Standard without a background in writing systems may find the huge list of scripts bewilderingly complex. Therefore, before considering the script descriptions in detail, this chapter first presents a brief introduction to the types of writing systems. That introduction explains basic terminology about scripts and character types that will be used again and again when discussing particular scripts. Punctuation. The rest of this chapter deals with a special case: punctuation marks, which tend to be scattered about in different blocks and which may be used in common by many scripts. Punctuation characters occur in several widely separated places in the character blocks, including Basic Latin, Latin-1 Supplement, General Punctuation, and CJK Symbols and Punctuation. There are also occasional punctuation characters in character blocks for specific scripts. Most punctuation characters are intended for common usage with any script, although some of them are script-specific. Some scripts use both common and script-specific punctuation characters, usually as the result of recent adoption of standard Western punctua-

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

198

Writing Systems and Punctuation

tion marks. While punctuation characters vary in details of appearance and function between different languages and scripts, their overall purpose is shared: They serve to separate or otherwise organize units of text, such as sentences and phrases, thereby helping to clarify the meaning of the text. Certain punctuation characters also occur in mathematical and scientific formulae.

6.1 Writing Systems This section presents a brief introduction to writing systems. It describes the different kinds of writing systems and relates them to the encoded scripts found in the Unicode Standard. This framework may help to make the variety of scripts, modern and historic, a little less daunting. The terminology used here follows that developed by Peter T. Daniels, a leading expert on writing systems of the world. The term writing system has two mutually exclusive meanings in this standard. As used in this section, “writing system” refers to a way that families of scripts may be classified by how they represent the sounds or words of human language. For example, the writing system of the Latin script is alphabetic. In other places in the standard, “writing system” refers to the way a particular language is written. For example, the modern Japanese writing system uses four scripts: Han ideographs, Hiragana, Katakana and Latin (Romaji). Alphabets. A writing system that consists of letters for the writing of both consonants and vowels is called an alphabet. The term “alphabet” is derived from the first two letters of the Greek script: alpha, beta. Consonants and vowels have equal status as letters in such a system. The Latin alphabet is the most widespread and well-known example of an alphabet, having been adapted for use in writing thousands of languages. The correspondence between letters and sounds may be either more or less exact. Many alphabets do not exhibit a one-to-one correspondence between distinct sounds and letters or groups of letters used to represent them; often this is an indication of original spellings that were not changed as the language changed. Not only are many sounds represented by letter combinations, such as “th” in English, but the language may have evolved since the writing conventions were settled. Examples range from cases such as Italian or Finnish, where the match between letter and sound is rather close, to English, which has notoriously complex and arbitrary spelling. Phonetic alphabets, in contrast, are used specifically for the precise transcription of the sounds of languages. The best known of these alphabets is the International Phonetic Alphabet, an adaptation and extension of the Latin alphabet by the addition of new letters and marks for specific sounds and modifications of sounds. Unlike normal alphabets, the intent of phonetic alphabets is that their letters exactly represent sounds. Phonetic alphabets are not used as general-purpose writing systems per se, but it is not uncommon for a formerly unwritten language to have an alphabet developed for it based on a phonetic alphabet. Abjads. A writing system in which only consonants are indicated is an abjad. The main letters are all consonants (or long vowels), with other vowels either left out entirely or option-

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

6.1

Writing Systems

199

ally indicated with the use of secondary marks on the consonants. The Phoenician script is a prototypical abjad; a better-known example is the Arabic writing system. The term “abjad” is derived from the first four letters of the traditional order of the Arabic script: alef, beh, jeem, dal. Abjads are often, although not exclusively, associated with Semitic languages, which have word structures particularly well suited to the use of consonantal writing. Some abjads allow consonant letters to mark long vowels, as the use of waw and yeh in Arabic for /u:/ or /i:/. Hebrew and Arabic are typically written without any vowel marking at all. The vowels, when they do occur in writing, are referred to as points or harakat, and are indicated by the use of diacritic dots and other marks placed above and below the consonantal letters. Syllabaries. In a syllabary, each symbol of the system typically represents both a consonant and a vowel, or in some instances more than one consonant and a vowel. One of the bestknown examples of a syllabary is Hiragana, used for Japanese, in which the units of the system represent the syllables ka, ki, ku, ke, ko, sa, si, su, se, so, and so on. In general parlance, the elements of a syllabary are not called letters, but rather syllables. This can lead to some confusion, however, because letters of alphabets and units of other writing systems are also used, singly or in combinations, to write syllables of languages. So in a broad sense, the term “letter” can be used to refer to the syllables of a syllabary. In syllabaries such as Cherokee, Hiragana, Katakana, and Yi, each symbol has a unique shape, with no particular shape relation to any of the consonant(s) or vowels of the syllables. In other cases, however, the syllabic symbols of a syllabary are not atomic; they can be built up out of parts that have a consistent relationship to the phonological parts of the syllable. The best example of this is the Hangul writing system for Korean. Each Hangul syllable is made up of a part for the initial consonant (or consonant cluster), a part for the vowel (or diphthong), and an optional part for the final consonant (or consonant cluster). The relationship between the sounds and the graphic parts to represent them is systematic enough for Korean that the graphic parts collectively are known as jamos and constitute a kind of alphabet on their own. The jamos of the Hangul writing system have another characteristic: their shapes are not completely arbitrary, but were devised with intentionally iconic shapes relating them to articulatory features of the sounds they represent in Korean. The Hangul writing system has thus also been classified as a featural syllabary. Abugidas. Abugidas represent a kind of blend of syllabic and alphabetic characteristics in a writing system. The Ethiopic script is an abugida. The term “abugida” is derived from the first four letters of the letters of the Ethiopic script in the Semitic order: alf, bet, gaml, dant. The order of vowels (-ä -u -i -a) is that of the traditional vowel order in the first four columns of the Ethiopic syllable chart. Historically, abugidas spread across South Asia and were adapted by many languages, often of phonologically very different types. This process has also resulted in many extensions, innovations, and/or simplifications of the original patterns. The best-known example of an abugida is the Devanagari script, used in modern times to write Hindi and many other Indian languages, and used classically to

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

200

Writing Systems and Punctuation

write Sanskrit. See Section 9.1, Devanagari, for a detailed description of how Devanagari works and is rendered. In an abugida, each consonant letter carries an inherent vowel, usually /a/. There are also vowel letters, often distinguished between a set of independent vowel letters, which occur on their own, and dependent vowel letters, or matras, which are subordinate to consonant letters. When a dependent vowel letter follows a consonant letter, the vowel overrides the inherent vowel of the consonant. This is shown schematically in Figure 6-1.

Figure 6-1. Overriding Inherent Vowels

ka + i í ki

ka + e í ke

ka + u í ku

ka + o í ko

Abugidas also typically contain a special element usually referred to as a halant, virama, or killer, which, when applied to a consonant letter with its inherent vowel, has the effect of removing the inherent vowel, resulting in a bare consonant sound. Because of legacy practice, three distinct approaches have been taken in the Unicode Standard for the encoding of abugidas: the Devanagari model, the Tibetan model, and the Thai model. The Devanagari model, used for most abugidas, encodes an explicit virama character and represents text in its logical order. The Thai model departs from the Devanagari model in that it represents text in its visual display order, based on the typewriter legacy, rather than in logical order. The Tibetan model avoids an explicit virama, instead encoding a sequence of subjoined consonants to represent consonants occurring in clusters in a syllable. The Ethiopic script is traditionally analyzed as an abugida, because the base character for each consonantal series is understood as having an inherent vowel. However, Ethiopic lacks some of the typical features of Brahmi-derived scripts, such as halants and matras. Historically, it was derived from early Semitic scripts and in its earliest form was an abjad. In its traditional presentation and its encoding in the Unicode Standard, it is now treated more like a syllabary. Logosyllabaries. The final major category of writing system is known as the logosyllabary. In a logosyllabary, the units of the writing system are used primarily to write words and/or morphemes of words, with some subsidiary usage to represent syllabic sounds per se. The best example of a logosyllabary is the Han script, used for writing Chinese and borrowed by a number of other East Asian languages for use as part of their writing systems. The term for a unit of the Han script is hànzì l% in Chinese, kanji l% in Japanese, and hanja l% in Korean. In many instances this unit also constitutes a word, but more typically, two or more units together are used to write a word. This unit has variously been referred to as an ideograph (“idea writing”), a logograph (“word writing”), or a sinogram, as well as other terms. No single English term is completely satisfactory or uncontroversial. In this standard, CJK ideograph is used because it is a widely understood term.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

6.1

Writing Systems

201

There are a number of other historical examples of logosyllabaries, such as Tangut, many of which may eventually be encoded in the Unicode Standard. They vary in the degree to which they combine logographic writing principles, where the symbols stand for morphemes or entire words, and syllabic writing principles, where the symbols come to represent syllables per se, divorced from their meaning as morphemes or words. In some notable instances, as for Sumero-Akkadian cuneiform, a logosyllabary may evolve through time into a syllabary or alphabet by shedding its use of logographs. In other instances, as for the Han script, the use of logographic characters is very well entrenched and persistent. However, even for the Han script a small number of characters are used purely to represent syllabic sounds, so as to be able to represent such things as foreign personal names and place names. The classification of a writing system is often somewhat blurred by complications in the exact ways in which it matches up written elements to the phonemes or syllables of a language. For example, although Hiragana is classified as a syllabary, it does not always have an exact match between syllables and written elements. Syllables with long vowels are not written with a single element, but rather with a sequence of elements. Thus the syllable with a long vowel k^ is written with two separate Hiragana symbols, {ku}+{u}. Because of these kinds of complications, one must always be careful not to assume too much about the structure of a writing system from its nominal classification. Typology of Scripts in the Unicode Standard. Table 6-1 lists all of the scripts currently encoded in the Unicode Standard, showing the writing system type for each. The list is an approximate guide, rather than a definitive classification, because of the mix of features seen in many scripts. The writing systems for some languages may be quite complex, mixing more than one type of script together in a composite system. Japanese is the best example; it mixes a logosyllabary (Han), two syllabaries (Hiragana and Katakana), and one alphabet (Latin, for romaji).

Table 6-1. Typology of Scripts in the Unicode Standard Alphabets

Latin, Greek, Cyrillic, Armenian, Thaana, Georgian, Ogham, Runic, Mongolian, Glagolitic, Coptic, Tifinagh, Old Italic, Gothic, Ugaritic, Old Persian, Deseret, Shavian, Osmanya, N’Ko

Abjads

Hebrew, Arabic, Syriac, Phoenician

Abugidas

Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Sinhala, Thai, Lao, Tibetan, Myanmar, Tagalog, Hanunóo, Buhid, Tagbanwa, Khmer, Limbu, Tai Le, New Tai Lue, Buginese, Syloti Nagri, Kharoshthi, Balinese, Phags-pa

Logosyllabaries

Han, Sumero-Akkadian

Simple Syllabaries

Cherokee, Hiragana, Katakana, Bopomofo, Yi, Linear B, Cypriot, Ethiopic, Canadian Aboriginal Syllabics

Featural Syllabaries Hangul Notational Systems. In addition to scripts for written natural languages, there are notational systems for other kinds of information. Some of these more closely resemble text

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

202

Writing Systems and Punctuation

than others. The Unicode Standard encodes symbols for use with mathematical notation, Western and Byzantine musical notation, and Braille, as well as symbols for use in divination, such as the Yijing hexagrams. Notational systems can be classified by how closely they resemble text. Even notational systems that do not fully resemble text may have symbols used in text. In the case of musical notation, for example, while the full notation is twodimensional, many of the encoded symbols are frequently referenced in texts about music and musical notation.

6.2 General Punctuation Punctuation characters—for example, U+002C comma and U+2022 bullet—are encoded only once, rather than being encoded again and again for particular scripts; such general-purpose punctuation may be used for any script or mixture of scripts. In contrast, punctuation principally used with a specific script is found in the block corresponding to that script, such as U+058A armenian hyphen, U+061B “ ” arabic semicolon, or the punctuation used with CJK ideographs in the CJK Symbols and Punctuation block. Scriptspecific punctuation characters may be unique in function, have different directionality, or be distinct in appearance or usage from their generic counterparts.

õ

Punctuation intended for use with several related scripts is often encoded with the principal script for the group. For example, U+1735 philippine single punctuation is encoded in a single location in the Hanunóo block, but it is intended for use with all four of the Philippine scripts. Use and Interpretation. The use and interpretation of punctuation characters can be heavily context dependent. For example, U+002E full stop can be used as sentence-ending punctuation, an abbreviation indicator, a decimal point, and so on. Many Unicode algorithms, such as the Bidirectional Algorithm and Line Breaking Algorithm, both of which treat numeric punctuation differently from text punctuation, resolve the status of any ambiguous punctuation mark depending on whether it is part of a number context. Legacy character encoding standards commonly include generic characters for punctuation instead of the more precisely specified characters used in printing. Examples include the single and double quotes, period, dash, and space. The Unicode Standard includes these generic characters, but also encodes the unambiguous characters independently: various forms of quotation marks, em dash, en dash, minus, hyphen, em space, en space, hair space, zero width space, and so on. Rendering. Punctuation characters vary in appearance with the font style, just like the surrounding text characters. In some cases, where used in the context of a particular script, a specific glyph style is preferred. For example, U+002E full stop should appear square when used with Armenian, but is typically circular when used with Latin. For mixed Latin/ Armenian text, two fonts (or one font allowing for context-dependent glyph variation) may need to be used to render the character faithfully.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

6.2

General Punctuation

203

Writing Direction. Punctuation characters shared across scripts have no inherent directionality. In a bidirectional context, their display direction is resolved according to the rules in Unicode Standard Annex #9, “The Bidirectional Algorithm.” Certain script-specific punctuation marks have an inherent directionality that matches the writing direction of the script. For an example, see “Dandas” later in this section. The image of certain paired punctuation marks, specifically those that are brackets, is mirrored when the character is part of a right-to-left directional run (see Section 4.7, Bidi Mirrored—Normative). Mirroring ensures that the opening and closing semantics of the character remains independent of the writing direction. The same is generally not true for other punctuation marks even when their image is not bilaterally symmetric, such as slash or the curly quotes. See also “Paired Punctuation” later in this section. In vertical writing, many punctuation characters have special vertical glyphs. Normally, fonts contain both the horizontal and vertical glyphs, and the selection of the appropriate glyph is based on the text orientation in effect at rendering time. However, see “CJK Compatibility Forms: Vertical Forms” later in this section. Figure 6-2 shows a set of three common shapes used for ideographic comma and ideographic full stop. The first shape in each row is that used for horizontal text, the last shape is that for vertical text. The centered form may be used with both horizontal and vertical text. See also Figure 6-4 for an example of vertical and horizontal forms for quotation marks.

Figure 6-2. Forms of CJK Punctuation

Horizontal

Centered

Vertical

、

、

、

。

。

。

Layout Controls. A number of characters in the blocks described in this section are not graphic punctuation characters, but rather affect the operation of layout algorithms. For a description of those characters, see Section 16.2, Layout Controls. Encoding Characters with Multiple Semantic Values. Some of the punctuation characters in the ASCII range (U+0020..U+007F) have multiple uses, either through ambiguity in the original standards or through accumulated reinterpretations of a limited code set. For example, 2716 is defined in ANSI X3.4 as apostrophe (closing single quotation mark; acute accent), and 2D16 is defined as hyphen-minus. In general, the Unicode Standard provides the same interpretation for the equivalent code points, without adding to or subtracting from their semantics. The Unicode Standard supplies unambiguous codes elsewhere for the most useful particular interpretations of these ASCII values; the corresponding unambiguous characters are cross-referenced in the character names list for this block. For more information, see “Apostrophes,” “Space Characters,” and “Dashes and Hyphens” later in this section.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

204

Writing Systems and Punctuation

Blocks Devoted to Punctuation For compatibility with widely used legacy character sets, the Basic Latin (ASCII) block (U+0000..U+007F) and the Latin-1 Supplement block (U+0080..U+00FF) contain several of the most common punctuation signs. They are isolated from the larger body of Unicode punctuation, signs, and symbols only because their relative code locations within ASCII and Latin-1 are so widely used in standards and software. The Unicode Standard has a number of blocks devoted specifically to encoding collections of punctuation characters. The General Punctuation block (U+2000..U+206F) contains the most common punctuation characters widely used in Latin typography, as well as a few specialized punctuation marks and a large number of format control characters. All of these punctuation characters are intended for generic use, and in principle they could be used with any script. The Supplemental Punctuation block (U+2E00..U+2E7F) is devoted to less commonly encountered punctuation marks, including those used in specialized notational systems or occurring primarily in ancient manuscript traditions. The CJK Symbols and Punctuation block (U+3000..U+303F) has the most commonly occurring punctuation specific to East Asian typography—that is, typography involving the rendering of text with CJK ideographs. The Vertical Forms block (U+FE10..U+FE1F), the CJK Compatibility Forms block (U+FE30..U+FE4F), the Small Form Variants block (U+FE50..U+FE6F), and the Halfwidth and Fullwidth Forms block (U+FF00..U+FFEF) contain many compatibility characters for punctuation marks, encoded for compatibility with a number of East Asian character encoding standards. Their primary use is for round-trip mapping with those legacy standards. For vertical text, the regular punctuation characters are used instead, with alternate glyphs for vertical layout supplied by the font. The punctuation characters in these various blocks are discussed below in terms of their general types.

Format Control Characters Format control characters are special characters that have no visible glyph of their own, but that affect the display of characters to which they are adjacent, or that have other specialized functions such as serving as invisible anchor points in text. All format control characters have General_Category=Cf. A significant number of format control characters are encoded in the General Punctuation block, but their descriptions are found in other sections. Cursive joining controls, as well as U+200B zero width space, U+2028 line separator, U+2029 paragraph separator, and U+2060 word joiner, are described in Section 16.2, Layout Controls. Bidirectional ordering controls are also discussed in Section 16.2, Layout Controls, but their detailed use is specified in Unicode Standard Annex #9, “The Bidirectional Algorithm.”

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

6.2

General Punctuation

205

Invisible operators are explained in Section 15.5, Invisible Mathematical Operators. Deprecated format characters related to obsolete models of Arabic text processing are described in Section 16.3, Deprecated Format Characters. The reserved code points U+2064..U+2069 and U+FFF0..U+FFF8, as well as any reserved code points in the range U+E0000..U+E0FFF, are reserved for the possible future encoding of other format control characters. Because of this, they are treated as default ignorable code points. For more information, see Section 5.20, Default Ignorable Code Points.

Space Characters The most commonly used space character is U+0020 space. Also often used is its nonbreaking counterpart, U+00A0 no-break space. These two characters have the same width, but behave differently for line breaking. For more information, see Unicode Standard Annex #14, “Line Breaking.” U+00A0 no-break space behaves like a numeric separator for the purposes of bidirectional layout. (See Unicode Standard Annex #9, “The Bidirectional Algorithm,” for a detailed discussion of the Unicode Bidirectional Algorithm.) In ideographic text, U+3000 ideographic space is commonly used because its width matches that of the ideographs. The main difference among other space characters is their width. U+2000..U+2006 are standard quad widths used in typography. U+2007 figure space has a fixed width, known as tabular width, which is the same width as digits used in tables. U+2008 punctuation space is a space defined to be the same width as a period. U+2009 thin space and U+200A hair space are successively smaller-width spaces used for narrow word gaps and for justification of type. The fixed-width space characters (U+2000..U+200A) are derived from conventional (hot lead) typography. Algorithmic kerning and justification in computerized typography do not use these characters. However, where they are used (for example, in typesetting mathematical formulae), their width is generally font-specified, and they typically do not expand during justification. The exception is U+2009 thin space, which sometimes gets adjusted. In addition to the various fixed-width space characters, there are a few script-specific space characters in the Unicode Standard. U+1680 ogham space mark is unusual in that it is generally rendered with a visible horizontal line, rather than being blank. Space characters with special behavior in word or line breaking are described in “Line and Word Breaking” in Section 16.2, Layout Controls, and Unicode Standard Annex #14, “Line Breaking.” U+00A0 no-break space has an additional, important function in the Unicode Standard. It may serve as the base character for displaying a nonspacing combining mark in apparent isolation. Versions of the standard prior to Version 4.1 indicated that U+0020 space could also be used for this function, but space is no longer recommended, because of potential interactions with the handling of space in XML and other markup languages. See Section 2.11, Combining Characters, for further discussion.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

206

Writing Systems and Punctuation

Space characters are found in several character blocks in the Unicode Standard. The list of space characters appears in Table 6-2.

Table 6-2. Unicode Space Characters Code

Name

U+0020 U+00A0 U+1680 U+180E U+2000 U+2001 U+2002 U+2003 U+2004 U+2005 U+2006 U+2007 U+2008 U+2009 U+200A U+202F U+205F U+3000

space no-break space ogham space mark mongolian vowel separator en quad em quad en space em space three-per-em space four-per-em space six-per-em space figure space punctuation space thin space hair space narrow no-break space medium mathematical space ideographic space

The space characters in the Unicode Standard can be identified by their General Category, [gc=Zs], in the Unicode Character Database. One exceptional “space” character is U+200B zero width space. This character, although called a “space” in its name, does not actually have any width or visible glyph in display. It functions primarily to indicate word boundaries in writing systems that do not actually use orthographic spaces to separate words in text. It is given the General Category [gc=Cf] and is treated as a format control character, rather than as a space character, in implementations. Further discussion of U+200B zero width space, as well as other zero-width characters with special properties, can be found in Section 16.2, Layout Controls.

Dashes and Hyphens Because of its prevalence in legacy encodings, U+002D hyphen-minus is the most common of the dash characters used to represent a hyphen. It has ambiguous semantic value and is rendered with an average width. U+2010 hyphen represents the hyphen as found in words such as “left-to-right.” It is rendered with a narrow width. When typesetting text, U+2010 hyphen is preferred over U+002D hyphen-minus. U+2011 non-breaking hyphen has the same semantic value as U+2010 hyphen, but should not be broken across lines. U+2012 figure dash has the same (ambiguous) semantic as the U+002D hyphen-minus, but has the same width as digits (if they are monospaced). U+2013 en dash is used to indicate a range of values, such as 1973–1984, although in some languages hyphen is used for that purpose. The en dash should be distinguished from the U+2212 minus sign, which is

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

6.2

General Punctuation

207

an arithmetic operator. Although it is not preferred in mathematical typesetting, typographers sometimes use U+2013 en dash to represent the minus sign, particularly a unary minus. When interpreting formulas, U+002D hyphen-minus, U+2012 figure dash, and U+2212 minus sign should each be taken as indicating a minus sign, as in “x = a - b”, unless a higher-level protocol precisely defines which of these characters serves that function. U+2014 em dash is used to make a break—like this—in the flow of a sentence. (Some typographers prefer to use U+2013 en dash set off with spaces – like this – to make the same kind of break.) Like many other conventions for punctuation characters, such usage may depend on language. This kind of dash is commonly represented with a typewriter as a double hyphen. In older mathematical typography, U+2014 em dash may also used to indicate a binary minus sign. U+2015 horizontal bar is used to introduce quoted text in some typographic styles. Dashes and hyphen characters may also be found in other character blocks in the Unicode Standard. A list of dash and hyphen characters appears in Table 6-3. For a description of the line breaking behavior of dashes and hyphens, see Unicode Standard Annex #14, “Line Breaking Properties.”

Table 6-3. Unicode Dash Characters Code

Name

U+002D U+007E U+058A U+05BE U+1806 U+2010 U+2011 U+2012 U+2013 U+2014 U+2015 U+2053 U+207B U+208B U+2212 U+2E17 U+301C U+3030 U+30A0 U+FE31 U+FE32 U+FE58 U+FE63 U+FF0D

hyphen-minus tilde (when used as swung dash) armenian hyphen hebrew punctuation maqaf mongolian todo soft hyphen hyphen non-breaking hyphen figure dash en dash em dash horizontal bar (= quotation dash) swung dash superscript minus subscript minus minus sign double oblique hyphen wave dash wavy dash katakana-hiragana double hyphen presentation form for vertical em dash presentation form for vertical en dash small em dash small hyphen-minus fullwidth hyphen-minus

Soft Hyphen. Despite its name, U+00AD soft hyphen is not a hyphen, but rather an invisible format character used to indicate optional intraword breaks. As described in

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Chapter 7

European Alphabetic Scripts 7 Modern European alphabetic scripts are derived from or influenced by the Greek script, which itself was an adaptation of the Phoenician alphabet. A Greek innovation was writing the letters from left to right, which is the writing direction for all the scripts derived from or inspired by Greek. The European alphabetic scripts and additional characters described in this chapter are Latin

Cyrillic

Georgian

Greek

Glagolitic

Modifier letters

Coptic

Armenian

Combining marks

The European scripts are all written from left to right. Many have separate lowercase and uppercase forms of the alphabet. Spaces are used to separate words. Accents and diacritical marks are used to indicate phonetic features and to extend the use of base scripts to additional languages. Some of these modification marks have evolved into small free-standing signs that can be treated as characters in their own right. The Latin script is used to write or transliterate texts in a wide variety of languages. The International Phonetic Alphabet (IPA) is an extension of the Latin alphabet, enabling it to represent the phonetics of all languages. Other Latin phonetic extensions are used for the Uralic Phonetic Alphabet. The Latin alphabet is derived from the alphabet used by the Etruscans, who had adopted a Western variant of the classical Greek alphabet (Section 14.2, Old Italic). Originally it contained only 24 capital letters. The modern Latin alphabet as it is found in the Basic Latin block owes its appearance to innovations of scribes during the Middle Ages and practices of the early Renaissance printers. The Cyrillic script was developed in the ninth century and is also based on Greek. Like Latin, Cyrillic is used to write or transliterate texts in many languages. The Georgian and Armenian scripts were devised in the fifth century and are influenced by Greek. Modern Georgian does not have separate uppercase and lowercase forms. The Coptic script was the last stage in the development of Egyptian writing. It represented the adaptation of the Greek alphabet to writing Egyptian, with the retention of forms from Demotic for sounds not adequately represented by Greek letters. Although primarily used

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

226

European Alphabetic Scripts

in Egypt from the fourth to the tenth century, it is described in this chapter because of its close relationship to the Greek script. Glagolitic is an early Slavic script related in some ways to both the Greek and the Cyrillic scripts. It was widely used in the Balkans but gradually died out, surviving the longest in Croatia. Like Coptic, however, it still has some modern use in liturgical contexts. This chapter also describes modifier letters and combining marks used with the Latin script and other scripts. The block descriptions for other archaic European alphabetic scripts, such as Gothic, Ogham, Old Italic, and Runic, can be found in Chapter 14, Archaic Scripts.

7.1 Latin The Latin script was derived from the Greek script. Today it is used to write a wide variety of languages all over the world. In the process of adapting it to other languages, numerous extensions have been devised. The most common is the addition of diacritical marks. Furthermore, the creation of digraphs, inverse or reverse forms, and outright new characters have all been used to extend the Latin script. The Latin script is written in linear sequence from left to right. Spaces are used to separate words and provide the primary line breaking opportunities. Hyphens are used where lines are broken in the middle of a word. (For more information, see Unicode Standard Annex #14, “Line Breaking Properties.”) Latin letters come in uppercase and lowercase pairs. Languages. Some indication of language or other usage is given for many characters within the names lists accompanying the character charts. Diacritical Marks. Speakers of different languages treat the addition of a diacritical mark to a base letter differently. In some languages, the combination is treated as a letter in the alphabet for the language. In others, such as English, the same words can often be spelled with and without the diacritical mark without implying any difference. Most languages that use the Latin script treat letters with diacritical marks as variations of the base letter, but do not accord the combination the full status of an independent letter in the alphabet. Widely used accented character combinations are provided as single characters to accommodate interoperation with pervasive practice in legacy encodings. Combining diacritical marks can express these and all other accented letters as combining character sequences. In the Unicode Standard, all diacritical marks are encoded in sequence after the base characters to which they apply. For more details, see the subsection “Combining Diacritical Marks” in Section 7.9, Combining Marks, and also Section 2.11, Combining Characters. Alternative Glyphs. Some characters have alternative representations, although they have a common semantic. In such cases, a preferred glyph is chosen to represent the character in the code charts, even though it may not be the form used under all circumstances. Some

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

7.1

Latin

227

Latin examples to illustrate this point are provided in Figure 7-1 and discussed in the text that follows.

Figure 7-1. Alternative Glyphs in Latin

aa gg @AU ST WV C D, " LR Common typographical variations of basic Latin letters include the open- and closed-loop forms of the lowercase letters “a” and “g”, as shown in the first example in Figure 7-1. In ordinary Latin text, such distinctions are merely glyphic alternates for the same characters; however, phonetic transcription systems, such as IPA and Pinyin, often make systematic distinctions between these forms. Variations in Diacritical Marks. The shape and placement of diacritical marks can be subject to considerable variation that might surprise a reader unfamiliar with such distinctions. For example, when Czech is typeset, U+010F latin small letter d with caron and U+0165 latin small letter t with caron are often rendered by glyphs with an apostrophe instead of with a caron, commonly known as a há`ek. See the second example in Figure 7-1. In Slovak, this use also applies to U+013E latin small letter l with caron and U+013D latin capital letter l with caron. The use of an apostrophe can avoid some line crashes over the ascenders of those letters and so result in better typography. In typewritten or handwritten documents, or in didactic and pedagogical material, glyphs with há`eks are preferred. A similar situation can be seen with the Latvian letter U+0123 latin small letter g with cedilla, as shown in example 3 in Figure 7-1. In good Latvian typography, this character is always shown with a rotated comma over the g, rather than a cedilla below the g, because of the typographical design and layout issues resulting from trying to place a cedilla below the descender loop of the g. Poor Latvian fonts may substitute an acute accent for the rotated comma, and handwritten or other printed forms may actually show the cedilla below the g. The uppercase form of the letter is always shown with a cedilla, as the rounded bottom of the G poses no problems for attachment of the cedilla. Other Latvian letters with a cedilla below (U+0137 latin small letter k with cedilla, U+0146 latin small letter n with cedilla, and U+0157 latin small letter r with cedilla) always prefer a glyph with a floating comma below, as there is no proper attachment point for a cedilla at the bottom of the base form.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

228

European Alphabetic Scripts

In Turkish and Romanian, a cedilla and a comma below sometimes replace one another depending on the font style, as shown in example 4 in Figure 7-1. The form with the cedilla is preferred in Turkish, and the form with the comma below is preferred in Romanian. The characters with explicit commas below are provided to permit the distinction from characters with a cedilla. Legacy encodings for these characters contain only a single form of each of these characters. ISO/IEC 8859-2 maps these to the form with the cedilla, while ISO/IEC 8859-16 maps them to the form with the comma below. Migrating Romanian 8-bit data to Unicode should be done with care. In general, characters with cedillas or ogoneks below are subject to variable typographical usage, depending on the availability and quality of fonts used, the technology, and the geographic area. Various hooks, commas, and squiggles may be substituted for the nominal forms of these diacritics below, and even the directions of the hooks may be reversed. Implementers should become familiar with particular typographical traditions before assuming that characters are missing or are wrongly represented in the code charts in the Unicode Standard. Exceptional Case Pairs. The characters U+0130 latin capital letter i with dot above and U+0131 latin small letter dotless i (used primarily in Turkish) are assumed to take ASCII “i” and “I”, respectively, as their case alternates. This mapping makes the corresponding reverse mapping language-specific; mapping in both directions requires special attention from the implementer (see Section 5.18, Case Mappings). Diacritics on i and j. A dotted (normal) i or j followed by a nonspacing mark above loses the dot in rendering. Thus, in the word naïve, the ï could be spelled with i + diaeresis. A dotted-i is not equivalent to a Turkish dotless-i + overdot, nor are other cases of accented dotted-i equivalent to accented dotless-i (for example, i + ¨‡ ± + ¨). The same pattern is used for j. Dotless-j is used in the Landsmålsalfabet, where it does not have a case pair. To express the forms sometimes used in the Baltic (where the dot is retained under a top accent in dictionaries), use i + overdot + accent (see Figure 7-2).

Figure 7-2. Diacritics on i and j

i+ ¨

ï

j+ ◊

j◊

.

+ ´ . i+ ´ +

i+

´i . ´i

All characters that use their dot in this manner have the Soft_Dotted property in Unicode. Vietnamese. In the modern Vietnamese alphabet, there are 12 vowel letters and 5 tone marks (see Figure 7-3). Normalization Form C represents the combination of vowel letter and tone mark as a single unit—for example, U+1EA8 ] latin capital letter a with circumflex and hook above. Normalization Form D decomposes this combination into

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

7.1

Latin

229

the combining sequence, such as . Some widely used implementations prefer storing the vowel letter and the tone mark separately.

Figure 7-3. Vietnamese Letters and Tone Marks

The Vietnamese vowels and other letters are found in the Basic Latin, Latin-1 Supplement, and Latin Extended-A blocks. Additional precomposed vowels and tone marks are found in the Latin Extended Additional block. The characters U+0300 combining grave accent, U+0309 combining hook above, U+0303 combining tilde, U+0301 combining acute accent, and U+0323 combining dot below should be used in representing the Vietnamese tone marks. The characters U+0340 combining grave tone mark and U+0341 combining acute tone mark are deprecated and should not be used. Standards. Unicode follows ISO/IEC 8859-1 in the layout of Latin letters up to U+00FF. ISO/IEC 8859-1, in turn, is based on older standards—among others, ASCII (ANSI X3.4), which is identical to ISO/IEC 646:1991-IRV. Like ASCII, ISO/IEC 8859-1 contains Latin letters, punctuation signs, and mathematical symbols. These additional characters are widely used with scripts other than Latin. The descriptions of these characters are found in Chapter 6, Writing Systems and Punctuation, and Chapter 15, Symbols. The Latin Extended-A block includes characters contained in ISO/IEC 8859— Part 2. Latin alphabet No. 2, Part 3. Latin alphabet No. 3, Part 4. Latin alphabet No. 4, and Part 9. Latin alphabet No. 5. Many of the other graphic characters contained in these standards, such as punctuation, signs, symbols, and diacritical marks, are already encoded in the Latin-1 Supplement block. Other characters from these parts of ISO/IEC 8859 are encoded in other blocks, primarily in the Spacing Modifier Letters block (U+02B0..U+02FF) and in the character blocks starting at and following the General Punctuation block. The Latin Extended-A block also covers additional characters from ISO/IEC 6937. The Latin Extended-B block covers, among others, characters in ISO 6438 Documentation — African coded character set for bibliographic information interchange, Pinyin Latin transcription characters from the People’s Republic of China national standard GB 2312 and from the Japanese national standard JIS X 0212, and Sami characters from ISO/IEC 8859 Part 10. Latin alphabet No. 6. The characters in the IPA block are taken from the 1989 revision of the International Phonetic Alphabet, published by the International Phonetic Association. Extensions from later IPA sources have also been added. Related Characters. For other Latin-derived characters, see Letterlike Symbols (U+2100..U+214F), Currency Symbols (U+20A0..U+20CF), Number Forms

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

230

European Alphabetic Scripts

(U+2150..U+218F), Enclosed Alphanumerics (U+2460..U+24FF), CJK Compatibility (U+3300..U+33FF), Fullwidth Forms (U+FF21..U+FF5A), and Mathematical Alphanumeric Symbols (U+1D400..U+1D7FF).

Letters of Basic Latin: U+0041–U+007A Only a small fraction of the languages written with the Latin script can be written entirely with the basic set of 26 uppercase and 26 lowercase Latin letters contained in this block. The 26 basic letter pairs form the core of the alphabets used by all the other languages that use the Latin script. A stream of text using one of these alphabets would therefore intermix characters from the Basic Latin block and other Latin blocks. Occasionally a few of the basic letter pairs are not used to write a language. For example, Italian does not use “j” or “w”.

Letters of the Latin-1 Supplement: U+00C0–U+00FF The Latin-1 supplement extends the basic 26 letter pairs of ASCII by providing additional letters for the major languages of Europe listed in the next paragraph. Languages. The languages supported by the Latin-1 supplement include Catalan, Danish, Dutch, Faroese, Finnish, Flemish, German, Icelandic, Irish, Italian, Norwegian, Portuguese, Spanish, and Swedish. Ordinals. U+00AA feminine ordinal indicator and U+00BA masculine ordinal indicator can be depicted with an underscore, but many modern fonts show them as superscripted Latin letters with no underscore. In sorting and searching, these characters should be treated as weakly equivalent to their Latin character equivalents.

Latin Extended-A: U+0100–U+017F The Latin Extended-A block contains a collection of letters that, when added to the letters contained in the Basic Latin and Latin-1 Supplement blocks, allow for the representation of most European languages that employ the Latin script. Many other languages can also be written with the characters in this block. Most of these characters are equivalent to precomposed combinations of base character forms and combining diacritical marks. These combinations may also be represented by means of composed character sequences. See Section 2.11, Combining Characters, and Section 7.9, Combining Marks. Compatibility Digraphs. The Latin Extended-A block contains five compatibility digraphs, encoded for compatibility with ISO/IEC 6937:1984. Two of these characters, U+0140 latin small letter l with middle dot and its uppercase version, were originally encoded in ISO/IEC 6937 for support of Catalan. In current conventions, the representation of this digraphic sequence in Catalan simply uses a sequence of an ordinary “l” and U+00B7 middle dot. Another pair of characters, U+0133 latin small ligature ij and its uppercase version, was provided to support the digraph “ij” in Dutch, often termed a “ligature” in discussions

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

7.1

Latin

231

of Dutch orthography. When adding intercharacter spacing for line justification, the “ij” is kept as a unit, and the space between the i and j does not increase. In titlecasing, both the i and the j are uppercased, as in the word “IJsselmeer.” Using a single code point might simplify software support for such features; however, because a vast amount of Dutch data is encoded without this digraph character, under most circumstances one will encounter an sequence. Finally, U+0149 latin small letter n preceded by apostrophe was encoded for use in Afrikaans. However, in nearly all cases it is better represented simply by a sequence of an apostrophe followed by “n”. Languages. Most languages supported by this block also require the concurrent use of characters contained in the Basic Latin and Latin-1 Supplement blocks. When combined with these two blocks, the Latin Extended-A block supports Afrikaans, Basque, Breton, Croatian, Czech, Esperanto, Estonian, French, Frisian, Greenlandic, Hungarian, Latin, Latvian, Lithuanian, Maltese, Polish, Provençal, Rhaeto-Romanic, Romanian, Romany, Sámi, Slovak, Slovenian, Sorbian, Turkish, Welsh, and many others.

Latin Extended-B: U+0180–U+024F The Latin Extended-B block contains letterforms used to extend Latin scripts to represent additional languages. It also contains phonetic symbols not included in the International Phonetic Alphabet (see the IPA Extensions block, U+0250..U+02AF). Arrangement. The characters are arranged in a nominal alphabetical order, followed by a small collection of Latinate forms. Uppercase and lowercase pairs are placed together where possible, but in many instances the other case form is encoded at some distant location and so is cross-referenced. Variations on the same base letter are arranged in the following order: turned, inverted, hook attachment, stroke extension or modification, different style, small cap, modified basic form, ligature, and Greek derived. Croatian Digraphs Matching Serbian Cyrillic Letters. Serbo-Croatian is a single language with paired alphabets: a Latin script (Croatian) and a Cyrillic script (Serbian). A set of compatibility digraph codes is provided for one-to-one transliteration. There are two potential uppercase forms for each digraph, depending on whether only the initial letter is to be capitalized (titlecase) or both (all uppercase). The Unicode Standard offers both forms so that software can convert one form to the other without changing font sets. The appropriate cross references are given for the lowercase letters. Pinyin Diacritic–Vowel Combinations. The Chinese standard GB 2312, the Japanese standard JIS X 0212, and some other standards include codes for Pinyin, which is used for Latin transcription of Mandarin Chinese. Most of the letters used in Pinyin romanization are already covered in the preceding Latin blocks. The group of 16 characters provided here completes the Pinyin character set specified in GB 2312 and JIS X 0212. Case Pairs. A number of characters in this block are uppercase forms of characters whose lowercase forms are part of some other grouping. Many of these characters came from the International Phonetic Alphabet; they acquired uppercase forms when they were adopted

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

232

European Alphabetic Scripts

into Latin script-based writing systems. Occasionally, however, alternative uppercase forms arose in this process. In some instances, research has shown that alternative uppercase forms are merely variants of the same character. If so, such variants are assigned a single Unicode code point, as is the case of U+01B7 latin capital letter ezh. But when research has shown that two uppercase forms are actually used in different ways, then they are given different codes; such is the case for U+018E latin capital letter reversed e and U+018F latin capital letter schwa. In this instance, the shared lowercase form is copied to enable unique case-pair mappings: U+01DD latin small letter turned e is a copy of U+0259 latin small letter schwa. For historical reasons, the names of some case pairs differ. For example, U+018E latin capital letter reversed e is the uppercase of U+01DD latin small letter turned e—not of U+0258 latin small letter reversed e. For default case mappings of Unicode characters, see Section 4.2, Case—Normative. Caseless Letters. A number of letters used with the Latin script are caseless—for example, the caseless glottal stop at U+0294 and U+01BB latin letter two with stroke, and the various letters denoting click sounds. Caseless letters retain their shape when uppercased. When titlecasing words, they may also act transparently; that is, if they occur in the leading position, the next following cased letter may be uppercased instead. Over the last several centuries, the trend in typographical development for the Latin script has tended to favor the eventual introduction of case pairs. See the following discussion of the glottal stop. The Unicode Standard may encode additional uppercase characters in such instances. However, for reasons of stability, the Standard will never add a new lowercase form for an existing uppercase character. See also “Caseless Matching” in Section 5.18, Case Mappings. Glottal Stop. There are two patterns of usage for the glottal stop in the Unicode Standard. U+0294 j latin letter glottal stop is a caseless letter used in IPA. It is also widely seen in language orthographies based on IPA or Americanist phonetic usage, in those instances where no casing is apparent for glottal stop. Such orthographies may avoid casing for glottal stop to the extent that when titlecasing strings, a word with an initial glottal stop may have its second letter uppercased instead of the first letter. In a small number of orthographies for languages of northwestern Canada, and in particular, for Chipewyan, Dogrib, and Slavey, case pairs have been introduced for glottal stop. For these orthographies, the cased glottal stop characters should be used: U+0241 k latin capital letter glottal stop and U+0242 l latin small letter glottal stop. The glyphs for the glottal stop are somewhat variable and overlap to a certain extent. The glyph shown in the code charts for U+0294 j latin letter glottal stop is a cap-height form as specified in IPA, but the same character is often shown with a glyph that resembles the top half of a question mark and that may or may not be cap height. U+0241 k latin capital letter glottal stop, while shown with a larger glyph in the code charts, often appears identical to U+0294. U+0242 l latin small letter glottal stop is a small form of U+0241.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

7.1

Latin

233

Various small, raised hook- or comma-shaped characters are often substituted for a glottal stop—for instance, U+02BC m modifier letter apostrophe, U+02BB n modifier letter turned comma, U+02C0 o modifier letter glottal stop, or U+02BE p modifier letter right half ring. U+02BB, in particular, is used in Hawaiian orthography as the nokina.

IPA Extensions: U+0250–U+02AF The IPA Extensions block contains primarily the unique symbols of the International Phonetic Alphabet, which is a standard system for indicating specific speech sounds. The IPA was first introduced in 1886 and has undergone occasional revisions of content and usage since that time. The Unicode Standard covers all single symbols and all diacritics in the last published IPA revision (1989) as well as a few symbols in former IPA usage that are no longer currently sanctioned. A few symbols have been added to this block that are part of the transcriptional practices of Sinologists, Americanists, and other linguists. Some of these practices have usages independent of the IPA and may use characters from other Latin blocks rather than IPA forms. Note also that a few nonstandard or obsolete phonetic symbols are encoded in the Latin Extended-B block. An essential feature of IPA is the use of combining diacritical marks. IPA diacritical mark characters are coded in the Combining Diacritical Marks block, U+0300.. U+036F. In IPA, diacritical marks can be freely applied to base form letters to indicate the fine degrees of phonetic differentiation required for precise recording of different languages. Standards. The International Phonetic Association standard considers IPA to be a separate alphabet, so it includes the entire Latin lowercase alphabet a–z, a number of extended Latin letters such as U+0153 œ latin small ligature oe, and a few Greek letters and other symbols as separate and distinct characters. In contrast, the Unicode Standard does not duplicate either the Latin lowercase letters a–z or other Latin or Greek letters in encoding IPA. Unlike other character standards referenced by the Unicode Standard, IPA constitutes an extended alphabet and phonetic transcriptional standard, rather than a character encoding standard. Unifications. The IPA characters are unified as much as possible with other letters, albeit not with nonletter symbols such as U+222B ´ integral. The IPA characters have also been adopted into the Latin-based alphabets of many written languages, such as some used in Africa. It is futile to attempt to distinguish a transcription from an actual alphabet in such cases. Therefore, many IPA characters are found outside the IPA Extensions block. IPA characters that are not found in the IPA Extensions block are listed as cross references at the beginning of the character names list for this block. IPA Alternates. In a few cases IPA practice has, over time, produced alternate forms, such as U+0269 latin small letter iota “ι” versus U+026A latin letter small capital i “i.” The Unicode Standard provides separate encodings for the two forms because they are used in a meaningfully distinct fashion.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

234

European Alphabetic Scripts

Case Pairs. IPA does not sanction case distinctions; in effect, its phonetic symbols are all lowercase. When IPA symbols are adopted into a particular alphabet and used by a given written language (as has occurred, for example, in Africa), they acquire uppercase forms. Because these uppercase forms are not themselves IPA symbols, they are generally encoded in the Latin Extended-B block (or other Latin extension blocks) and are cross-referenced with the IPA names list. Typographic Variants. IPA includes typographic variants of certain Latin and Greek letters that would ordinarily be considered variations of font style rather than of character identity, such as small capital letterforms. Examples include a typographic variant of the Greek letter phi φ and the borrowed letter Greek iota ι, which has a unique Latin uppercase form. These forms are encoded as separate characters in the Unicode Standard because they have distinct semantics in plain text. Affricate Digraph Ligatures. IPA officially sanctions six digraph ligatures used in transcription of coronal affricates. These are encoded at U+02A3 .. U+02A8. The IPA digraph ligatures are explicitly defined in IPA and have possible semantic values that make them not simply rendering forms. For example, while U+02A6 latin small letter ts digraph is a transcription for the sounds that could also be transcribed in IPA as “ts” , the choice of the digraph ligature may be the result of a deliberate distinction made by the transcriber regarding the systematic phonetic status of the affricate. The choice of whether to ligate cannot be left to rendering software based on the font available. This ligature also differs in typographical design from the “ts” ligature found in some oldstyle fonts. Arrangement. The IPA Extensions block is arranged in approximate alphabetical order according to the Latin letter that is graphically most similar to each symbol. This order has nothing to do with a phonetic arrangement of the IPA letters.

Phonetic Extensions: U+1D00–U+1DBF Most of the characters in the first of the two adjacent blocks comprising the phonetic extensions are used in the Uralic Phonetic Alphabet (UPA; also called Finno-Ugric Transcription, FUT), a highly specialized system that has been used by Uralicists globally for more than 100 years. Originally, it was chiefly used in Finland, Hungary, Estonia, Germany, Norway, Sweden, and Russia, but it is now known and used worldwide, including in North America and Japan. Uralic linguistic description, which treats the phonetics, phonology, and etymology of Uralic languages, is also used by other branches of linguistics, such as Indo-European, Turkic, and Altaic studies, as well as by other sciences, such as archaeology. A very large body of descriptive texts, grammars, dictionaries, and chrestomathies exists, and continues to be produced, using this system. The UPA makes use of approximately 258 characters, some of which are encoded in the Phonetic Extensions block; others are encoded in the other Latin blocks and in the Greek and Cyrillic blocks. The UPA takes full advantage of combining characters. It is not uncommon to find a base letter with three diacritics above and two below.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

7.1

Latin

235

Typographic Features of the UPA. Small capitalization in the UPA means voicelessness of a normally voiced sound. Small capitalization is also used to indicate certain either voiceless or half-voiced consonants. Superscripting indicates very short schwa vowels or transition vowels, or in general very short sounds. Subscripting indicates co-articulation caused by the preceding or following sound. Rotation (turned letters) indicates reduction; sideways (that is, 90 degrees counterclockwise) rotation is used where turning (180 degrees) might result in an ambiguous representation. UPA phonetic material is generally represented with italic glyphs, so as to separate it from the surrounding text. Other Phonetic Extensions. The remaining characters in the phonetics extension range U+1D6C..U+1DFF are derived from a wide variety of sources, including many technical orthographies developed by SIL linguists, as well as older historic sources. All attested phonetic characters showing struckthrough tildes, struckthrough bars, and retroflex or palatal hooks attached to the basic letter have been separately encoded here. Although separate combining marks exist in the Unicode Standard for overstruck diacritics and attached retroflex or palatal hooks, earlier encoded IPA letters such as U+0268 latin small letter i with stroke and U+026D latin small letter l with retroflex hook have never been been given decomposition mappings in the standard. For consistency, all newly encoded characters are handled analogously to the existing, more common characters of this type and are not given decomposition mappings. Because these characters do not have decompositions, they require special handling in some circumstances. See the discussion of single-script confusables in Unicode Technical Standard #39, “Unicode Security Mechanisms.” The Phonetic Extensions Supplement block also contains 37 superscript modifier letters. These complement the much more commonly used superscript modifier letters found in the Spacing Modifer Letters block. U+1D77 latin small letter turned g and U+1D78 modifier letter cyrillic en are used in Caucasian linguistics. U+1D79 latin small letter insular g is used in older Irish phonetic notation. It is to be distinguished from a Gaelic style glyph for U+0067 latin small letter g. Digraph for th. U+1D7A latin small letter th with strikethrough is a digraphic notation commonly found in some English-language dictionaries, representing the voiceless (inter)dental fricative, as in thin. While this character is clearly a digraph, the obligatory strikethrough across two letters distinguishes it from a “th” digraph per se, and there is no mechanism involving combining marks that can easily be used to represent it. A common alternative glyphic form for U+1D7A uses a horizontal bar to strike through the two letters, instead of a diagonal stroke.

Latin Extended Additional: U+1E00–U+1EFF The characters in this block constitute a number of precomposed combinations of Latin letters with one or more general diacritical marks. With the exception of U+1E9A latin

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

236

European Alphabetic Scripts

small letter a with right half ring, each of the characters contained in this block is a canonical decomposable character and may alternatively be represented with a base letter followed by one or more general diacritical mark characters found in the Combining Diacritical Marks block. Vietnamese Vowel Plus Tone Mark Combinations. A portion of this block (U+1EA0.. U+1EF9) comprises vowel letters of the modern Vietnamese alphabet (quxc ngy) combined with a diacritic mark that denotes the phonemic tone that applies to the syllable.

Latin Extended-C: U+2C60–U+2C7F This small block of additional Latin characters contains orthographic Latin additions for minority languages, a few historic Latin letters, and further extensions for phonetic notations. Uighur. The Latin orthography for the Uighur language was influenced by widespread conventions for extension of the Cyrillic script for representing Central Asian languages. In particular, a number of Latin characters were extended with a Cyrillic-style descender diacritic to create new letters for use with Uighur. Claudian Letters. The Roman emperor Claudius invented three additional letters for use with the Latin script. Those letters saw limited usage during his reign, but were abandoned soon afterward. The half h letter is encoded in this block. The other two letters are encoded in other blocks: U+2132 turned capital f and U+2183 roman numeral reversed one hundred (unified with the Claudian letter reversed c). Claudian letters in inscriptions are uppercase only, but may be transcribed by scholars in lowercase.

Latin Extended-D: U+A720–U+A7FF This block is intended for further encoding of historic letters for the Latin script and other rare phonetic and orthographic extensions to the script. For Unicode 5.0, it contains only two modifier tone letters for use with UPA.

Latin Ligatures: U+FB00–U+FB06 This range in the Alphabetic Presentation Forms block (U+FB00..U+FB4F) contains several common Latin ligatures, which occur in legacy encodings. Whether to use a Latin ligature is a matter of typographical style as well as a result of the orthographical rules of the language. Some languages prohibit ligatures across word boundaries. In these cases, it is preferable for the implementations to use unligated characters in the backing store and provide out-of-band information to the display layer where ligatures may be placed. Some format controls in the Unicode Standard can affect the formation of ligatures. See “Controlling Ligatures” in Section 16.2, Layout Controls.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

7.2

Greek

237

7.2 Greek Greek: U+0370–U+03FF The Greek script is used for writing the Greek language. The Greek script had a strong influence on the development of the Latin, Cyrillic, and Coptic scripts. The Greek script is written in linear sequence from left to right with the frequent use of nonspacing marks. There are two styles of such use: monotonic, which uses a single mark called tonos, and polytonic, which uses multiple marks. Greek letters come in uppercase and lowercase pairs. Spaces are used to separate words and provide the primary line breaking opportunities. Archaic Greek texts do not use spaces. Standards. The Unicode encoding of Greek is based on ISO/IEC 8859-7, which is equivalent to the Greek national standard ELOT 928, designed for monotonic Greek. The Unicode Standard encodes Greek characters in the same relative positions as in ISO/IEC 88597. A number of variant and archaic characters are taken from the bibliographic standard ISO 5428. Polytonic Greek. Polytonic Greek, used for ancient Greek (classical and Byzantine) and occasionally for modern Greek, may be encoded using either combining character sequences or precomposed base plus diacritic combinations. For the latter, see the following subsection, “Greek Extended: U+1F00–U+1FFF.” Nonspacing Marks. Several nonspacing marks commonly used with the Greek script are found in the Combining Diacritical Marks range (see Table 7-1).

Table 7-1. Nonspacing Marks Used with Greek Code

Name

Alternative Names

U+0300 U+0301 U+0304 U+0306 U+0308 U+0313 U+0314 U+0342 U+0343 U+0345

combining grave accent combining acute accent combining macron combining breve combining diaeresis combining comma above combining reversed comma above combining greek perispomeni combining greek koronis combining greek ypogegrammeni

varia tonos, oxia

dialytika psili, smooth breathing mark dasia, rough breathing mark circumflex, tilde, inverted breve comma above iota subscript

Because the characters in the Combining Diacritical Marks block are encoded by shape, not by meaning, they are appropriate for use in Greek where applicable. The character U+0344 combining greek dialytika tonos should not be used. The combination of dialytika plus tonos is instead represented by the sequence .

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

238

European Alphabetic Scripts

Multiple nonspacing marks applied to the same baseform character are encoded in insideout sequence. See the general rules for applying nonspacing marks in Section 2.11, Combining Characters. The basic Greek accent written in modern Greek is called tonos. It is represented by an acute accent (U+0301). The shape that the acute accent takes over Greek letters is generally steeper than that shown over Latin letters in Western European typographic traditions, and in earlier editions of this standard was mistakenly shown as a vertical line over the vowel. Polytonic Greek has several contrastive accents, and the accent, or tonos, written with an acute accent is referred to as oxia, in contrast to the varia, which is written with a grave accent. U+0342 combining greek perispomeni may appear as a circumflex N, an inverted breve ., a tilde O, or occasionally a macron -. Because of this variation in form, the perispomeni was encoded distinctly from U+0303 combining tilde. U+0313 combining comma above and U+0343 combining greek koronis both take the form of a raised comma over a baseform letter. U+0343 combining greek koronis was included for compatibility reasons; U+0313 combining comma above is the preferred form for general use. Greek uses guillemets for quotation marks; for Ancient Greek, the quotations tend to follow local publishing practice. Because of the possibility of confusion between smooth breathing marks and curly single quotation marks, the latter are best avoided where possible. When either breathing mark is followed by an acute or grave accent, the pair is rendered side-by-side rather than vertically stacked. Accents are typically written above their base letter in an all-lowercase or all-uppercase word; they may also be omitted from an all-uppercase word. However, in a titlecase word, accents applied to the first letter are commonly written to the left of that letter. This is a matter of presentation only—the internal representation is still the base letter followed by the combining marks. It is not the stand-alone version of the accents, which occur before the base letter in the text stream. Iota. The nonspacing mark ypogegrammeni (also known as iota subscript in English) can be applied to the vowels alpha, eta, and omega to represent historic diphthongs. This mark appears as a small iota below the vowel. When applied to a single uppercase vowel, the iota does not appear as a subscript, but is instead normally rendered as a regular lowercase iota to the right of the uppercase vowel. This form of the iota is called prosgegrammeni (also known as iota adscript in English). In completely uppercased words, the iota subscript should be replaced by a capital iota following the vowel. Precomposed characters that contain iota subscript or iota adscript also have special mappings. (See Section 5.18, Case Mappings.) Archaic representations of Greek words, which did not have lowercase or accents, use the Greek capital letter iota following the vowel for these diphthongs. Such archaic representations require special case mapping, which may not be automatically derivable. Variant Letterforms. U+03A5 greek capital letter upsilon has two common forms: one looks essentially like the Latin capital Y, and the other has two symmetric upper branches that curl like rams’ horns, “Y”. The Y-form glyph has been chosen consistently for use in the code charts, both for monotonic and polytonic Greek. For mathematical usage,

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

7.2

Greek

239

the rams’ horn form of the glyph is required to distinguish it from the Latin Y. A third form is also encoded as U+03D2 greek upsilon with hook symbol (see Figure 7-4). The precomposed characters U+03D3 greek upsilon with acute and hook symbol and U+03D4 greek upsilon with diaeresis and hook symbol should not normally be needed, except where necessary for backward compatibility for legacy character sets.

Figure 7-4. Variations in Greek Capital Letter Upsilon

XYZ Variant forms of several other Greek letters are encoded as separate characters in this block. Often (but not always), they represent different forms taken on by the character when it appears in the final position of a word. Examples include U+03C2 greek small letter final sigma used in a final position and U+03D0 greek beta symbol, which is the form that U+03B2 greek small letter beta would take on in a medial or final position. Of these variant letterforms, only final sigma should be used in encoding standard Greek text to indicate a final sigma. It is also encoded in ISO/IEC 8859-7 and ISO 5428 for this purpose. Because use of the final sigma is a matter of spelling convention, software should not automatically substitute a final form for a nominal form at the end of a word. However, when performing lowercasing, the final form needs to be generated based on the context. See Section 3.13, Default Case Algorithms. In contrast, U+03D0 greek beta symbol, U+03D1 greek theta symbol, U+03D2 greek upsilon with hook symbol, U+03D5 greek phi symbol, U+03F0 greek kappa symbol, U+03F1 greek rho symbol, U+03F4 greek capital theta symbol, U+03F5 greek lunate epsilon symbol, and U+03F6 greek reversed lunate epsilon symbol should be used only in mathematical formulas—never in Greek text. If positional or other shape differences are desired for these characters, they should be implemented by a font or rendering engine. Representative Glyphs for Greek Phi. Starting with The Unicode Standard, Version 3.0, and the concurrent second edition of ISO/IEC 10646-1, the representative glyphs for U+03C6 ϕ greek small letter phi and U+03D5 φ greek phi symbol were swapped compared to earlier versions. In ordinary Greek text, the character U+03C6 is used exclusively, although this character has considerable glyphic variation, sometimes represented with a glyph more like the representative glyph shown for U+03C6 ϕ (the “loopy” form) and less often with a glyph more like the representative glyph shown for U+03D5 φ (the “straight” form). For mathematical and technical use, the straight form of the small phi is an important symbol and needs to be consistently distinguishable from the loopy form. The straight-form phi glyph is used as the representative glyph for the symbol phi at U+03D5 to satisfy this distinction. The representative glyphs were reversed in versions of the Unicode Standard prior to Unicode 3.0. This resulted in the problem that the character explicitly identified as the mathe-

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

240

European Alphabetic Scripts

matical symbol did not have the straight form of the character that is the preferred glyph for that use. Furthermore, it made it unnecessarily difficult for general-purpose fonts supporting ordinary Greek text to add support for Greek letters used as mathematical symbols. This resulted from the fact that many of those fonts already used the loopy-form glyph for U+03C6, as preferred for Greek body text; to support the phi symbol as well, they would have had to disrupt glyph choices already optimized for Greek text. When mapping symbol sets or SGML entities to the Unicode Standard, it is important to make sure that codes or entities that require the straight form of the phi symbol be mapped to U+03D5 and not to U+03C6. Mapping to the latter should be reserved for codes or entities that represent the small phi as used in ordinary Greek text. Fonts used primarily for Greek text may use either glyph form for U+03C6, but fonts that also intend to support technical use of the Greek letters should use the loopy form to ensure appropriate contrast with the straight form used for U+03D5. Greek Letters as Symbols. The use of Greek letters for mathematical variables and operators is well established. Characters from the Greek block may be used for these symbols. For compatibility purposes, a few Greek letters are separately encoded as symbols in other character blocks. Examples include U+00B5 µ micro sign in the Latin-1 Supplement character block and U+2126 Ω ohm sign in the Letterlike Symbols character block. The ohm sign is canonically equivalent to the capital omega, and normalization would remove any distinction. Its use is therefore discouraged in favor of capital omega. The same equivalence does not exist between micro sign and mu, and use of either character as a micro sign is common. For Greek text, only the mu should be used. Symbols Versus Numbers. The characters stigma, koppa, and sampi are used only as numerals, whereas archaic koppa and digamma are used only as letters. Compatibility Punctuation. Two specific modern Greek punctuation marks are encoded in the Greek and Coptic block: U+037E “;” greek question mark and U+0387 “·” greek ano teleia. The Greek question mark (or erotimatiko) has the shape of a semicolon, but functions as a question mark in the Greek script. The ano teleia has the shape of a middle dot, but functions as a semicolon in the Greek script. These two compatibility punctuation characters have canonical equivalences to U+003B semicolon and U+00B7 middle dot, respectively; as a result, normalized Greek text will lose any distinctions between the Greek compatibility punctuation characters and the common punctuation marks. Furthermore, ISO/IEC 8859-7 and most vendor code pages for Greek simply make use of semicolon and middle dot for the punctuation in question. Therefore, use of U+037E and U+0387 is not necessary for interoperating with legacy Greek data, and their use is not generally encouraged for representation of Greek punctuation. Historic Letters. Historic Greek letters have been retained from ISO 5428. Coptic-Unique Letters. In the Unicode Standard prior to Version 4.1, the Coptic script was regarded primarily as a stylistic variant of the Greek alphabet. The letters unique to Coptic

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Chapter 8

Middle Eastern Scripts

8

The scripts in this chapter have a common origin in the ancient Phoenician alphabet. They include Hebrew

Syriac

Arabic

Thaana

The Hebrew script is used in Israel and for languages of the Diaspora. The Arabic script is used to write many languages throughout the Middle East, North Africa, and certain parts of Asia. The Syriac script is used to write a number of Middle Eastern languages. These three also function as major liturgical scripts, used worldwide by various religious groups. The Thaana script is used to write Dhivehi, the language of the Republic of Maldives, an island nation in the middle of the Indian Ocean. The Middle Eastern scripts are mostly abjads, with small character sets. Words are demarcated by spaces. Except for Thaana, these scripts include a number of distinctive punctuation marks. In addition, the Arabic script includes traditional forms for digits, called “Arabic-Indic digits” in the Unicode Standard. Text in these scripts is written from right to left. Implementations of these scripts must conform to the Unicode Bidirectional Algorithm (see Unicode Standard Annex #9, “The Bidirectional Algorithm”). For more information about writing direction, see Section 2.10, Writing Direction. There are also special security considerations that apply to bidirectional scripts, especially with regard to their use in identifiers. For more information about these issues, see Unicode Technical Report #36, “Unicode Security Considerations.” Arabic and Syriac are cursive scripts even when typeset, unlike Hebrew and Thaana, where letters are unconnected. Most letters in Arabic and Syriac assume different forms depending on their position in a word. Shaping rules for the rendering of text are specified in Section 8.2, Arabic, and Section 8.3, Syriac. Shaping rules are not required for Hebrew because only five letters have position-dependent final forms, and these forms are separately encoded. Historically, Middle Eastern scripts did not write short vowels. Nowadays, short vowels are represented by marks positioned above or below a consonantal letter. Vowels and other marks of pronunciation (“vocalization”) are encoded as combining characters, so support for vocalized text necessitates use of composed character sequences. Yiddish, Syriac, and

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

264

Middle Eastern Scripts

Thaana are normally written with vocalization; Hebrew and Arabic are usually written unvocalized.

8.1 Hebrew Hebrew: U+0590–U+05FF The Hebrew script is used for writing the Hebrew language as well as Yiddish, Judezmo (Ladino), and a number of other languages. Vowels and various other marks are written as points, which are applied to consonantal base letters; these marks are usually omitted in Hebrew, except for liturgical texts and other special applications. Five Hebrew letters assume a different graphic form when they occur last in a word. Directionality. The Hebrew script is written from right to left. Conformant implementations of Hebrew script must use the Unicode Bidirectional Algorithm (see Unicode Standard Annex #9, “The Bidirectional Algorithm”). Cursive. The Unicode Standard uses the term cursive to refer to writing where the letters of a word are connected. A handwritten form of Hebrew is known as cursive, but its rounded letters are generally unconnected, so the Unicode definition does not apply. Fonts based on cursive Hebrew exist. They are used not only to show examples of Hebrew handwriting, but also for display purposes. Standards. ISO/IEC 8859-8—Part 8. Latin/Hebrew Alphabet. The Unicode Standard encodes the Hebrew alphabetic characters in the same relative positions as in ISO/IEC 8859-8; however, there are no points or Hebrew punctuation characters in that ISO standard. Vowels and Other Marks of Pronunciation. These combining marks, generically called points in the context of Hebrew, indicate vowels or other modifications of consonantal letters. General rules for applying combining marks are given in Section 2.11, Combining Characters, and Section 3.11, Canonical Ordering Behavior. Additional Hebrew-specific behavior is described below. Hebrew points can be separated into four classes: dagesh, shin dot and sin dot, vowels, and other marks of punctuation. Dagesh, U+05BC hebrew point dagesh or mapiq, has the form of a dot that appears inside the letter that it affects. It is not a vowel but rather a diacritic that affects the pronunciation of a consonant. The same base consonant can also have a vowel and/or other diacritics. Dagesh is the only element that goes inside a letter. The dotted Hebrew consonant shin is explicitly encoded as the sequence U+05E9 hebrew letter shin followed by U+05C1 hebrew point shin dot. The shin dot is positioned on the upper-right side of the undotted base letter. Similarly, the dotted consonant sin is explicitly encoded as the sequence U+05E9 hebrew letter shin followed by U+05C2 hebrew point sin dot. The sin dot is positioned on the upper-left side of the base letter.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

8.1

Hebrew

265

The two dots are mutually exclusive. The base letter shin can also have a dagesh, a vowel, and other diacritics. The two dots are not used with any other base character. Vowels all appear below the base character that they affect, except for holam, U+05B9 hebrew point holam, which appears above left. The following points represent vowels: U+05B0..U+05B9, U+05BB. The remaining three points are marks of pronunciation: U+05BD hebrew point meteg, U+05BF hebrew point rafe, and U+FB1E hebrew point judeo-spanish varika. Meteg, also known as siluq, goes below the base character; rafe and varika go above it. The varika, used in Judezmo, is a glyphic variant of rafe. Shin and Sin. Separate characters for the dotted letters shin and sin are not included in this block. When it is necessary to distinguish between the two forms, they should be encoded as U+05E9 hebrew letter shin followed by the appropriate dot, either U+05C1 hebrew point shin dot or U+05C2 hebrew point sin dot. (See preceding discussion.) This practice is consistent with Israeli standard encoding. Final (Contextual Variant) Letterforms. Variant forms of five Hebrew letters are encoded as separate characters in this block, as in Hebrew standards including ISO/IEC 8859-8. These variant forms are generally used in place of the nominal letterforms at the end of words. Certain words, however, are spelled with nominal rather than final forms, particularly names and foreign borrowings in Hebrew and some words in Yiddish. Because final form usage is a matter of spelling convention, software should not automatically substitute final forms for nominal forms at the end of words. The positional variants should be coded directly and rendered one-to-one via their own glyphs—that is, without contextual analysis. Yiddish Digraphs. The digraphs are considered to be independent characters in Yiddish. The Unicode Standard has included them as separate characters so as to distinguish certain letter combinations in Yiddish text—for example, to distinguish the digraph double vav from an occurrence of a consonantal vav followed by a vocalic vav. The use of digraphs is consistent with standard Yiddish orthography. Other letters of the Yiddish alphabet, such as pasekh alef, can be composed from other characters, although alphabetic presentation forms are also encoded. Punctuation. Most punctuation marks used with the Hebrew script are not given independent codes (that is, they are unified with Latin punctuation) except for the few cases where the mark has a unique form in Hebrew—namely, U+05BE hebrew punctuation maqaf, U+05C0 hebrew punctuation paseq (also known as legarmeh), U+05C3 hebrew punctuation sof pasuq, U+05F3 hebrew punctuation geresh, and U+05F4 hebrew punctuation gershayim. For paired punctuation such as parentheses, the glyphs chosen to represent U+0028 left parenthesis and U+0029 right parenthesis will depend on the direction of the rendered text. See Section 4.7, Bidi Mirrored—Normative, for more information. For additional punctuation to be used with the Hebrew script, see Section 6.2, General Punctuation.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

266

Middle Eastern Scripts

Cantillation Marks. Cantillation marks are used in publishing liturgical texts, including the Bible. There are various historical schools of cantillation marking; the set of marks included in the Unicode Standard follows the Israeli standard SI 1311.2. Positioning. Marks may combine with vowels and other points, and complex typographic rules dictate how to position these combinations. The vertical placement (meaning above, below, or inside) of points and marks is very well defined. The horizontal placement (meaning left, right, or center) of points is also very well defined. The horizontal placement of marks, by contrast, is not well defined, and convention allows for the different placement of marks relative to their base character. When points and marks are located below the same base letter, the point always comes first (on the right) and the mark after it (on the left), except for the marks yetiv, U+059A hebrew accent yetiv, and dehi, U+05AD hebrew accent dehi. These two marks come first (on the right) and are followed (on the left) by the point. These rules are followed when points and marks are located above the same base letter: • If the point is holam, all cantillation marks precede it (on the right) except pashta, U+0599 hebrew accent pashta. • Pashta always follows (goes to the left of) points. • Holam on a sin consonant (shin base + sin dot) follows (goes to the left of) the sin dot. However, the two combining marks are sometimes rendered as a single assimilated dot. • Shin dot and sin dot are generally represented closer vertically to the base letter than other points and marks that go above it. Meteg. Meteg, U+05BD hebrew point meteg, frequently co-occurs with vowel points below the consonant. Typically, meteg is placed to the left of the vowel, although in some manuscripts and printed texts it is positioned to the right of the vowel. The difference in positioning is not known to have any semantic significance; nevertheless, some authors wish to retain the positioning found in source documents. The alternate vowel-meteg ordering can be represented in terms of alternate ordering of characters in encoded representation. However, because of the fixed-position canonical combining classes to which meteg and vowel points are assigned, differences in ordering of such characters are not preserved under normalization. The combining grapheme joiner can be used within a vowel-meteg sequence to preserve an ordering distinction under normalization. For more information, see the description of U+034F combining grapheme joiner in Section 16.2, Layout Controls. For example, to display meteg to the left of (after, for a right-to-left script) the vowel point sheva, U+05B0 hebrew point sheva, the sequence of meteg following sheva can be used: <sheva, meteg>

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

8.1

Hebrew

267

Because these marks are canonically ordered, this sequence is preserved under normalization. Then, to display meteg to the right of the sheva, the sequence with meteg preceding sheva with an intervening CGJ can be used: <meteg, CGJ, sheva> A further complication arises for combinations of meteg with hataf vowels: U+05B1 hebrew point hataf segol, U+05B2 hebrew point hataf patah, and U+05B3 hebrew point hataf qamats. These vowel points have two side-by-side components. Meteg can be placed to the left or the right of a hataf vowel, but it also is often placed between the two components of the hataf vowel. A three-way positioning distinction is needed for such cases. The combining grapheme joiner can be used to preserve an ordering that places meteg to the right of a hataf vowel, as described for combinations of meteg with non-hataf vowels, such as sheva. Placement of meteg between the components of a hataf vowel can be conceptualized as a ligature of the hataf vowel and a nominally positioned meteg. With this in mind, the ligation-control functionality of U+200D zero width joiner and U+200C zero width nonjoiner can be used as a mechanism to control the visual distinction between a nominally positioned meteg to the left of a hataf vowel versus the medially positioned meteg within the hataf vowel. That is, zero width joiner can be used to request explicitly a medially positioned meteg, and zero width non-joiner can be used to request explicitly a left-positioned meteg. Just as different font implementations may or may not display an “fi” ligature by default, different font implementations may or may not display meteg in a medial position when combined with hataf vowels by default. As a result, authors who want to ensure left-position versus medial-position display of meteg with hataf vowels across all font implementations may use joiner characters to distinguish these cases. Thus the following encoded representations can be used for different positioning of meteg with a hataf vowel, such as hataf patah: left-positioned meteg: medially positioned meteg: right-positioned meteg: <meteg, CGJ, hataf patah> In no case is use of ZWNJ, ZWJ, or CGJ required for representation of meteg. These recommendations are simply provided for interoperability in those instances where authors wish to preserve specific positional information regarding the layout of a meteg in text. Atnah Hafukh and Qamats Qatan. In some older versions of Biblical text, a distinction is made between the accents U+05A2 hebrew accent atnah hafukh and U+05AA hebrew accent yerah ben yomo. Many editions from the last few centuries do not retain this distinction, using only yerah ben yomo, but some users in recent decades have begun to reintroduce this distinction. Similarly, a number of publishers of Biblical or other religious texts have introduced a typographic distinction for the vowel point qamats corresponding to two different readings. The original letterform used for one reading is referred to as

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

268

Middle Eastern Scripts

qamats or qamats gadol; the new letterform for the other reading is qamats qatan. Not all users of Biblical Hebrew use atnah hafukh and qamats qatan. If the distinction between accents atnah hafukh and yerah ben yomo is not made, then only U+05AA hebrew accent yerah ben yomo is used. If the distinction between vowels qamats gadol and qamats qatan is not made, then only U+05B8 hebrew point qamats is used. Implementations that support Hebrew accents and vowel points may not necessarily support the special-usage characters U+05A2 hebrew accent atnah hafukh and U+05C7 hebrew point qamats qatan. Holam Male and Holam Haser. The vowel point holam represents the vowel phoneme /o/. The consonant letter vav represents the consonant phoneme /w/, but in some words is used to represent a vowel, /o/. When the point holam is used on vav, the combination usually represents the vowel /o/, but in a very small number of cases represents the consonantvowel combination /wo/. A typographic distinction is made between these two in many versions of Biblical text. In most cases, in which vav + holam together represents the vowel /o/, the point holam is centered above the vav and referred to as holam male. In the less frequent cases, in which the vav represents the consonant /w/, some versions show the point holam positioned above left. This is referred to as holam haser. The character U+05BA hebrew point holam haser for vav is intended for use as holam haser only in those cases where a distinction is needed. When the distinction is made, the character U+05B9 hebrew point holam is used to represent the point holam male on vav. U+05BA hebrew point holam haser for vav is intended for use only on vav; results of combining this character with other base characters are not defined. Not all users distinguish between the two forms of holam, and not all implementations can be assumed to support U+05BA hebrew point holam haser for vav. Puncta Extraordinaria. In the Hebrew Bible, dots are written in various places above or below the base letters that are distinct from the vowel points and accents. These dots are referred to by scholars as puncta extraordinaria, and there are two kinds. The upper punctum, the more common of the two, has been encoded since Unicode 2.0 as U+05C4 hebrew mark upper dot. The lower punctum is used in only one verse of the Bible, Psalm 27:13, and is encoded as U+05C5 hebrew mark lower dot. The puncta generally differ in appearance from dots that occur above letters used to represent numbers; the number dots should be represented using U+0307 combining dot above and U+0308 combining diaeresis. Nun Hafukha. The nun hafukha is a special symbol that appears to have been used for scribal annotations, although its exact functions are uncertain. It is used a total of nine times in the Hebrew Bible, although not all versions include it, and there are variations in the exact locations in which it is used. There is also variation in the glyph used: it often has the appearance of a rotated or reversed nun and is very often called inverted nun; it may also appear similar to a half tet or have some other form. Currency Symbol. The new sheqel sign (U+20AA) is encoded in the currency block.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

8.2

Arabic

269

Alphabetic Presentation Forms: U+FB1D–U+FB4F The Hebrew characters in this block are chiefly of two types: variants of letters and marks encoded in the main Hebrew block, and precomposed combinations of a Hebrew letter or digraph with one or more vowels or pronunciation marks. This block contains all of the vocalized letters of the Yiddish alphabet. The alef lamed ligature and a Hebrew variant of the plus sign are included as well. The Hebrew plus sign variant, U+FB29 hebrew letter alternative plus sign, is used more often in handwriting than in print, but it does occur in school textbooks. It is used by those who wish to avoid cross symbols, which can have religious and historical connotations. U+FB20 hebrew letter alternative ayin is an alternative form of ayin that may replace the basic form U+05E2 hebrew letter ayin when there is a diacritical mark below it. The basic form of ayin is often designed with a descender, which can interfere with a mark below the letter. U+FB20 is encoded for compatibility with implementations that substitute the alternative form in the character data, as opposed to using a substitute glyph at rendering time. Use of Wide Letters. Wide letterforms are used in handwriting and in print to achieve even margins. The wide-form letters in the Unicode Standard are those that are most commonly “stretched” in justification. If Hebrew text is to be rendered with even margins, justification should be left to the text-formatting software. These alphabetic presentation forms are included for compatibility purposes. For the preferred encoding, see the Hebrew presentation forms, U+FB1D..U+FB4F. For letterlike symbols, see U+2135..U+2138.

8.2 Arabic Arabic: U+0600–U+06FF The Arabic script is used for writing the Arabic language and has been extended to represent a number of other languages, such as Persian, Urdu, Pashto, Sindhi, and Kurdish, as well as many African languages. Urdu is often written with the ornate Nastaliq script variety. Some languages, such as Indonesian/Malay, Turkish, and Ingush, formerly used the Arabic script but now employ the Latin or Cyrillic scripts. The Arabic script is cursive, even in its printed form (see Figure 8-1). As a result, the same letter may be written in different forms depending on how it joins with its neighbors. Vowels and various other marks may be written as combining marks called harakat, which are applied to consonantal base letters. In normal writing, however, these harakat are omitted. Directionality. The Arabic script is written from right to left. Conformant implementations of Arabic script must use the Unicode Bidirectional Algorithm to reorder the memory

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

270

Middle Eastern Scripts

representation for display (see Unicode Standard Annex #9, “The Bidirectional Algorithm”).

Figure 8-1. Directionality and Cursive Connection

Memory representation: After reordering: After joining: Standards. ISO/IEC 8859-6—Part 6. Latin/Arabic Alphabet. The Unicode Standard encodes the basic Arabic characters in the same relative positions as in ISO/IEC 8859-6. ISO/IEC 8859-6, in turn, is based on ECMA-114, which was based on ASMO 449. Encoding Principles. The basic set of Arabic letters is well defined. Each letter receives only one Unicode character value in the basic Arabic block, no matter how many different contextual appearances it may exhibit in text. Each Arabic letter in the Unicode Standard may be said to represent the inherent semantic identity of the letter. A word is spelled as a sequence of these letters. The representative glyph shown in the Unicode character chart for an Arabic letter is usually the form of the letter when standing by itself. It is simply used to distinguish and identify the character in the code charts and does not restrict the glyphs used to represent it. Punctuation. Most punctuation marks used with the Arabic script are not given independent codes (that is, they are unified with Latin punctuation), except for the few cases where the mark has a significantly different appearance in Arabic—namely, U+060C arabic comma, U+061B arabic semicolon, U+061E arabic triple dot punctuation mark, U+061F arabic question mark, and U+066A arabic percent sign. For paired punctuation such as parentheses, the glyphs chosen to represent U+0028 left parenthesis and U+0029 right parenthesis will depend on the direction of the rendered text. The Non-joiner and the Joiner. The Unicode Standard provides two user-selectable formatting codes: U+200C zero width non-joiner and U+200D zero width joiner (see Figure 8-2, Figure 8-3, and Figure 8-4). The use of a non-joiner between two letters prevents those letters from forming a cursive connection with each other when rendered. Examples include the Persian plural suffix, some Persian proper names, and Ottoman Turkish vowels. The use of a joiner adjacent to a suitable letter permits that letter to form a cursive connection without a visible neighbor. This provides a simple way to encode some special cases, such as exhibiting a connecting form in isolation. For further discussion of joiners and non-joiners, see Section 16.2, Layout Controls. Harakat (Vowel) Nonspacing Marks. Harakat are marks that indicate vowels or other modifications of consonant letters. The code charts depict a character in the harakat range in relation to a dashed circle, indicating that this character is intended to be applied via some process to the character that precedes it in the text stream (that is, the base character).

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

8.2

Arabic

271

Figure 8-2. Using a Joiner

Memory representation: After reordering: After joining: Figure 8-3. Using a Non-joiner

Memory representation: After reordering: After joining: Figure 8-4. Combinations of Joiners and Non-joiners

Memory representation: After reordering: After joining: General rules for applying nonspacing marks are given in Section 7.9, Combining Marks. The few marks that are placed after (to the left of) the base character are treated as ordinary spacing characters in the Unicode Standard. The Unicode Standard does not specify a sequence order in case of multiple harakat applied to the same Arabic base character, as there is no possible ambiguity of interpretation. For more information about the canonical ordering of nonspacing marks, see Section 2.11, Combining Characters, and Section 3.11, Canonical Ordering Behavior. The placement and rendering of vowel and other marks in Arabic strongly depends on the typographical environment or even the typographical style. For example, in Chapter 17, Code Charts, the default position of U+0651 L arabic shadda is with the glyph placed above the base character, whereas for U+064D  arabic kasratan the glyph is placed below the base character, as shown in the first example in Figure 8-5. However, computer fonts often follow an approach that originated in metal typesetting and combine the kasratan with shadda in a ligature placed above the text, as shown in the second example in Figure 8-5. Arabic-Indic Digits. The names for the forms of decimal digits vary widely across different languages. The decimal numbering system originated in India (Devanagari vwx …) and was subsequently adopted in the Arabic world with a different appearance (Arabic ٠١٢٣ …). The Europeans adopted decimal numbers from the Arabic world, although

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

272

Middle Eastern Scripts

Figure 8-5. Placement of Harakat

once again the forms of the digits changed greatly (European 0123 …). The European forms were later adopted widely around the world and are used even in many Arabicspeaking countries in North Africa. In each case, the interpretation of decimal numbers remained the same. However, the forms of the digits changed to such a degree that they are no longer recognizably the same characters. Because of the origin of these characters, the European decimal numbers are widely known as “Arabic numerals” or “Hindi-Arabic numerals,” whereas the decimal numbers in use in the Arabic world are widely known there as “Hindi numbers.” The Unicode Standard includes Indic digits (including forms used with different Indic scripts), Arabic digits (with forms used in most of the Arabic world), and European digits (now used internationally). Because of this decision, the traditional names could not be retained without confusion. In addition, there are two main variants of the Arabic digits: those used in Iran, Pakistan, and Afghanistan (here called Eastern Arabic-Indic) and those used in other parts of the Arabic world. In summary, the Unicode Standard uses the names shown in Table 8-1.

Table 8-1. Arabic Digit Names Name

Code Points

Forms

European

U+0030..U+0039

0123456789

Arabic-Indic

U+0660..U+0669

Eastern Arabic-Indic

U+06F0..U+06F9

Indic (Devanagari)

U+0966..U+096F

٠١٢٣٤٥٦٧٨٩ ÒÚÛÙıˆ˜¯˘ vwx yz{|}~

There is substantial variation in usage of glyphs for the Eastern Arabic-Indic digits, especially for the digits four, five, six, and seven. Table 8-2 illustrates this variation with some example glyphs for digits in languages of Iran, Pakistan, and India. While some usage of the Persian glyph for U+06F7 extended arabic-indic digit seven can be documented for Sindhi, the form shown in Table 8-2 is predominant. The Unicode Standard provides a single, complete sequence of digits for Persian, Sindhi, and Urdu to account for the differences in appearance and directional treatment when rendering them. (For a complete discussion of directional formatting of numbers in the Unicode Standard, see Unicode Standard Annex #9, “The Bidirectional Algorithm.”)

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

8.2

Arabic

273

Table 8-2. Glyph Variation in Eastern Arabic-Indic Digits Code Point Digit

Persian

Sindhi

Urdu

U+06F4

4

D

d

T

U+06F5

5

E

e

U

U+06F6

6

F

f

V

U+06F7

7

G

g

W

Extended Arabic Letters. Arabic script is used to write major languages, such as Persian and Urdu, but it has also been used to transcribe some lesser-used languages, such as Baluchi and Lahnda, which have little tradition of printed typography. As a result, the Unicode Standard encodes multiple forms of some Extended Arabic letters because the character forms and usages are not well documented for a number of languages. For additional extended Arabic letters, see the Arabic Supplement block, U+0750..U+077F. Koranic Annotation Signs. These characters are used in the Koran to mark pronunciation and other annotation. The enclosing mark U+06DE is used to enclose a digit. When rendered, the digit appears in a smaller size. Additional Vowel Marks. When the Arabic script is adopted as the writing system for a language other than Arabic, it is often necessary to represent vowel sounds or distinctions not made in Arabic. In some cases, conventions such as the addition of small dots above and/or below the standard Arabic fatha, damma, and kasra signs have been used. Classical Arabic has only three canonical vowels (/a/, /i/, /u/), whereas languages such as Urdu and Persian include other contrasting vowels such as /o/ and /e/. For this reason, it is imperative that speakers of these languages be able to show the difference between /e/ and / i/ (U+0656 arabic subscript alef), and between /o/ and /u/ (U+0657 arabic inverted damma). At the same time, the use of these two diacritics in Arabic is redundant, merely emphasizing that the underlying vowel is long. Honorifics. Marks known as honorifics represent phrases expressing the status of a person and are in widespread use in the Arabic-script world. Most have a specifically religious meaning. In effect, these marks are combining characters at the word level, rather than being associated with a single base character. Depending on the letter shapes present in the name and the calligraphic style in use, the honorific mark may be applied to a letter somewhere in the middle of the name. The normalization algorithm does not move such wordlevel combining characters to the end of the word. Date Separator. U+060D arabic date separator is used in Pakistan and India between the numeric date and the month name when writing out a date. This sign is distinct from U+002F solidus, which is used, for example, as a separator in currency amounts. Full Stop. U+061E arabic triple dot punctuation mark is encoded for traditional orthographic practice using the Arabic script to write African languages such as Hausa, Wolof, Fulani, and Mandinka. These languages use arabic triple dot punctuation mark as a full stop.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

274

Middle Eastern Scripts

Currency Symbols. U+060B afghani sign is a currency symbol used in Afghanistan. The symbol is derived from an abbreviation of the name of the currency, which has become a symbol in its own right. U+FDFC rial sign is a currency symbol used in Iran. Unlike the afghani sign, U+FDFC rial sign is considered a compability character, encoded for compatibility with Iranian standards. Ordinarily in Persian “rial” is simply spelled out as the sequence of letters, <0631, 06CC, 0627, 0644>. End of Ayah. U+06DD arabic end of ayah graphically encloses a sequence of zero or more digits (of General Category Nd) that follow it in the data stream. The enclosure terminates with any non-digit. For behavior of a similar prefixed formatting control, see the discussion of U+070F syriac abbreviation mark in Section 8.3, Syriac. Other Signs Spanning Numbers. Several other special signs are written in association with numbers in the Arabic script. U+0600 arabic number sign signals the beginning of a number; it is written below the digits of the number. U+0601 arabic sign sanah indicates a year (that is, as part of a date). This sign is rendered below the digits of the number it precedes. Its appearance is a vestigial form of the Arabic word for year, /sanatu/ (seen noon teh-marbuta), but it is now a sign in its own right and is widely used to mark a numeric year even in non-Arabic languages where the Arabic word would not be known. The use of the year sign is illustrated in Figure 8-6.

Figure 8-6. Arabic Year Sign

Z

U+0602 arabic footnote marker is another of these signs; it is used in the Arabic script in conjunction with the footnote number itself. It also precedes the digits in logical order and is written to extend underneath them. Finally, U+0603 arabic sign safha functions as a page sign, preceding and extending under a sequence of digits for a page number. Like U+06DD arabic end of ayah, all of these signs can span multiple-digit numbers, rather than just a single digit. They are not formally considered combining marks in the sense used by the Unicode Standard, although they clearly interact graphically with the sequence of digits that follows them. They precede the sequence of digits that they span, rather than following a base character, as would be the case for a combining mark. Their General Category value is Cf (format control character). Unlike most other format control characters, however, they should be rendered with a visible glyph, even in circumstances where no suitable digit or sequence of digits follows them in logical order. Poetic Verse Sign. U+060E arabic poetic verse sign is a special symbol often used to mark the beginning of a poetic verse. Although it is similar to U+0602 arabic footnote marker in appearance, the poetic sign is simply a symbol. In contrast, the footnote marker

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

8.2

Arabic

275

is a format control character that has complex rendering in conjunction with following digits. U+060F arabic sign misra is another symbol used in poetry.

Arabic Cursive Joining Minimum Rendering Requirements. A rendering or display process must convert between the logical order in which characters are placed in the backing store and the visual (or physical) order required by the display device. See Unicode Standard Annex #9, “The Bidirectional Algorithm,” for a description of the conversion between logical and visual orders. The cursive nature of the Arabic script imposes special requirements on display or rendering processes that are not typically found in Latin script-based systems. At a minimum, a display process must select an appropriate glyph to depict each Arabic letter according to its immediate joining context; furthermore, it must substitute certain ligature glyphs for sequences of Arabic characters. The remainder of this section specifies a minimum set of rules that provide legible Arabic joining and ligature substitution behavior. Joining Classes. Each Arabic letter must be depicted by one of a number of possible contextual glyph forms. The appropriate form is determined on the basis of its joining class and the joining class of adjacent characters. Each Arabic character falls into one of the classes shown in Table 8-3. (See ArabicShaping.txt in the Unicode Character Database for a complete list.) In this table, right and left refer to visual order. The characters of the rightjoining class are exemplified in more detail in Table 8-8, and those of the dual-joining class are shown in Table 8-7. When characters do not join or cause joining (such as dammatan), they are classified as transparent.

Table 8-3. Primary Arabic Joining Classes Joining Class

Symbols

Members

Right-joining Left-joining Dual-joining Join-causing

R L D C

Non-joining

U

ALEF, DAL, THAL, REH, ZAIN … None BEH, TEH, THEH, JEEM … ZERO WIDTH JOINER (200D) and TATWEEL (0640). These characters are distinguished from the dual-joining characters in that they do not change shape themselves. ZERO WIDTH NON-JOINER (200C) and all spacing characters, except those explicitly mentioned as being one of the other joining classes, are non-joining. These include HAMZA (0621), HIGH HAMZA (0674), spaces, digits, punctuation, non-Arabic letters, and so on. Also, U+0600 arabic number sign..U+0603 arabic sign safha and U+06DD arabic end of ayah.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

276

Middle Eastern Scripts

Table 8-3. Primary Arabic Joining Classes (Continued) Joining Class

Symbols

Members

Transparent

T

All nonspacing marks (General Category Mn or Me) and most format control characters (General Category Cf) are transparent to cursive joining. These include FATHATAN (064B) and other Arabic harakat, HAMZA BELOW (0655), SUPERSCRIPT ALEF (0670), combining Koranic annotation signs, and nonspacing marks from other scripts. Also U+070F syriac abbreviation mark.

Table 8-4 defines derived superclasses of the primary Arabic joining classes; those superclasses are used in the cursive joining rules. In this table, right and left refer to visual order.

Table 8-4. Derived Arabic Joining Classes Joining Class

Members

Right join-causing Left join-causing

Superset of dual-joining, left-joining, and join-causing Superset of dual-joining, right-joining, and join-causing

Joining Rules. The following rules describe the joining behavior of Arabic letters in terms of their display (visual) order. In other words, the positions of letterforms in the included examples are presented as they would appear on the screen after the Bidirectional Algorithm has reordered the characters of a line of text. An implementation may choose to restate the following rules according to logical order so as to apply them before the Bidirectional Algorithm’s reordering phase. In this case, the words right and left as used in this section would become preceding and following. In the following rules, if X refers to a character, then various glyph types representing that character are referred to as shown in Table 8-5.

Table 8-5. Arabic Glyph Types Glyph Types

Description

Xn Xr

Nominal glyph form as it appears in the code charts Right-joining glyph form (both right-joining and dual-joining characters may employ this form) Left-joining glyph form (both left-joining and dual-joining characters may employ this form) Dual-joining (medial) glyph form that joins on both left and right (only dualjoining characters employ this form)

Xl Xm

R1 Transparent characters do not affect the joining behavior of base (spacing) characters. For example:

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Chapter 9

South Asian Scripts-I

9

The following South Asian scripts are described in this chapter: Devanagari

Gujarati

Telugu

Bengali

Oriya

Kannada

Gurmukhi

Tamil

Malayalam

The scripts of South Asia share so many common features that a side-by-side comparison of a few will often reveal structural similarities even in the modern letterforms. With minor historical exceptions, they are written from left to right. They are all abugidas in which most symbols stand for a consonant plus an inherent vowel (usually the sound /a/). Wordinitial vowels in many of these scripts have distinct symbols, and word-internal vowels are usually written by juxtaposing a vowel sign in the vicinity of the affected consonant. Absence of the inherent vowel, when that occurs, is frequently marked with a special sign. In the Unicode Standard, this sign is denoted by the Sanskrit word virZma. In some languages, another designation is preferred. In Hindi, for example, the word hal refers to the character itself, and halant refers to the consonant that has its inherent vowel suppressed; in Tamil, the word pukki is used. The virama sign nominally serves to suppress the inherent vowel of the consonant to which it is applied; it is a combining character, with its shape varying from script to script. Most of the scripts of South Asia, from north of the Himalayas to Sri Lanka in the south, from Pakistan in the west to the easternmost islands of Indonesia, are derived from the ancient Brahmi script. The oldest lengthy inscriptions of India, the edicts of Ashoka from the third century bce, were written in two scripts, Kharoshthi and Brahmi. These are both ultimately of Semitic origin, probably deriving from Aramaic, which was an important administrative language of the Middle East at that time. Kharoshthi, written from right to left, was supplanted by Brahmi and its derivatives. The descendants of Brahmi spread with myriad changes throughout the subcontinent and outlying islands. There are said to be some 200 different scripts deriving from it. By the eleventh century, the modern script known as Devanagari was in ascendancy in India proper as the major script of Sanskrit literature. The North Indian branch of scripts was, like Brahmi itself, chiefly used to write Indo-European languages such as Pali and Sanskrit, and eventually the Hindi, Bengali, and Gujarati languages, though it was also the source for scripts for non-Indo-European languages such as Tibetan, Mongolian, and Lepcha.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

296

South Asian Scripts-I

The South Indian scripts are also derived from Brahmi and, therefore, share many structural characteristics. These scripts were first used to write Pali and Sanskrit but were later adapted for use in writing non-Indo-European languages—namely, the languages of the Dravidian family of southern India and Sri Lanka. Because of their use for Dravidian languages, the South Indian scripts developed many characteristics that distinguish them from the North Indian scripts. South Indian scripts were also exported to southeast Asia and were the source of scripts such as Lanna and Myanmar, as well as the insular scripts of the Philippines and Indonesia. The shapes of letters in the South Indian scripts took on a quite distinct look from the shapes of letters in the North Indian scripts. Some scholars suggest that this occurred because writing materials such as palm leaves encouraged changes in the way letters were written. The major official scripts of India proper, including Devanagari, are documented in this chapter. They are all encoded according to a common plan, so that comparable characters are in the same order and relative location. This structural arrangement, which facilitates transliteration to some degree, is based on the Indian national standard (ISCII) encoding for these scripts and makes use of a virama. While the arrangement of the encoding for the scripts of India is based on ISCII, this does not imply that the rendering behavior of South Indian scripts in particular is the same as that of Devanagari or other North Indian scripts. Implementations should ensure that adequate attention is given to the actual behavior of those scripts; they should not assume that they work just as Devanagari does. Each block description in this chapter describes the most important aspects of rendering for a particular script as well as unique behaviors it may have. Many of the character names in this group of scripts represent the same sounds, and common naming conventions are used for the scripts of India.

9.1 Devanagari Devanagari: U+0900–U+097F The Devanagari script is used for writing classical Sanskrit and its modern historical derivative, Hindi. Extensions to the Sanskrit repertoire are used to write other related languages of India (such as Marathi) and of Nepal (Nepali). In addition, the Devanagari script is used to write the following languages: Awadhi, Bagheli, Bhatneri, Bhili, Bihari, Braj Bhasha, Chhattisgarhi, Garhwali, Gondi (Betul, Chhindwara, and Mandla dialects), Harauti, Ho, Jaipuri, Kachchhi, Kanauji, Konkani, Kului, Kumaoni, Kurku, Kurukh, Marwari, Mundari, Newari, Palpa, and Santali. All other Indic scripts, as well as the Sinhala script of Sri Lanka, the Tibetan script, and the Southeast Asian scripts, are historically connected with the Devanagari script as descendants of the ancient Brahmi script. The entire family of scripts shares a large number of structural features.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

9.1

Devanagari

297

The principles of the Indic scripts are covered in some detail in this introduction to the Devanagari script. The remaining introductions to the Indic scripts are abbreviated but highlight any differences from Devanagari where appropriate. Standards. The Devanagari block of the Unicode Standard is based on ISCII-1988 (Indian Script Code for Information Interchange). The ISCII standard of 1988 differs from and is an update of earlier ISCII standards issued in 1983 and 1986. The Unicode Standard encodes Devanagari characters in the same relative positions as those coded in positions A0–F416 in the ISCII-1988 standard. The same character code layout is followed for eight other Indic scripts in the Unicode Standard: Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam. This parallel code layout emphasizes the structural similarities of the Brahmi scripts and follows the stated intention of the Indian coding standards to enable one-to-one mappings between analogous coding positions in different scripts in the family. Sinhala, Tibetan, Thai, Lao, Khmer, Myanmar, and other scripts depart to a greater extent from the Devanagari structural pattern, so the Unicode Standard does not attempt to provide any direct mappings for these scripts to the Devanagari order. In November 1991, at the time The Unicode Standard, Version 1.0, was published, the Bureau of Indian Standards published a new version of ISCII in Indian Standard (IS) 13194:1991. This new version partially modified the layout and repertoire of the ISCII1988 standard. Because of these events, the Unicode Standard does not precisely follow the layout of the current version of ISCII. Nevertheless, the Unicode Standard remains a superset of the ISCII-1991 repertoire except for a number of new Vedic extension characters defined in IS 13194:1991 Annex G—Extended Character Set for Vedic. Modern, non-Vedic texts encoded with ISCII-1991 may be automatically converted to Unicode code points and back to their original encoding without loss of information. Encoding Principles. The writing systems that employ Devanagari and other Indic scripts constitute abugidas—a cross between syllabic writing systems and alphabetic writing systems. The effective unit of these writing systems is the orthographic syllable, consisting of a consonant and vowel (CV) core and, optionally, one or more preceding consonants, with a canonical structure of (((C )C )C)V. The orthographic syllable need not correspond exactly with a phonological syllable, especially when a consonant cluster is involved, but the writing system is built on phonological principles and tends to correspond quite closely to pronunciation. The orthographic syllable is built up of alphabetic pieces, the actual letters of the Devanagari script. These pieces consist of three distinct character types: consonant letters, independent vowels, and dependent vowel signs. In a text sequence, these characters are stored in logical (phonetic) order.

Principles of the Devanagari Script Rendering Devanagari Characters. Devanagari characters, like characters from many other scripts, can combine or change shape depending on their context. A character’s

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

298

South Asian Scripts-I

appearance is affected by its ordering with respect to other characters, the font used to render the character, and the application or system environment. These variables can cause the appearance of Devanagari characters to differ from their nominal glyphs (used in the code charts). Additionally, a few Devanagari characters cause a change in the order of the displayed characters. This reordering is not commonly seen in non-Indic scripts and occurs independently of any bidirectional character reordering that might be required. Consonant Letters. Each consonant letter represents a single consonantal sound but also has the peculiarity of having an inherent vowel, generally the short vowel /a/ in Devanagari and the other Indic scripts. Thus U+0915 devanagari letter ka represents not just /k/ but also /ka/. In the presence of a dependent vowel, however, the inherent vowel associated with a consonant letter is overridden by the dependent vowel. Consonant letters may also be rendered as half-forms, which are presentation forms used to depict the initial consonant in consonant clusters. These half-forms do not have an inherent vowel. Their rendered forms in Devanagari often resemble the full consonant but are missing the vertical stem, which marks a syllabic core. (The stem glyph is graphically and historically related to the sign denoting the inherent /a/ vowel.) Some Devanagari consonant letters have alternative presentation forms whose choice depends on neighboring consonants. This variability is especially notable for U+0930 devanagari letter ra, which has numerous different forms, both as the initial element and as the final element of a consonant cluster. Only the nominal forms, rather than the contextual alternatives, are depicted in the code chart. The traditional Sanskrit/Devanagari alphabetic encoding order for consonants follows articulatory phonetic principles, starting with velar consonants and moving forward to bilabial consonants, followed by liquids and then fricatives. ISCII and the Unicode Standard both observe this traditional order. Independent Vowel Letters. The independent vowels in Devanagari are letters that stand on their own. The writing system treats independent vowels as orthographic CV syllables in which the consonant is null. The independent vowel letters are used to write syllables that start with a vowel. Dependent Vowel Signs (Matras). The dependent vowels serve as the common manner of writing noninherent vowels and are generally referred to as vowel signs, or as matras in Sanskrit. The dependent vowels do not stand alone; rather, they are visibly depicted in combination with a base letterform. A single consonant or a consonant cluster may have a dependent vowel applied to it to indicate the vowel quality of the syllable, when it is different from the inherent vowel. Explicit appearance of a dependent vowel in a syllable overrides the inherent vowel of a single consonant letter. The greatest variation among different Indic scripts is found in the way that the dependent vowels are applied to base letterforms. Devanagari has a collection of nonspacing dependent vowel signs that may appear above or below a consonant letter, as well as spacing dependent vowel signs that may occur to the right or to the left of a consonant letter or

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

9.1

Devanagari

299

consonant cluster. Other Indic scripts generally have one or more of these forms, but what is a nonspacing mark in one script may be a spacing mark in another. Also, some of the Indic scripts have single dependent vowels that are indicated by two or more glyph components—and those glyph components may surround a consonant letter both to the left and to the right or may occur both above and below it. The Devanagari script has only one character denoting a left-side dependent vowel sign: U+093F devanagari vowel sign i. Other Indic scripts either have no such vowel signs (Telugu and Kannada) or include as many as three of these signs (Bengali, Tamil, and Malayalam). Vowel Letters. Vowel letters are encoded atomically in Unicode, even if they can be analyzed visually as consisting of multiple parts. Table 9-1 shows the letters that can be analyzed, the single code point that should be used to represent them in text, and the sequence of code points resulting from analysis that should not be used.

Table 9-1. Devanagari Vowel Letters To Represent

N O ä ç R S ë U V W

Use

Do Not Use

0904

<0905, 0946>

0906

<0905, 093E>

090A

<0909, 0941>

090D

<090F, 0945>

090E

<090F, 0946>

0910

<090F, 0947>

0911

<0905, 0949>

0912

<0905, 094A>

0913

<0905, 094B>

0914

<0905, 094C>

Virama (Halant). Devanagari employs a sign known in Sanskrit as the virama or vowel omission sign. In Hindi, it is called hal or halant, and that term is used in referring to the virama or to a consonant with its vowel suppressed by the virama. The terms are used interchangeably in this section. The virama sign, U+094D devanagari sign virama, nominally serves to cancel (or kill) the inherent vowel of the consonant to which it is applied. When a consonant has lost its inherent vowel by the application of virama, it is known as a dead consonant; in contrast, a live consonant is one that retains its inherent vowel or is written with an explicit dependent vowel sign. In the Unicode Standard, a dead consonant is defined as a sequence consisting

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

300

South Asian Scripts-I

of a consonant letter followed by a virama. The default rendering for a dead consonant is to position the virama as a combining mark bound to the consonant letterform. For example, if Cn denotes the nominal form of consonant C, and Cd denotes the dead consonant form, then a dead consonant is encoded as shown in Figure 9-1.

Figure 9-1. Dead Consonants in Devanagari TAn + VIRAMAn → TAd

Ã

+

˜

→

Ã˜

Consonant Conjuncts. The Indic scripts are noted for a large number of consonant conjunct forms that serve as orthographic abbreviations (ligatures) of two or more adjacent letterforms. This abbreviation takes place only in the context of a consonant cluster. An orthographic consonant cluster is defined as a sequence of characters that represents one or more dead consonants (denoted Cd) followed by a normal, live consonant letter (denoted Cl). Under normal circumstances, a consonant cluster is depicted with a conjunct glyph if such a glyph is available in the current font. In the absence of a conjunct glyph, the one or more dead consonants that form part of the cluster are depicted using half-form glyphs. In the absence of half-form glyphs, the dead consonants are depicted using the nominal consonant forms combined with visible virama signs (see Figure 9-2).

Figure 9-2. Conjunct Formations in Devanagari ₍1₎ GAd + DHAl → GAh + DHAn

ª˜

+

œ

→

Çœ

₍2₎ KAd + KAl → K.KAn

∑˜

+

∑→ P

₍3₎ KAd + SSAl → K.SSAn

∑˜

·

+

S

→

₍4₎ RAd + KAl → KAl + RAsup

⁄˜

+

∑

→

∑F

A number of types of conjunct formations appear in these examples: (1) a half-form of GA in its combination with the full form of DHA; (2) a vertical conjunct K.KA; and (3) a fully ligated conjunct K.SSA, in which the components are no longer distinct. In example (4) in Figure 9-2, the dead consonant RAd is depicted with the nonspacing combining mark RAsup (repha). A well-designed Indic script font may contain hundreds of conjunct glyphs, but they are not encoded as Unicode characters because they are the result of ligation of distinct letters.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

9.1

Devanagari

301

Indic script rendering software must be able to map appropriate combinations of characters in context to the appropriate conjunct glyphs in fonts. Explicit Virama (Halant). Normally a virama character serves to create dead consonants that are, in turn, combined with subsequent consonants to form conjuncts. This behavior usually results in a virama sign not being depicted visually. Occasionally, this default behavior is not desired when a dead consonant should be excluded from conjunct formation, in which case the virama sign is visibly rendered. To accomplish this goal, the Unicode Standard adopts the convention of placing the character U+200C zero width non-joiner immediately after the encoded dead consonant that is to be excluded from conjunct formation. In this case, the virama sign is always depicted as appropriate for the consonant to which it is attached. For example, in Figure 9-3, the use of zero width non-joiner prevents the default formation of the conjunct form (K.SSAn).

S

Figure 9-3. Preventing Conjunct Forms in Devanagari KAd + ZWNJ + SSAl → KAd + SSAn

∑˜

+

ZW NJ

+

·

→

∑· ˜

Explicit Half-Consonants. When a dead consonant participates in forming a conjunct, the dead consonant form is often absorbed into the conjunct form, such that it is no longer distinctly visible. In other contexts, the dead consonant may remain visible as a half-consonant form. In general, a half-consonant form is distinguished from the nominal consonant form by the loss of its inherent vowel stem, a vertical stem appearing to the right side of the consonant form. In other cases, the vertical stem remains but some part of its right-side geometry is missing. In certain cases, it is desirable to prevent a dead consonant from assuming full conjunct formation yet still not appear with an explicit virama. In these cases, the half-form of the consonant is used. To explicitly encode a half-consonant form, the Unicode Standard adopts the convention of placing the character U+200D zero width joiner immediately after the encoded dead consonant. The zero width joiner denotes a nonvisible letter that presents linking or cursive joining behavior on either side (that is, to the previous or following letter). Therefore, in the present context, the zero width joiner may be considered to present a context to which a preceding dead consonant may join so as to create the half-form of the consonant. For example, if Ch denotes the half-form glyph of consonant C, then a half-consonant form is represented as shown in Figure 9-4. In the absence of the zero width joiner, the sequence in Figure 9-4 would normally produce the full conjunct form (K.SSAn).

S

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

302

South Asian Scripts-I

Figure 9-4. Half-Consonants in Devanagari KAd + ZWJ + SSAl → KAh + SSAn

∑˜

+

ZW J

+

·

→

Ä·

This encoding of half-consonant forms also applies in the absence of a base letterform. That is, this technique may be used to encode independent half-forms, as shown in Figure 9-5.

Figure 9-5. Independent Half-Forms in Devanagari GAd

+

ZWJ

ª˜

+

ZW J

→ GAh

Ç

→

Other Indic scripts have similar half-forms for the initial consonants of a conjunct. Some, such as Oriya, also have similar half-forms for the final consonants; those are represented as shown in Figure 9-6.

Figure 9-6. Half-Consonants in Oriya KAn + ZWJ + VIRAMA + TAl → KAl + TAh

<˜

+

ZW J

+

>

+

U

→

<

In the absence of the zero width joiner, the sequence in Figure 9-6 would normally pro(K.TAn). duce the full conjunct form

V

Consonant Forms. In summary, each consonant may be encoded such that it denotes a live consonant, a dead consonant that may be absorbed into a conjunct, the half-form of a dead consonant, or a dead consonant with an overt halant that does not get absorbed into a conjunct (see Figure 9-7). As the rendering of conjuncts and half-forms depends on the availability of glyphs in the font, the following fallback strategy should be employed: • If the coded character sequence would normally render with a full conjunct, but such a conjunct is not available, the fallback rendering is to use half-forms. If those are not available, the fallback rendering should use an explicit (visible) virama.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

9.1

Devanagari

303

Figure 9-7. Consonant Forms in Devanagari and Oriya

• If the coded character sequence would normally render with a half-form (it contains a ZWJ), but half-forms are not available, the fallback rendering should use an explicit (visible) virama.

Rendering Devanagari Rules for Rendering. This section provides more formal and detailed rules for minimal rendering of Devanagari as part of a plain text sequence. It describes the mapping between Unicode characters and the glyphs in a Devanagari font. It also describes the combining and ordering of those glyphs. These rules provide minimal requirements for legibly rendering interchanged Devanagari text. As with any script, a more complex procedure can add rendering characteristics, depending on the font and application. In a font that is capable of rendering Devanagari, the number of glyphs is greater than the number of Devanagari characters. Notation. In the next set of rules, the following notation applies: Cn

Nominal glyph form of consonant C as it appears in the code charts.

Cl

A live consonant, depicted identically to Cn.

Cd

Glyph depicting the dead consonant form of consonant C.

Ch

Glyph depicting the half-consonant form of consonant C.

Ln

Nominal glyph form of a conjunct ligature consisting of two or more component consonants. A conjunct ligature composed of two consonants X and Y is also denoted X.Yn.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

304

South Asian Scripts-I

RAsup

A nonspacing combining mark glyph form of U+0930 devanagari letter ra positioned above or attached to the upper part of a base glyph form. This form is also known as repha.

RAsub

A nonspacing combining mark glyph form of U+0930 devanagari letter ra positioned below or attached to the lower part of a base glyph form.

Vvs

Glyph depicting the dependent vowel sign form of a vowel V.

VIRAMAn

The nominal glyph form of the nonspacing combining mark depicting U+094D devanagari sign virama.

A virama character is not always depicted. When it is depicted, it adopts this nonspacing mark form. Dead Consonant Rule. The following rule logically precedes the application of any other rule to form a dead consonant. Once formed, a dead consonant may be subject to other rules described next. R1 When a consonant Cn precedes a VIRAMAn , it is considered to be a dead consonant Cd . A consonant Cn that does not precede VIRAMAn is considered to be a live consonant Cl .

TAn + VIRAMAn → TAd

Ã

˜

+

→

Ã˜

Consonant RA Rules. The character U+0930 devanagari letter ra takes one of a number of visual forms depending on its context in a consonant cluster. By default, this letter is depicted with its nominal glyph form (as shown in the code charts). In some contexts, it is depicted using one of two nonspacing glyph forms that combine with a base letterform. R2 If the dead consonant RAd precedes a consonant, then it is replaced by the superscript nonspacing mark RAsup , which is positioned so that it applies to the logically subsequent element in the memory representation.

RAd + KAl → KAl + RAsup

⁄˜

+

∑

→

∑

+

F

Displayed Output

→

∑F

→

⁄˜ F

RAd + RAd → RAd + RAsup 1

⁄˜

2

+

⁄˜

2

→

Copyright © 1991-2007, Unicode, Inc.

⁄˜

1

+

F

The Unicode Standard 5.0 – Electronic edition

9.1

Devanagari

305

R3 If the superscript mark RAsup is to be applied to a dead consonant and that dead consonant is combined with another consonant to form a conjunct ligature, then the mark is positioned so that it applies to the conjunct ligature form as a whole.

RAd + JAd + NYAl → J.NYAn + RAsup

⁄˜

+

¡˜

ƒ

+

Æ

→

Displayed Output

F

+

ÆF

→

R4 If the superscript mark RAsup is to be applied to a dead consonant that is subsequently replaced by its half-consonant form, then the mark is positioned so that it applies to the form that serves as the base of the consonant cluster.

RAd + GAd + GHAl → GAh + GHAl + RAsup

⁄˜

+

ª˜

Ω

+

→

Ç

+

Ω

F

+

Displayed Output

→

ÇΩ F

R5 In conformance with the ISCII standard, the half-consonant form RRAh is represented as eyelash-RA. This form of RA is commonly used in writing Marathi and Newari.

RRAn + VIRAMAn → RRAh

⁄.

˜

+

→

:

R5a For compatibility with The Unicode Standard, Version 2.0, if the dead consonant RAd precedes zero width joiner, then the half-consonant form RAh , depicted as eyelash-RA, is used instead of RAsup .

RAd

+

ZWJ

⁄˜

+

ZW J

→ RAh →

:

R6 Except for the dead consonant RAd , when a dead consonant Cd precedes the live consonant RAl , then Cd is replaced with its nominal form Cn , and RA is replaced by the subscript nonspacing mark RAsub , which is positioned so that it applies to Cn .

TTHAd + RAl → TTHAn + RAsub Displayed Output

∆˜

+

⁄

→

∆

The Unicode Standard 5.0 – Electronic edition

+

˛

→

∆˛

Copyright © 1991–2007 Unicode, Inc.

306

South Asian Scripts-I

R7 For certain consonants, the mark RAsub may graphically combine with the consonant to form a conjunct ligature form. These combinations, such as the one shown here, are further addressed by the ligature rules described shortly.

PHAd + RAl → PHAn + RAsub

”˜

+

⁄

”

→

˛

+

Displayed Output

→

p

R8 If a dead consonant (other than RAd ) precedes RAd , then the substitution of RA for RAsub is performed as described above; however, the VIRAMA that formed RAd remains so as to form a dead consonant conjunct form.

TAd + RAd → TAn + RAsub + VIRAMAn → T.RAd

Ã˜

+

⁄˜

→

Ã

˛

+

˜

+

→

d˜

A dead consonant conjunct form that contains an absorbed RAd may subsequently combine to form a multipart conjunct form.

T.RAd + YAl → T.R.YAn

d˜

+

ÿ

òÿ

→

Modifier Mark Rules. In addition to vowel signs, three other types of combining marks may be applied to a component of an orthographic syllable or to the syllable as a whole: nukta, bindus, and svaras. R9 The nukta sign, which modifies a consonant form, is placed immediately after the consonant in the memory representation and is attached to that consonant in rendering. If the consonant represents a dead consonant, then NUKTA should precede VIRAMA in the memory representation.

KAn + NUKTAn + VIRAMAn → QAd

∑

+

.

+

˜

→

∑ .˜

R10 Other modifying marks, in particular bindus and svaras, apply to the orthographic syllable as a whole and should follow (in the memory representation) all other characters that constitute the syllable. The bindus should follow any vowel signs, and the svaras should come last. The relative placement of these marks is

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

9.1

Devanagari

307

horizontal rather than vertical; the horizontal rendering order may vary according to typographic concerns.

KAn + AAvs + CANDRABINDUn

∑

Ê

+

°

+

∑Ê °

→

Ligature Rules. Subsequent to the application of the rules just described, a set of rules governing ligature formation apply. The precise application of these rules depends on the availability of glyphs in the current font being used to display the text. R11 If a dead consonant immediately precedes another dead consonant or a live consonant, then the first dead consonant may join the subsequent element to form a two-part conjunct ligature form.

JAd + NYAl → J.NYAn

¡˜

+

ƒ

TTAd + TTHAl → TT.TTHAn

Æ

→

≈˜

+

∆

_

→

R12 A conjunct ligature form can itself behave as a dead consonant and enter into further, more complex ligatures.

SAd + TAd + RAn → SAd + T.RAn → S.T.RAn

‚˜

+

Ã˜

⁄

+

→

‚˜

+

d

→

ù

A conjunct ligature form can also produce a half-form.

K.SSAd + YAl → K.SSh + YAn

S˜

+

ÿ

→

óÿ

R13 If a nominal consonant or conjunct ligature form precedes RAsub as a result of the application of rule R6, then the consonant or ligature form may join with RAsub to form a multipart conjunct ligature (see rule R6 for more information).

KAn + RAsub → K.RAn

∑

+

˛

→

R

The Unicode Standard 5.0 – Electronic edition

PHAn + RAsub → PH.RAn

”

+

˛

→

p

Copyright © 1991–2007 Unicode, Inc.

308

South Asian Scripts-I

R14 In some cases, other combining marks will combine with a base consonant, either attaching at a nonstandard location or changing shape. In minimal rendering, there are only two cases: RAl with Uvs or UUvs .

RAl + Uvs → RUn

⁄

G

+

→

RAl + UUvs → RUUn

L

⁄

+

H

→

M

Memory Representation and Rendering Order. The storage of plain text in Devanagari and all other Indic scripts generally follows phonetic order; that is, a CV syllable with a dependent vowel is always encoded as a consonant letter C followed by a vowel sign V in the memory representation. This order is employed by the ISCII standard and corresponds to both the phonetic order and the keying order of textual data (see Figure 9-8).

Figure 9-8. Rendering Order in Devanagari Character Order

Glyph Order

KAn

+

Ivs →

∑

+

Á

Ivs + KAn

Á∑

→

Because Devanagari and other Indic scripts have some dependent vowels that must be depicted to the left side of their consonant letter, the software that renders the Indic scripts must be able to reorder elements in mapping from the logical (character) store to the presentational (glyph) rendering. For example, if Cn denotes the nominal form of consonant C, and Vvs denotes a left-side dependent vowel sign form of vowel V, then a reordering of glyphs with respect to encoded characters occurs as just shown. R15 When the dependent vowel Ivs is used to override the inherent vowel of a syllable, it is always written to the extreme left of the orthographic syllable. If the orthographic syllable contains a consonant cluster, then this vowel is always depicted to the left of that cluster.

TAd + RAl + Ivs → T.RAn + Ivs → Ivs + T.RAd

Ã˜

+

⁄

+

Á

→

Copyright © 1991-2007, Unicode, Inc.

d

+

Á

→

Ád

The Unicode Standard 5.0 – Electronic edition

9.1

Devanagari

309

R16 The presence of an explicit virama (either caused by a ZWNJ or by the absence of a conjunct in the font) blocks this reordering, and the dependent vowel Ivs is rendered after the rightmost such explicit virama.

TAd + Ã + RAl + Ivs → TAd + Ivs + RAl

§ + Ã + ⁄+ Á→ F Sample Half-Forms. Table 9-2 shows examples of half-consonant forms that are commonly used with the Devanagari script. These forms are glyphs, not characters. They may be encoded explicitly using zero width joiner as shown. In normal conjunct formation, they may be used spontaneously to depict a dead consonant in combination with subsequent consonant forms.

Table 9-2. Sample Devanagari Half-Forms

∑+ π+ ª+ Ω+ ø+ ¡+ √+ ƒ+ À+ Ã+ Õ+ œ+

0

+

Ä →

0

+

Ä →

0

+

Ä →

0

+

Ä →

0

+

Ä →

0

+

Ä →

0

+

Ä →

0

+

Ä →

0

+

Ä →

0

+

Ä →

0

+

Ä →

0

+

Ä →

Ä Å Ç É Ñ Ö ß Ü á à â ä

–+ “+ ”+ ’+ ÷+ ◊+ ÿ+ ‹+ ﬂ+ ‡+ ·+ ‚+

0

+

Ä →

0

+

Ä →

0

+

Ä →

0

+

Ä →

0

+

Ä →

0

+

Ä →

0

+

Ä →

0

+

Ä →

0

+

Ä →

0

+

Ä →

0

+

Ä →

0

+

Ä →

ã å ç é è ê ë í ì î ï ñ

Sample Ligatures. Table 9-3 shows examples of conjunct ligature forms that are commonly used with the Devanagari script. These forms are glyphs, not characters. Not every writing system that employs this script uses all of these forms; in particular, many of these forms are used only in writing Sanskrit texts. Furthermore, individual fonts may provide fewer or more ligature forms than are depicted here.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

310

South Asian Scripts-I

Table 9-3. Sample Devanagari Ligatures

∑+ ∑+ ∑+ ∑+ æ+ æ+ æ+ æ+ ƒ+ ¡+ Œ+ Œ+ Œ+ Œ+ Œ+ Œ+ Œ+ Œ+ ≈+

0

+

0

+

0

+

0

+

0

+

0

+

0

+

0

+

0

+

0

+

0

+

0

+

0

+

0

+

0

+

0

+

0

+

0

+

0

+

∑→ Ã→ ⁄ → ·→ ∑→ π→ ª→ Ω→ ¡→ ƒ→ Ω→ Œ→ œ→ ’→ ÷→ ◊→ ÿ→ ﬂ→ ≈→

P Q R S V W X Y ¨ Æ f g h i j k l m ^

≈+ ∆+ «+ «+ «+ Ã+ Ã+ –+ ”+ ‡+ „+ „+ „+ „+ „+ ⁄ + ⁄ + ⁄ + ‚+

0

+

0

+

0

+

0

+

0

+

0

+

0

+

0

+

0

+

0

+

0

+

0

+

0

+

0

+

∆ ∆ ª « … Ã ⁄ – ⁄ ⁄ ◊ ÿ ‹ ﬂ

→ → → → → → → → → → → → → →

A

→

B

→

C

→

A

→

0

+

d

→

_ n ` a b c d Ÿ p o r s t u N L M D ù

Sample Half-Ligature Forms. In addition to half-form glyphs of individual consonants, half-forms are used to depict conjunct ligature forms. A sample of such forms is shown in Table 9-4. These forms are glyphs, not characters. They may be encoded explicitly using zero width joiner as shown. In normal conjunct formation, they may be used spontaneously to depict a conjunct ligature in combination with subsequent consonant forms.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

9.1

Devanagari

311

Table 9-4. Sample Devanagari Half-Ligature Forms

∑+ ¡+ Ã+ Ã+ ‡+

0

+

0

+

0

+

0

+

0

+

·+ ƒ+ Ã+ ⁄+ ⁄+

0

+

Ä →

0

+

Ä →

0

+

Ä →

0

+

Ä →

0

+

Ä →

ó ô û ò ü

Language-Specific Allographs. In Marathi and some South Indian orthographies, variant glyphs are preferred for U+0932 devanagari letter la and U+0936 devanagari letter sha, as shown in Figure 9-9. Marathi also makes use of the “eyelash” form of the letter RA, as discussed in rule R5.

Figure 9-9. Marathi Allographs Normal

Marathi

LA

Normal

Marathi

SHA U+0932

U+0936

Combining Marks. Devanagari and other Indic scripts have a number of combining marks that could be considered diacritic. One class of these marks, known as bindus, is represented by U+0901 devanagari sign candrabindu and U+0902 devanagari sign anusvara. These marks indicate nasalization or final nasal closure of a syllable. U+093C devanagari sign nukta is a true diacritic. It is used to extend the basic set of consonant letters by modifying them (with a subscript dot in Devanagari) to create new letters. U+0951..U+0954 are a set of combining marks used in transcription of Sanskrit texts. Digits. Each Indic script has a distinct set of digits appropriate to that script. These digits may or may not be used in ordinary text in that script. European digits have displaced the Indic script forms in modern usage in many of the scripts. Some Indic scripts—notably Tamil—lack a distinct digit for zero. Punctuation and Symbols. U+0964 1 devanagari danda is similar to a full stop. U+0965 2 devanagari double danda marks the end of a verse in traditional texts. The term danda is from Sanskrit, and the punctuation mark is generally referred to as a viram instead in Hindi. Although the danda and double danda are encoded in the Devanagari block, the intent is that they be used as common punctuation for all the major scripts of India covered by this chapter. Danda and double danda punctuation marks are not separately encoded for

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

312

South Asian Scripts-I

Bengali, Gujarati, and so on. However, analogous punctuation marks for other Brahmiderived scripts are separately encoded, particularly for scripts used primarily outside of India. Many modern languages written in the Devanagari script intersperse punctuation derived from the Latin script. Thus U+002C comma and U+002E full stop are freely used in writing Hindi, and the danda is usually restricted to more traditional texts. However, the danda may be preserved when such traditional texts are transliterated into the Latin script. U+0970 3 devanagari abbreviation sign appears after letters or combinations of letters and marks the sequence as an abbreviation. Encoding Structure. The Unicode Standard organizes the nine principal Indic scripts in blocks of 128 encoding points each. The first six columns in each script are isomorphic with the ISCII-1988 encoding, except that the last 11 positions (U+0955.. U+095F in Devanagari, for example), which are unassigned or undefined in ISCII-1988, are used in the Unicode encoding. The seventh column in each of these scripts, along with the last 11 positions in the sixth column, represent additional character assignments in the Unicode Standard that are matched across all nine scripts. For example, positions U+xx66..U+xx6F and U+xxE6 .. U+xxEF code the Indic script digits for each script. The eighth column for each script is reserved for script-specific additions that do not correspond from one Indic script to the next. Other Languages. The characters U+097B devanagari letter gga, U+097C devanagari letter jja, U+097E devanagari letter ddda, and U+097F devanagari letter bba are used to write Sindhi implosive consonants. Previous versions of the Unicode Standard recommended representing those characters as a combination of the usual consonants with nukta and anudatta, but those combinations are no longer recommended. Konkani makes use of additional sounds that can be represented with combinations such as U+091A devanagari letter ca plus U+093C devanagari sign nukta and U+091F devanagari letter tta plus U+0949 devanagari vowel sign candra o.

9.2 Bengali Bengali: U+0980–U+09FF The Bengali script is a North Indian script closely related to Devanagari. It is used to write the Bengali language primarily in the West Bengal state and in the nation of Bangladesh. It is also used to write Assamese in Assam and a number of other minority languages, such as Bishnupriya Manipuri, Daphla, Garo, Hallam, Khasi, Mizo, Munda, Naga, Rian, and Santali, in northeastern India. Virama (Hasant). The Bengali script uses the Unicode virama model to form conjunct consonants. In Bengali, the virama is known as hasant.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

9.2

Bengali

313

Vowel Letters. Vowel letters are encoded atomically in Unicode, even if they can be analyzed visually as consisting of multiple parts. Table 9-5 shows the letters that can be analyzed, the single code point that should be used to represent them in text, and the sequence of code points resulting from analysis that should not be used.

Table 9-5. Bengali Vowel Letters To Represent

X

Use

Do Not Use

0986

<0985, 09BE>

Two-Part Vowel Signs. The Bengali script, along with a number of other Indic scripts, makes use of two-part vowel signs. In these vowels one-half of the vowel is placed on each side of a consonant letter or cluster—for example, U+09CB bengali vowel sign o and U+09CC bengali vowel sign au. The vowel signs are coded in each case in the position in the charts isomorphic with the corresponding vowel in Devanagari. Hence U+09CC bengali vowel sign au is isomorphic with U+094C devanagari vowel sign au. To provide compatibility with existing implementations of the scripts that use two-part vowel signs, the Unicode Standard explicitly encodes the right half of these vowel signs. For example, U+09D7 bengali au length mark represents the right-half glyph component of U+09CC bengali vowel sign au. Special Characters. U+09F2..U+09F9 are a series of Bengali additions for writing currency and fractions. Rendering Behavior. Like other Brahmic scripts in the Unicode Standard, Bengali uses the hasant to form conjunct characters. For example, U+0995 $ bengali letter ka + U+09CD z bengali sign virama + U+09B7 % bengali letter ssa yields the conjunct & KSSA, which is pronounced khya in Assamese. For general principles regarding the rendering of the Bengali script, see the rules for rendering in Section 9.1, Devanagari. Consonant-Vowel Ligatures. Some Bengali consonant plus vowel combinations have two distinct visual presentations. The first visual presentation is a traditional ligated form, in which the vowel combines with the consonant in a novel way. In the second presentation, the vowel is joined to the consonant but retains its nominal form, and the combination is not considered a ligature. These consonant-vowel combinations are illustrated in Table 9-6. The ligature forms of these consonant-vowel combinations are traditional. They are used in handwriting and some printing. The “non-ligated” forms are more common; they are used in newspapers and are associated with modern typefaces. However, the traditional ligatures are preferred in some contexts. No semantic distinctions are made in Bengali text on the basis of the two different presentations of these consonant-vowel combinations. However, some users consider it important that implementations support both forms and that the distinction be representable in plain text. This may be accomplished by using U+200D zero width joiner and U+200C zero width non-joiner to influence ligature glyph selection. (See “Cursive Connection and Ligatures” in Section 16.2, Layout Controls.)

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

314

South Asian Scripts-I

Table 9-6. Bengali Consonant-Vowel Combinations Code Points

Ligated

Non-ligated

gu <0997, 09C1> ru <09B0, 09C1> ru¯ <09B0, 09C2> ´ su

<09B6, 09C1>

hu <09B9, 09C1> hr

<09B9, 09C3>

A given font implementation can choose whether to treat the ligature forms of the consonant-vowel combinations as the defaults for rendering. If the non-ligated form is the default, then ZWJ can be inserted to request a ligature, as shown in Figure 9-10.

Figure 9-10. Requesting Bengali Consonant-Vowel Ligature

0997

0997

09C1

200D

09C1

If the ligated form is the default for a given font implementation, then ZWNJ can be inserted to block a ligature, as shown in Figure 9-11.

Figure 9-11. Blocking Bengali Consonant-Vowel Ligature

0997

0997

09C1

200C

09C1

Khanda Ta. In Bengali, a dead consonant ta makes use of a special form, U+09CE bengali letter khanda ta. This form is used in all contexts except where it is immediately followed by one of the consonants: ta, tha, na, ba, ma, ya, or ra.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Chapter 10

South Asian Scripts-II

10

This chapter documents scripts of South Asia aside from the major official scripts of India, which are documented in Chapter 9, South Asian Scripts-I. The following South Asian scripts are described in this chapter: Sinhala

Phags-pa

Syloti Nagri

Tibetan

Limbu

Kharoshthi

Sinhala has a virama-based model, but is not structurally mapped to ISCII. Tibetan stands apart, using a subjoined consonant model for conjoined consonants, reflecting its somewhat different structure and usage. Phags-pa is a historical script related to Tibetan that was created as the national script of the Mongol empire. Even though Phags-pa was used mostly in Eastern and Central Asia for writing text in the Mongolian and Chinese languages, it is discussed in this chapter because of its close historical connection to the Tibetan script. The Limbu script makes use of an explicit encoding of syllable-final consonants. Syloti Nagri is used to write the modern Sylheti language of northeast Bangladesh. The oldest lengthy inscriptions of India, the edicts of Ashoka from the third century bce, were written in two scripts, Kharoshthi and Brahmi. These are both ultimately of Semitic origin, probably deriving from Aramaic, which was an important administrative language of the Middle East at that time. Kharoshthi, which was written from right to left, was supplanted by Brahmi and its derivatives.

10.1 Sinhala Sinhala: U+0D80–U+0DFF The Sinhala script, also known as Sinhalese, is used to write the Sinhala language, the majority language of Sri Lanka. It is also used to write the Pali and Sanskrit languages. The script is a descendant of Brahmi and resembles the scripts of South India in form and structure.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

342

South Asian Scripts-II

Sinhala differs from other languages of the region in that it has a series of prenasalized stops that are distinguished from the combination of a nasal followed by a stop. In other words, both forms occur and are written differently—for example, AB a8}a [a:;a] “sound” versus ACDE aV}a [a9;a] “egg.” In addition, Sinhala has separate distinct signs for both a short and a long low front vowel sounding similar to the initial vowel of the English word “apple,” usually represented in IPA as U+00E6 æ latin small letter ae (ash). The independent forms of these vowels are encoded at U+0D87 and U+0D88; the corresponding dependent forms are U+0DD0 and U+0DD1. Because of these extra letters, the encoding for Sinhala does not precisely follow the pattern established for the other Indic scripts (for example, Devanagari). It does use the same general structure, making use of phonetic order, matra reordering, and use of the virama (U+0DCA sinhala sign al-lakuna) to indicate conjunct consonant clusters. Sinhala does not use half-forms in the Devanagari manner, but does use many ligatures. Vowel Letters. Vowel letters are encoded atomically in Unicode, even if they can be analyzed visually as consisting of multiple parts. Table 10-1 shows the letters that can be analyzed, the single code point that should be used to represent them in text, and the sequence of code points resulting from analysis that should not be used.

Table 10-1. Sinhala Vowel Letters To Represent

1 á 3 å é 6 í 8 ñ

Use

Do Not Use

0D86

<0D85, 0DCF>

0D87

<0D85, 0DD0>

0D88

<0D85, 0DD1>

0D8C

<0D8B, 0DDF>

0D8E

<0D8D, 0DD8>

0D90

<0D8F, 0DDF>

0D92

<0D91, 0DCA>

0D93

<0D91, 0DD9>

0D96

<0D94, 0DDF>

Other Letters for Tamil. The Sinhala script may also be used to write Tamil. In this case, some additional combinations may be required. Some letters, such as U+0DBB sinhala letter rayanna and U+0DB1 sinhala letter dantaja nayanna, may be modified by adding the equivalent of a nukta. There is, however, no nukta presently encoded in the Sinhala block. Historical Symbols. Neither U+0DF4 w sinhala punctuation kunddaliya nor the Sinhala numerals are in general use today, having been replaced by Western-style punctua-

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

10.2

Tibetan

343

tion and Western digits. The kunddaliya was formerly used as a full stop or period. It is included for scholarly use. The Sinhala numerals are not presently encoded.

10.2 Tibetan Tibetan: U+0F00–U+0FFF The Tibetan script is used for writing Tibetan in several countries and regions throughout the Himalayas. Aside from Tibet itself, the script is used in Ladakh, Nepal, and northern areas of India bordering Tibet where large Tibetan-speaking populations now reside. The Tibetan script is also used in Bhutan to write Dzongkha, the official language of that country. In addition, Tibetan is used as the language of philosophy and liturgy by Buddhist traditions spread from Tibet into the Mongolian cultural area that encompasses Mongolia, Buriatia, Kalmykia, and Tuva. The Tibetan scripting and grammatical systems were originally defined together in the sixth century by royal decree when the Tibetan King Songtsen Gampo sent 16 men to India to study Indian languages. One of those men, Thumi Sambhota, is credited with creating the Tibetan writing system upon his return, having studied various Indic scripts and grammars. The king’s primary purpose was to bring Buddhism from India to Tibet. The new script system was therefore designed with compatibility extensions for Indic (principally Sanskrit) transliteration so that Buddhist texts could be represented properly. Because of this origin, over the last 1,500 years the Tibetan script has been widely used to represent Indic words, a number of which have been adopted into the Tibetan language retaining their original spelling. A note on Latin transliteration: Tibetan spelling is traditional and does not generally reflect modern pronunciation. Throughout this section, Tibetan words are represented in italics when transcribed as spoken, followed at first occurrence by a parenthetical transliteration; in these transliterations, the presence of the tsek (tsheg) character is expressed with a hyphen. Thumi Sambhota’s original grammar treatise defined two script styles. The first, called uchen (dbu-can, “with head”), is a formal “inscriptional capitals” style said to be based on an old form of Devanagari. It is the script used in Tibetan xylograph books and the one used in the coding tables. The second style, called u-mey (dbu-med, or “headless”), is more cursive and said to be based on the Wartu script. Numerous styles of u-mey have evolved since then, including both formal calligraphic styles used in manuscripts and running handwriting styles. All Tibetan scripts follow the same lettering rules, though there is a slight difference in the way that certain compound stacks are formed in uchen and u-mey. General Principles of the Tibetan Script. Tibetan grammar divides letters into consonants and vowels. There are 30 consonants, and each consonant is represented by a discrete written character. There are five vowel sounds, only four of which are represented by written marks. The four vowels that are explicitly represented in writing are each represented with

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

344

South Asian Scripts-II

a single mark that is applied above or below a consonant to indicate the application of that vowel to that consonant. The absence of one of the four marks implies that the first vowel sound (like a short “ah” in English) is present and is not modified to one of the four other possibilities. Three of the four marks are written above the consonants; one is written below. Each word in Tibetan has a base or root consonant. The base consonant can be written singly or it can have other consonants added above or below it to make a vertically “stacked” letter. Tibetan grammar contains a very complete set of rules regarding letter gender, and these rules dictate which letters can be written in adjacent positions. The rules therefore dictate which combinations of consonants can be joined to make stacks. Any combination not allowed by the gender rules does not occur in native Tibetan words. However, when transcribing other languages (for example, Sanskrit, Chinese) into Tibetan, these rules do not operate. In certain instances other than transliteration, any consonant may be combined with any other subjoined consonant. Implementations should therefore be prepared to accept and display any combinations. For example, the syllable spyir “general,” pronounced [t"í#], is a typical example of a Tibetan syllable that includes a stack comprising a head letter, two subscript letters, and a vowel sign. Figure 10-1 shows the characters in the order in which they appear in the backing store.

Figure 10-1. Tibetan Syllable Structure

U+0F66 TIBETAN LETTER SA U+0FA4 TIBETAN SUBJOINED LETTER PA U+0FB1 TIBETAN SUBJOINED LETTER YA U+0F72 TIBETAN VOWEL SIGN I U+0F62 TIBETAN LETTER RA U+0F0B TIBETAN MARK INTERSYLLABIC TSHEG

The model adopted to encode the Tibetan lettering set described above contains the following groups of items: Tibetan consonants, vowels, numerals, punctuation, ornamental signs and marks, and Tibetan-transliterated Sanskrit consonants and vowels. Each of these will be described in this section. Both in this description and in Tibetan, the terms “subjoined” (-btags) and “head” (-mgo) are used in different senses. In the structural sense, they indicate specific slots defined in native Tibetan orthography. In spatial terms, they refer to the position in the stack; anything in the topmost position is “head,” anything not in the topmost position is “subjoined.” Unless explicitly qualified, the terms “subjoined” and “head” are used here in their spatial sense. For example, in a conjunct like “rka,” the letter in the root slot is “KA.” Because it is not the topmost letter of the stack, however, it is expressed with a subjoined

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

10.2

Tibetan

345

character code, while “RA”, which is structurally in the head slot, is expressed with a nominal character code. In a conjunct “kra,” in which the root slot is also occupied with “KA”, the “KA” is encoded with a nominal character code because it is in the topmost position in the stack. The Tibetan script has its own system of formatting, and details of that system relevant to the characters encoded in this standard are explained herein. However, an increasing number of publications in Tibetan do not strictly adhere to this original formatting system. This change is due to the partial move from publishing on long, horizontal, loose-leaf folios, to publishing in vertically oriented, bound books. The Tibetan script also has a punctuation set designed to meet needs quite different from the punctuation that has evolved for Western scripts. With the appearance of Tibetan newspapers, magazines, school textbooks, and Western-style reference books in the last 20 or 30 years, Tibetans have begun using things like columns, indented blocks of text, Western-style headings, and footnotes. Some Western punctuation marks, including brackets, parentheses, and quotation marks, are becoming commonplace in these kinds of publication. With the introduction of more sophisticated electronic publishing systems, there is also a renaissance in the publication of voluminous religious and philosophical works in the traditional horizontal, loose-leaf format—many set in digital typefaces closely conforming to the proportions of traditional hand-lettered text. Consonants. The system described here has been devised to encode the Tibetan system of writing consonants in both single and stacked forms. All of the consonants are encoded a first time from U+0F40 through U+0F69. There are the basic Tibetan consonants and, in addition, six compound consonants used to represent the Indic consonants gha, jha, d.ha, dha, bha, and ksh.a. These codes are used to represent occurrences of either a stand-alone consonant or a consonant in the head position of a vertical stack. Glyphs generated from these codes will always sit in the normal position starting at and dropping down from the design baseline. All of the consonants are then encoded a second time. These second encodings from U+0F90 through U+0FB9 represent consonants in subjoined stack position. To represent a single consonant in a text stream, one of the first “nominal” set of codes is placed. To represent a stack of consonants in the text stream, a “nominal” consonant code is followed directly by one or more of the subjoined consonant codes. The stack so formed continues for as long as subjoined consonant codes are contiguously placed. This encoding method was chosen over an alternative method that would have involved a virama-based encoding, such as Devanagari. There were two main reasons for this choice. First, the virama is not normally used in the Tibetan writing system to create letter combinations. There is a virama in the Tibetan script, but only because of the need to represent Devanagari; called “srog-med”, it is encoded at U+0F84 tibetan mark halanta. The virama is never used in writing Tibetan words and can be—but almost never is—used as a substitute for stacking in writing Sanskrit mantras in the Tibetan script. Second, there is a prevalence of stacking in native Tibetan, and the model chosen specifically results in decreased data storage requirements. Furthermore, in languages other than Tibetan, there

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

346

South Asian Scripts-II

are many cases where stacks occur that do not appear in Tibetan-language texts; it is thus imperative to have a model that allows for any consonant to be stacked with any subjoined consonant(s). Thus a model for stack building was chosen that follows the Tibetan approach to creating letter combinations, but is not limited to a specific set of the possible combinations. Vowels. Each of the four basic Tibetan vowel marks is coded as a separate entity. These code points are U+0F72, U+0F74, U+0F7A, and U+0F7C. For compatibility, a set of several compound vowels for Sanskrit transcription is also provided in the other code points between U+0F71 and U+0F7D. Most Tibetan users do not view these compound vowels as single characters, and their use is limited to Sanskrit words. It is acceptable for users to enter these compounds as a series of simpler elements and have software render them appropriately. Canonical equivalences are specified for all of these code points except U+0F77 and U+0F79. All vowel signs are nonspacing marks above or below a stack of consonants, sometimes on both sides. A stand-alone consonant or a stack of consonants can have a vowel sign applied to it. In accordance with the rules of Tibetan writing, a code for a vowel sign applied to a consonant should always be placed after the bare consonant or the stack of consonants formed by the method just described. All of the symbols and punctuation marks have straightforward encodings. Further information about many of them appears later in this section. Coding Order. In general, the correct coding order for a stream of text will be the same as the order in which Tibetans spell and in which the characters of the text would be written by hand. For example, the correct coding order for the most complex Tibetan stack would be head position consonant first subjoined consonant ... (intermediate subjoined consonants, if any) last subjoined consonant subjoined vowel a-chung (U+0F71) standard or compound vowel sign, or virama Where used, the character U+0F39 tibetan mark tsa -phru occurs immediately after the consonant it modifies. Allographical Considerations. When consonants are combined to form a stack, one of them retains the status of being the principal consonant in the stack. The principal consonant always retains its stand-alone form. However, consonants placed in the “head” and “subjoined” positions to the main consonant sometimes retain their stand-alone forms and sometimes are given new, special forms. Because of this fact, certain consonants are given a further, special encoding treatment—namely, “wa” (U+0F5D), “ya” (U+0F61), and “ra” (U+0F62).

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

10.2

Tibetan

347

Head Position “ra”. When the consonant “ra” is written in the “head” position (ra-mgo, pronounced ra-go) at the top of a stack in the normal Tibetan-defined lettering set, the shape of the consonant can change. This is called ra-go (ra-mgo). It can either be a fullform shape or the full-form shape but with the bottom stroke removed (looking like a short-stemmed letter “T”). This requirement of “ra” in the head position where the glyph representing it can change shape is correctly coded by using the stand-alone “ra” consonant (U+0F62) followed by the appropriate subjoined consonant(s). For example, in the normal Tibetan ra-mgo combinations, the “ra” in the head position is mostly written as the half-ra but in the case of “ra + subjoined nya” must be written as the full-form “ra”. Thus the normal Tibetan ra-mgo combinations are correctly encoded with the normal “ra” consonant (U+0F62) because it can change shape as required. It is the responsibility of the font developer to provide the correct glyphs for representing the characters where the “ra” in the head position will change shape—for example, as in “ra + subjoined nya”. Full-Form “ra” in Head Position. Some instances of “ra” in the head position require that the consonant be represented as a full-form “ra” that never changes. This is not standard usage for the Tibetan language itself, but rather occurs in transliteration and transcription. Only in these cases should the character U+0F6A tibetan letter fixed-form ra be used instead of U+0F62 tibetan letter ra. This “ra” will always be represented as a full-form “ra consonant” and will never change shape to the form where the lower stroke has been cut off. For example, the letter combination “ra + ya”, when appearing in transliterated Sanskrit works, is correctly written with a full-form “ra” followed by either a modified subjoined “ya” form or a full-form subjoined “ya” form. Note that the fixed-form “ra” should be used only in combinations where “ra” would normally transform into a short form but the user specifically wants to prevent that change. For example, the combination “ra + subjoined nya” never requires the use of fixed-form “ra”, because “ra” normally retains its full glyph form over “nya”. It is the responsibility of the font developer to provide the appropriate glyphs to represent the encodings. Subjoined Position “wa”, “ya”, and “ra”. All three of these consonants can be written in subjoined position to the main consonant according to normal Tibetan grammar. In this position, all of them change to a new shape. The “wa” consonant when written in subjoined position is not a full “wa” letter any longer but is literally the bottom-right corner of the “wa” letter cut off and appended below it. For that reason, it is called a wazur (wa-zur, or “corner of a wa”) or, less frequently but just as validly, wa-ta (wa-btags) to indicate that it is a subjoined “wa”. The consonants “ya” and “ra” when in the subjoined position are called ya-ta (ya-btags) and ra-ta (ra-btags), respectively. To encode these subjoined consonants that follow the rules of normal Tibetan grammar, the shape-changed, subjoined forms U+0FAD tibetan subjoined letter wa, U+0FB1 tibetan subjoined letter ya, and U+0FB2 tibetan subjoined letter ra should be used. All three of these subjoined consonants also have full-form non-shape-changing counterparts for the needs of transliterated and transcribed text. For this purpose, the full subjoined consonants that do not change shape (encoded at U+0FBA, U+0FBB, and U+0FBC, respectively) are used where necessary. The combinations of “ra + ya” are a good example

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

348

South Asian Scripts-II

because they include instances of “ra” taking a short (ya-btags) form and “ra” taking a fullform subjoined “ya”. U+0FB0 tibetan subjoined letter -a (a-chung) should be used only in the very rare cases where a full-sized subjoined a-chung letter is required. The small vowel lengthening a-chung encoded as U+0F71 tibetan vowel sign aa is far more frequently used in Tibetan text, and it is therefore recommended that implementations treat this character (rather than U+0FB0) as the normal subjoined a-chung. Halanta (Srog-Med). Because two sets of consonants are encoded for Tibetan, with the second set providing explicit ligature formation, there is no need for a “dead character” in Tibetan. When a halanta (srog-med) is used in Tibetan, its purpose is to suppress the inherent vowel “a”. If anything, the halanta should prevent any vowel or consonant from forming a ligature with the consonant preceding the halanta. In Tibetan text, this character should be displayed beneath the base character as a combining glyph and not used as a (purposeless) dead character. Line Breaking Considerations. Tibetan text separates units called natively tsek-bar (“tshegbar”), an inexact translation of which is “syllable.” Tsek-bar is literally the unit of text between tseks and is generally a consonant cluster with all of its prefixes, suffixes, and vowel signs. It is not a “syllable” in the English sense. Tibetan script has only two break characters. The primary break character is the standard interword tsek (tsheg), which is encoded at U+0F0B. The second break character is the space. Space or tsek characters in a stream of Tibetan text are not always break characters and so need proper contextual handling. The primary delimiter character in Tibetan text is the tsek (U+0F0B tibetan mark intersyllabic tsheg). In general, automatic line breaking processes may break after any occurrence of this tsek, except where it follows a U+0F44 tibetan letter nga (with or without a vowel sign) and precedes a shay (U+0F0D), or where Tibetan grammatical rules do not permit a break. (Normally, tsek is not written before shay except after “nga”. This type of tsek-after-nga is called “nga-phye-tsheg” and may be expressed by U+0F0B or by the special character U+0F0C, a nonbreaking form of tsek.) The Unicode names for these two types of tsek are misnomers, retained for compatibility. The standard tsek U+0F0B tibetan mark intersyllabic tsheg is always required to be a potentially breaking character, whereas the “nga-phye-tsheg” is always required to be a nonbreaking tsek. U+0F0C tibetan mark delimiter tsheg bstar is specifically not a “delimiter” and is not for general use. There are no other break characters in Tibetan text. Unlike English, Tibetan has no system for hyphenating or otherwise breaking a word within the group of letters making up the word. Tibetan text formatting does not allow text to be broken within a word. Whitespace appears in Tibetan text, although it should be represented by U+00A0 nobreak space instead of U+0020 space. Tibetan text breaks lines after tsek instead of at whitespace.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

10.2

Tibetan

349

Complete Tibetan text formatting is best handled by a formatter in the application and not just by the code stream. If the interword and nonbreaking tseks are properly employed as breaking and nonbreaking characters, respectively, and if all spaces are nonbreaking spaces, then any application will still wrap lines correctly on that basis, even though the breaks might be sometimes inelegant. Tibetan Punctuation. The punctuation apparatus of Tibetan is relatively limited. The principal punctuation characters are the tsek; the shay (transliterated “shad”), which is a vertical stroke used to mark the end of a section of text; the space used sparingly as a space; and two of several variant forms of the shay that are used in specialized situations requiring a shay. There are also several other marks and signs but they are sparingly used. The shay at U+0F0D marks the end of a piece of text called “tshig-grub”. The mode of marking bears no commonality with English phrases or sentences and should not be described as a delimiter of phrases. In Tibetan grammatical terms, a shay is used to mark the end of an expression (“brjod-pa”) and a complete expression. Two shays are used at the end of whole topics (“don-tshan”). Because some writers use the double shay with a different spacing than would be obtained by coding two adjacent occurrences of U+0F0D, the double shay has been coded at U+0F0E with the intent that it would have a larger spacing between component shays than if two shays were simply written together. However, most writers do not use an unusual spacing between the double shay, so the application should allow the user to write two U+0F0D codes one after the other. Additionally, font designers will have to decide whether to implement these shays with a larger than normal gap. The U+0F11 rin-chen-pung-shay (rin-chen-spungs-shad) is a variant shay used in a specific “new-line” situation. Its use was not defined in the original grammars but Tibetan tradition gives it a highly defined use. The drul-shay (“sbrul-shad”) is likewise not defined by the original grammars but has a highly defined use; it is used for separating sections of meaning that are equivalent to topics (“don-tshan”) and subtopics. A drul-shay is usually surrounded on both sides by the equivalent of about three spaces (though no rule is specified). Hard spaces will be needed for these instances because the drul-shay should not appear at the beginning of a new line and the whole structure of spacing-plus-shay should not be broken up, if possible. Tibetan texts use a yig-go (“head mark,” yig-mgo) to indicate the beginning of the front of a folio, there being no other certain way, in the loose-leaf style of traditional Tibetan books, to tell which is the front of a page. The head mark can and does vary from text to text; there are many different ways to write it. The common type of head mark has been provided for with U+0F04 tibetan mark initial yig mgo mdun ma and its extension U+0F05 tibetan mark closing yig mgo sgab ma. An initial mark yig-mgo can be written alone or combined with as many as three closing marks following it. When the initial mark is written in combination with one or more closing marks, the individual parts of the whole must stay in proper registration with each other to appear authentic. Therefore, it is strongly recommended that font developers create precomposed ligature glyphs to represent the various combinations of these two characters. The less common head marks mainly appear in Nyingmapa and Bonpo literature. Three of these head marks have been provided for with U+0F01, U+0F02, and U+0F03; however, many others have not been encoded. Font devel-

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

350

South Asian Scripts-II

opers will have to deal with the fact that many types of head marks in use in this literature have not been encoded, cannot be represented by a replacement that has been encoded, and will be required by some users. Two characters, U+0F3C tibetan mark ang khang gyon and U+0F3D tibetan mark ang khang gyas, are paired punctuation; they are typically used together to form a roof over one or more digits or words. In this case, kerning or special ligatures may be required for proper rendering. The right ang khang may also be used much as a single closing parenthesis is used in forming lists; again, special kerning may be required for proper rendering. The marks U+0F3E tibetan sign yar tshes and U+0F3F tibetan sign mar tshes are paired signs used to combine with digits; special glyphs or compositional metrics are required for their use. A set of frequently occurring astrological and religious signs specific to Tibetan is encoded between U+0FBE and U+0FCF. U+0F34, which means “et cetera” or “and so on,” is used after the first few tsek-bar of a recurring phrase. U+0FBE (often three times) indicates a refrain. U+0F36 and U+0FBF are used to indicate where text should be inserted within other text or as references to footnotes or marginal notes. Other Characters. The Wheel of Dharma, which occurs sometimes in Tibetan texts, is encoded in the Miscellaneous Symbols block at U+2638. Left-facing and right-facing swastika symbols are likewise used. They are found among the Chinese ideographs at U+534D (“yung-drung-chi-khor”) and U+5350 (“yung-drungnang-khor”). The marks U+0F35 tibetan mark ngas bzung nyi zla and U+0F37 tibetan mark ngas bzung sgor rtags conceptually attach to a tsek-bar rather than to an individual character and function more like attributes than characters—for example, as underlining to mark or emphasize text. In Tibetan interspersed commentaries, they may be used to tag the tsek-bar belonging to the root text that is being commented on. The same thing is often accomplished by setting the tsek-bar belonging to the root text in large type and the commentary in small type. Correct placement of these glyphs may be problematic. If they are treated as normal combining marks, they can be entered into the text following the vowel signs in a stack; if used, their presence will need to be accounted for by searching algorithms, among other things. Tibetan Half-Numbers. The half-number forms (U+0F2A..U+0F33) are peculiar to Tibetan, though other scripts (for example, Bengali) have similar fractional concepts. The value of each half-number is 0.5 less than the number within which it appears. These forms are used only in some traditional contexts and appear as the last digit of a multidigit number. For example, the sequence of digits “U+0F24 U+0F2C” represents the number 42.5 or forty-two and one-half. Tibetan Transliteration and Transcription of Other Languages. Tibetan traditions are in place for transliterating other languages. Most commonly, Sanskrit has been the language

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

10.2

Tibetan

351

being transliterated, although Chinese has become more common in modern times. Additionally, Mongolian has a transliterated form. There are even some conventions for transliterating English. One feature of Tibetan script/grammar is that it allows for totally accurate transliteration of Sanskrit. The basic Tibetan letterforms and punctuation marks contain most of what is needed, although a few extra things are required. With these additions, Sanskrit can be transliterated perfectly into Tibetan, and the Tibetan transliteration can be rendered backward perfectly into Sanskrit with no ambiguities or difficulties. The six Sanskrit retroflex letters are interleaved among the other consonants. The compound Sanskrit consonants are not included in normal Tibetan. They could be made using the method described earlier for Tibetan stacked consonants, generally by subjoining “ha”. However, to maintain consistency in transliterated texts and for ease in transmission and searching, it is recommended that implementations of Sanskrit in the Tibetan script use the precomposed forms of aspirated letters (and U+0F69, “ka + reversed sha”) whenever possible, rather than implementing these consonants as completely decomposed stacks. Implementations must ensure that decomposed stacks and precomposed forms are interpreted equivalently (see Section 3.7, Decomposition). The compound consonants are explicitly coded as follows: U+0F93 tibetan subjoined letter gha, U+0F9D tibetan subjoined letter ddha, U+0FA2 tibetan subjoined letter dha, U+0FA7 tibetan subjoined letter bha, U+0FAC tibetan subjoined letter dzha, and U+0FB9 tibetan subjoined letter kssa. The vowel signs of Sanskrit not included in Tibetan are encoded with other vowel signs between U+0F70 and U+0F7D. U+0F7F tibetan sign rnam bcad (nam chay) is the visarga, and U+0F7E tibetan sign rjes su nga ro (ngaro) is the anusvara. See Section 9.1, Devanagari, for more information on these two characters. The characters encoded in the range U+0F88..U+0F8B are used in transliterated text and are most commonly found in Kalachakra literature. When the Tibetan script is used to transliterate Sanskrit, consonants are sometimes stacked in ways that are not allowed in native Tibetan stacks. Even complex forms of this stacking behavior are catered for properly by the method described earlier for coding Tibetan stacks. Other Signs. U+0F09 tibetan mark bskur yig mgo is a list enumerator used at the beginning of administrative letters in Bhutan, as is the petition honorific U+0F0A tibetan mark bka- shog yig mgo. U+0F3A tibetan mark gug rtags gyon and U+0F3B tibetan mark gug rtags gyas are paired punctuation marks (brackets). The sign U+0F39 tibetan mark tsa -phru (tsa-’phru, which is a lenition mark) is the ornamental flaglike mark that is an integral part of the three consonants U+0F59 tibetan letter tsa, U+0F5A tibetan letter tsha, and U+0F5B tibetan letter dza. Although those consonants are not decomposable, this mark has been abstracted and may by itself be applied to “pha” and other consonants to make new letters for use in transliteration and transcription of other languages. For example, in modern literary Tibetan, it is one of the

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

352

South Asian Scripts-II

ways used to transcribe the Chinese “fa” and “va” sounds not represented by the normal Tibetan consonants. Tsa-’phru is also used to represent tsa, tsha, and dza in abbreviations. Traditional Text Formatting and Line Justification. Native Tibetan texts (“pecha”) are written and printed using a justification system that is, strictly speaking, right-ragged but with an attempt to right-justify. Each page has a margin. That margin is usually demarcated with visible border lines required of a pecha. In modern times, when Tibetan text is produced in Western-style books, the margin lines may be dropped and an invisible margin used. When writing the text within the margins, an attempt is made to have the lines of text justified up to the right margin. To do so, writers keep an eye on the overall line length as they fill lines with text and try manually to justify to the right margin. Even then, a gap at the right margin often cannot be filled. If the gap is short, it will be left as is and the line will be said to be justified enough, even though by machine-justification standards the line is not truly flush on the right. If the gap is large, the intervening space will be filled with as many tseks as are required to justify the line. Again, the justification is not done perfectly in the way that English text might be perfectly right-justified; as long as the last tsek is more or less at the right margin, that will do. The net result is that of a right-justified, blocklike look to the text, but the actual lines are always a little right-ragged. Justifying tseks are nearly always used to pad the end of a line when the preceding character is a tsek—in other words, when the end of a line arrives in the middle of tshig-grub (see the previous definition under “Tibetan Punctuation”). However, it is unusual for a line that ends at the end of a tshig-grub to have justifying tseks added to the shay at the end of the tshig-grub. That is, a sequence like that shown in the first line of Figure 10-2 is not usually padded as in the second line of Figure 10-2, though it is allowable. In this case, instead of justifying the line with tseks, the space between shays is enlarged and/or the whitespace following the final shay is usually left as is. Padding is never applied following an actual space character. For example, given the existence of a space after a shay, a line such as the third line of Figure 10-2 may not be written with the padding as shown because the final shay should have a space after it, and padding is never applied after spaces. The same rule applies where the final consonant of a tshig-grub that ends a line is a “ka” or “ga”. In that case, the ending shay is dropped but a space is still required after the consonant and that space must not be padded. For example, the appearance shown in the fourth line of Figure 10-2 is not acceptable.

Figure 10-2. Justifying Tibetan Tseks

1 2 3 4 Tibetan text has two rules regarding the formatting of text at the beginning of a new line. There are severe constraints on which characters can start a new line, and the first rule is

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

10.3

Phags-pa

353

traditionally stated as follows: A shay of any description may never start a new line. Nothing except actual words of text can start a new line, with the only exception being a go-yig (yig-mgo) at the head of a front page or a da-tshe (zla-tshe, meaning “crescent moon”—for example, U+0F05) or one of its variations, which is effectively an “in-line” go-yig (yigmgo), on any other line. One of two or three ornamental shays is also commonly used in short pieces of prose in place of the more formal da-tshe. This also means that a space may not start a new line in the flow of text. If there is a major break in a text, a new line might be indented. A syllable (tsheg-bar) that comes at the end of a tshig-grub and that starts a new line must have the shay that would normally follow it replaced by a rin-chen-spungs-shad (U+0F11). The reason for this second rule is that the presence of the rin-chen-spungs-shad makes the end of tshig-grub more visible and hence makes the text easier to read. In verse, the second shay following the first rin-chen-spungs-shad is sometimes replaced with a rin-chen-spungs-shad, though the practice is formally incorrect. It is a writer’s trick done to make a particular scribing of a text more elegant. Although a moderately popular device, it does breaks the rule. Not only is rin-chen-spungs-shad used as the replacement for the shay but a whole class of “ornamental shays” are used for the same purpose. All are scribal variants on a rin-chen-spungs-shad, which is correctly written with three dots above it. Tibetan Shorthand Abbreviations (bskungs-yig) and Limitations of the Encoding. A consonant functioning as the word base (ming-gzhi) is allowed to take only one vowel sign according to Tibetan grammar. The Tibetan shorthand writing technique called bskungsyig does allow one or more words to be contracted into a single, very unusual combination of consonants and vowels. This construction frequently entails the application of more than one vowel sign to a single consonant or stack, and the composition of the stacks themselves can break the rules of normal Tibetan grammar. For this reason, vowel signs sometimes interact typographically, which accounts for their particular combining classes (see Section 4.3, Combining Classes—Normative). The Unicode Standard accounts for plain text compounds of Tibetan that contain at most one base consonant, any number of subjoined consonants, followed by any number of vowel signs. This coverage constitutes the vast majority of Tibetan text. Rarely, stacks are seen that contain more than one such consonant-vowel combination in a vertical arrangement. These stacks are highly unusual and are considered beyond the scope of plain text rendering. They may be handled by higher-level mechanisms.

10.3 Phags-pa Phags-pa: U+A840–U+A87F The Phags-pa script is an historic script with some limited modern use. It bears some similarity to Tibetan and has no case distinctions. It is written vertically in columns running

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Chapter 11

Southeast Asian Scripts

11

The following scripts are discussed in this chapter: Thai

Khmer

Philippine scripts

Lao

Tai Le

Buginese

Myanmar

New Tai Lue

Balinese

The scripts of Southeast Asia are written from left to right; many use no interword spacing but use spaces or marks between phrases. They are mostly abugidas, but with various idiosyncrasies that distinguish them from the scripts of South Asia. The four Philippine scripts included here operate on similar principles; each uses nonspacing vowel signs. In addition, the Tagalog script has a virama. The term “Tai” refers to a family of languages spoken in Southeast Asia, including Thai, Lao, and Shan. This term is also part of the name of a number of scripts encoded in the Unicode Standard. The Tai Le script is used to write the language of the same name, which is spoken in south central Yunnan (China). The New Tai Lue script, also known as Xishuang Banna Dai, is unrelated to the Tai Le script, but is also used in south Yunnan. Buginese and Balinese are scripts of Indonesia, and both are ultimately related to scripts of South Asia. Buginese is used in Sulawesi; Balinese is used on the island of Bali.

11.1 Thai Thai: U+0E00–U+0E7F The Thai script is used to write Thai and other Southeast Asian languages, such as Kuy, Lanna Tai, and Pali. It is a member of the Indic family of scripts descended from Brahmi. Thai modifies the original Brahmi letter shapes and extends the number of letters to accommodate features of the Thai language, including tone marks derived from superscript digits. At the same time, the Thai script lacks the conjunct consonant mechanism and independent vowel letters found in most other Brahmi-derived scripts. As in all scripts of this family, the predominant writing direction is from left to right.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

374

Southeast Asian Scripts

Standards. Thai layout in the Unicode Standard is based on the Thai Industrial Standard 620-2529, and its updated version 620-2533. Encoding Principles. In common with most Brahmi-derived scripts, each Thai consonant letter represents a syllable possessing an inherent vowel sound. For Thai, that inherent vowel is /o/ in the medial position and /a/ in the final position. The consonants are divided into classes that historically represented distinct sounds, but in modern Thai indicate tonal differences. The inherent vowel and tone of a syllable are then modified by addition of vowel signs and tone marks attached to the base consonant letter. Some of the vowel signs and all of the tone marks are rendered in the script as diacritics attached above or below the base consonant. These combining signs and marks are encoded after the modified consonant in the memory representation. Most of the Thai vowel signs are rendered by full letter-sized inline glyphs placed either before (that is, to the left of ), after (to the right of ) , or around (on both sides of ) the glyph for the base consonant letter. In the Thai encoding, the letter-sized glyphs that are placed before (left of ) the base consonant letter, in full or partial representation of a vowel sign, are, in fact, encoded as separate characters that are typed and stored before the base consonant character. This encoding for left-side Thai vowel sign glyphs (and similarly in Lao) differs from the conventions for all other Indic scripts, which uniformly encode all vowels after the base consonant. The difference is necessitated by the encoding practice commonly employed with Thai character data as represented by the Thai Industrial Standard. The glyph positions for Thai syllables are summarized in Table 11-1.

Table 11-1. Glyph Positions in Thai Syllables Syllable ka ka: ki ki: ku ku: ku’ ku’: ke ke: kae kae: ko

Glyphs

CD CE CF CG CH CI CJ CK LCD LC MCD MC NCD

Copyright © 1991-2007, Unicode, Inc.

Code Point Sequence 0E01 0E30 0E01 0E32 0E01 0E34 0E01 0E35 0E01 0E38 0E01 0E39 0E01 0E36 0E01 0E37 0E40 0E01 0E30 0E40 0E01 0E41 0E01 0E30 0E41 0E01 0E42 0E01 0E30

The Unicode Standard 5.0 – Electronic edition

11.1

Thai

375

Table 11-1. Glyph Positions in Thai Syllables (Continued) Syllable ko: ko’ ko’: koe koe: kia ku’a kua kaw koe:y kay kay kam kri

Glyphs

NC LCED CO LCOD LCO LCGP LCKO CQR LCE LCP SC TC CU CV

Code Point Sequence 0E42 0E01 0E40 0E01 0E32 0E30 0E01 0E2D 0E40 0E01 0E2D 0E30 0E40 0E01 0E2D 0E40 0E01 0E35 0E22 0E40 0E01 0E37 0E2D 0E01 0E31 0E27 0E40 0E01 0E32 0E40 0E01 0E22 0E44 0E01 0E43 0E01 0E01 0E33 0E01 0E24

Rendering of Thai Combining Marks. The combining classes assigned to tone marks (107) and to other combining characters displayed above (0) do not fully account for their typographic interaction. For the purpose of rendering, the Thai combining marks above (U+0E31, U+0E34..U+0E37, U+0E47..U+0E4E) should be displayed outward from the base character they modify, in the order in which they appear in the text. In particular, a sequence containing should be displayed with the nikhahit above the mai ek, and a sequence containing should be displayed with the mai ek above the nikhahit. This does not preclude input processors from helping the user by pointing out or correcting typing mistakes, perhaps taking into account the language. For example, because the string <mai ek, nikhahit> is not useful for the Thai language and is likely a typing mistake, an input processor could reject it or correct it to . When the character U+0E33 thai character sara am follows one or more tone marks (U+0E48..U+0E4B), the nikhahit that is part of the sara am should be displayed below those tone marks. In particular, a sequence containing should be displayed with the mai ek above the nikhahit.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

376

Southeast Asian Scripts

Thai Punctuation. Thai uses a variety of punctuation marks particular to this script. U+0E4F thai character fongman is the Thai bullet, which is used to mark items in lists or appears at the beginning of a verse, sentence, paragraph, or other textual segment. U+0E46 thai character maiyamok is used to mark repetition of preceding letters. U+0E2F thai character paiyannoi is used to indicate elision or abbreviation of letters; it is itself viewed as a kind of letter, however, and is used with considerable frequency because of its appearance in such words as the Thai name for Bangkok. Paiyannoi is also used in combination (U+0E2F U+0E25 U+0E2F) to create a construct called paiyanyai, which means “et cetera, and so forth.” The Thai paiyanyai is comparable to its analogue in the Khmer script: U+17D8 khmer sign beyyal. U+0E5A thai character angkhankhu is used to mark the end of a long segment of text. It can be combined with a following U+0E30 thai character sara a to mark a larger segment of text; typically this usage can be seen at the end of a verse in poetry. U+0E5B thai character khomut marks the end of a chapter or document, where it always follows the angkhankhu + sara a combination. The Thai angkhankhu and its combination with sara a to mark breaks in text have analogues in many other Brahmi-derived scripts. For example, they are closely related to U+17D4 khmer sign khan and U+17D5 khmer sign bariyoosan, which are themselves ultimately related to the danda and double danda of Devanagari. Thai words are not separated by spaces. Instead, text is laid out with spaces introduced at text segments where Western typography would typically make use of commas or periods. However, Latin-based punctuation such as comma, period, and colon are also used in text, particularly in conjunction with Latin letters or in formatting numbers, addresses, and so forth. If word boundary indications are desired—for example, for the use of automatic line layout algorithms—the character U+200B zero width space should be used to place invisible marks for such breaks. The zero width space can grow to have a visible width when justified. See Figure 16-2. Thai Transcription of Pali and Sanskrit. The Thai script is frequently used to write Pali and Sanskrit. When so used, consonant clusters are represented by the explicit use of U+0E3A thai character phinthu (virama) to mark the removal of the inherent vowel. There is no conjoining behavior, unlike in other Indic scripts. U+0E4D thai character nikhahit is the Pali nigghahita and Sanskrit anusvara. U+0E30 thai character sara a is the Sanskrit visarga. U+0E24 thai character ru and U+0E26 thai character lu are vocalic /r/ and /l/, with U+0E45 thai character lakkhangyao used to indicate their lengthening.

11.2 Lao Lao: U+0E80–U+0EFF The Lao language and script are closely related to Thai. The Unicode Standard encodes the characters of the Lao script in the same relative order as the Thai characters.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

11.2

Lao

377

Encoding Principles. Lao contains fewer letters than Thai because by 1960 it was simplified to be fairly phonemic, whereas Thai maintains many etymological spellings that are homonyms. Unlike in Thai, Lao consonant letters are conceived of as simply representing the consonant sound, rather than a syllable with an inherent vowel. The vowel [a] is always represented explicitly with U+0EB0 lao vowel sign a. Punctuation. Regular word spacing is not used in Lao; spaces separate phrases or sentences instead. Glyph Placement. The glyph placements for Lao syllables are summarized in Table 11-2.

Table 11-2. Glyph Positions in Lao Syllables Syllable

Glyphs

Code Point Sequence

ka

WX WY WZ W[ W\ W] W^ W_ `WX `W aWX aW bWX bW `WYX Wc `WZ `W[ `Wkd `Wl `W_f Wej `WeY

0E81 0EB0

ka: ki ki: ku ku: ku’ ku’: ke ke: kae kae: ko ko: ko’ ko’: koe koe: kia ku’a kua kaw

0E81 0EB2 0E81 0EB4 0E81 0EB5 0E81 0EB8 0E81 0EB9 0E81 0EB6 0E81 0EB7 0EC0 0E81 0EB0 0EC0 0E81 0EC1 0E81 0EB0 0EC1 0E81 0EC2 0E81 0EB0 0EC2 0E81 0EC0 0E81 0EB2 0EB0 0E81 0ECD 0EC0 0E81 0EB4 0EC0 0E81 0EB5 0EC0 0E81 0EB1 0EBD 0EC0 0E81 0EA2 0EC0 0E81 0EB7 0EAD 0E81 0EBB 0EA7 0EC0 0E81 0EBB 0EB2

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

378

Southeast Asian Scripts

Table 11-2. Glyph Positions in Lao Syllables (Continued) Syllable koe:y kay kay kam

Glyphs

`W[d `W[l gW hW Wi

Code Point Sequence 0EC0 0E81 0EB5 0EBD 0EC0 0E81 0EB5 0EA2 0EC4 0E81 0EC3 0E81 0E81 0EB3

Additional Letters. A few additional letters in Lao have no match in Thai: U+0EBB lao vowel sign mai kon U+0EBC lao semivowel sign lo U+0EBD lao semivowel sign nyo The preceding two semivowel signs are the last remnants of the system of subscript medials, which in Myanmar retains additional distinctions. Myanmar and Khmer include a full set of subscript consonant forms used for conjuncts. Thai no longer uses any of these forms; Lao has just the two. Rendering of Lao Combining Marks. The combining classes assigned to tone marks (122) and to other combining characters displayed above (0) do not fully account for their typographic interaction. For the purpose of rendering, the Lao combining marks above (U+0EB1, U+0EB4..U+0EB7, U+0EBB, U+0EC8..U+0ECD) should be displayed outward from the base character they modify, in the order in which they appear in the text. In particular, a sequence containing should be displayed with the niggahita above the mai ek, and a sequence containing should be displayed with the mai ek above the niggahita. This does not preclude input processors from helping the user by pointing out or correcting typing mistakes, perhaps taking into account the language. For example, because the string <mai ek, niggahita> is not useful for the Lao language and is likely a typing mistake, an input processor could reject it or correct it to . When the character U+0EB3 lao vowel sign am follows one or more tone marks (U+0EC8..U+0ECB), the niggahita that is part of the sara am should be displayed below those tone marks. In particular, a sequence containing should be displayed with the mai ek above the niggahita. Lao Aspirated Nasals. The Unicode character encoding includes two ligatures for Lao: U+0EDC lao ho no and U+0EDD lao ho mo. They correspond to sequences of [h] plus [n] or [h] plus [m] without ligating. Their function in Lao is to provide versions of the [n] and [m] consonants with a different inherent tonal implication.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

11.3

Myanmar

379

11.3 Myanmar Myanmar: U+1000–U+109F The Myanmar script is used to write Burmese, the majority language of Myanmar (formerly called Burma). Variations and extensions of the script are used to write other languages of the region, such as Shan and Mon, as well as Pali and Sanskrit. The Myanmar script was formerly known as the Burmese script, but the term “Myanmar” is now preferred. The Myanmar writing system derives from a Brahmi-related script borrowed from South India in about the eighth century to write the Mon language. The first inscription in the Myanmar script dates from the eleventh century and uses an alphabet almost identical to that of the Mon inscriptions. Aside from rounding of the originally square characters, this script has remained largely unchanged to the present. It is said that the rounder forms were developed to permit writing on palm leaves without tearing the writing surface of the leaf. Because of its Brahmi origins, the Myanmar script shares the structural features of its Indic relatives: consonant symbols include an inherent “a” vowel; various signs are attached to a consonant to indicate a different vowel; ligatures and conjuncts are used to indicate consonant clusters; and the overall writing direction is from left to right. Thus, despite great differences in appearance and detail, the Myanmar script follows the same basic principles as, for example, Devanagari. Standards. There is not yet an official national standard for the encoding of Myanmar/Burmese. The current encoding was prepared with the consultation of experts from the Myanmar Information Technology Standardization Committee (MITSC) in Yangon (Rangoon). The MITSC, formed by the government in 1997, consists of experts from the Myanmar Computer Scientists’ Association, Myanmar Language Commission, and Myanmar Historical Commission. Encoding Principles. As with Indic scripts, the Myanmar encoding represents only the basic underlying characters; multiple glyphs and rendering transformations are required to assemble the final visual form for each syllable. Even some single characters, such as U+102C " myanmar vowel sign aa, may assume variant forms (for example, #) depending on the other characters with which they combine. Conversely, characters and combinations that may appear visually identical in some fonts, such as U+101D ! myanmar letter wa and U+1040 ! myanmar digit zero, are distinguished by their underlying encoding. Composite Characters. As is the case in many other scripts, some Myanmar letters or signs may be analyzed as composites of two or more other characters and are not encoded separately. The following are examples of Myanmar letters represented by combining character sequences:

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

380

Southeast Asian Scripts

myanmar vowel sign o U+1000 . ka + U+1031 & vowel sign e + U+102C " vowel sign aa → ) k] myanmar vowel sign au U+1000 . ka + U+1031 & vowel sign e + U+102C " vowel sign aa + U+1039 ' virama + U+200C Ã → * kau myanmar vowel sign ui U+1000 . ka + U+102F % vowel sign u + U+102D $ vowel sign i → ( kui Encoding Subranges. The basic consonants, independent vowels, and dependent vowel signs required for writing the Myanmar language are encoded at the beginning of the Myanmar range. Extensions of each of these categories for use in writing other languages, such as Pali and Sanskrit, are appended at the end of the range. In between these two sets lie the script-specific signs, punctuation, and digits. Conjunct and Medial Consonants. As in other Indic-derived scripts, conjunction of two consonant letters is indicated by the insertion of a virama U+1039 ' myanmar sign virama between them. It causes ligation or other rendered combination of the consonants, although the virama itself is not rendered visibly. The conjunct form of U+1004 + myanmar letter nga is rendered as a superscript sign called kinzi. Kinzi is encoded in logical order as a conjunct consonant before the syllable to which it applies; this is similar to the treatment of the Devanagari ra. (See Section 9.1, Devanagari, rule R2.) For example, kinzi applied to U+1000 . myanmar letter ka would be written via the following sequence: U+1004 + nga + U+1039 ' virama + U+1000 . ka → - vka The Myanmar script traditionally distinguishes a set of subscript “medial” consonants: forms of ya, ra, wa, and ha that are considered to be modifiers of the syllable’s vowel. Graphically, these medial consonants are sometimes written as subscripts, but sometimes, as in the case of ra, they surround the base consonant instead. In the Myanmar encoding, the medial consonants are treated as conjuncts; that is, they are coded using the virama. For example, the word krwe , [kjwei] (“to drop off ” ) would be written via the following sequence: U+1000 . ka + U+1039 ' virama + U+101B / ra + U+1039 ' virama + U+101D ! wa + U+1031 & vowel sign e → , krwe Explicit Virama. The virama U+1039 ' myanmar sign virama also participates in some common constructions where it appears as a visible sign, commonly termed killer. In this usage where it appears as a visible diacritic, U+1039 is followed by a U+200C zero width non-joiner, as with Devanagari (see Figure 9-3).

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

11.3

Myanmar

381

Ordering of Syllable Components. Dependent vowels and other signs are encoded after the consonant to which they apply, except for kinzi, which precedes the consonant. Characters occur in the relative order shown in Table 11-3.

Table 11-3. Myanmar Syllabic Structure Name

Encoding

Example

kinzi

consonant

[U+1000..U+1021]

subscript consonant

medial ya

medial ra

medial wa

medial ha

vowel sign e

U+1031

vowel sign u, uu

[U+102F, U+1030]

vowel sign i, ii, ai

[U+102D, U+102E, U+1032]

vowel sign aa

U+102C

anusvara

U+1036

atha (killer)

dot below

U+1037

visarga

U+1038

# * $ % & ( ) + ,, z . , /, 0 1 2 3 4 5

U+1031 & myanmar vowel sign e is encoded after its consonant (as in the earlier example), although in visual presentation its glyph appears before (to the left of) the consonant form. Spacing. Myanmar does not use any whitespace between words. If word boundary indications are desired—for example, for the use of automatic line layout algorithms—the character U+200B zero width space should be used to place invisible marks for such breaks. The zero width space can grow to have a visible width when justified.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

382

Southeast Asian Scripts

11.4 Khmer Khmer: U+1780–U+17FF Khmer, also known as Cambodian, is the official language of the Kingdom of Cambodia. Mutually intelligible dialects are also spoken in northeastern Thailand and in the Mekong Delta region of Vietnam. Although Khmer is not an Indo-European language, it has borrowed much vocabulary from Sanskrit and Pali, and religious texts in those languages have been both transliterated and translated into Khmer. The Khmer script is also used to render a number of regional minority languages, such as Tampuan, Krung, and Cham. The Khmer script, called aksaa khmae (“Khmer letters”), is also the official script of Cambodia. It is descended from the Brahmi script of South India, as are Thai, Lao, Myanmar, Old Mon, and others. The exact sources have not been determined, but there is a great similarity between the earliest inscriptions in the region and the Pallawa script of the Coromandel coast of India. Khmer has been a unique and independent script for more than 1,400 years. Modern Khmer has two basic styles of script: the aksaa crieng (“slanted script”) and the aksaa muul (“round script”). There is no fundamental structural difference between the two. The slanted script (in its “standing” variant) is chosen as representative in Chapter 17, Code Charts.

Principles of the Khmer Script Structurally, the Khmer script has many features in common with other Brahmi-derived scripts, such as Devanagari and Myanmar. Consonant characters bear an inherent vowel sound, with additional signs placed before, above, below, and/or after the consonants to indicate a vowel other than the inherent one. The overall writing direction is left to right. In comparison with the Devanagari script, explained in detail in Section 9.1, Devanagari, the Khmer script has developed several distinctive features during its evolution. Glottal Consonant. The Khmer script has a consonant character for a glottal stop (qa) that bears an inherent vowel sound and can have an optional vowel sign. While Khmer also has independent vowel characters like Devanagari, as shown in Table 11-4, in principle many of its sounds can be represented by using qa and a vowel sign. This does not mean these representations are always interchangeable in real words. Some words are written with one variant to the exclusion of others. Subscript Consonants. Subscript consonant signs differ from independent consonant characters and are called coeng (literally, “foot, leg”) after their subscript position. While a consonant character can constitute an orthographic syllable by itself, a subscript consonant sign cannot. Note that U+17A1 C khmer letter la does not have a corresponding subscript consonant sign in standard Khmer, but does have a subscript in the Khmer script used in Thailand.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

11.4

Khmer

383

Table 11-4. Independent Khmer Vowel Characters Name i ii u uk uu uuv ry ryy ly lyy e ai oo au

Independent Vowel

Qa with Vowel Sign

G H I J K L M N O P Q R S, T U

DY, DY], DZ DZ, DY] D], Dl] D]" D^, Dl^ D^> <[ <\ =[ =\ cD, dD eD co ci

Subscript consonant signs are used to represent any consonant following the first consonant in an orthographic syllable. They also have an inherent vowel sound, which may be suppressed if the syllable bears a vowel sign or another subscript consonant. The subscript consonant signs are often used to represent a consonant cluster. Two consecutive consonant characters cannot represent a consonant cluster because the inherent vowel sound in between is retained. To suppress the vowel, a subscript consonant sign (or rarely a subscript independent vowel) replaces the second consonant character. Theoretically, any consonant cluster composed of any number of consonant sounds without inherent vowel sounds in between can be represented systematically by a consonant character and as many subscript consonant signs as necessary. Examples of subscript consonant signs for a consonant cluster follow:

=t lo + coeng + ngo [l}mq] “sesame” (compare =& lo + ngo [lmq}] “to haunt”)

="2%Z lo + ka + coeng + sa + coeng + mo + ii [lr'ksmei] “beauty, luck” McB/ ka + aa + ha + coeng + vo + e [kaqfeq] “coffee” The subscript consonant signs in the Khmer script can be used to denote a final consonant, although this practice is uncommon.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

384

Southeast Asian Scripts

Examples of subscript consonant signs for a closing consonant follow:

^ht to + aa + nikahit + coeng + ngo [tr'}] “both” (= ^h&) (≠ *^hh [t}m'm]) cBZ, ha + oe + coeng + yo [ha'i] “already” (= cBZ;) (≠ *cB,Z [hya']) While these subscript consonant signs are usually attached to a consonant character, they can also be attached to an independent vowel character. Although this practice is relatively rare, it is used in one very common word, meaning “to give.” Examples of subscript consonant signs attached to an independent vowel character follow:

S, qoo-1 + coeng + yo [paoi] “to give” (= S; and also T,) S+ qoo-1 + coeng + mo [paom] “exclamation of solemn affirmation” (= S:) Subscript Independent Vowel Signs. Some independent vowel characters also have corresponding subscript independent vowel signs, although these are rarely used today. Examples of subscript independent vowel signs follow:

7B: pha + coeng + qe + mo [pspaem] “sweet” (= d75: pha + coeng + qa + ae + mo)

B>3r; ha + coeng + ry + to + samyok sannya + yo [harotey] “heart” (royal) (= BM3r; ha + ry + to + samyok sannya + yo) Consonant Registers. The Khmer language has a richer set of vowels than the languages for which the ancestral script was used, although it has a smaller set of consonant sounds. The Khmer script takes advantage of this situation by assigning different characters to represent the same consonant using different inherent vowels. Khmer consonant characters and signs are organized into two series or registers, whose inherent vowels are nominally -a in the first register and -o in the second register, as shown in Table 11-5. The register of a consonant character is generally reflected on the last letter of its transliterated name. Some consonant characters and signs have a counterpart whose consonant sound is the same but whose register is different, as ka and ko in the first row of the table. For the other consonant characters and signs, two “shifter” signs are available. U+17C9 khmer sign muusikatoan converts a consonant character and sign from the second to the first register, while U+17CA khmer sign triisap converts a consonant from the first register to the second (rows 2–4). To represent pa, however, muusikatoan is attached not to po but to ba, in an exceptional use (row 5). The phonetic value of a dependent vowel sign may also change depending on the context of the consonant(s) to which it is attached (row 6). Encoding Principles. Like other related scripts, the Khmer encoding represents only the basic underlying characters; multiple glyphs and rendering transformations are required to assemble the final visual form for each orthographic syllable. Individual characters, such as U+1789 khmer letter nyo, may assume variant forms depending on the other characters with which they combine.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

11.4

Khmer

385

Table 11-5. Two Registers of Khmer Consonants Row 1 2 3

First Register

Second Register

" ka [ktq] “neck” $ ko [kmq] “mute”
5

6" ba + ka [btqk] “to return” 6k: ba + muusikatoan + mo [ptqm]

6

"^< ka + u + ro [koq] “to stir”

$^< ko + u + ro [kuq] “to sketch”

4

“blockhouse”

8: po + mo [pmqm] “to put into the mouth”

Subscript Consonant Signs. In the way that many Cambodians analyze Khmer today, subscript consonant signs are considered to be different entities from consonant characters. The Unicode Standard does not assign independent code points for the subscript consonant signs. Instead, each of these signs is represented by the sequence of two characters: a special control character (U+17D2 khmer sign coeng) and a corresponding consonant character. This is analogous to the virama model employed for representing conjuncts in other related scripts. Subscripted independent vowels are encoded in the same manner. Because the coeng sign character does not exist as a letter or sign in the Khmer script, the Unicode model departs from the ordinary way that Khmer is conceived of and taught to native Khmer speakers. Consequently, the encoding may not be intuitive to a native user of the Khmer writing system, although it is able to represent Khmer correctly. U+17D2 A khmer sign coeng is not actually a coeng but a coeng generator, because coeng in Khmer refers to the subscript consonant sign. The glyph for U+17D2 A khmer sign coeng shown in the code charts is arbitrary and is not actually rendered directly; the dotted box around the glyph indicates that special rendering is required. To aid Khmer script users, a listing of typical Khmer subscript consonant letters has been provided in Table 11-6 together with their descriptive names following preferred Khmer practice. While the Unicode encoding represents both the subscripts and the combined vowel letters with a pair of code points, they should be treated as a unit for most processing purposes. In other words, the sequence functions as if it had been encoded as a single character. A number of independent vowels also have subscript forms, as shown in Table 11-8.

Table 11-6. Khmer Subscript Consonant Signs Glyph

!p !q !r

Code

Name

17D2 1780

khmer consonant sign coeng ka

17D2 1781

khmer consonant sign coeng kha

17D2 1782

khmer consonant sign coeng ko

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

386

Southeast Asian Scripts

Table 11-6. Khmer Subscript Consonant Signs (Continued) Glyph

!s !t !u !v !w !x !y !z !{ !| !} !~ !" !# !$ !% !& !' !( !) !* !+ !, -! !. !/ !0 !1

Code

Name

17D2 1783

khmer consonant sign coeng kho

17D2 1784

khmer consonant sign coeng ngo

17D2 1785

khmer consonant sign coeng ca

17D2 1786

khmer consonant sign coeng cha

17D2 1787

khmer consonant sign coeng co

17D2 1788

khmer consonant sign coeng cho

17D2 1789

khmer consonant sign coeng nyo

17D2 178A

khmer consonant sign coeng da

17D2 178B

khmer consonant sign coeng ttha

17D2 178C

khmer consonant sign coeng do

17D2 178D

khmer consonant sign coeng ttho

17D2 178E

khmer consonant sign coeng na

17D2 178F

khmer consonant sign coeng ta

17D2 1790

khmer consonant sign coeng tha

17D2 1791

khmer consonant sign coeng to

17D2 1792

khmer consonant sign coeng tho

17D2 1793

khmer consonant sign coeng no

17D2 1794

khmer consonant sign coeng ba

17D2 1795

khmer consonant sign coeng pha

17D2 1796

khmer consonant sign coeng po

17D2 1797

khmer consonant sign coeng pho

17D2 1798

khmer consonant sign coeng mo

17D2 1799

khmer consonant sign coeng yo

17D2 179A

khmer consonant sign coeng ro

17D2 179B

khmer consonant sign coeng lo

17D2 179C

khmer consonant sign coeng vo

17D2 179D

khmer consonant sign coeng sha

17D2 179E

khmer consonant sign coeng ssa

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

11.4

Khmer

387

Table 11-6. Khmer Subscript Consonant Signs (Continued) Glyph

Code

!2 !3 !4 !5

Name

17D2 179F

khmer consonant sign coeng sa

17D2 17A0

khmer consonant sign coeng ha

17D2 17A1

khmer consonant sign coeng la

17D2 17A2

khmer vowel sign coeng qa

As noted earlier, represents a subscript form of la that is not used in Cambodia, although it is employed in Thailand. Dependent Vowel Signs. Most of the Khmer dependent vowel signs are represented with a single character that is applied after the base consonant character and optional subscript consonant signs. Three of these Khmer vowel signs are not encoded as single characters in in the Unicode Standard. The vowel sign am is encoded as a nasalization sign, U+17C6 khmer sign nikahit. Two vowel signs, om and aam, have not been assigned independent code points. They are represented by the sequence of a vowel (U+17BB khmer vowel sign u and U+17B6 khmer vowel sign aa, respectively) and U+17C6 khmer sign nikahit. The nikahit is superficially similar to anusvara, the nasalization sign in the Devanagari script, although in Khmer it is usually regarded as a vowel sign am. Anusvara not only represents a special nasal sound, but also can be used in place of one of the five nasal consonants homorganic to the subsequent consonant (velar, palatal, retroflex, dental, or labial, respectively). Anusvara can be used concurrently with any vowel sign in the same orthographic syllable. Nikahit, in contrast, functions differently. Its final sound is [m], irrespective of the type of the subsequent consonant. It is not used concurrently with the vowels ii, e, ua, oe, oo, and so on, although it is used with the vowel signs aa and u. In these cases the combination is sometimes regarded as a unit—aam and om, respectively. The sound that aam represents is [m'm], not [aqm]. The sequences used for these combinations are shown in Table 11-7.

Table 11-7. Khmer Composite Dependent Vowel Signs with Nikahit Glyph

!h] !hX

Code

Name

17BB 17C6

khmer vowel sign om

17B6 17C6

khmer vowel sign aam

Examples of dependent vowel signs ending with [m] follow:

,h da + nikahit [dtm] “to pound” (compare ,: da + mo [dtqm] “nectar”)

ch po + aa + nikahit [pm'm] “to carry in the beak” (compare c: po + aa + mo [pè'm] “mouth of a river”)

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Chapter 12

East Asian Scripts

12

This chapter presents the following scripts: Han

Hiragana

Hangul

Bopomofo

Katakana

Yi

The characters that are now called East Asian ideographs, and known as Han ideographs in the Unicode Standard, were developed in China in the second millennium bce. The basic system of writing Chinese using ideographs has not changed since that time, although the set of ideographs used, their specific shapes, and the technologies involved have developed over the centuries. The encoding of Chinese ideographs in the Unicode Standard is described in Section 12.1, Han. As civilizations developed surrounding China, they frequently adapted China’s ideographs for writing their own languages. Japan, Korea, and Vietnam all borrowed and modified Chinese ideographs for their own languages. Chinese is an isolating language, monosyllabic and noninflecting, and ideographic writing suits it well. As Han ideographs were adopted for unrelated languages, however, extensive modifications were required. Chinese ideographs were originally used to write Japanese, for which they are, in fact, ill suited. As an adaptation, the Japanese developed two syllabaries, hiragana and katakana, whose shapes are simplified or stylized versions of certain ideographs. (See Section 12.4, Hiragana and Katakana.) Chinese ideographs are called kanji in Japanese and are still used, in combination with hiragana and katakana, in modern Japanese. In Korea, Chinese ideographs were originally used to write Korean, for which they are also ill suited. The Koreans developed an alphabetic system, Hangul, discussed in Section 12.6, Hangul. The shapes of Hangul syllables or the letter-like jamos from which they are composed are not directly influenced by Chinese ideographs. However, the individual jamos are grouped into syllabic blocks that resemble ideographs both visually and in the relationship they have to the spoken language (one syllable per block). Chinese ideographs are called hanja in Korean and are still used together with Hangul in South Korea for modern Korean. The Unicode Standard includes a complete set of Korean Hangul syllables as well as the individual jamos, which can also be used to write Korean. Section 3.12, Conjoining Jamo Behavior, describes how to use the conjoining jamos and how to convert between the two methods for representing Korean.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

408

East Asian Scripts

In Vietnam, a set of native ideographs was created for Vietnamese based on the same principles used to create new ideographs for Chinese. These Vietnamese ideographs were used through the beginning of the twentieth century and are occasionally used in more recent signage and other limited contexts. Yi was originally written using a set of ideographs invented in imitation of the Chinese. Modern Yi as encoded in the Unicode Standard is a syllabary derived from these ideographs and is discussed in Section 12.7, Yi. Bopomofo, discussed in Section 12.3, Bopomofo, is another recently invented syllabic system, used to represent Chinese phonetics. In all these East Asian scripts, the characters (Chinese ideographs, Japanese kana, Korean Hangul syllables, and Yi syllables) are written within uniformly sized rectangles, usually squares. Traditionally, the basic writing direction followed the conventions of Chinese handwriting, in top-down vertical lines arranged from right to left across the page. Under the influence of Western printing technologies, a horizontal, left-to-right directionality has become common, and proportional fonts are seeing increased use, particularly in Japan. Horizontal, right-to-left text is also found on occasion, usually for shorter texts such as inscriptions or store signs. Diacritical marks are rarely used, although phonetic annotations are not uncommon. Older editions of the Chinese classics sometimes use the ideographic tone marks (U+302A..U+302D) to indicate unusual pronunciations of characters. Many older character sets include characters intended to simplify the implementation of East Asian scripts, such as variant punctuation forms for text written vertically, halfwidth forms (which occupy only half a rectangle), and fullwidth forms (which allow Latin letters to occupy a full rectangle). These characters are included in the Unicode Standard for compatibility with older standards. Appendix E, Han Unification History, describes how the diverse typographic traditions of mainland China, Taiwan, Japan, Korea, and Vietnam have been reconciled to provide a common set of ideographs in the Unicode Standard for all these languages and regions.

12.1 Han CJK Unified Ideographs The Unicode Standard contains a set of unified Han ideographic characters used in the written Chinese, Japanese, and Korean languages.1 The term Han, derived from the Chinese Han Dynasty, refers generally to Chinese traditional culture. The Han ideographic 1. Although the term “CJK”—Chinese, Japanese, and Korean—is used throughout this text to describe the languages that currently use Han ideographic characters, it should be noted that earlier Vietnamese writing systems were based on Han ideographs. Consequently, the term “CJKV” would be more accurate in a historical sense. Han ideographs are still used for historical, religious, and pedagogical purposes in Vietnam.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

12.1

Han

409

characters make up a coherent script, which was traditionally written vertically, with the vertical lines ordered from right to left. In modern usage, especially in technical works and in computer-rendered text, the Han script is written horizontally from left to right and is freely mixed with Latin or other scripts. When used in writing Japanese or Korean, the Han characters are interspersed with other scripts unique to those languages (Hiragana and Katakana for Japanese; Hangul syllables for Korean). The term “Han ideographic characters” is used within the Unicode Standard as a common term traditionally used in Western texts, although “sinogram” is preferred by professional linguists. Taken literally, the word “ideograph” applies only to some of the ancient original character forms, which indeed arose as ideographic depictions. The vast majority of Han characters were developed later via composition, borrowing, and other non-ideographic principles, but the term “Han ideographs” remains in English usage as a conventional cover term for the script as a whole. The Han ideographic characters constitute a very large set, numbering in the tens of thousands. They have a long history of use in East Asia. Enormous compendia of Han ideographic characters exist because of a continuous, millennia-long scholarly tradition of collecting all Han character citations, including variant, mistaken, and nonce forms, into annotated character dictionaries. Because of the large size of the Han ideographic character repertoire, and because of the particular problems that the characters pose for standardizing their encoding, this character block description is more extended than that for other scripts and is divided into subsections. The first two subsections, “CJK Standards” and “Blocks Containing Han Ideographs,” describe the character set standards used as sources and the way in which the Unicode Standard divides Han ideographs into blocks. These subsections are followed by an extended discussion of the characteristics of Han characters, with particular attention being paid to the problem of unification of encoding for characters used for different languages. There is a formal statement of the principles behind the Unified Han character encoding adopted in the Unicode Standard and the order of its arrangement. For a detailed account of the background and history of development of the Unified Han character encoding, see Appendix E, Han Unification History.

CJK Standards The Unicode Standard draws its unified Han character repertoire of 70,229 characters from a number of character set standards. These standards are grouped into seven initial sources, as indicated in Table 12-1. The primary work of unifying and ordering the characters from these sources was done by the Ideographic Rapporteur Group (IRG), a subgroup of ISO/ IEC JTC1/SC2/WG2. The G, T, J, K, KP, and V sources represent the characters submitted to the IRG by its member bodies. The G source consists of submissions from mainland China, the Hong Kong SAR, and Singapore. The other five sources are the submissions from Taiwan, Japan, South and North Korea, and Vietnam, respectively. The U source represents character set stan-

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

410

East Asian Scripts

Table 12-1. Initial Sources for Unified Han G source:

G0 G1 G3 G5 G7

GB 2312-80 GB 12345-90 with 58 Hong Kong and 92 Korean “Idu” characters GB 7589-87 unsimplified forms GB 7590-87 unsimplified forms General Purpose Hanzi List for Modern Chinese Language, and General List of Simplified Hanzi GS Singapore Characters G8 GB 8565-88 GE GB 16500-95 T source: T1 CNS 11643-1992 1st plane T2 CNS 11643-1992 2nd plane T3 CNS 11643-1992 3rd plane with some additional characters T4 CNS 11643-1992 4th plane T5 CNS 11643-1992 5th plane T6 CNS 11643-1992 6th plane T7 CNS 11643-1992 7th plane TF CNS 11643-1992 15th plane J source: J0 JIS X 0208-1990 J1 JIS X 0212-1990 JA Unified Japanese IT Vendors Contemporary Ideographs, 1993 K source: K0 KS C 5601-1987 (unique ideographs) K1 KS C 5657-1991 K2 PKS C 5700-1 1994 K3 PKS C 5700-2 1994 KP source: KP0 KPS 9566-97 KP1 KPS 10721-2000 V source: V0 TCVN 5773:1993 V1 TCVN 6056:1995 U source: KS C 5601-1987 (duplicate ideographs) ANSI Z39.64-1989 (EACC) Big-5 (Taiwan) CCCII, level 1 GB 12052-89 (Korean) JEF (Fujitsu) PRC Telegraph Code Taiwan Telegraph Code (CCDC) Xerox Chinese Han Character Shapes Permitted for Personal Names (Japan) IBM Selected Japanese and Korean Ideographs

dards that were not submitted to the IRG by any member body but that were used by the Unicode Consortium. For each of the IRG sources, the table contains an abbreviated source name in the second column and a descriptive source name in the third column. The abbreviated names are used in various data files published by the Unicode Consortium and ISO/IEC to identify the specific IRG sources.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

12.1

Han

411

In some cases, the entire ideographic repertoire of the original character set standards was not included in the corresponding source. Three reasons explain this decision: 1. Where the repertoires of two of the character set standards within a single source have considerable overlap, the characters in the overlap might be included only once in the source. This approach is used, for example, with GB 2312-80 and GB 12345-90, which have many ideographs in common. Characters in GB 12345-90 that are duplicates of characters in GB 2312-80 are not included in the G source. 2. Where a character set standard is based on unification rules that differ substantially from those used by the IRG, many variant characters found in the character set standard will not be included in the source. This situation is the case with CNS 11643-1992, EACC, and CCCII. It is the only case where full roundtrip compatibility with the Han ideograph repertoire of the relevant character set standards is not guaranteed. 3. KS C 5601-1987 contains numerous duplicate ideographs included because they have multiple pronunciations in Korean. These multiply encoded ideographs are not included in the K source but are included in the U source. They are encoded in the CJK Compatibility Ideographs block to provide full roundtrip compatibility with KS C 5601-1987 (now known as KS X 1001:1998).

Blocks Containing Han Ideographs Han ideographic characters are found in five main blocks of the Unicode Standard, as shown in Table 12-2.

Table 12-2. Blocks Containing Han Ideographs Block

Range

Comment

CJK Unified Ideographs CJK Unified Ideographs Extension A CJK Unified Ideographs Extension B CJK Compatibility Ideographs

4E00-9FFF 3400-4DFF 20000-2A6DF F900-FAFF

Common Rare Rare, historic Duplicates, unifiable variants, corporate characters Unifiable variants

CJK Compatibility Ideographs Sup- 2F800-2FA1F plement

Characters in the three unified ideographs blocks are defined by the IRG, based on Han unification principles explained later in this section. The two compatibility ideographs blocks contain various duplicate or unifiable variant characters encoded for round-trip compatibility with various legacy standards. The initial repertoire of the CJK Unified Ideographs block contains characters submitted to the IRG prior to 1992, consisting of commonly used characters. That initial repertoire was

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

412

East Asian Scripts

derived entirely from the G, T, J, and K sources. It has subsequently been extended with small sets of unified ideographs or ideographic components needed for interoperability with the HKSCS standard (U+9FA6..U+9FB3) and with the GB 18030 standard (U+9FB4..U+9FBB). Characters in the CJK Unified Ideographs Extension A block are rare and are not unifiable with characters in the CJK Unified Ideographs block. They were submitted to the IRG during 1992–1998 and are derived entirely from the G, T, J, K, and V sources. The CJK Unified Ideographs Extension B block contains rare and historic characters that are also not unifiable with characters in the CJK Unified Ideographs block. They were submitted to the IRG during 1998–2002 and are derived from a long list of additional sources, including major dictionaries, as documented in Table 12-8. The only principled difference in the unification work done by the IRG on the three unified ideograph blocks is that the Source Separation Rule (rule R1) was applied only to the original CJK Unified Ideographs block and not to the two extension blocks. The Source Separation Rule states that ideographs that are distinctly encoded in a source must not be unified. (For further discussion, see “Principles of Han Unification” later in this section.) The three unified ideograph blocks are not closed repertoires. Each contains a small range of reserved code points at the end of the block. Additional unified ideographs may eventually be encoded in those ranges—as has already occurred in the CJK Unified Ideographs block itself. There is no guarantee that any such Han ideographic additions would be of the same types or from the same sources as preexisting characters in the block, and implementations should be careful not to make hard-coded assumptions regarding the range of assignments within the Han ideographic blocks in general. Unifiable Han characters unique to the U source are found in the CJK Compatibility Ideographs block. There are 12 of these characters: U+FA0E, U+FA0F, U+FA11, U+FA13, U+FA14, U+FA1F, U+FA21, U+FA23, U+FA24, U+FA27, U+FA28, and U+FA29. The remaining characters in the CJK Compatibility Ideographs block and the CJK Compatiblity Ideographs Supplement block are either duplicates or unifiable variants of a character in one of the blocks of unified ideographs. IICore. IICore (International Ideograph Core) is an important set of Han ideographs, incorporating characters from all the defined blocks. This set of nearly 10,000 characters has been developed by the IRG and represents the set of characters in everyday use throughout East Asia. By covering the characters in IICore, developers guarantee that they can handle all the needs of almost all of their customers. This coverage is of particular use on devices such as cell phones or PDAs, which have relatively stringent resource limitations. Characters in IICore are explicitly tagged as such in the Unihan Database (see “Unihan Database” in Section 4.1, Unicode Character Database).

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

12.1

Han

413

General Characteristics of Han Ideographs The authoritative Japanese dictionary Koujien defines Han characters to be characters that originated among the Chinese to write the Chinese language. They are now used in China, Japan, and Korea. They are logographic (each character represents a word, not just a sound) characters that developed from pictographic and ideographic principles. They are also used phonetically. In Japan they are generally called kanji (Han, that is, Chinese, characters) including the “national characters” (kokuji) such as touge (mountain pass), which have been created using the same principles. They are also called mana (true names, as opposed to kana, false or borrowed names).2 For many centuries, written Chinese was the accepted written standard throughout East Asia. The influence of the Chinese language and its written form on the modern East Asian languages is similar to the influence of Latin on the vocabulary and written forms of languages in the West. This influence is immediately visible in the mixture of Han characters and native phonetic scripts (kana in Japan, hangul in Korea) as now used in the orthographies of Japan and Korea (see Table 12-3).

Table 12-3. Common Han Characters Han Character

Chinese

Japanese

Korean

English Translation

1

ti#n

ten, ame

chen

heaven, sky

2

dì

chi, tsuchi

ci

earth, ground

3

rén

jin, hito

in

man, person

4

sh#n

san, yama

san

mountain

5

shu$

sui, mizu

swu

water

6

shàng

jou, ue

sang

above

7

xià

ka, shita

ha

below

The evolution of character shapes and semantic drift over the centuries has resulted in changes to the original forms and meanings. For example, the Chinese character 8 tZng (Japanese tou or yu, Korean thang), which originally meant “hot water,” has come to mean “soup” in Chinese. “Hot water” remains the primary meaning in Japanese and Korean, whereas “soup” appears in more recent borrowings from Chinese, such as “soup noodles” 2. Lee Collins’ translation from the Japanese, Koujien, Izuru, Shinmura, ed. (Tokyo: Iwanami Shoten, 1983).

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

414

East Asian Scripts

(Japanese tanmen; Korean thangmyen). Still, the identical appearance and similarities in meaning are dramatic and more than justify the concept of a unified Han script that transcends language. The “nationality” of the Han characters became an issue only when each country began to create coded character sets (for example, China’s GB 2312-80, Japan’s JIS X 0208-1978, and Korea’s KS C 5601-87) based on purely local needs. This problem appears to have arisen more from the priority placed on local requirements and lack of coordination with other countries, rather than out of conscious design. Nevertheless, the identity of the Han characters is fundamentally independent of language, as shown by dictionary definitions, vocabulary lists, and encoding standards. Terminology. Several standard romanizations of the term used to refer to East Asian ideographic characters are commonly used. They include hànzì (Chinese), kanzi (Japanese), kanji (colloquial Japanese), hanja (Korean), and ChÔ hán (Vietnamese). The standard English translations for these terms are interchangeable: Han character, Han ideographic character, East Asian ideographic character, or CJK ideographic character. For the purpose of clarity, the Unicode Standard uses some subset of the English terms when referring to these characters. The term Kanzi is used in reference to a specific Japanese government publication. The unrelated term KangXi (which is a Chinese reign name, rather than another romanization of “Han character”) is used only when referring to the primary dictionary used for determining Han character arrangement in the Unicode Standard. (See Table 12-7.) Distinguishing Han Character Usage Between Languages. There is some concern that unifying the Han characters may lead to confusion because they are sometimes used differently by the various East Asian languages. Computationally, Han character unification presents no more difficulty than employing a single Latin character set that is used to write languages as different as English and French. Programmers do not expect the characters “c”, “h”, “a”, and “t” alone to tell us whether chat is a French word for cat or an English word meaning “informal talk.” Likewise, we depend on context to identify the American hood (of a car) with the British bonnet. Few computer users are confused by the fact that ASCII can also be used to represent such words as the Welsh word ynghyd, which are strange looking to English eyes. Although it would be convenient to identify words by language for programs such as spell-checkers, it is neither practical nor productive to encode a separate Latin character set for every language that uses it. Similarly, the Han characters are often combined to “spell” words whose meaning may not be evident from the constituent characters. For example, the two characters “to cut” and “hand” mean “postage stamp” in Japanese, but the compound may appear to be nonsense to a speaker of Chinese or Korean (see Figure 12-1).

Figure 12-1. Han Spelling

+ to cut

hand

Copyright © 1991-2007, Unicode, Inc.

=

1. Japanese “stamp” 2. Chinese “cut hand”

The Unicode Standard 5.0 – Electronic edition

12.1

Han

415

Even within one language, a computer requires context to distinguish the meanings of words represented by coded characters. The word chuugoku in Japanese, for example, may refer to China or to a district in central west Honshuu (see Figure 12-2).

Figure 12-2. Semantic Context for Han Characters

+ middle

country

=

1. China 2. Chuugoku district of Honshuu

Coding these two characters as four so as to capture this distinction would probably cause more confusion and still not provide a general solution. The Unicode Standard leaves the issues of language tagging and word recognition up to a higher level of software and does not attempt to encode the language of the Han characters. Simplified and Traditional Chinese. There are currently two main varieties of written Chinese: “simplified Chinese” (jiântîzì), used in most parts of the People’s Republic of China (PRC) and Singapore, and “traditional Chinese” (fántîzì), used predominantly in the Hong Kong and Macao SARs, Taiwan, and overseas Chinese communities. The process of interconverting between the two is a complex one. This complexity arises largely because a single simplified form may correspond to multiple traditional forms, such as U+53F0 3, which is a traditional character in its own right and the simplified form for U+6AAF 4, U+81FA 5, and U+98B1 6. Moreover, vocabulary differences have arisen between Mandarin as spoken in Taiwan and Mandarin as spoken in the PRC, the most notable of which is the usual name of the language itself: guóy& (the National Language) in Taiwan and p&t]nghuà (the Common Speech) in the PRC. Merely converting the character content of a text from simplified Chinese to the appropriate traditional counterpart is insufficient to change a simplified Chinese document to traditional Chinese, or vice versa. (The vast majority of Chinese characters are the same in both simplified and traditional Chinese.) There are two PRC national standards, GB 2312-80 and GB 12345-90, which are intended to represent simplified and traditional Chinese, respectively. The character repertoires of the two are the same, but the simplified forms occur in GB 2312-80 and the traditional ones in GB 12345-90. These are both part of the IRG G source, with traditional forms and simplified forms separated where they differ. As a result, the Unicode Standard contains a number of distinct simplifications for characters, such as U+8AAC i and U+8BF4 j. While there are lists of official simplifications published by the PRC, most of these are obtained by applying a few general principles to specific areas. In particular, there is a set of radicals (such as U+2F94 / kangxi radical speech, U+2F99 0 kangxi radical shell, U+2FA8 1 kangxi radical gate, and U+2FC3 2 kangxi radical bird) for which simplifications exist (U+2EC8 + cjk radical c-simplified speech, U+2EC9 , cjk radical c-simplified shell, U+2ED4 - cjk radical c-simplified gate, and U+2EE6 . cjk radical c-simplified bird). The basic technique for simplifying a character containing one of these radicals is to substitute the simplified radical, as in the previous example.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

416

East Asian Scripts

The Unicode Standard does not explicitly encode all simplified forms for traditional Chinese characters. Where the simplified and traditional forms exist as different encoded characters, each should be used as appropriate. The Unicode Standard does not specify how to represent a new simplified form (or, more rarely, a new traditional form) that can be derived algorithmically from an encoded traditional form (simplified form). Dialects of Chinese. Chinese is not a single language, but a complex of spoken forms that share a single written form. Although these spoken forms are referred to as dialects, they are actually mutually unintelligible and distinct languages. Virtually all modern written Chinese is Mandarin, the dominant language in both the PRC and Taiwan. Speakers of other Chinese languages learn to read and write Mandarin, although they pronounce it using the rules of their own language. (This would be like having Spanish children read and write only French, but pronouncing it as if it were Spanish.) The major non-Mandarin Chinese languages are Cantonese (spoken in the Hong Kong and Macao SARs, in many overseas Chinese communities, and in much of Guangzhou province), Wu, Min, Hakka, Gan, and Xiang. Prior to the twentieth century, the standard form of written Chinese was literary Chinese, a form derived from the classical Chinese written, but probably not spoken by Confucius in the sixth century bce. The ideographic repertoire of the Unicode Standard is sufficient for all but the most specialized texts of modern Chinese, literary Chinese, and classical Chinese. Preclassical Chinese, written using seal forms or oracle bone forms, has not been systematically incorporated into the Unicode Standard. Of Chinese languages, Cantonese is occasionally found in printed materials; the others are almost never seen in printed form. There is less standardization for the ideographic repertoires of these languages, and no fully systematic effort has been undertaken to catalog the nonstandard ideographs they use. Because of efforts on the part of the government of the Hong Kong SAR, however, the current ideographic repertoire of the Unicode Standard should be adequate for many—but not all—written Cantonese texts. Sorting Han Ideographs. The Unicode Standard does not define a method by which ideographic characters are sorted; the requirements for sorting differ by locale and application. Possible collating sequences include phonetic, radical-stroke (KangXi, Xinhua Zidian, and so on), four-corner, and total stroke count. Raw character codes alone are seldom sufficient to achieve a usable ordering in any of these schemes; ancillary data are usually required. (See Table 12-7.) Character Glyphs. In form, Han characters are monospaced. Every character takes the same vertical and horizontal space, regardless of how simple or complex its particular form is. This practice follows from the long history of printing and typographical practice in China, which traditionally placed each character in a square cell. When written vertically, there are also a number of named cursive styles for Han characters, but the cursive forms of the characters tend to be quite idiosyncratic and are not implemented in general-purpose Han character fonts for computers.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

12.1

Han

417

There may be a wide variation in the glyphs used in different countries and for different applications. The most commonly used typefaces in one country may not be used in others. The types of glyphs used to depict characters in the Han ideographic repertoire of the Unicode Standard have been constrained by available fonts. Users are advised to consult authoritative sources for the appropriate glyphs for individual markets and applications. It is assumed that most Unicode implementations will provide users with the ability to select the font (or mixture of fonts) that is most appropriate for a given locale.

Principles of Han Unification Three-Dimensional Conceptual Model. To develop the explicit rules for unification, a conceptual framework was developed to model the nature of Han ideographic characters. This model expresses written elements in terms of three primary attributes: semantic (meaning, function), abstract shape (general form), and actual shape (instantiated, typeface form). These attributes are graphically represented in three dimensions according to the X, Y, and Z axes (see Figure 12-3).

1

2

Z (typeface)

Y(

abs

trac

t sh

ape

)

Figure 12-3. Three-Dimensional Conceptual Model

X (semantic)

The semantic attribute (represented along the X axis) distinguishes characters by meaning and usage. Distinctions are made between entirely unrelated characters such as > (marsh) and : (machine) as well as extensions or borrowings beyond the original semantic cluster such as ;1 (a phonetic borrowing used as a simplified form of :) and ;2 (table, the original meaning).

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

418

East Asian Scripts

The abstract shape attribute (the Y axis) distinguishes the variant forms of a single character with a single semantic attribute (that is, a character with a single position on the X axis). The actual shape (typeface) attribute (the Z axis) is for differences of type design (the actual shape used in imaging) of each variant form. Only characters that have the same abstract shape (that is, occupy a single point on the X and Y axes) are potential candidates for unification. Z-axis typeface and stylistic differences are generally ignored.

Unification Rules The following rules were applied during the process of merging Han characters from the different source character sets. R1 Source Separation Rule. If two ideographs are distinct in a primary source standard, then they are not unified. • This rule is sometimes called the round-trip rule because its goal is to facilitate a round-trip conversion of character data between an IRG source standard and the Unicode Standard without loss of information. • This rule was applied only for the work on the original CJK Unified Ideographs block [also known as the Unified Repertoire and Ordering (URO)]. The IRG dropped this rule in 1992 and will not use it in future work. Figure 12-4 illustrates six variants of the CJK ideograph meaning “sword.”

Figure 12-4. CJK Source Separation

“sword” Each of the six variants in Figure 12-4 is separately encoded in one of the primary source standards—in this case, J0 (JIS X 0208-1990), as shown in Table 12-4.

Table 12-4. Source Encoding for Sword Variants Unicode

JIS

U+5263 U+528D U+5271 U+5294 U+5292 U+91FC

J0-3775 J0-5178 J0-517B J0-5179 J0-517A J0-6E5F

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

12.1

Han

419

Because the six sword characters are historically related, they are not subject to disunification by the Noncognate Rule (R2) and thus would ordinarily have been considered for possible abstract shape-based unification by R3. Under that rule, the fourth and fifth variants would probably have been unified for encoding. However, the Source Separation Rule required that all six variants be separately encoded, precluding them from any consideration of shape-based unification. Further variants of the “sword” ideograph, U+5251 and U+528E, are also separately encoded because of application of the Source Separation Rule—in that case applied to one or more Chinese primary source standards, rather than to the J0 Japanese primary source standard. R2 Noncognate Rule. In general, if two ideographs are unrelated in historical derivation (noncognate characters), then they are not unified. For example, the ideographs in Figure 12-5, although visually quite similar, are nevertheless not unified because they are historically unrelated and have distinct meanings.

Figure 12-5. Not Cognates, Not Unified

≠ earth

warrior, scholar

R3 By means of a two-level classification (described next), the abstract shape of each ideograph is determined. Any two ideographs that possess the same abstract shape are then unified provided that their unification is not disallowed by either the Source Separation Rule or the Noncognate Rule.

Abstract Shape Two-Level Classification. Using the three-dimensional model, characters are analyzed in a two-level classification. The two-level classification distinguishes characters by abstract shape (Y axis) and actual shape of a particular typeface (Z axis). Variant forms are identified based on the difference of abstract shapes. To determine differences in abstract shape and actual shape, the structure and features of each component of an ideograph are analyzed as follows. Ideographic Component Structure. The component structure of each ideograph is examined. A component is a geometrical combination of primitive elements. Various ideographs can be configured with these components used in conjunction with other components. Some components can be combined to make a component more complicated in its structure. Therefore, an ideograph can be defined as a component tree with the entire ideograph as the root node and with the bottom nodes consisting of primitive elements (see Figure 12-6 and Figure 12-7). Ideograph Features. The following features of each ideograph to be compared are examined:

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

420

East Asian Scripts

Figure 12-6. Ideographic Component Structure

Figure 12-7. The Most Superior Node of an Ideographic Component

vs. vs. vs.

• Number of components • Relative positions of components in each complete ideograph • Structure of a corresponding component • Treatment in a source character set • Radical contained in a component Uniqueness or Unification. If one or more of these features are different between the ideographs compared, the ideographs are considered to have different abstract shapes and, therefore, are considered unique characters and are not unified. If all of these features are identical between the ideographs, the ideographs are considered to have the same abstract shape and are unified. Spatial Positioning. Ideographs may exist as a unit or may be a component of more complex ideographs. A source standard may describe a requirement for a component with a specific spatial positioning that would be otherwise unified on the principle of having the same abstract shape as an existing full ideograph. Examples of spatial positioning for ideographic components are left half, top half, and so on. Examples. Table 12-5 gives examples of some typical differences in abstract character shape, resulting in decisions not to unify characters. Also included in the table are all three instances of disunification based on distinctions in spatial positioning. Differences in the actual shapes of ideographs that have been unified are illustrated in Table 12-6.

Han Ideograph Arrangement The arrangement of the Unicode Han characters is based on the positions of characters as they are listed in four major dictionaries. The KangXi Zidian was chosen as primary

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

12.1

Han

421

Table 12-5. Ideographs Not Unified Characters

Reason

a‡b c‡d e‡f

Different number of components

g‡h i‡j k‡l a‡b c‡d e‡f

Characters treated differently in a source character set

Same number of components placed in different relative positions Same number and same relative position of components, corresponding components structured differently Characters with different radical in a component Same abstract shape, different actual shape Same abstract shape, different position (U+9FBB versus U+470C) Same abstract shape, different position (U+9FB9 versus U+20509) Same abstract shape, different position (U+9FBA versus U+2099D)

Table 12-6. Ideographs Unified Characters

m»n q»r s»t u»v w»x y»z P»Q ~»T o»p a.

Reason Different writing sequence Differences in overshoot at the stroke termination Differences in contact of strokes Differences in protrusion at the folded corner of strokes Differences in bent strokes Differences in stroke termination Differences in accent at the stroke initiation Difference in rooftop modification Difference in rotated strokes/dotsa

These ideographs (having the same abstract shape) would have been unified except for the Source Separation Rule.

because it contains most of the source characters and because the dictionary itself and the principles of character ordering it employs are commonly used throughout East Asia. The Han ideograph arrangement follows the index (page and position) of the dictionaries listed in Table 12-7 with their priorities. When a character is found in the KangXi Zidian, it follows the KangXi Zidian order. When it is not found in the KangXi Zidian and it is found in Dai Kan-Wa Jiten, it is given a position extrapolated from the KangXi position of the preceding character in Dai Kan-Wa Jiten.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

422

East Asian Scripts

Table 12-7. Han Ideograph Arrangement Priority Dictionary 1 2 3 4

KangXi Zidian Dai Kan-Wa Jiten Hanyu Da Zidian Dae Jaweon

City

Publisher

Version

Beijing Tokyo Chengdu Seoul

Zhonghua Bookstore, 1989 Taishuukan Shoten, 1986 Sichuan Cishu Publishing, 1986 Samseong Publishing Co. Ltd, 1988

Seventh edition Revised edition First edition First edition

When it is not found in either KangXi or Dai Kan-Wa, then the Hanyu Da Zidian and Dae Jaweon dictionaries are consulted in a similar manner. Ideographs with simplified KangXi radicals are placed in a group following the traditional KangXi radical from which the simplified radical is derived. For example, characters with the simplified radical + corresponding to KangXi radical / follow the last nonsimplified character having / as a radical. The arrangement for these simplified characters is that of the Hanyu Da Zidian. The few characters that are not found in any of the four dictionaries are placed following characters with the same KangXi radical and stroke count. Radical-Stroke Order. The radical-stroke order that results is a culturally neutral order. It does not exactly match the order found in common dictionaries. Information for sorting all CJK ideographs by the radical-stroke method is found in the Unihan Database (see “Unihan Database” in Section 4.1, Unicode Character Database). It should be used if characters from the various blocks containing ideographs (see Table 12-2) are to be properly interleaved. Note, however, that there is no standard way of ordering characters with the same radical-stroke count; for most purposes, Unicode code point order would be as acceptable as any other way. A radical-stroke index to the IICore subset of the CJK unified ideographs is provided in Chapter 18, Han Radical-Stroke Index, to help locate the most useful and common Han characters in the standard. A full radical-stroke index of all CJK unified ideographs, together with a complete chart listing, can be found on the Unicode Web site. Details regarding the form of the online charts for the CJK unified ideographs are discussed in Section 17.2, CJK Unified Ideographs.

Mappings for Han Ideographs The mappings defined by the IRG between the ideographs in the Unicode Standard and the IRG sources are specified in the Unihan Database. These mappings are considered to be normative parts of ISO/IEC 10646 and of the Unicode Standard; that is, the characters are defined to be the targets for conversion of these characters in these character set standards. These mappings have been derived from editions of the source standards provided directly to the IRG by its member bodies, and they may not match mappings derived from the published editions of these standards. For this reason, developers may choose to use alternative mappings more directly correlated with published editions.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Chapter 13

Additional Modern Scripts 13 This chapter contains a collection of additional scripts in modern use that do not fit well into the script categories featured in other chapters: Ethiopic

Tifinagh

Canadian Aboriginal Syllabics

Mongolian

N’Ko

Deseret

Osmanya

Cherokee

Shavian

Ethiopic, Mongolian, and Tifinagh are scripts with long histories. Although their roots can be traced back to the original Semitic and North African writing systems, they would not be classified as Middle Eastern scripts today. The remaining scripts in this chapter have been developed relatively recently. Some of them show roots in Latin and other letterforms, including shorthand. They are all original creative contributions intended specifically to serve the linguistic communities that use them.

13.1 Ethiopic Ethiopic: U+1200–U+137F The Ethiopic syllabary originally evolved for writing the Semitic language Ge’ez. Indeed, the English noun “Ethiopic” simply means “the Ge’ez language.” Ge’ez itself is now limited to liturgical usage, but its script has been adopted for modern use in writing several languages of central east Africa, including Amharic, Tigre, and Oromo. Basic and Extended Ethiopic. The Ethiopic characters encoded here are the basic set that has become established in common usage for writing major languages. As with other productive scripts, the basic Ethiopic forms are sometimes modified to produce an extended range of characters for writing additional languages. Encoding Principles. The syllables of the Ethiopic script are traditionally presented as a two-dimensional matrix of consonant-vowel combinations. The encoding follows this structure; in particular, the codespace range U+1200..U+1357 is interpreted as a matrix of 43 consonants crossed with 8 vowels, making 344 conceptual syllables. Most of these consonant-vowel syllables are represented by characters in the script, but some of them happen to be unused, accounting for the blank cells in the matrix.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

446

Additional Modern Scripts

Variant Glyph Forms. A given Ethiopic syllable may be represented by different glyph forms, analogous to the glyph variants of Latin lowercase “a” or “g”, which do not coexist in the same font. Thus the particular glyph shown in the code chart for each position in the matrix is merely one representation of that conceptual syllable, and the glyph itself is not the object that is encoded. Labialized Subseries. A few Ethiopic consonants have labialized (“W”) forms that are traditionally allotted their own consonant series in the syllable matrix, although only a subset of the possible vowel forms are realized. Each of these derivative series is encoded immediately after the corresponding main consonant series. Because the standard vowel series includes both “AA” and “WAA”, two different cells of the syllable matrix might represent the “consonant + W + AA” syllable. For example: U+1257 = QH + WAA: potential but unused version of qhwaa U+125B = QHW + AA: ethiopic syllable qhwaa In these cases, where the two conceptual syllables are equivalent, the entry in the labialized subseries is encoded and not the “consonant + WAA” entry in the main syllable series. The six specific cases are enumerated in Table 13-1. In three of these cases, the -WAA position in the syllable matrix has been reanalyzed and used for encoding a syllable in -OA for extended Ethiopic.

Table 13-1. Labialized Forms in Ethiopic -WAA -WAA Form QWAA QHWAA XWAA KWAA KXWAA GWAA

Encoded as U+124B d U+125B e U+128B f U+12B3 g U+12C3 h U+1313 i

Not Used

Contrast

1247 1257 1287 12AF 12BF 130F

U+1247 { QOA U+1287 | XOA U+12AF } KOA

Also, within the labialized subseries, the sixth vowel (“-E”) forms are sometimes considered to be second vowel (“-U”) forms. For example: U+1249 = QW + U: unused version of qwe U+124D = QW + E: ethiopic syllable qwe In these cases, where the two syllables are nearly equivalent, the “-E” entry is encoded and not the “-U” entry. The six specific cases are enumerated in Table 13-2. Keyboard Input. Because the Ethiopic script includes more than 300 characters, the units of keyboard input must constitute some smaller set of entities, typically 43+8 codes interpreted as the coordinates of the syllable matrix. Because these keyboard input codes are expected to be transient entities that are resolved into syllabic characters before they enter stored text, keyboard input codes are not specified in this standard.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

13.1

Ethiopic

447

Table 13-2. Labialized Forms in Ethiopic -WE “-WE” Form Encoded as U+124D j U+125D k U+128D l U+12B5 m U+12C5 n U+1315 o

QWE QHWE XWE KWE KXWE GWE

Not Used 1249 1259 1289 12B1 12C1 1311

Syllable Names. The Ethiopic script often has multiple syllables corresponding to the same Latin letter, making it difficult to assign unique Latin names. Therefore the names list makes use of certain devices (such as doubling a Latin letter in the name) merely to create uniqueness; this device has no relation to the phonetics of these syllables in any particular language. Encoding Order and Sorting. The order of the consonants in the encoding is based on the traditional alphabetical order. It may differ from the sort order used for one or another language, if only because in many languages various pairs or triplets of syllables are treated as equivalent in the first sorting pass. For example, an Amharic dictionary may start out with a section headed by three H-like syllables: U+1200 ethiopic syllable ha U+1210 ethiopic syllable hha U+1280 ethiopic syllable xa Thus the encoding order cannot and does not implement a collation procedure for any particular language using this script. Word Separators. The traditional word separator is U+1361 ethiopic wordspace ( : ). In modern usage, a plain white wordspace (U+0020 space) is becoming common. Section Mark. One or more section marks are typically used on a separate line to mark the separation of sections. Commonly, an odd number is used and they are separated by spaces. Diacritical Marks. The Ethiopic script generally makes no use of diacritical marks, but they are sometimes employed for scholarly or didactic purposes. In particular, U+135F ethiopic combining gemination mark and U+030E combining double vertical line above are sometimes used to indicate emphasis or gemination (consonant doubling). Numbers. Ethiopic digit glyphs are derived from the Greek alphabet, possibly borrowed from Coptic letterforms. In modern use, European digits are often used. The Ethiopic number system does not use a zero, nor is it based on digital-positional notation. A number is denoted as a sequence of powers of 100, each preceded by a coefficient (2 through 99). In each term of the series, the power 100^n is indicated by n HUNDRED characters (merged to a digraph when n = 2). The coefficient is indicated by a tens digit and a ones digit, either of which is absent if its value is zero.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

448

Additional Modern Scripts

For example, the number 2345 is represented by 2345 = (20 + 3)*100^1 + (40 + 5)*100^0 = 20 3 100

40 5

= TWENTY THREE HUNDRED FORTY FIVE = 1373 136B 137B 1375 136D MNOPQ A language using the Ethiopic script may have a word for “thousand,” such as Amharic “SHI” (U+123A), and a quantity such as 2,345 may also be written as it is spoken in that language, which in the case of Amharic happens to parallel English: 2,345 = TWO thousand THREE HUNDRED FORTY FIVE = 136A 123A 136B 137B 1375 136D RSNOPQ

Ethiopic Extensions: U+1380–U+139F, U+2D80–U+2DDF The Ethiopic script is used for a large number of languages and dialects in Ethiopia and in some instances has been extended significantly beyond the set of characters used for major languages such as Amharic and Tigre. There are two blocks of extensions to the Ethiopic script: Ethiopic Supplement U+1380..U+139F and Ethiopic Extended U+2D80..U+2DDF. Those extensions cover such languages as Me’en, Blin, and Sebatbeit, which use many additional characters. Several other characters for Ethiopic script extensions can be found in the main Ethiopic script block in the range U+1200..U+137F. The Ethiopic Supplement block also contains a set of tonal marks. They are used in multiline scored layout. Like other musical (an)notational systems of this type, these tonal marks require a higher-level protocol to enable proper rendering.

13.2 Mongolian Mongolian: U+1800–U+18AF The Mongolians are key representatives of a cultural-linguistic group known as Altaic, after the Altai mountains of central Asia. In the past, these peoples have dominated the vast expanses of Asia and beyond, from the Baltic to the Sea of Japan. Echoes of Altaic languages remain from Finland, Hungary, and Turkey, across central Asia, to Korea and Japan. Today the Mongolians are represented politically in Mongolia proper (formally the Mongolian People’s Republic, also known as Outer Mongolia) and Inner Mongolia (formally the Inner Mongolia Autonomous Region, China), with Mongolian populations also living in other areas of China. The Mongolian block unifies Mongolian and the three derivative scripts Todo, Manchu, and Sibe. Each of the three derivative scripts shares some common letters with Mongolian,

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

13.2

Mongolian

449

and these letters are encoded only once. Each derivative script also has a number of modified letter forms or new letters, which are encoded separately. Mongolian, Todo, and Manchu also have a number special “Ali Gali” letters that are used for transcribing Tibetan and Sanskrit in Buddhist texts. History. The Mongolian script was derived from the Uighur script around the beginning of the thirteenth century, during the reign of Genghis Khan. The Uighur script, which was in use from about the eighth to the fifteenth centuries, was derived from Sogdian Aramaic, a Semitic script written horizontally from right to left. Probably under the influence of the Chinese script, the Uighur script became rotated 90 degrees counterclockwise so that the lines of text read vertically in columns running from left to right. The Mongolian script inherited this directionality from the Uighur script. The Mongolian script has remained in continuous use for writing Mongolian within the Inner Mongolia Autonomous Region of the People’s Republic of China and elsewhere in China. However, in the Mongolian People’s Republic (Outer Mongolia), the traditional script was replaced by a Cyrillic orthography in the early 1940s. The traditional script has been revived to an extent since the early 1990s, so that now both the Cyrillic and the Mongolian scripts are used. The spelling used with the traditional Mongolian script represents the literary language of the seventeenth and early eighteenth centuries, whereas the Cyrillic script is used to represent the modern, colloquial pronunciation of words. As a consequence, there is no one-to-one relationship between the traditional Mongolian orthography and Cyrillic orthography. Approximate correspondence mappings are indicated in the code charts, but are not necessarily unique in either direction. All of the Cyrillic characters needed to write Mongolian are included in the Cyrillic block of the Unicode Standard. In addition to the traditional Mongolian script of Mongolia, several historical modifications and adaptations of the Mongolian script have emerged elsewhere. These adaptations are often referred to as scripts in their own right, although for the purposes of character encoding in the Unicode Standard they are treated as styles of the Mongolian script and share encoding of their basic letters. The Todo script is a modified and improved version of the Mongolian script, devised in 1648 by Zaya Pandita for use by the Kalmyk Mongolians, who had migrated to Russia in the sixteenth century, and who now inhabit the Republic of Kalmykia in the Russian Federation. The name Todo means “clear” in Mongolian; it refers to the fact that the new script eliminates the ambiguities inherent in the original Mongolian script. The orthography of the Todo script also reflects the Oirat-Kalmyk dialects of Mongolian rather than literary Mongolian. In Kalmykia, the Todo script was replaced by a succession of Cyrillic and Latin orthographies from the mid-1920s and is no longer in active use. Until very recently the Todo script was still used by speakers of the Oirat and Kalmyk dialects within Xinjiang and Qinghai in China. The Manchu script is an adaptation of the Mongolian script used to write Manchu, a Tungusic language that is not closely related to Mongolian. The Mongolian script was first adapted for writing Manchu in 1599 under the orders of the Manchu leader Nurhachi, but few examples of this early form of the Manchu script survive. In 1632, the Manchu scholar

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

450

Additional Modern Scripts

Dahai reformed the script by adding circles and dots to certain letters in an effort to distinguish their different sounds and by devising new letters to represent the sounds of the Chinese language. When the Manchu people conquered China to rule as the Qing dynasty (1644–1911), Manchu become the language of state. The ensuing systematic program of translation from Chinese created a large and important corpus of books written in Manchu. Over time the Manchu people became completely sinified, and as a spoken language Manchu is now almost extinct. The Sibe (also spelled Sibo, Xibe, or Xibo) people are closely related to the Manchus, and their language is often classified as a dialect of Manchu. The Sibe people are widely dispersed across northwest and northeast China due to deliberate programs of ethnic dispersal during the Qing dynasty. The majority have become assimilated into the local population and no longer speak the Sibe language. However, there is a substantial Sibe population in the Sibe Autonomous County in the Ili River valley in Western Xinjiang, the descendants of border guards posted to Xinjiang in 1764, who still speak and write the Sibe language. The Sibe script is based on the Manchu script, with a few modified letters. Directionality. The Mongolian script is written vertically from top to bottom in columns running from left to right. In modern contexts, words or phrases may be embedded in horizontal scripts. In such a case, the Mongolian text will be rotated 90 degrees counterclockwise so that it reads from left to right. When rendering Mongolian text in a system that does not support vertical layout, the text should be laid out in horizontal lines running left to right, with the glyphs rotated 90 degrees counterclockwise with respect to their orientation in the code charts. If such text is viewed sideways, the usual Mongolian column order appears reversed, but this orientation can be workable for short stretches of text. There are no bidirectional effects in such a layout because all text is horizontal left to right. Encoding Principles. The encoding model for Mongolian is somewhat different from that for any other script within Unicode, and in many respects it is the most complicated. For this reason, only the essential features of Mongolian shaping behavior are presented here; the precise details are to be presented in a separate technical report. The Semitic alphabet from which the Mongolian script was ultimately derived is fundamentally inadequate for representing the sounds of the Mongolian language. As a result, many of the Mongolian letters are used to represent two different sounds, and the correct pronunciation of a letter may be known only from the context. In this respect, Mongolian orthography is similar to English spelling, in which the pronunciation of a letter such as c may be known only from the context. Unlike in the Latin script, in which c /k/ and c /s/ are treated as the same letter and encoded as a single character, in the Mongolian script different phonetic values of the same glyph may be encoded as distinct characters. Modern Mongolian grammars consider the phonetic value of a letter to be its distinguishing feature, rather than its glyph shape. For example, the four Mongolian vowels o, u, ö, and ü are considered four distinct letters and are encoded as four characters (U+1823, U+1824, U+1825, and U+1826, respectively), even though o is written identically to u in all positional forms, ö is written identically to ü in all

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

13.2

Mongolian

451

positional forms, o and u are normally distinguished from ö and ü only in the first syllable of a word. Likewise, the letters t (U+1832) and d (U+1833) are often indistinguishable. For example, pairs of Mongolian words such as urtu “long” and ordu “palace, camp, horde” or ende “here” and ada “devil” are written identically, but are represented using different sequences of Unicode characters, as shown in Figure 13-1. There are many such examples in Mongolian, but not in Todo, Manchu, or Sibe, which have largely eliminated ambiguous letters.

Figure 13-1. Mongolian Glyph Convergence

urtu

ordu

1824

1823

1837

1837

1832

1833

1824

1824

ende

ada

1821

1820

1828

1833

1833

1820

1821

Cursive Joining. The Mongolian script is cursive, and the letters constituting a word are normally joined together. In most cases the letters join together naturally along a vertical stem, but in the case of certain “bowed” consonants (for example, U+182A mongolian letter ba and the feminine form of U+182C mongolian letter qa), which lack a trailing vertical stem, they may form ligatures with a following vowel. This is illustrated in Figure 13-2, where the letter ba combines with the letter u to form a ligature in the Mongolian word abu “father.”

Figure 13-2. Mongolian Consonant Ligation

abu 1820 182A 1824

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

452

Additional Modern Scripts

Many letters also have distinct glyph forms depending on their position within a word. These positional forms are classified as initial, medial, final, or isolate. The medial form is often the same as the initial form, but the final form is always distinct from the initial or medial form. Figure 13-3 shows the Mongolian letters U+1823 o and U+1821 e, rendered with distinct positional forms initially and finally in the Mongolian words odo “now” and ene “this.”

Figure 13-3. Mongolian Positional Forms

odo

ene

1823

1821

1833

1828

1823

1821

U+200C zero width non-joiner (ZWNJ) and U+200D zero width joiner (ZWJ) may be used to select a particular positional form of a letter in isolation or to override the expected positional form within a word. Basically, they evoke the same contextual selection effects in neighboring letters as do non-joining or joining regular letters, but are themselves invisible (see Chapter 16, Special Areas and Format Characters). For example, the various positional forms of U+1820 mongolian letter a may be selected by means of the following character sequences: <1820> selects the isolate form. <1820 200D> selects the initial form. <200D 1820> selects the final form. <200D 1820 200D> selects the medial form. Some letters have additional variant forms that do not depend on their position within a word, but instead reflect differences between modern versus traditional orthographic practice or lexical considerations—for example, special forms used for writing foreign words. On occasion, other contextual rules may condition a variant form selection. For example, a certain variant of a letter may be required when it occurs in the first syllable of a word or when it occurs immediately after a particular letter. The various positional and variant glyph forms of a letter are considered presentation forms and are not encoded separately. It is the responsibility of the rendering system to select the correct glyph form for a letter according to its context. Free Variation Selectors. When a glyph form that cannot be predicted algorithmically is required (for example, when writing a foreign word), the user needs to append an appropriate variation selector to the letter to indicate to the rendering system which glyph form is required. The following free variation selectors are provided for use specifically with the Mongolian block:

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

13.2

Mongolian

453

U+180B mongolian free variation selector one (FVS1) U+180C mongolian free variation selector two (FVS2) U+180D mongolian free variation selector three (FVS3) These format characters normally have no visual appearance. When required, a free variation selector immediately follows the base character it modifies. This combination of base character and variation selector is known as a standardized variant. The table of standardized variants, StandardizedVariants.txt, in the Unicode Character Database exhaustively lists all currently defined standardized variants. All combinations not listed in the table are unspecified and are reserved for future standardization; no conformant process may interpret them as standardized variants. Therefore, any free variation selector not immediately preceded by one of their defined base characters will be ignored. Figure 13-4 gives an example of how a free variation selector may be used to select a particular glyph variant. In modern orthography, the initial letter ga in the Mongolian word gal “fire” is written with two dots; in traditional orthography, the letter ga is written without any dots. By default, the dotted form of the letter ga is selected, but this behavior may be overridden by means of FVS1, so that ga plus FVS1 selects the undotted form of the letter ga.

Figure 13-4. Mongolian Free Variation Selector

gal

gal

182D

182D

1820

180B

182F

1820 182F

It is important to appreciate that even though a particular standardized variant may be defined for a letter, the user needs to apply the appropriate free variation selector only if the correct glyph form cannot be predicted automatically by the rendering system. In most cases, in running text, there will be few occasions when a free variation selector is required to disambiguate the glyph form. Older documentation, external to the Unicode Standard, listed the action of the free variation selectors by using ZWJ to explicitly indicate the shaping environment affected by the variation selector. The relative order of the ZWJ and the free variation selector in these documents was different from the one required by Section 16.4, Variation Selectors. Older implementations of Mongolian free variation selectors may therefore interpret a sequence such as a base character followed by first by ZWJ and then by FVS1 as if it were a base character followed first by FVS1 and then by ZWJ.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

454

Additional Modern Scripts

Representative Glyphs. The representative glyph in the code charts is generally the isolate form for the vowels and the initial form for the consonants. Letters that share the same glyph forms are distinguished by using different positional forms for the representative glyph. For example, the representative glyph for U+1823 mongolian letter o is the isolate form, whereas the representative glyph for U+1824 mongolian letter u is the initial form. However, this distinction is only nominal, as the glyphs for the two characters are identical for the same positional form. Likewise, the representative glyphs for U+1863 mongolian letter sibe ka and U+1874 mongolian letter manchu ka both take the final form, as their initial forms are identical to the representative glyph for U+182C mongolian letter qa (the initial form). Vowel Harmony. Mongolian has a system of vowel harmony, whereby the vowels in a word are either all “masculine” and “neuter” vowels (that is, back vowels plus /i/) or all “feminine” and “neuter” vowels (that is, front vowels plus /i/). Words that are written with masculine/neuter vowels are considered to be masculine, and words that are written with feminine/neuter vowels are considered to be feminine. Words with only neuter vowels behave as feminine words (for example, take feminine suffixes). Manchu and Sibe have a similar system of vowel harmony, although it is not so strict. Some words in these two scripts may include both masculine and feminine vowels, and separated suffixes with masculine or feminine vowels may be applied to a stem irrespective of its gender. Vowel harmony is an important element of the encoding model, as the gender of a word determines the glyph form of the velar series of consonant letters for Mongolian, Todo, Sibe, and Manchu. In each script, the velar letters have both masculine and feminine forms. For Mongolian and Todo, the masculine and feminine forms of these letters have different pronunciations. When one of the velar consonants precedes a vowel, it takes the masculine form before masculine vowels, and the feminine form before feminine or neuter vowels. In the latter case, a ligature of the consonant and vowel is required. When one of these consonants precedes another consonant or is the final letter in a word, it may take either a masculine or feminine glyph form, depending on its context. The rendering system should automatically select the correct gender form for these letters based on the gender of the word (in Mongolian and Todo) or the gender of the preceding vowel (in Manchu and Sibe). This is illustrated by Figure 13-5, where U+182D mongolian letter ga takes a masculine glyph form when it occurs finally in the masculine word jarlig “order,” but takes a feminine glyph form when it occurs finally in the feminine word chirig “soldier.” In this example, the gender form of the final letter ga depends on whether the first vowel in the word is a back (masculine) vowel or a front (feminine or neuter) vowel. Where the gender is ambiguous or a form not derivable from the context is required, the user needs to specify which form is required by means of the appropriate free variation selector. Narrow No-Break Space. In Mongolian, Todo, Manchu, and Sibe, certain grammatical suffixes are separated from the stem of a word or from other suffixes by a narrow gap. There are many such suffixes in Mongolian, usually occurring in masculine and feminine pairs (for example, the dative suffixes -dur and -dür), and a stem may take multiple suffixes. In

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

13.2

Mongolian

455

Figure 13-5. Mongolian Gender Forms

jarlig

chirig 1834

1835 1820 1837 182F 1822 182D

1822 1837 1822 182D

contrast, there are only six separated suffixes for Manchu and Sibe, and stems do not take more than one suffix at a time. As any suffixes are considered to be an integral part of the word as a whole, a line break opportunity does not occur before a suffix, and the whitespace is represented using U+202F narrow no-break space (NNBSP). For a Mongolian font it is recommended that the width of NNBSP should be one-third the width of an ordinary space (U+0020 space). NNBSP affects the form of the preceding and following letters. The final letter of the stem or suffix preceding the NNBSP takes the final positional form, whereas the first letter of the suffix following NNBSP may take the normal initial form, a variant initial form, a medial form, or a final form, depending on the particular suffix. Mongolian Vowel Separator. In Mongolian, the letters a (U+1820) and e (U+1821) in a word-final position may take a “forward tail” form or a “backward tail” form depending on the preceding consonant that they are attached to. In some words, a final letter a or e is separated from the preceding consonant by a narrow gap, in which case the vowel always takes the “forward tail” form. U+180E mongolian vowel separator (MVS) is used to represent the whitespace that separates a final letter a or e from the rest of the word. MVS is very similar in function to NNBSP, as it divides a word with a narrow non-breaking whitespace. Whereas NNBSP marks off a grammatical suffix, however, the a or e following MVS is not a suffix but an integral part of the word stem. Whether a final letter a or e is joined or separated is purely lexical and is not a question of varying orthography. For example, the word qana <182C, 1820, 1828, 1820> without a gap before the final letter a means “the outer casing of a vein,” whereas the word qana <182C, 1820, 1828, 180E, 1820> with a gap before the final letter a means “the wall of a tent,” as shown in Figure 13-6.

Figure 13-6. Mongolian Vowel Separator

Qana with Connected Final

The Unicode Standard 5.0 – Electronic edition

Qana with Separated Final

Copyright © 1991–2007 Unicode, Inc.

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Chapter 14

Archaic Scripts

14

The following historic scripts are encoded in Version 5.0 of the Unicode Standard: Ogham

Linear B

Ugaritic

Old Italic

Cypriot

Old Persian

Runic

Phoenician

Sumero-Akkadian

Gothic Unicode encodes a number of historic scripts. Although they are no longer used to write living languages, documents and inscriptions using these scripts exist, both for extinct and precursors of modern languages. The primary user communities for these scripts are scholars interested in studying the scripts and the languages written in them. Some of the historical scripts are related to each other and to modern alphabets. The Ogham script is indigenous to Ireland. While its originators may have been aware of the Latin or Greek scripts, it seems clear that the sound values of Ogham letters were suited to the phonology of a form of Primitive Irish. Old Italic was derived from Greek and was used to write Etruscan and other languages in Italy. It was borrowed by the Romans and is the immediate ancestor of the Latin script now used worldwide. Old Italic had other descendants, too: The Alpine alphabets seem to have been influential in devising the Runic script, which has a distinct angular appearance owing to its use in carving inscriptions in stone and wood. Gothic, like Cyrillic, was developed on the basis of Greek at a much later date than Old Italic. The two historic scripts of northwestern Europe, Runic and Ogham, have a distinct appearance owing to their primary use in carving inscriptions in stone and wood. They are conventionally rendered from left to right in scholarly literature, but on the original stone carvings often proceeded in an arch tracing the outline of the stone. Both Linear B and Cypriot are syllabaries that were used to write Greek. Linear B is the older of the two scripts, and there are some similarities between a few of the characters that may not be accidental. Cypriot may descend from Cypro-Minoan, which in turn may descend from Linear B. The Phoenician alphabet was used in various forms around the Mediterranean. It is ancestral to Latin, Greek, Hebrew, and many other scripts both modern and historical.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

472

Archaic Scripts

Three ancient cuneiform scripts are described in this chapter: Ugaritic, Old Persian, and Sumero-Akkadian. The largest and oldest of these is Sumero-Akkadian. The other two scripts are not derived directly from the Sumero-Akkadian tradition but had common writing technology, consisting of wedges indented into clay tablets with reed styluses. Ugaritic texts are about as old as the earliest extant Biblical texts. Old Persian texts are newer, dating from the fifth century bce.

14.1 Ogham Ogham: U+1680–U+169F Ogham is an alphabetic script devised to write a very early form of Irish. Monumental Ogham inscriptions are found in Ireland, Wales, Scotland, England, and on the Isle of Man. Many of the Scottish inscriptions are undeciphered and may be in Pictish. It is probable that Ogham (Old Irish “Ogam”) was widely written in wood in early times. The main flowering of “classical” Ogham, rendered in monumental stone, was in the fifth and sixth centuries ce. Such inscriptions were mainly employed as territorial markers and memorials; the more ancient examples are standing stones. The script was originally written along the edges of stone where two faces meet; when written on paper, the central “stemlines” of the script can be said to represent the edge of the stone. Inscriptions written on stemlines cut into the face of the stone, instead of along its edge, are known as “scholastic” and are of a later date (post-seventh century). Notes were also commonly written in Ogham in manuscripts as recently as the sixteenth century. Structure. The Ogham alphabet consists of 26 distinct characters (feda), the first 20 of which are considered to be primary and the last 6 (forfeda) supplementary. The four primary series are called aicmí (plural of aicme, meaning “family”). Each aicme was named after its first character, (Aicme Beithe, Aicme Uatha, meaning “the B Family,” “the H Family,” and so forth). The character names used in this standard reflect the spelling of the names in modern Irish Gaelic, except that the acute accent is stripped from Úr, Éabhadh, Ór, and Ifín, and the mutation of nGéadal is not reflected. Rendering. Ogham text is read beginning from the bottom left side of a stone, continuing upward, across the top, and down the right side (in the case of long inscriptions). Monumental Ogham was incised chiefly in a bottom-to-top direction, though there are examples of left-to-right bilingual inscriptions in Irish and Latin. Manuscript Ogham accommodated the horizontal left-to-right direction of the Latin script, and the vowels were written as vertical strokes as opposed to the incised notches of the inscriptions. Ogham should therefore be rendered on computers from left to right or from bottom to top (never starting from top to bottom). Forfeda (Supplementary Characters). In printed and in manuscript Ogham, the fonts are conventionally designed with a central stemline, but this convention is not necessary. In implementations without the stemline, the character U+1680 ogham space mark should

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

14.2

Old Italic

473

be given its conventional width and simply left blank like U+0020 space. U+169B ogham feather mark and U+169C ogham reversed feather mark are used at the beginning and the end of Ogham text, particularly in manuscript Ogham. In some cases, only the Ogham feather mark is used, which can indicate the direction of the text. The word latheirt MNOPQRSTPU shows the use of the feather marks. This word was written in the margin of a ninth-century Latin grammar and means “massive hangover,” which may be the scribe’s apology for any errors in his text.

14.2 Old Italic Old Italic: U+10300–U+1032F The Old Italic script unifies a number of related historical alphabets located on the Italian peninsula. Some of these were used for non-Indo-European languages (Etruscan and probably North Picene), and some for various Indo-European languages belonging to the Italic branch (Faliscan and members of the Sabellian group, including Oscan, Umbrian, and South Picene). The ultimate source for the alphabets in ancient Italy is Euboean Greek used at Ischia and Cumae in the bay of Naples in the eighth century bce. Unfortunately, no Greek abecedaries from southern Italy have survived. Faliscan, Oscan, Umbrian, North Picene, and South Picene all derive from an Etruscan form of the alphabet. There are some 10,000 inscriptions in Etruscan. By the time of the earliest Etruscan inscriptions, circa 700 bce, local distinctions are already found in the use of the alphabet. Three major stylistic divisions are identified: the Northern, Southern, and Caere/Veii. Use of Etruscan can be divided into two stages, owing largely to the phonological changes that occurred: the “archaic Etruscan alphabet,” used from the seventh to the fifth centuries bce, and the “neo-Etruscan alphabet,” used from the fourth to the first centuries bce. Glyphs for eight of the letters differ between the two periods; additionally, neo-Etruscan abandoned the letters ka, ku, and eks. The unification of these alphabets into a single Old Italic script requires language-specific fonts because the glyphs most commonly used may differ somewhat depending on the language being represented. Most of the languages have added characters to the common repertoire: Etruscan and Faliscan add letter ef; Oscan adds letter ef, letter ii, and letter uu; Umbrian adds letter ef, letter ers, and letter che; North Picene adds letter uu; and Adriatic adds letter ii and letter uu. The Latin script itself derives from a south Etruscan model, probably from Caere or Veii, around the mid-seventh century bce or a bit earlier. However, because there are significant differences between Latin and Faliscan of the seventh and sixth centuries bce in terms of formal differences (glyph shapes, directionality) and differences in the repertoire of letters used, this warrants a distinctive character block. Fonts for early Latin should use the uppercase code positions U+0041..U+005A. The unified Alpine script, which includes the

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

474

Archaic Scripts

Venetic, Rhaetic, Lepontic, and Gallic alphabets, has not yet been proposed for addition to the Unicode Standard but is considered to differ enough from both Old Italic and Latin to warrant independent encoding. The Alpine script is thought to be the source for Runic, which is encoded at U+16A0..U+16FF. (See Section 14.3, Runic.) Character names assigned to the Old Italic block are unattested but have been reconstructed according to the analysis made by Sampson (1985). While the Greek character names (alpha, beta, gamma, and so on) were borrowed directly from the Phoenician names (modified to Greek phonology), the Etruscans are thought to have abandoned the Greek names in favor of a phonetically based nomenclature, where stops were pronounced with a following -e sound, and liquids and sibilants (which can be pronounced more or less on their own) were pronounced with a leading e- sound (so [k], [d] became [ke:], [de:] became [l:], [m:] became [el], [em]). It is these names, according to Sampson, which were borrowed by the Romans when they took their script from the Etruscans. Directionality. Most early Etruscan texts have right-to-left directionality. From the third century bce, left-to-right texts appear, showing the influence of Latin. Oscan, Umbrian, and Faliscan also generally have right-to-left directionality. Boustrophedon appears rarely, and not especially early (for instance, the Forum inscription dates to 550–500 bce). Despite this, for reasons of implementation simplicity, many scholars prefer left-to-right presentation of texts, as this is also their practice when transcribing the texts into Latin script. Accordingly, the Old Italic script has a default directionality of strong left-to-right in this standard. If the default directionality of the script is overridden to produce a right-to-left presentation, the glyphs in Old Italic fonts should also be mirrored from the representative glyphs shown in the code charts. This kind of behavior is not uncommon in archaic scripts; for example, archaic Greek letters may be mirrored when written from right to left in boustrophedon. Punctuation. The earliest inscriptions are written with no space between words in what is called scriptio continua. There are numerous Etruscan inscriptions with dots separating word forms, attested as early as the second quarter of the seventh century bce. This punctuation is sometimes, but only rarely, used to separate syllables rather than words. From the sixth century bce, words were often separated by one, two, or three dots spaced vertically above each other. Numerals. Etruscan numerals are not well attested in the available materials, but are employed in the same fashion as Roman numerals. Several additional numerals are attested, but as their use is at present uncertain, they are not yet encoded in the Unicode Standard. Glyphs. The default glyphs in the code charts are based on the most common shapes found for each letter. Most of these are similar to the Marsiliana abecedary (mid-seventh century bce). Note that the phonetic values for U+10317 old italic letter eks [ks] and U+10319 old italic letter khe [kh] show the influence of western, Euboean Greek; eastern Greek has U+03A7 greek capital letter chi [x] and U+03A8 greek capital letter psi [ps] instead.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

14.3

Runic

475

The geographic distribution of the Old Italic script is shown in Figure 14-1. In the figure, the approximate distribution of the ancient languages that used Old Italic alphabets is shown in white. Areas for the ancient languages that used other scripts are shown in gray, and the labels for those languages are shown in oblique type. In particular, note that the ancient Greek colonies of the southern Italian and Sicilian coasts used the Greek script proper. Also, languages such as Ligurian, Venetic, and so on, of the far north of Italy made use of alphabets of the Alpine script. Rome, of course, is shown in gray, because Latin was written with the Latin alphabet, now encoded in the Latin script.

Figure 14-1. Distribution of Old Italic Rhaetic Venetic

Lepontic Gallic Etruscan N. Picene Umbrian S. Picene

Central Sabellian languages Oscan

Ligurian Etruscan Faliscan Latin (Rome)

Messapic

Volscian

Elimian Sicanian

Greek Siculan

14.3 Runic Runic: U+16A0–U+16F0 The Runic script was historically used to write the languages of the early and medieval societies in the German, Scandinavian, and Anglo-Saxon areas. Use of the Runic script in various forms covers a period from the first century to the nineteenth century. Some 6,000 Runic inscriptions are known. They form an indispensable source of information about the development of the Germanic languages. Historical Script. The Runic script is an historical script, whose most important use today is in scholarly and popular works about the old Runic inscriptions and their interpretation. The Runic script illustrates many technical problems that are typical for this kind of script. Unlike many other scripts in the Unicode Standard, which predominantly serve the needs of the modern user community—with occasional extensions for historic forms—the encoding of the Runic script attempts to suit the needs of texts from different periods of time and from distinct societies that had little contact with one another.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

476

Archaic Scripts

Direction. Like other early writing systems, runes could be written either from left to right or from right to left, or moving first in one direction and then the other (boustrophedon), or following the outlines of the inscribed object. At times, characters appear in mirror image, or upside down, or both. In modern scholarly literature, Runic is written from left to right. Therefore, the letters of the Runic script have a default directionality of strong leftto-right in this standard. The Runic Alphabet. Present-day knowledge about runes is incomplete. The set of graphemically distinct units shows greater variation in its graphical shapes than most modern scripts. The Runic alphabet changed several times during its history, both in the number and the shapes of the letters contained in it. The shapes of most runes can be related to some Latin capital letter, but not necessarily to a letter representing the same sound. The most conspicuous difference between the Latin and the Runic alphabets is the order of the letters. The Runic alphabet is known as the futhark from the name of its first six letters. The original old futhark contained 24 runes: †¢ ¶ ® ± ≤ ∑ π

∫æ ¡ √ « » …

œ “ ÷ ◊ ⁄‹ ﬁ ﬂ

They are usually transliterated in this way: f u ˛a r k g w

h n i j Ôp z s

t b e ml} d o

In England and Friesland, seven more runes were added from the fifth to the ninth century. In the Scandinavian countries, the futhark changed in a different way; in the eighth century, the simplified younger futhark appeared. It consists of only 16 runes, some of which are used in two different forms. The long-branch form is shown here: † ¢ ¶ ¨ ± ¥

º æ ¡ ≈ À

œ “ ÿ ⁄Ê

f u ˛ o r k

h n i a s

t b ml Ä

The use of runes continued in Scandinavia during the Middle Ages. During that time, the futhark was influenced by the Latin alphabet and new runes were invented so that there was full correspondence with the Latin letters. Representative Glyphs. The known inscriptions can include considerable variations of shape for a given rune, sometimes to the point where the nonspecialist will mistake the shape for a different rune. There is no dominant main form for some runes, particularly for many runes added in the Anglo-Friesian and medieval Nordic systems. When transcribing a Runic inscription into its Unicode-encoded form, one cannot rely on the idealized representative glyph shape in the character charts alone. One must take into account to which of the four Runic systems an inscription belongs and be knowledgeable about the permitted form variations within each system. The representative glyphs were chosen to provide an image that distinguishes each rune visually from all other runes in the same system. For actual use, it might be advisable to use a separate font for each Runic system. Of particular note is the fact that the glyph for U+16C4 ƒ runic letter ger is actually a rare form, as the more common form is already used for U+16E1 · runic letter ior.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

14.4

Gothic

477

Unifications. When a rune in an earlier writing system evolved into several different runes in a later system, the unification of the earlier rune with one of the later runes was based on similarity in graphic form rather than similarity in sound value. In cases where a substantial change in the typical graphical form has occurred, though the historical continuity is undisputed, unification has not been attempted. When runes from different writing systems have the same graphic form but different origins and denote different sounds, they have been coded as separate characters. Long-Branch and Short-Twig. Two sharply different graphic forms, the long-branch and the short-twig form, were used for 9 of the 16 Viking Age Nordic runes. Although only one form is used in a given inscription, there are runologically important exceptions. In some cases, the two forms were used to convey different meanings in later use in the medieval system. Therefore the two forms have been separated in the Unicode Standard. Staveless Runes. Staveless runes are a third form of the Viking Age Nordic runes, a kind of Runic shorthand. The number of known inscriptions is small and the graphic forms of many of the runes show great variability between inscriptions. For this reason, staveless runes have been unified with the corresponding Viking Age Nordic runes. The corresponding Viking Age Nordic runes must be used to encode these characters—specifically the short-twig characters, where both short-twig and long-branch characters exist. Punctuation Marks. The wide variety of Runic punctuation marks has been reduced to three distinct characters based on simple aspects of their graphical form, as very little is known about any difference in intended meaning between marks that look different. Any other punctuation marks have been unified with shared punctuation marks elsewhere in the Unicode Standard. Golden Numbers. Runes were used as symbols for Sunday letters and golden numbers on calendar staves used in Scandinavia during the Middle Ages. To complete the number series 1–19, three more calendar runes were added. They are included after the punctuation marks. Encoding. A total of 81 characters of the Runic script are included in the Unicode Standard. Of these, 75 are Runic letters, 3 are punctuation marks, and 3 are Runic symbols. The order of the Runic characters follows the traditional futhark order, with variants and derived runes being inserted directly after the corresponding ancestor. Runic character names are based as much as possible on the sometimes several traditional names for each rune, often with the Latin transliteration at the end of the name.

14.4 Gothic Gothic: U+10330–U+1034F The Gothic script was devised in the fourth century by the Gothic bishop, Wulfila (311–383 ce), to provide his people with a written language and a means of reading his translation of

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Chapter 15

Symbols

15

The universe of symbols is rich and open-ended. The collection of encoded symbols in the Unicode Standard encompasses the following: Currency symbols

Geometrical symbols

Letterlike symbols

Miscellaneous symbols and dingbats

Mathematical alphabets

Enclosed and square symbols

Number forms

Braille patterns

Mathematical symbols

Western and Byzantine musical symbols

Invisible mathematical operators

Ancient Greek musical notation

Technical symbols There are other notational systems not covered by the Unicode Standard. Some symbols mark the transition between pictorial items and text elements; because they do not have a well-defined place in plain text, they are not encoded here. Combining marks may be used with symbols, particularly the set encoded at U+20D0.. U+20FF (see Section 7.9, Combining Marks). Letterlike and currency symbols, as well as number forms including superscripts and subscripts, are typically subject to the same font and style changes as the surrounding text. Where square and enclosed symbols occur in East Asian contexts, they generally follow the prevailing type styles. Other symbols have an appearance that is independent of type style, or a more limited or altogether different range of type style variation than the regular text surrounding them. For example, mathematical alphanumeric symbols are typically used for mathematical variables; those letterlike symbols that are part of this set carry semantic information in their type style. This fact restricts—but does not completely eliminate—possible style variations. However, symbols such as mathematical operators can be used with any script or independent of any script. Special invisible operator characters can be used to explicitly encode some mathematical operations, such as multiplication, which are normally implied by juxtaposition. This aids in automatic interpretation of mathematical notation.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

490

Symbols

In a bidirectional context (see Unicode Standard Annex #9, “The Bidirectional Algorithm”), symbol characters have no inherent directionality but resolve according to the Unicode Bidirectional Algorithm. Where the image of a symbol is not bilaterally symmetric, the mirror image is used when the character is part of the right-to-left text stream (see Section 4.7, Bidi Mirrored—Normative). Dingbats and optical character recognition characters are different from all other characters in the standard, in that they are encoded based on their precise appearance. Braille patterns are a special case, because they can be used to write text. They are included as symbols, as the Unicode Standard encodes only their shapes; the association of letters to patterns is left to other standards. When a character stream is intended primarily to convey text information, it should be coded using one of the scripts. Only when it is intended to convey a particular binding of text to Braille pattern sequence should it be coded using the Braille patterns. Musical notation—particularly Western musical notation—is different from ordinary text in the way it is laid out, especially the representation of pitch and duration in Western musical notation. However, ordinary text commonly refers to the basic graphical elements that are used in musical notation, and it is primarily those symbols that are encoded in the Unicode Standard. Additional sets of symbols are encoded to support historical systems of musical notation. Many symbols encoded in the Unicode Standard are intended to support legacy implementations and obsolescent practices, such as terminal emulation or other character mode user interfaces. Examples include box drawing components and control pictures. Many of the symbols encoded in Unicode can be used as operators or given some other syntactical function in a formal language syntax. For more information, see Unicode Standard Annex #31, “Identifier and Pattern Syntax.”

15.1 Currency Symbols Currency symbols are intended to encode the customary symbolic signs used to indicate certain currencies in general text. These signs vary in shape and are often used for more than one currency. Not all currencies are represented by a special currency symbol; some use multiple-letter strings instead, such as “Sfr” for Swiss franc. Moreover, the abbreviations for currencies can vary by language. The Common Locale Data Registry (CLDR) provides further information; see Section B.6, Other Unicode Online Resources. Therefore, implementations that are concerned with the exact identity of a currency should not depend on an encoded currency sign character. Instead, they should follow standards such as the ISO 4217 three-letter currency codes, which are specific to currencies—for example, USD for U.S. dollar, CAD for Canadian dollar. Unification. The Unicode Standard does not duplicate encodings where more than one currency is expressed with the same symbol. Many currency symbols are overstruck letters.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

15.1

Currency Symbols

491

There are therefore many minor variants, such as the U+0024 dollar sign $, with one or two vertical bars, or other graphical variation, as shown in Figure 15-1.

Figure 15-1. Alternative Glyphs for Dollar Sign

$$ Claims that glyph variants of a certain currency symbol are used consistently to indicate a particular currency could not be substantiated upon further research. Therefore, the Unicode Standard considers these variants to be typographical and provides a single encoding for them. See ISO/IEC 10367, Annex B (informative), for an example of multiple renderings for U+00A3 pound sign. Fonts. Currency symbols are commonly designed to display at the same width as a digit (most often a European digit, U+0030..U+0039) to assist in alignment of monetary values in tabular displays. Like letters, they tend to follow the stylistic design features of particular fonts because they are used often and need to harmonize with body text. In particular, even though there may be more or less normative designs for the currency sign per se, as for the euro sign, type designers freely adapt such designs to make them fit the logic of the rest of their fonts. This partly explains why currency signs show more glyph variation than other types of symbols.

Currency Symbols: U+20A0–U+20CF This block contains currency symbols that are not encoded in other blocks. Common currency symbols encoded in other blocks are listed in Table 15-1.

Table 15-1. Currency Symbols Encoded in Other Blocks Currency

Unicode Code Point

Dollar, milreis, escudo, peso Cent Pound and lira General currency Yen or yuan Dutch florin Afghani Rupee Rupee Rupee Rupee Baht Riel German mark (historic) Yuan, yen, won, HKD Yen Yuan

U+0024 U+00A2 U+00A3 U+00A4 U+00A5 U+0192 U+060B U+09F2 U+09F3 U+0AF1 U+0BF9 U+0E3F U+17DB U+2133 U+5143 U+5186 U+5706

The Unicode Standard 5.0 – Electronic edition

dollar sign cent sign pound sign currency sign yen sign latin small letter f with hook afghani sign bengali rupee mark bengali rupee sign gujarati rupee sign tamil rupee sign thai currency symbol baht khmer currency symbol riel script capital m cjk unified ideograph-5143 cjk unified ideograph-5186 cjk unified ideograph-5706

Copyright © 1991–2007 Unicode, Inc.

492

Symbols

Table 15-1. Currency Symbols Encoded in Other Blocks (Continued) Yuan, yen, won, HKD, NTD U+5713 cjk unified ideograph-5713 Rial U+FDFC rial sign

Lira Sign. A separate currency sign U+20A4 lira sign is encoded for compatibility with the HP Roman-8 character set, which is still widely implemented in printers. In general, U+00A3 pound sign should be used for both the various currencies known as pound (or punt) and the various currencies known as lira—for example, the former currency of Italy and the lira still in use in Turkey. Widespread implementation practice in Italian and Turkish systems has long made use of U+00A3 as the currency sign for the lira. As in the case of the dollar sign, the glyphic distinction between single- and double-bar versions of the sign is not indicative of a systematic difference in the currency. Yen and Yuan. Like the dollar sign and the pound sign, U+00A5 yen sign has been used as the currency sign for more than one currency. While there may be some preferences to use a double-bar glyph for the yen currency of Japan (JPY) and a single-bar glyph for the yuan (renminbi) currency of China (CNY), this distinction is not systematic in all font designs, and there is considerable overlap in usage. As listed in Table 15-1, there are also a number of CJK ideographs to represent the words yen (or en) and yuan, as well as the Korean word won, and these also tend to overlap in use as currency symbols. In the Unicode Standard, U+00A5 yen sign is intended to be the character for the currency sign for both the yen and the yuan, with details of glyphic presentation left to font choice and local preferences. Euro Sign. The single currency for member countries of the European Economic and Monetary Union is the euro (EUR). The euro character is encoded in the Unicode Standard as U+20AC euro sign. For additional forms of currency symbols, see Fullwidth Forms (U+FFE0..U+FFE6).

15.2 Letterlike Symbols Letterlike Symbols: U+2100–U+214F Letterlike symbols are symbols derived in some way from ordinary letters of an alphabetic script. This block includes symbols based on Latin, Greek, and Hebrew letters. Stylistic variations of single letters are used for semantics in mathematical notation. See “Mathematical Alphanumeric Symbols” in this section for the use of letterlike symbols in mathematical formulas. Some letterforms have given rise to specialized symbols, such as U+211E prescription take. Numero Sign. U+2116 numero sign is provided both for Cyrillic use, where it looks like M, and for compatibility with Asian standards, where it looks like ñ.. Figure 15-2 illustrates a number of alternative glyphs for this sign. Instead of using a special symbol, French practice is to use an “N” or an “n”, according to context, followed by a superscript small letter “o” (No or no; plural Nos or nos). Legacy data encoded in ISO/IEC 8859-1 (Latin-1) or

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

15.2

Letterlike Symbols

493

other 8-bit character sets may also have represented the numero sign by a sequence of “N” followed by the degree sign (U+00B0 degree sign). Implementations interworking with legacy data should be aware of such alternative representations for the numero sign when converting data.

Figure 15-2. Alternative Glyphs for Numero Sign

Unit Symbols. Several letterlike symbols are used to indicate units. In most cases, however, such as for SI units (Système International), the use of regular letters or other symbols is preferred. U+2113 script small l is commonly used as a non-SI symbol for the liter. Official SI usage prefers the regular lowercase letter l. Three letterlike symbols have been given canonical equivalence to regular letters: U+2126 ohm sign, U+212A kelvin sign, and U+212B angstrom sign. In all three instances, the regular letter should be used. If text is normalized according to Unicode Standard Annex #15, “Unicode Normalization Forms,” these three characters will be replaced by their regular equivalents. In normal use, it is better to represent degrees Celsius “°C” with a sequence of U+00B0 degree sign + U+0043 latin capital letter c, rather than U+2103 degree celsius. For searching, treat these two sequences as identical. Similarly, the sequence U+00B0 degree sign + U+0046 latin capital letter f is preferred over U+2109 degree fahrenheit, and those two sequences should be treated as identical for searching. Compatibility. Some symbols are composites of several letters. Many of these composite symbols are encoded for compatibility with Asian and other legacy encodings. (See also “CJK Compatibility Ideographs” in Section 12.1, Han.) The use of these composite symbols is discouraged where their presence is not required by compatibility. For example, in normal use, the symbols U+2121 TEL telephone sign and U+213B FAX facsimile sign are simply spelled out. In the context of East Asian typography, many letterlike symbols, and in particular composites, form part of a collection of compatibility symbols, the larger part of which is located in the CJK Compatibility block (see Section 15.9, Enclosed and Square). When used in this way, these symbols are rendered as “wide” characters occupying a full cell. They remain upright in vertical layout, contrary to the rotated rendering of their regular letter equivalents. See Unicode Standard Annex #11, “East Asian Width,” for more information. Where the letterlike symbols have alphabetic equivalents, they collate in alphabetic sequence; otherwise, they should be treated as neutral symbols. The letterlike symbols may have different directional properties than normal letters. For example, the four transfinite cardinal symbols (U+2135..U+2138) are used in ordinary mathematical text and do not share the strong right-to-left directionality of the Hebrew letters from which they are derived.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

494

Symbols

Styles. The letterlike symbols include some of the few instances in which the Unicode Standard encodes stylistic variants of letters as distinct characters. For example, there are instances of blackletter (Fraktur), double-struck, italic, and script styles for certain Latin letters used as mathematical symbols. The choice of these stylistic variants for encoding reflects their common use as distinct symbols. They form part of the larger set of mathematical alphanumeric symbols. For the complete set and more information on its use, see “Mathematical Alphanumeric Symbols” in this section. These symbols should not be used in ordinary, nonscientific texts. Despite its name, U+2118 script capital p is neither script nor capital—it is uniquely the Weierstrass elliptic function symbol derived from a calligraphic lowercase p. U+2113 script small l is derived from a special italic form of the lowercase letter l and, when it occurs in mathematical notation, is known as the symbol ell. Use U+1D4C1 mathematical script small l as the lowercase script l for mathematical notation. Standards. The Unicode Standard encodes letterlike symbols from many different national standards and corporate collections.

Mathematical Alphanumeric Symbols: U+1D400–U+1D7FF The Mathematical Alphanumeric Symbols block contains a large extension of letterlike symbols used in mathematical notation, typically for variables. The characters in this block are intended for use only in mathematical or technical notation; they are not intended for use in nontechnical text. When used with markup languages—for example, with Mathematical Markup Language (MathML)—the characters are expected to be used directly, instead of indirectly via entity references or by composing them from base letters and style markup. Words Used as Variables. In some specialties, whole words are used as variables, not just single letters. For these cases, style markup is preferred because in ordinary mathematical notation the juxtaposition of variables generally implies multiplication, not word formation as in ordinary text. Markup not only provides the necessary scoping in these cases, but also allows the use of a more extended alphabet.

Mathematical Alphabets Basic Set of Alphanumeric Characters. Mathematical notation uses a basic set of mathematical alphanumeric characters, which consists of the following: • The set of basic Latin digits (0–9) (U+0030..U+0039) • The set of basic uppercase and lowercase Latin letters (a– z, A–Z) • The uppercase Greek letters ë–© (U+0391..U+03A9), plus the nabla á (U+2207) and the variant of theta p given by U+03F4 • The lowercase Greek letters α–… (U+03B1..U+03C9), plus the partial differential sign Ç (U+2202), and the six glyph variants q, r, s, t, u, and v, given by U+03F5, U+03D1, U+03F0, U+03D5, U+03F1, and U+03D6, respectively

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

15.2

Letterlike Symbols

495

Only unaccented forms of the letters are used for mathematical notation, because general accents such as the acute accent would interfere with common mathematical diacritics. Examples of common mathematical diacritics that can interfere with general accents are the circumflex, macron, or the single or double dot above, the latter two of which are used in physics to denote derivatives with respect to the time variable. Mathematical symbols with diacritics are always represented by combining character sequences. For some characters in the basic set of Greek characters, two variants of the same character are included. This is because they can appear in the same mathematical document with different meanings, even though they would have the same meaning in Greek text. (See “Variant Letterforms” in Section 7.2, Greek.) Additional Characters. In addition to this basic set, mathematical notation uses the uppercase and lowercase digamma, in regular (U+03DC and U+03DD) and bold (U+1D7CA and U+1D7CB), and the four Hebrew-derived characters (U+2135..U+2138). Occasional uses of other alphabetic and numeric characters are known. Examples include U+0428 cyrillic capital letter sha, U+306E hiragana letter no, and Eastern Arabic-Indic digits (U+06F0..U+06F9). However, these characters are used only in their basic forms, rather than in multiple mathematical styles. Dotless Characters. In the Unicode Standard, the characters “i” and “j”, including their variations in the mathematical alphabets, have the Soft_Dotted property. Any conformant renderer will remove the dot when the character is followed by a nonspacing combining mark above. Therefore, using an individual mathematical italic i or j with math accents would result in the intended display. However, in mathematical equations an entire subexpression can be placed underneath a math accent—for example, when a “wide hat” is placed on top of i+j, as shown in Figure 15-3.

Figure 15-3. Wide Mathematical Accents

ˆ i+j = iˆ + jˆ In such a situation, a renderer can no longer rely simply on the presence of an adjacent combining character to substitute for the un-dotted glyph, and whether the dots should be removed in such a situation is no longer predictable. Authors differ in whether they expect the dotted or dotless forms in that case. In some documents mathematical italic dotless i or j is used explicitly without any combining marks, or even in contrast to the dotted versions. Therefore, the Unicode Standard provides the explicitly dotless characters U+1D6A4 mathematical italic small dotless i and U+1D6A5 mathematical italic small dotless j. These two characters map to the ISOAMSO entities imath and jmath or the TEX macros \imath and \jmath. These entities are, by default, always italic. The appearance of these two characters in the code charts is similar to the shapes of the entities documented in the ISO 9573-13 entity sets and used by TEX. The mathematical dotless characters do not have case mappings.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

496

Symbols

Semantic Distinctions. Mathematical notation requires a number of Latin and Greek alphabets that initially appear to be mere font variations of one another. The letter H can appear as plain or upright (H), bold (H), italic (H), as well as script, Fraktur, and other styles. However, in any given document, these characters have distinct, and usually unrelated, mathematical semantics. For example, a normal H represents a different variable from a bold H, and so on. If these attributes are dropped in plain text, the distinctions are lost and the meaning of the text is altered. Without the distinctions, the well-known Hamiltonian formula turns into the integral equation in the variable H as shown in Figure 15-4.

Figure 15-4. Style Variants and Semantic Distinctions in Mathematics

Hamiltonian formula: Integral equation:

, = ∫dτ (q E 2 + µ H 2 ) H = ∫dτ(εE 2 + µH 2 )

Mathematicians will object that a properly formatted integral equation requires all the letters in this example (except for the “d”) to be in italics. However, because the distinction between s and H has been lost, they would recognize it as a fallback representation of an integral equation, and not as a fallback representation of the Hamiltonian. By encoding a separate set of alphabets, it is possible to preserve such distinctions in plain text. Mathematical Alphabets. The alphanumeric symbols encountered in mathematics and encoded in the Unicode Standard are given in Table 15-2.

Table 15-2. Mathematical Alphanumeric Symbols Math Style

Characters from Basic Set Location

plain (upright, serifed) bold italic bold italic script (calligraphic) bold script (calligraphic) Fraktur bold Fraktur double-struck sans-serif sans-serif bold sans-serif italic sans-serif bold italic monospace

Latin, Greek, and digits Latin, Greek, and digits Latin and Greek Latin and Greek Latin Latin Latin Latin Latin and digits Latin and digits Latin, Greek, and digits Latin Latin and Greek Latin and digits

Copyright © 1991-2007, Unicode, Inc.

BMP Plane 1 Plane 1 Plane 1 Plane 1 Plane 1 Plane 1 Plane 1 Plane 1 Plane 1 Plane 1 Plane 1 Plane 1 Plane 1

The Unicode Standard 5.0 – Electronic edition

15.2

Letterlike Symbols

497

The plain letters have been unified with the existing characters in the Basic Latin and Greek blocks. There are 24 double-struck, italic, Fraktur, and script characters that already exist in the Letterlike Symbols block (U+2100..U+214F). These are explicitly unified with the characters in this block, and corresponding holes have been left in the mathematical alphabets. The alphabets in this block encode only semantic distinction, but not which specific font will be used to supply the actual plain, script, Fraktur, double-struck, sans-serif, or monospace glyphs. Especially the script and double-struck styles can show considerable variation across fonts. Characters from the Mathematical Alphanumeric Symbols block are not to be used for nonmathematical styled text. Compatibility Decompositions. All mathematical alphanumeric symbols have compatibility decompositions to the base Latin and Greek letters. This does not imply that the use of these characters is discouraged for mathematical use. Folding away such distinctions by applying the compatibility mappings is usually not desirable, however, as it loses the semantic distinctions for which these characters were encoded. See Unicode Standard Annex #15, “Unicode Normalization Forms.”

Fonts Used for Mathematical Alphabets Mathematicians place strict requirements on the specific fonts used to represent mathematical variables. Readers of a mathematical text need to be able to distinguish single-letter variables from each other, even when they do not appear in close proximity. They must be able to recognize the letter itself, whether it is part of the text or is a mathematical variable, and lastly which mathematical alphabet it is from. Fraktur. The blackletter style is often referred to as Fraktur or Gothic in various sources. Technically, Fraktur and Gothic typefaces are distinct designs from blackletter, but any of several font styles similar in appearance to the forms shown in the charts can be used. Note that in East Asian typography, the term Gothic is commonly used to indicate a sans-serif type style. Math Italics. Mathematical variables are most commonly set in a form of italics, but not all italic fonts can be used successfully. For example, a math italic font should avoid a “tail” on the lowercase italic letter z because it clashes with subscripts. In common text fonts, the italic letter v and Greek letter nu are not very distinct. A rounded italic letter v is therefore preferred in a mathematical font. There are other characters that sometimes have similar shapes and require special attention to avoid ambiguity. Examples are shown in Figure 15-5. Hard-to-Distinguish Letters. Not all sans-serif fonts allow an easy distinction between lowercase l and uppercase I, and not all monospaced (monowidth) fonts allow a distinction between the letter l and the digit one. Such fonts are not usable for mathematics. In Fraktur, the letters ' and (, in particular, must be made distinguishable. Overburdened blackletter forms are inappropriate for mathematical notation. Similarly, the digit zero must be distinct from the uppercase letter O for all mathematical alphanumeric sets. Some characters are so similar that even mathematical fonts do not attempt to provide distinct glyphs for

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

498

Symbols

Figure 15-5. Easily Confused Shapes for Mathematical Glyphs

italic a

alpha

italic v (pointed)

nu

italic v (rounded)

upsilon

script X

chi

plain Y

Upsilon

them. Their use is normally avoided in mathematical notation unless no confusion is possible in a given context—for example, uppercase A and uppercase Alpha. Font Support for Combining Diacritics. Mathematical equations require that characters be combined with diacritics (dots, tilde, circumflex, or arrows above are common), as well as followed or preceded by superscripted or subscripted letters or numbers. This requirement leads to designs for italic styles that are less inclined and script styles that have smaller overhangs and less slant than equivalent styles commonly used for text such as wedding invitations. Type Style for Script Characters. In some instances, a deliberate unification with a nonmathematical symbol has been undertaken; for example, U+2133 is unified with the pre1949 symbol for the German currency unit Mark. This unification restricts the range of glyphs that can be used for this character in the charts. Therefore the font used for the representative glyphs in the code charts is based on a simplified “English Script” style, as per recommendation by the American Mathematical Society. For consistency, other script characters in the Letterlike Symbols block are now shown in the same type style. Double-Struck Characters. The double-struck glyphs shown in earlier editions of the standard attempted to match the design used for all the other Latin characters in the standard, which is based on Times. The current set of fonts was prepared in consultation with the American Mathematical Society and leading mathematical publishers; it shows much simpler forms that are derived from the forms written on a blackboard. However, both serifed and non-serifed forms can be used in mathematical texts, and inline fonts are found in works published by certain publishers.

15.3 Number Forms Number Forms: U+2150–U+218F Many number form characters are composite or duplicate forms encoded solely for compatibility with existing standards. The use of these composite symbols is discouraged where their presence is not required by compatibility.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

15.3

Number Forms

499

Fractions. The Number Forms block contains a series of vulgar fraction characters, encoded for compatibility with legacy character encoding standards. These characters are intended to represent both of the common forms of vulgar fractions: forms with a rightslanted division slash, such as G, as shown in the code charts, and forms with a horizontal division line, such as H, which are considered to be alternative glyphs for the same fractions, as shown in Figure 15-6. A few other vulgar fraction characters are located in the Latin-1 block in the range U+00BC..U+00BE.

Figure 15-6. Alternate Forms of Vulgar Fractions

GH The vulgar fraction characters are given compatibility decompositions using U+2044 “/” fraction slash. Use of the fraction slash is the more generic way to represent fractions in text; it can be used to construct fractional number forms that are not included in the collections of vulgar fraction characters. For more information on the fraction slash, see “Other Punctuation” in Section 6.2, General Punctuation. Roman Numerals. For most purposes, it is preferable to compose the Roman numerals from sequences of the appropriate Latin letters. However, the uppercase and lowercase variants of the Roman numerals through 12, plus L, C, D, and M, have been encoded for compatibility with East Asian standards. Unlike sequences of Latin letters, these symbols remain upright in vertical layout. Additionally, in certain locales, compact date formats use Roman numerals for the month, but may expect the use of a single character. In identifiers, the use of Roman numeral symbols—particularly those based on a single letter of the Latin alphabet—can lead to spoofing. For more information, see Unicode Technical Report #36, “Unicode Security Considerations.” U+2180 roman numeral one thousand c d and U+216F roman numeral one thousand can be considered to be glyphic variants of the same Roman numeral, but are distinguished because they are not generally interchangeable and because U+2180 cannot be considered to be a compatibility equivalent to the Latin letter M. U+2181 roman numeral five thousand and U+2182 roman numeral ten thousand are distinct characters used in Roman numerals; they do not have compatibility decompositions in the Unicode Standard. U+2183 roman numeral reversed one hundred is a form used in combinations with C and/or I to form large numbers—some of which vary with single character number forms such as D, M, U+2181, or others. U+2183 is also used for the Claudian letter antisigma.

CJK Number Forms Chinese Counting-Rod Numerals. Counting-rod numerals were used in pre-modern East Asian mathematical texts in conjunction with counting rods used to represent and manipulate numbers. The counting rods were a set of small sticks, several centimeters long that were arranged in patterns on a gridded counting board. Counting rods and the counting

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

500

Symbols

board provided a flexible system for mathematicians to manipulate numbers, allowing for considerable sophistication in mathematics. The specifics of the patterns used to represent various numbers using counting rods varied, but there are two main constants: Two sets of numbers were used for alternate columns; one set was used for the ones, hundreds, and ten-thousands columns in the grid, while the other set was used for the tens and thousands. The shapes used for the counting-rod numerals in the Unicode Standard follow conventions from the Song dynasty in China, when traditional Chinese mathematics had reached its peak. Fragmentary material from many early Han dynasty texts shows different orientation conventions for the numerals, with horizontal and vertical marks swapped for the digits and tens places. Zero was indicated by a blank square on the counting board and was either avoided in written texts or was represented with U+3007 ideographic number zero. (Historically, U+3007 ideographic number zero originated as a dot; as time passed, it increased in size until it became the same size as an ideograph. The actual size of U+3007 ideographic number zero in mathematical texts varies, but this variation should be considered a font difference.) Written texts could also take advantage of the alternating shapes for the numerals to avoid having to explicitly represent zero. Thus 6,708 can be distinguished from 678, because the former would be /'(, whereas the latter would be &0(. Negative numbers were originally indicated on the counting board by using rods of a different color. In written texts, a diagonal slash from lower right to upper left is overlaid upon the rightmost digit. On occasion, the slash might not be actually overlaid. U+20E5 combining reverse solidus overlay should be used for this negative sign. The predominant use of counting-rod numerals in texts was as part of diagrams of counting boards. They are, however, occasionally used in other contexts, and they may even occur within the body of modern texts. Suzhou-Style Numerals. The Suzhou-style numerals (Mandarin su1zhou1ma3zi) are CJK ideographic number forms encoded in the CJK Symbols and Punctuation block in the ranges U+3021..U+3029 and U+3038..U+303A. The Suzhou-style numerals are modified forms of CJK ideographic numerals that are used by shopkeepers in China to mark prices. They are also known as “commercial forms,” “shop units,” or “grass numbers.” They are encoded for compatibility with the CNS 116431992 and Big Five standards. The forms for ten, twenty, and thirty, encoded at U+3038..U+303A, are also encoded as CJK unified ideographs: U+5341, U+5344, and U+5345, respectively. (For twenty, see also U+5EFE and U+5EFF.) These commercial forms of Chinese numerals should be distinguished from the use of other CJK unified ideographs as accounting numbers to deter fraud. See Table 4-9 in Section 4.6, Numeric Value—Normative, for a list of ideographs used as accounting numbers. Why are the Suzhou numbers called Hangzhou numerals in the Unicode names? No one has been able to trace this back. Hangzhou is a district in China that is near the Suzhou district, but the name “Hangzhou” does not occur in other sources that discuss these number forms.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

15.3

Number Forms

501

Superscripts and Subscripts: U+2070–U+209F In general, the Unicode Standard does not attempt to describe the positioning of a character above or below the baseline in typographical layout. Therefore, the preferred means to encode superscripted letters or digits, such as “1st” or “DC0016”, is by style or markup in rich text. However, in some instances superscript or subscript letters are used as part of the plain text content of specialized phonetic alphabets, such as the Uralic Phonetic Alphabet. These superscript and subscript letters are mostly from the Latin or Greek scripts. These characters are encoded in other character blocks, along with other modifier letters or phonetic letters. In addition, superscript digits are used to indicate tone in transliteration of many languages. The use of superscript two and superscript three is common legacy practice when referring to units of area and volume in general texts. A certain number of additional superscript and subscript characters are needed for roundtrip conversions to other standards and legacy code pages. Most such characters are encoded in this block and are considered compatibility characters. Parsing of Superscript and Subscript Digits. In the Unicode Character Database, superscript and subscript digits have not been given the General_Category property value Decimal_Number (gc=Nd), so as to prevent expressions like 23 from being interpreted like 23 by simplistic parsers. This should not be construed as preventing more sophisticated numeric parsers, such as general mathematical expression parsers, from correctly identifying these compatibility superscript and subscript characters as digits and interpreting them appropriately. Standards. Many of the characters in the Superscripts and Subscripts block are from character sets registered in the ISO International Register of Coded Character Sets to be Used With Escape Sequences, under the registration standard ISO/IEC 2375, for use with ISO/IEC 2022. Two MARC 21 character sets used by libraries include the digits, plus signs, minus signs, and parentheses. Superscripts and Subscripts in Other Blocks. The superscript digits one, two, and three are coded in the Latin-1 Supplement block to provide code point compatibility with ISO/IEC 8859-1. For a discussion of U+00AA feminine ordinal indicator and U+00BA masculine ordinal indicator, see “Letters of the Latin-1 Supplement” in Section 7.1, Latin. U+2120 service mark and U+2122 trade mark sign are commonly used symbols that are encoded in the Letterlike Symbols block (U+2100..U+214F); they consist of sequences of two superscripted letters each. For phonetic usage, there are a small number of superscript letters located in the Spacing Modifier Letters block (U+02B0..U+02FF) and a large number of superscript and subscript letters in the Phonetic Extensions block (U+1D00..U+1D7F) and in the Phonetic Extensions Supplement block (U+1D80..U+1DBF). The superscripted letters do not contain the word “superscript” in their character names, but are simply called modifier letters. Finally, a small set of superscripted CJK ideographs, used for the Japanese system of syntactic markup of Classical Chinese text for reading, is located in the Kanbun block (U+3190..U+319F).

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

502

Symbols

15.4 Mathematical Symbols The Unicode Standard provides a large set of standard mathematical characters to support publications of scientific, technical, and mathematical texts on and off the Web. In addition to the mathematical symbols and arrows contained in the blocks described in this section, mathematical operators are found in the Basic Latin (ASCII) and Latin-1 Supplement blocks. A few of the symbols from the Miscellaneous Technical, Miscellaneous Symbols, and Dingbats blocks, as well as characters from General Punctuation, are also used in mathematical notation. For Latin and Greek letters in special font styles that are used as mathematical variables, such as U+210B ã script capital h, as well as the Hebrew letter alef used as the first transfinite cardinal symbol encoded by U+2135 ℵ alef symbol, see “Letterlike Symbols” and “Mathematical Alphanumeric Symbols” in Section 15.2, Letterlike Symbols. The repertoire of mathematical symbols in Unicode enables the display of virtually all standard mathematical symbols. Nevertheless, no collection of mathematical symbols can ever be considered complete; mathematicians and other scientists are continually inventing new mathematical symbols. More symbols will be added as they become widely accepted in the scientific communities. Semantics. The same mathematical symbol may have different meanings in different subdisciplines or different contexts. The Unicode Standard encodes only a single character for a single symbolic form. For example, the “+” symbol normally denotes addition in a mathematical context, but it might refer to concatenation in a computer science context dealing with strings, indicate incrementation, or have any number of other functions in given contexts. It is up to the application to distinguish such meanings according to the appropriate context. Where information is available about the usage (or usages) of particular symbols, it has been indicated in the character annotations in Chapter 17, Code Charts. Mathematical Property. The mathematical (math) property is an informative property of characters that are used as operators in mathematical formulas. The mathematical property may be useful in identifying characters commonly used in mathematical text and formulas. However, a number of these characters have multiple usages and may occur with nonmathematical semantics. For example, U+002D hyphen-minus may also be used as a hyphen—and not as a mathematical minus sign. Other characters, including some alphabetic, numeric, punctuation, spaces, arrows, and geometric shapes, are used in mathematical expressions as well, but are even more dependent on the context for their identification. A list of characters with the mathematical property is provided in the Unicode Character Database. For a classification of mathematical characters by typographical behavior and mapping to ISO 9573-13 entity sets, see Unicode Technical Report #25, “Unicode Support for Mathematics.”

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

15.4

Mathematical Symbols

503

Mathematical Operators: U+2200–U+22FF The Mathematical Operators block includes character encodings for operators, relations, geometric symbols, and a few other symbols with special usages confined largely to mathematical contexts. Standards. Many national standards’ mathematical operators are covered by the characters encoded in this block. These standards include such special collections as ANSI Y10.20, ISO 6862, ISO 8879, and portions of the collection of the American Mathematical Society, as well as the original repertoire of TEX. Encoding Principles. Mathematical operators often have more than one meaning. Therefore the encoding of this block is intentionally rather shape-based, with numerous instances in which several semantic values can be attributed to the same Unicode code point. For example, U+2218 ° ring operator may be the equivalent of white small circle or composite function or apl jot. The Unicode Standard does not attempt to distinguish all possible semantic values that may be applied to mathematical operators or relation symbols. The Unicode Standard does include many characters that appear to be quite similar to one another, but that may well convey different meanings in a given context. Conversely, mathematical operators, and especially relation symbols, may appear in various standards, handbooks, and fonts with a large number of purely graphical variants. Where variants were recognizable as such from the sources, they were not encoded separately. For relation symbols, the choice of a vertical or forward-slanting stroke typically seems to be an aesthetic one, but both slants might appear in a given context. However, a back-slanted stroke almost always has a distinct meaning compared to the forward-slanted stroke. See Section 16.4, Variation Selectors, for more information on some particular variants. Unifications. Mathematical operators such as implies ⇒ and if and only if ↔ have been unified with the corresponding arrows (U+21D2 rightwards double arrow and U+2194 left right arrow, respectively) in the Arrows block. The operator U+2208 element of is occasionally rendered with a taller shape than shown in the code charts. Mathematical handbooks and standards consulted treat these characters as variants of the same glyph. U+220A small element of is a distinctively small version of the element of that originates in mathematical pi fonts. The operators U+226B much greater-than and U+226A much less-than are sometimes rendered in a nested shape. The nested shapes are encoded separately as U+2AA2 double nested greater-than and U+2AA1 double nested less-than. A large class of unifications applies to variants of relation symbols involving negation. Variants involving vertical or slanted negation slashes and negation slashes of different lengths are not separately encoded. For example, U+2288 neither a subset of nor equal to is the archetype for several different glyph variants noted in various collections. In two instances in this block, essentially stylistic variants are separately encoded: U+2265 greater-than or equal to is distinguished from U+2267 greater-than over equal to; the same distinction applies to U+2264 less-than or equal to and U+2266 less-

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

504

Symbols

than over equal to. Further instances of the encoding of such stylistic variants can be found in the supplemental blocks of mathematical operators. The primary reason for such duplication is for compatibility with existing standards. Greek-Derived Symbols. Several mathematical operators derived from Greek characters have been given separate encodings because they are used differently from the corresponding letters. These operators may occasionally occur in context with Greek-letter variables. They include U+2206 ∆ increment, U+220F è n-ary product, and U+2211 ∑ n-ary summation. The latter two are large operators that take limits. Other duplicated Greek characters are those for U+00B5 µ micro sign in the Latin-1 Supplement block, U+2126 Ω ohm sign in Letterlike Symbols, and several characters among the APL functional symbols in the Miscellaneous Technical block. Most other Greek characters with special mathematical semantics are found in the Greek block because duplicates were not required for compatibility. Additional sets of mathematical-style Greek alphabets are found in the Mathematical Alphanumeric Symbols block. N-ary Operators. N-ary operators are distinguished from binary operators by their larger size and by the fact that in mathematical layout, they take limit expressions. Invisible Operators. In mathematics, some operators or punctuation are often implied but not displayed. For a set of invisible operators that can be used to mark these implied operators in the text, see Section 15.5, Invisible Mathematical Operators. Minus Sign. U+2212 “–” minus sign is a mathematical operator, to be distinguished from the ASCII-derived U+002D “-” hyphen-minus, which may look the same as a minus sign or be shorter in length. (For a complete list of dashes in the Unicode Standard, see Table 6-3.) U+22EE..U+22F1 are a set of ellipses used in matrix notation. U+2052 “%” commercial minus sign is a specialized form of the minus sign. Its use is described in Section 6.2, General Punctuation. Delimiters. Many mathematical delimiters are unified with punctuation characters. See Section 6.2, General Punctuation, for more information. Some of the set of ornamental Brackets in the range U+2768..U+2775 are also used as mathematical delimiters. See Section 15.8, Miscellaneous Symbols and Dingbats. See also Section 15.6, Technical Symbols, for specialized characters used for large vertical or horizontal delimiters. Bidirectional Layout. In a bidirectional context, with the exception of arrows, the glyphs for mathematical operators and delimiters are adjusted as described in Unicode Standard Annex #9, “The Bidirectional Algorithm.” See Section 4.7, Bidi Mirrored—Normative, and “Semantics of Paired Punctuation” in Section 6.2, General Punctuation. Other Elements of Mathematical Notation. In addition to the symbols in these blocks, mathematical and scientific notation makes frequent use of arrows, punctuation characters, letterlike symbols, geometrical shapes, and miscellaneous and technical symbols. For an extensive discussion of mathematical alphanumeric symbols, see Section 15.2, Letterlike Symbols. For additional information on all the mathematical operators and other symbols, see Unicode Technical Report #25, “Unicode Support for Mathematics.”

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

15.4

Mathematical Symbols

505

Supplements to Mathematical Symbols and Arrows The Unicode Standard defines a number of additional blocks to supplement the repertoire of mathematical operators and arrows. These additions are intended to extend the Unicode repertoire sufficiently to cover the needs of such applications as MathML, modern mathematical formula editing and presentation software, and symbolic algebra systems. Standards. MathML, an XML application, is intended to support the full legacy collection of the ISO mathematical entity sets. Accordingly, the repertoire of mathematical symbols for the Unicode Standard has been supplemented by the full list of mathematical entity sets in ISO TR 9573-13, Public entity sets for mathematics and science. An additional repertoire was provided from the amalgamated collection of the STIX Project (Scientific and Technical Information Exchange). That collection includes, but is not limited to, symbols gleaned from mathematical publications by experts of the American Mathematical Society and symbol sets provided by Elsevier Publishing and by the American Physical Society.

Supplemental Mathematical Operators: U+2A00–U+2AFF The Supplemental Mathematical Operators block contains many additional symbols to supplement the collection of mathematical operators. In addition, the Miscellaneous Symbols and Arrows block (U+2B00..U+2BFF) has been set aside to encode additional mathematical symbols, arrows, and geometric shapes.

Miscellaneous Mathematical Symbols-A: U+27C0–U+27EF The Miscellaneous Mathematical Symbols-A block contains symbols that are used mostly as operators or delimiters in mathematical notation. Mathematical Brackets. The mathematical white square brackets, angle brackets, and double angle brackets encoded at U+27E6..U+27EB are intended for ordinary mathematical use of these particular bracket types. They are unambiguously narrow, for use in mathematical and scientific notation, and should be distinguished from the corresponding wide forms of white square brackets, angle brackets, and double angle brackets used in CJK typography. (See the discussion of the CJK Symbols and Punctuation block in Section 6.2, General Punctuation.) Note especially that the “bra” and “ket” angle brackets (U+2329 left-pointing angle bracket and U+232A right-pointing angle bracket, respectively) are now deprecated for use with mathematics because of their canonical equivalence to CJK angle brackets, which is likely to result in unintended spacing problems if used in mathematical formulae.

Miscellaneous Mathematical Symbols-B: U+2980–U+29FF The Miscellaneous Mathematical Symbols-B block contains miscellaneous symbols used for mathematical notation, including fences and other delimiters. Some of the symbols in this block may also be used as operators in some contexts.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

506

Symbols

Wiggly Fence. U+29D8 left wiggly fence has a superficial similarity to U+FE34 presentation form for vertical wavy low line. The latter is a wiggly sidebar character, intended for legacy support as a style of underlining character in a vertical text layout context; it has a compatibility mapping to U+005F low line. This represents a very different usage from the standard use of fence characters in mathematical notation.

Arrows: U+2190–U+21FF Arrows are used for a variety of purposes: to imply directional relation, to show logical derivation or implication, and to represent the cursor control keys. Accordingly, the Unicode Standard includes a fairly extensive set of generic arrow shapes, especially those for which there are established usages with well-defined semantics. It does not attempt to encode every possible stylistic variant of arrows separately, especially where their use is mainly decorative. For most arrow variants, the Unicode Standard provides encodings in the two horizontal directions, often in the four cardinal directions. For the single and double arrows, the Unicode Standard provides encodings in eight directions. Bidirectional Layout. In bidirectional layout, arrows are not automatically mirrored, because the direction of the arrow could be relative to the text direction or relative to an absolute direction. Therefore, if text is copied from a left-to-right to a right-to-left context, or vice versa, the character code for the desired arrow direction in the new context must be used. For example, it might be necessary to change U+21D2 rightwards double arrow to U+21D0 leftwards double arrow to maintain the semantics of “implies” in a rightto-left context. For more information on bidirectional layout, see Unicode Standard Annex #9, “The Bidirectional Algorithm.” Standards. The Unicode Standard encodes arrows from many different international and national standards as well as corporate collections. Unifications. Arrows expressing mathematical relations have been encoded in the Arrows block as well as in the supplemental arrows blocks. An example is U+21D2 ⇒ rightwards double arrow, which may be used to denote implies. Where available, such usage information is indicated in the annotations to individual characters in Chapter 17, Code Charts. However, because the arrows have such a wide variety of applications, there may be several semantic values for the same Unicode character value.

Supplemental Arrows The Supplemental Arrows-A (U+27F0..U+27FF), Supplemental Arrows-B (U+2900.. U+297F), and Miscellaneous Symbols and Arrows (U+2B00..U+2BFF) blocks contain a large repertoire of arrows to supplement the main set in the Arrows block. Long Arrows. The long arrows encoded in the range U+27F5..U+27FF map to standard SGML entity sets supported by MathML. Long arrows represent distinct semantics from their short counterparts, rather than mere stylistic glyph differences. For example, the shorter forms of arrows are often used in connection with limits, whereas the longer ones are associated with mappings. The use of the long arrows is so common that they were

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

15.5

Invisible Mathematical Operators

507

assigned entity names in the ISOAMSA entity set, one of the suite of mathematical symbol entity sets covered by the Unicode Standard.

Standardized Variants of Mathematical Symbols These mathematical variants are all produced with the addition of U+FE00 variation selector-1 (VS1) to mathematical operator base characters. The valid combinations are listed in the file StandardizedVariants.txt in the Unicode Character Database. All combinations not listed there are unspecified and are reserved for future standardization; no conformant process may interpret them as standardized variants. Change in Representative Glyphs for U+2278 and U+2279. In Version 3.2 of the Unicode Standard, the representative glyphs for U+2278 neither less-than nor greater-than and U+2279 neither greater-than nor less-than were changed from using a vertical cancellation to using a slanted cancellation. This change was made to match the longstanding canonical decompositions for these characters, which use U+0338 combining long solidus overlay. The symmetric forms using the vertical stroke continue to be acceptable glyph variants. Using U+2278 or U+2279 with VS1 will request these variants explicitly, as will using U+2276 less-than or greater-than or U+2277 greater-than or less-than with U+20D2 combining long vertical line overlay. Unless fonts are created with the intention to add support for both forms (via VS1 for the upright forms), there is no need to revise the glyphs in existing fonts; the glyphic range implied by using the base character code alone encompasses both shapes. For more information, see Section 16.4, Variation Selectors.

15.5 Invisible Mathematical Operators In mathematics, some operators and punctuation are often implied but not displayed. The General Punctuation block contains several special format control characters known as invisible operators, which can be used to make such operators explicit for use in machine interpretation of mathematical expressions. Use of invisible operators is optional and is intended for interchange with math-aware programs. A more complete discussion of mathematical notation can be found in Unicode Technical Report #25, “Unicode Support for Mathematics.” Invisible Separator. U+2063 invisible separator (also known as invisible comma) is intended for use in index expressions and other mathematical notation where two adjacent variables form a list and are not implicitly multiplied. In mathematical notation, commas are not always explicitly present, but they need to be indicated for symbolic calculation software to help it disambiguate a sequence from a multiplication. For example, the double ij subscript in the variable aij means ai, j —that is, the i and j are separate indices and not a single variable with the name ij or even the product of i and j. To represent the implied list separation in the subscript ij , one can insert a nondisplaying invisible separator between the

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Chapter 16

Special Areas and Format Characters 16 This chapter describes several kinds of characters that have special properties as well as areas of the codespace that are set aside for special purposes: Control codes

Surrogates area

Noncharacters

Layout controls

Variation selectors

Specials

Deprecated format characters

Private-use characters

Tag characters

In addition to regular characters, the Unicode Standard contains a number of format characters. These characters are not normally rendered directly, but rather influence the layout of text or otherwise affect the operation of text processes. The Unicode Standard contains code positions for the 64 control characters and the DEL character found in ISO standards and many vendor character sets. The choice of control function associated with a given character code is outside the scope of the Unicode Standard, with the exception of those control characters specified in this chapter. Layout controls are not themselves rendered visibly, but influence the behavior of algorithms for line breaking, word breaking, glyph selection, and bidirectional ordering. Surrogate code points are reserved and are to be used in pairs—called surrogate pairs—to access 1,048,544 supplementary characters. Variation selectors allow the specification of standardized variants of characters. This ability is particularly useful where the majority of implementations would treat the two variants as two forms of the same character, but where some implementations need to differentiate between the two. By using a variation selector, such differentiation can be made explicit. Private-use characters are reserved for private use. Their meaning is defined by private agreement. Noncharacters are code points that are permanently reserved and will never have characters assigned to them.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

532

Special Areas and Format Characters

The Specials block contains characters that are neither graphic characters nor traditional controls. Tag characters support a general scheme for the internal tagging of text streams in the absence of other mechanisms, such as markup languages. They are reserved for use with specific plain text-based protocols that specify their usage. Their use in ordinary text is strongly discouraged.

16.1 Control Codes There are 65 code points set aside in the Unicode Standard for compatibility with the C0 and C1 control codes defined in the ISO/IEC 2022 framework. The ranges of these code points are U+0000..U+001F, U+007F, and U+0080..U+009F, which correspond to the 8-bit controls 0016 to 1F16 (C0 controls), 7F16 (delete), and 8016 to 9F16 (C1 controls), respectively. For example, the 8-bit legacy control code character tabulation (or tab) is the byte value 0916; the Unicode Standard encodes the corresponding control code at U+0009. The Unicode Standard provides for the intact interchange of these code points, neither adding to nor subtracting from their semantics. The semantics of the control codes are generally determined by the application with which they are used. However, in the absence of specific application uses, they may be interpreted according to the control function semantics specified in ISO/IEC 6429:1992. In general, the use of control codes constitutes a higher-level protocol and is beyond the scope of the Unicode Standard. For example, the use of ISO/IEC 6429 control sequences for controlling bidirectional formatting would be a legitimate higher-level protocol layered on top of the plain text of the Unicode Standard. Higher-level protocols are not specified by the Unicode Standard; their existence cannot be assumed without a separate agreement between the parties interchanging such data.

Representing Control Sequences There is a simple, one-to-one mapping between 7-bit (and 8-bit) control codes and the Unicode control codes: every 7-bit (or 8-bit) control code is numerically equal to its corresponding Unicode code point. For example, if the ASCII line feed control code (0A16) is to be used for line break control, then the text “WXYZ” would be transmitted in Unicode plain text as the following coded character sequence: <0057, 0058, 000A, 0059, 005A>. Control sequences that are part of Unicode text must be represented in terms of the Unicode encoding forms. For example, suppose that an application allows embedded font information to be transmitted by means of markup using plain text and control codes. A font tag specified as “^ATimes^B”, where ^A refers to the C0 control code 0116 and ^B refers to the C0 control code 0216, would then be expressed by the following coded character sequence: <0001, 0054, 0069, 006D, 0065, 0073, 0002>. The representation of the con-

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

16.1

Control Codes

533

trol codes in the three Unicode encoding forms simply follows the rules for any other code points in the standard: UTF-8: <01 54 69 6D 65 73 02> UTF-16: <0001 0054 0069 006D 0065 0073 0002> UTF-32: <00000001 00000054 00000069 0000006D 00000065 00000073 00000002> Escape Sequences. Escape sequences are a particular type of protocol that consists of the use of some set of ASCII characters introduced by the escape control code, 1B16, to convey extra-textual information. When converting escape sequences into and out of Unicode text, they should be converted on a character-by-character basis. For instance, “ESC-A” <1B 41> would be converted into the Unicode coded character sequence <001B, 0041>. Interpretation of U+0041 as part of the escape sequence, rather than as latin capital letter a, is the responsibility of the higher-level protocol that makes use of such escape sequences. This approach allows for low-level conversion processes to conformantly convert escape sequences into and out of the Unicode Standard without needing to actually recognize the escape sequences as such. If a process uses escape sequences or other configurations of control code sequences to embed additional information about text (such as formatting attributes or structure), then such sequences constitute a higher-level protocol that is outside the scope of the Unicode Standard.

Specification of Control Code Semantics Several control codes are commonly used in plain text, particularly those involved in line and paragraph formatting. The use of these control codes is widespread and important to interoperability. Therefore, the Unicode Standard specifies semantics for their use with the rest of the encoded characters in the standard. Table 16-1 lists those control codes.

Table 16-1. Control Codes Specified in the Unicode Standard Code Point Abbreviation ISO/IEC 6429 Name U+0009 U+000A U+000B U+000C U+000D U+001C U+001D U+001E U+001F U+0085

HT LF VT FF CR FS GS RS US NEL

The Unicode Standard 5.0 – Electronic edition

character tabulation (tab) line feed line tabulation (vertical tab) form feed carriage return information separator four information separator three information separator two information separator one next line

Copyright © 1991–2007 Unicode, Inc.

534

Special Areas and Format Characters

Most of the control codes in Table 16-1 have the White_Space property. They have the directional property values of S, B, or WS, rather than the default of ON used for other control codes. (See Unicode Standard Annex #9, “The Bidirectional Algorithm.”) In addition, the separator semantics of the control codes U+001C..U+001F are recognized in the Bidirectional Algorithm. U+0009..U+000D and U+0085 also have line breaking property values that differ from the default CM value for other control codes. (See Unicode Standard Annex #14, “Line Breaking Properties.”) U+0000 null may be used as a Unicode string terminator, as in the C language. Such usage is outside the scope of the Unicode Standard, which does not require any particular formal language representation of a string or any particular usage of null. Newline Function. In particular, one or more of the control codes U+000A line feed, U+000D carriage return, and the Unicode equivalent of the EBCDIC next line can encode a newline function. A newline function can act like a line separator or a paragraph separator, depending on the application. See Section 16.2, Layout Controls, for information on how to interpret a line or paragraph separator. The exact encoding of a newline function depends on the application domain. For information on how to identify a newline function, see Section 5.8, Newline Guidelines.

16.2 Layout Controls The effect of layout controls is specific to particular text processes. As much as possible, layout controls are transparent to those text processes for which they were not intended. In other words, their effects are mutually orthogonal.

Line and Word Breaking The following gives a brief summary of the intended behavior of certain layout controls. For a full description of line and word breaking layout controls, see Unicode Standard Annex #14, “Line Breaking Properties.” No-Break Space. U+00A0 no-break space has the same width as U+0020 space, but the no-break space indicates that, under normal circumstances, no line breaks are permitted between it and surrounding characters, unless the preceding or following character is a line or paragraph separator or space or zero width space. For a complete list of space characters in the Unicode Standard, see Table 6-2. Word Joiner. U+2060 word joiner behaves like U+00A0 no-break space in that it indicates the absence of word boundaries; however, the word joiner has no width. The function of the character is to indicate that line breaks are not allowed between the adjoining characters, except next to hard line breaks. For example, the word joiner can be inserted after the fourth character in the text “base+delta” to indicate that there should be no line break between the “e” and the “+”. The word joiner can be used to prevent line breaking with other characters that do not have nonbreaking variants, such as U+2009 thin space or U+2015 horizontal bar, by bracketing the character.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

16.2

Layout Controls

535

The word joiner must not be confused with the zero width joiner or the combining grapheme joiner, which have very different functions. In particular, inserting a word joiner between two characters has no effect on their ligating and cursive joining behavior. The word joiner should be ignored in contexts other than word or line breaking. Zero Width No-Break Space. In addition to its primary meaning of byte order mark (see “Byte Order Mark” in Section 16.8, Specials), the code point U+FEFF possesses the semantics of zero width no-break space, which matches that of word joiner. Until Unicode 3.2, U+FEFF was the only code point with word joining semantics, but because it is more commonly used as byte order mark, the use of U+2060 word joiner to indicate word joining is strongly preferred for any new text. Implementations should continue to support the word joining semantics of U+FEFF for backward compatibility. Zero Width Space. The U+200B zero width space indicates a word boundary, except that it has no width. Zero-width space characters are intended to be used in languages that have no visible word spacing to represent word breaks, such as Thai, Khmer, and Japanese. When text is justified, ZWSP has no effect on letter spacing—for example, in English or Japanese usage. There may be circumstances with other scripts, such as Thai, where extra space is applied around ZWSP as a result of justification, as shown in Table 16-2. This approach is unlike the use of fixed-width space characters, such as U+2002 en space, that have specified width and should not be automatically expanded during justification (see Section 6.2, General Punctuation).

Table 16-2. Letter Spacing Type Memory

Justification Examples

the ISP® Charts

the ISP®Charts Display 2 the ISP®Char t s Display 1

Explanation The is inserted to allow line break after ® Without letter spacing Increased letter spacing

Display 3

the ISP®Charts

“Thai-style” letter spacing

Display 4

the I S P ®Cha r t s

incorrectly inhibiting letter spacing (after ®)

In some languages such as German and Russian, increased letter spacing is used to indicate emphasis. Implementers should be aware of this issue. Zero-Width Spaces and Joiner Characters. The zero-width spaces are not to be confused with the zero-width joiner characters. U+200C zero width non-joiner and U+200D zero width joiner have no effect on word boundaries, and zero width no-break space and zero width space have no effect on joining or linking behavior. In other words, the zero-width joiner characters should be ignored when determining word boundaries; zero

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

536

Special Areas and Format Characters

width space should be ignored when determining cursive joining behavior. See “Cursive Connection” later in this section. Hyphenation. U+00AD soft hyphen (SHY) indicates an intraword break point, where a line break is preferred if a word must be hyphenated or otherwise broken across lines. Such break points are generally determined by an automatic hyphenator. SHY can be used with any script, but its use is generally limited to situations where users need to override the behavior of such a hyphenator. The visible rendering of a line break at an intraword break point, whether automatically determined or indicated by a SHY, depends on the surrounding characters, the rules governing the script and language used, and, at times, the meaning of the word. The precise rules are outside the scope of this standard, but see Unicode Standard Annex #14, “Line Breaking Properties,” for additional information. A common default rendering is to insert a hyphen before the line break, but this is insufficient or even incorrect in many situations. Contrast this usage with U+2027 hyphenation point, which is used for a visible indication of the place of hyphenation in dictionaries. For a complete list of dash characters in the Unicode Standard, including all the hyphens, see Table 6-3. The Unicode Standard includes two nonbreaking hyphen characters: U+2011 non-breaking hyphen and U+0F0C tibetan mark delimiter tsheg bstar. See Section 10.2, Tibetan, for more discussion of the Tibetan-specific line breaking behavior. Line and Paragraph Separator. The Unicode Standard provides two unambiguous characters, U+2028 line separator and U+2029 paragraph separator, to separate lines and paragraphs. They are considered the default form of denoting line and paragraph boundaries in Unicode plain text. A new line is begun after each line separator. A new paragraph is begun after each paragraph separator. As these characters are separator codes, it is not necessary either to start the first line or paragraph or to end the last line or paragraph with them. Doing so would indicate that there was an empty paragraph or line following. The paragraph separator can be inserted between paragraphs of text. Its use allows the creation of plain text files, which can be laid out on a different line width at the receiving end. The line separator can be used to indicate an unconditional end of line. A paragraph separator indicates where a new paragraph should start. Any interparagraph formatting would be applied. This formatting could cause, for example, the line to be broken, any interparagraph line spacing to be applied, and the first line to be indented. A line separator indicates that a line break should occur at this point; although the text continues on the next line, it does not start a new paragraph—no interparagraph line spacing or paragraphic indentation is applied. For more information on line separators, see Section 5.8, Newline Guidelines.

Cursive Connection and Ligatures In some fonts for some scripts, consecutive characters in a text stream may be rendered via adjacent glyphs that cursively join to each other, so as to emulate connected handwriting.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

16.2

Layout Controls

537

For example, cursive joining is implemented in nearly all fonts for the Arabic scripts and in a few handwriting-like fonts for the Latin script. Cursive rendering is implemented by joining glyphs in the font and by using a process that selects the particular joining glyph to represent each individual character occurrence, based on the joining nature of its neighboring characters. This glyph selection is implemented in the rendering engine, typically using information in the font. In many cases there is an even closer binding, where a sequence of characters is represented by a single glyph, called a ligature. Ligatures can occur in both cursive and noncursive fonts. Where ligatures are available, it is the task of the rendering system to select a ligature to create the most appropriate line layout. However, the rendering system cannot define the locations where ligatures are possible because there are many languages in which ligature formation requires more information. For example, in some languages, ligatures are never formed across syllable boundaries. On occasion, an author may wish to override the normal automatic selection of connecting glyphs or ligatures. Typically, this choice is made to achieve one of the following effects: • Cause nondefault joining appearance (for example, as is sometimes required in writing Persian using the Arabic script) • Exhibit the joining-variant glyphs themselves in isolation • Request a ligature to be formed where it normally would not be • Request a ligature not to be formed where it normally would be The Unicode Standard provides two characters that influence joining and ligature glyph selection: U+200C zero width non-joiner and U+200D zero width joiner. The zero width joiner and non-joiner request a rendering system to have more or less of a connection between characters than they would otherwise have. Such a connection may be a simple cursive link, or it may include control of ligatures. The zero width joiner and non-joiner characters are designed for use in plain text; they should not be used where higher-level ligation and cursive control is available. (See Unicode Technical Report #20, “Unicode in XML and Other Markup Languages,” for more information.) Moreover, they are essentially requests for the rendering system to take into account when laying out the text; while a rendering system should consider them, it is perfectly acceptable for the system to disregard these requests. The ZWJ and ZWNJ are designed for marking the unusual cases where ligatures or cursive connections are required or prohibited. These characters are not to be used in all cases where ligatures or cursive connections are desired; instead, they are meant only for overriding the normal behavior of the text. Joiner. U+200D zero width joiner is intended to produce a more connected rendering of adjacent characters than would otherwise be the case, if possible. In particular: • If the two characters could form a ligature but do not normally, ZWJ requests that the ligature be used.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

538

Special Areas and Format Characters

• Otherwise, if either of the characters could cursively connect but do not normally, ZWJ requests that each of the characters take a cursive-connection form where possible. In a sequence like <X, ZWJ, Y>, where a cursive form exists for X but not for Y, the presence of ZWJ requests a cursive form for X. Otherwise, where neither a ligature nor a cursive connection is available, the ZWJ has no effect. In other words, given the three broad categories below, ZWJ requests that glyphs in the highest available category (for the given font) be used: 1. Ligated 2. Cursively connected 3. Unconnected Non-joiner. U+200C zero width non-joiner is intended to break both cursive connections and ligatures in rendering. ZWNJ requests that glyphs in the lowest available category (for the given font) be used. For those unusual circumstances where someone wants to forbid ligatures in a sequence XY but promote cursive connection, the sequence <X, ZWJ, ZWNJ, ZWJ, Y> can be used. The ZWNJ breaks ligatures, while the two adjacent joiners cause the X and Y to take adjacent cursive forms (where they exist). Similarly, if someone wanted to have X take a cursive form but Y be isolated, then the sequence <X, ZWJ, ZWNJ, Y> could be used (as in previous versions of the Unicode Standard). Examples are shown in Figure 16-3. Cursive Connection. For cursive connection, the joiner and non-joiner characters typically do not modify the contextual selection process itself, but instead change the context of a particular character occurrence. By providing a non-joining adjacent character where the adjacent character otherwise would be joining, or vice versa, they indicate that the rendering process should select a different joining glyph. This process can be used in two ways: to prevent a cursive joining or to exhibit joining glyphs in isolation. In Figure 16-1, the insertion of the ZWNJ overrides the normal cursive joining of sad and lam.

Figure 16-1. Prevention of Joining

π+›

ﬁª

π+Ã+›

›π

0635

0635

0644

200C

Copyright © 1991-2007, Unicode, Inc.

0644

The Unicode Standard 5.0 – Electronic edition

16.2

Layout Controls

539

In Figure 16-2, the normal display of ghain without ZWJ before or after it uses the nominal (isolated) glyph form. When preceded and followed by ZWJ characters, however, the ghain is rendered with its medial form glyph in isolation.

Figure 16-2. Exhibition of Joining Glyphs in Isolation

Õ

Õ

063A

Ä+Õ+Ä 200D

063A

200D

–

The examples in Figure 16-1 and Figure 16-2 are adapted from the Iranian national coded character set standard, ISIRI 3342, which defines ZWNJ and ZWJ as “pseudo space” and “pseudo connection,” respectively. Examples. Figure 16-3 provides samples of desired renderings when the joiner or nonjoiner is inserted between two characters. The examples presume that all of the glyphs are available in the font. If, for example, the ligatures are not available, the display would fall back to the unligated forms. Each of the entries in the first column of Figure 16-3 shows two characters in visual display order. The column headings show characters to be inserted between those two characters. The cells below show the respective display when the joiners in the heading row are inserted between the original two characters.

Figure 16-3. Effect of Intervening Joiners Character Sequences

f i

0066

0069

0627

0644

062C

0645

062C

0648

As Is

f i or fi

fi

fi

fi

For backward compatibility, between Arabic characters a ZWJ acts just like the sequence , preventing a ligature from forming instead of requesting the use of a ligature that would not normally be used. As a result, there is no plain text mechanism for requesting the use of a ligature in Arabic text.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

540

Special Areas and Format Characters

Transparency. The property value of Joining_Type=Transparent applies to characters that should not interfere with cursive connection, even when they occur in sequence between two characters that are connected cursively. These include all nonspacing marks and most format control characters, except for ZWJ and ZWNJ themselves. Note, in particular, that enclosing combining marks are also transparent as regards cursive connection. For example, using U+20DD combining enclosing circle to circle an Arabic letter in a sequence should not cause that Arabic letter to change its cursive connections to neighboring letters. See Section 8.2, Arabic, for more on joining classes and the details regarding Arabic cursive joining. Joiner and Non-joiner in Indic Scripts. In Indic text, the ZWJ and ZWNJ are used to request particular display forms. A ZWJ after a sequence of consonant plus virama requests what is called a “half-form” of that consonant. A ZWNJ after a sequence of consonant plus virama requests that conjunct formation be interrupted, usually resulting in an explicit virama on that consonant. There are a few more specialized uses as well. For more information, see the discussions in Chapter 9, South Asian Scripts-I. Implementation Notes. For modern font technologies, such as OpenType or AAT, font vendors should add ZWJ to their ligature mapping tables as appropriate. Thus, where a font had a mapping from “f ” + “i” to ﬁ, the font designer should add the mapping from “f ” + ZWJ + “i” to ﬁ. In contrast, ZWNJ will normally have the desired effect naturally for most fonts without any change, as it simply obstructs the normal ligature/cursive connection behavior. As with all other alternate format characters, fonts should use an invisible zero-width glyph for representation of both ZWJ and ZWNJ. Filtering Joiner and Non-joiner. zero width joiner and zero width non-joiner are format control characters. As such, and in common with other format control characters, they are ordinarily ignored by processes that analyze text content. For example, a spellchecker or a search operation should filter them out when checking for matches. There are exceptions, however. In particular scripts—most notably the Indic scripts—ZWJ and ZWNJ have specialized usages that may be of orthographic significance. In those contexts, blind filtering of all instances of ZWJ or ZWNJ may result in ignoring distinctions relevant to the user’s notion of text content. Implementers should be aware of these exceptional circumstances, so that searching and matching operations behave as expected for those scripts.

Combining Grapheme Joiner U+034F combining grapheme joiner (CGJ) is used to affect the collation of adjacent characters for purposes of language-sensitive collation and searching. It is also used to distinguish sequences that would otherwise be canonically equivalent. Formally, the combining grapheme joiner is not a format control character, but rather a combining mark. It has the General_Category value gc=Mn and the canonical combining class value ccc=0.

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

16.2

Layout Controls

541

As a result of these properties, the presence of a combining grapheme joiner in the midst of a combining character sequence does not interrupt the combining character sequence; any process that is accumulating and processing all the characters of a combining character sequence would include a combining grapheme joiner as part of that sequence. This differs from the behavior of most format control characters, whose presence would interrupt a combining character sequence. In addition, because the combining grapheme joiner has the canonical combining class of 0, canonical reordering will not reorder any adjacent combining marks around a combining grapheme joiner. (See the definition of canonical reordering in Section 3.11, Canonical Ordering Behavior.) In turn, this means that insertion of a combining grapheme joiner between two combining marks will prevent normalization from switching the positions of those two combining marks, regardless of their own combining classes. Blocking Reordering. The CGJ has no visible glyph and no other format effect on neighboring characters but simply blocks reordering of combining marks. It can therefore be used as a tool to distinguish two alternative orderings of a sequence of combining marks for some exceptional processing or rendering purpose, whenever normalization would otherwise eliminate the distinction between the two sequences. For example, using CGJ to block reordering is one way to maintain distinction between differently ordered sequences of certain Hebrew accents and marks. These distinctions are necessary for analytic and text representational purposes. However, these characters were assigned fixed-position combining classes despite the fact that they interact typographically. As a result, normalization treats differently ordered sequences as equivalent. In particular, the sequence is canonically equivalent to because the canonical combining classes of U+05B4 hebrew point hiriq and U+05B7 hebrew point patah are distinct. However, the sequence is not canonically equivalent to the other two. The presence of the combining grapheme joiner, which has ccc=0, blocks the reordering of hiriq before patah by canonical reordering and thus allows a patah following a hiriq and a patah preceding a hiriq to be reliably distinguished, whether for display or for other processing. The use of CGJ with double diacritics is discussed in Section 7.9, Combining Marks; see Figure 7-10. CGJ and Collation. The Unicode Collation Algorithm normalizes Unicode text strings before applying collation weighting. The combining grapheme joiner is ordinarily ignored in collation key weighting in the UCA. However, whenever it blocks the reordering of combining marks in a string, it affects the order of secondary key weights associated with those

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

542

Special Areas and Format Characters

combining marks, giving the two strings distinct keys. That makes it possible to treat them distinctly in searching and sorting without having to tailor the weights for either the combining grapheme joiner or the combining marks. The CGJ can also be used to prevent the formation of contractions in the Unicode Collation Algorithm. For example, while “ch” is sorted as a single unit in a tailored Slovak collation, the sequence will sort as a “c” followed by an “h”. The CGJ can also be used in German, for example, to distinguish in sorting between “ü” in the meaning of uumlaut, which is the more common case and often sorted like , and “ü” in the meaning u-diaeresis, which is comparatively rare and sorted like “u” with a secondary key weight. This also requires no tailoring of either the combining grapheme joiner or the sequence. Because CGJ is invisible and has the default_ignorable property, data that are marked up with a CGJ should not cause problems for other processes. It is possible to give sequences of characters that include the combining grapheme joiner special tailored weights. Thus the sequence could be weighted completely differently from the contraction “ch” or from the way “c” and “h” would have sorted without the contraction. However, such an application of CGJ is not recommended. For more information on the use of CGJ with sorting, matching, and searching, see Unicode Technical Report #10, “Unicode Collation Algorithm.” Rendering. For rendering, the combining grapheme joiner is invisible. However, some older implementations may treat a sequence of grapheme clusters linked by combining grapheme joiners as a single unit for the application of enclosing combining marks. For more information on grapheme clusters, see Unicode Technical Report #29, “Text Boundaries.” For more information on enclosing combining marks, see Section 3.11, Canonical Ordering Behavior. CGJ and Joiner Characters. The combining grapheme joiner must not be confused with the zero width joiner or the word joiner, which have very different functions. In particular, inserting a combining grapheme joiner between two characters should have no effect on their ligation or cursive joining behavior. Where the prevention of line breaking is the desired effect, the word joiner should be used. For more information on the behavior of these characters in line breaking, see Unicode Standard Annex #14, “Line Breaking Properties.”

Bidirectional Ordering Controls Bidirectional ordering controls are used in the Bidirectional Algorithm, described in Unicode Standard Annex #9, “The Bidirectional Algorithm.” Systems that handle right-to-left scripts such as Arabic, Syriac, and Hebrew, for example, should interpret these format control characters. The bidirectional ordering controls are shown in Table 16-3. As with other format control characters, bidirectional ordering controls affect the layout of the text in which they are contained but should be ignored for other text processes, such as sorting or searching. However, text processes that modify text content must maintain these characters correctly, because matching pairs of bidirectional ordering controls must be

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

16.3

Deprecated Format Characters

543

Table 16-3. Bidirectional Ordering Controls Code

Name

Abbreviation

U+200E U+200F U+202A U+202B U+202C U+202D U+202E

left-to-right mark right-to-left mark left-to-right embedding right-to-left embedding pop directional formatting left-to-right override right-to-left override

lrm rlm lre rle pdf lro rlo

coordinated, so as not to disrupt the layout and interpretation of bidirectional text. Each instance of a lre, rle, lro, or rlo is normally paired with a corresponding pdf. U+200E left-to-right mark and U+200F right-to-left mark have the semantics of an invisible character of zero width, except that these characters have strong directionality. They are intended to be used to resolve cases of ambiguous directionality in the context of bidirectional texts; they are not paired. Unlike U+200B zero width space, these characters carry no word breaking semantics. (See Unicode Standard Annex #9, “The Bidirectional Algorithm,” for more information.)

16.3 Deprecated Format Characters Deprecated Format Characters: U+206A–U+206F Three pairs of deprecated format characters are encoded in this block: • Symmetric swapping format characters used to control the glyphs that depict characters such as “(” (The default state is activated.) • Character shaping selectors used to control the shaping behavior of the Arabic compatibility characters (The default state is inhibited.) • Numeric shape selectors used to override the normal shapes of the Western digits (The default state is nominal.) The use of these character shaping selectors and codes for digit shapes is strongly discouraged in the Unicode Standard. Instead, the appropriate character codes should be used with the default state. For example, if contextual forms for Arabic characters are desired, then the nominal characters should be used, not the presentation forms with the shaping selectors. Similarly, if the Arabic digit forms are desired, then the explicit characters should be used, such as U+0660 arabic-indic digit zero. Symmetric Swapping. The symmetric swapping format characters are used in conjunction with the class of left- and right-handed pairs of characters (symmetric characters), such as parentheses. The characters thus affected are listed in Section 4.7, Bidi Mirrored—Norma-

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

Electronic Edition This file is part of the electronic edition of The Unicode Standard, Version 5.0, provided for online access, content searching, and accessibility. It may not be printed. Bookmarks linking to specific chapters or sections of the whole Unicode Standard are available at http://www.unicode.org/versions/Unicode5.0.0/bookmarks.html

Purchasing the Book For convenient access to the full text of the standard as a useful reference book, we recommend purchasing the printed version. The book is available from the Unicode Consortium, the publisher, and booksellers. Purchase of the standard in book format contributes to the ongoing work of the Unicode Consortium. Details about the book publication and ordering information may be found at http://www.unicode.org/book/aboutbook.html

Joining Unicode You or your organization may benefit by joining the Unicode Consortium: for more information, see Joining the Unicode Consortium at http://www.unicode.org/consortium/join.html

This PDF file is an excerpt from The Unicode Standard, Version 5.0, issued by the Unicode Consortiumand published by Addison-Wesley. The material has been modified slightly for this electronic editon, however, the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten, used as the source of reference Kanji codes, was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, www.mehallo.com The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside the United States please contact International Sales, [email protected] Visit us on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen ... [et al.]. — Version 5.0. p. cm. Includes bibliographical references and index. ISBN 0-321-48091-0 (hardcover : alk. paper) 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U545 2007 005.7'22—dc22 2006023526 Copyright © 1991–2007 Unicode, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116. Fax: (617) 848-7047 ISBN 0-321-48091-0 Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, October 2006

Chapter 17

Code Charts

17 Disclaimer

Character images shown in the code charts are not prescriptive. In actual fonts, considerable variations are to be expected. The code charts that follow present the characters of the Unicode Standard. Characters are organized into related groups called blocks. Many scripts are fully contained within a single character block, but other scripts, including some of the most widely used scripts, have characters divided across several blocks. Separate blocks contain common punctuation characters and different types of symbols. A character names list follows each character chart. The character names list itemizes every character in the block and provides supplementary information in many cases. Charts for CJK Unified Ideographs and for Hangul syllables are not printed in this chapter, but are available online, as discussed in Section 17.2, CJK Unified Ideographs, and Section 17.3, Hangul Syllables. An index to distinctive character names is found at the back of this book; a full set of character names appears in the Unicode Character Database.

17.1 Character Names List The following illustration identifies the components of typical entries in the character names list. code

image

00AE

®

00AF

¯

entry REGISTERED SIGN = registered trade mark sign (1.0)

(Version 1.0 name)

MACRON = overline, APL overbar • this is a spacing character → 02C9 ¯ modifier letter macron

(Unicode name) (alternative names) (informative note) (cross reference)

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

564

Code Charts → 0304 ÄÑ combining macron → 0305 ÄÖ combining overline » 0020 † 0304 ÄÑ

00E5

å

(compatibility decomposition)

LATIN SMALL LETTER A WITH RING ABOVE • Danish, Norwegian, Swedish, Walloon (sample of language use) · 0061 a 030A Ää (canonical decomposition)

Images in the Code Charts and Character Lists Each character in these code charts is shown with a representative glyph. A representative glyph is not a prescriptive form of the character, but rather one that enables recognition of the intended character to a knowledgeable user and facilitates lookup of the character in the code charts. In many cases, there are more or less well-established alternative glyphic representations for the same character. Designers of high-quality fonts will do their own research into the preferred glyphic appearance of Unicode characters. In addition, many scripts require context-dependent glyph shaping, glyph positioning, or ligatures, none of which is shown in the code charts. The representative glyphs for the Latin, Greek, and Cyrillic scripts in the code charts are based on a serifed, Times-like font. Some characters have alternative forms. For example, even the ASCII character U+0061 latin small letter a has two common alternative forms: the “a” used in Times and the “—” that occurs in many other font styles. In a Timeslike font, the character U+03A5 greek capital letter upsilon looks like “Y”; the form Y is common in other font styles. The fonts used for other scripts are similar to Times in that each represents a common, widely used design, with variable stroke width and serifs or similar devices, where applicable, to show each character as distinctly as possible. Sans-serif fonts with uniform stroke width tend to have less visibly distinct characters. In the code charts, sans-serif fonts are used for archaic scripts that predate the invention of serifs, for example. A different case is U+010F latin small letter d with caron, which is commonly typeset as @ instead of A. In such cases, the code charts show the more common variant in preference to a more didactic archetypical shape. Many characters have been unified and have different appearances in different language contexts. The shape shown for U+2116 ñ numero sign is a fullwidth shape as it would be used in East Asian fonts. In Cyrillic usage, M is the universally recognized glyph. See Figure 15-2. In certain cases, characters need to be represented by more or less condensed, shifted, or distorted glyphs to make them fit the format of the code charts. For example, U+0D10 ê malayalam letter ai is shown in a reduced size to fit the character cell. Sometimes characters need to be given artificial shapes to make them recognizable in the code charts. Examples are the space characters and such characters as U+00AD K soft hyphen and U+2011 L non-breaking hyphen, where the special behavior of the

Copyright © 1991-2007, Unicode, Inc.

The Unicode Standard 5.0 – Electronic edition

17.1

Character Names List

565

hyphen is indicated by the dashed box and the letters. This use of a dashed box is not correlated with the General Category value of the character. When characters are used in context, the surrounding text gives important clues as to identity, size, and positioning. In the code charts, these clues are absent. For example, U+2075 ısuperscript five is shown much smaller than it would be in a Times-like text font. Combining characters are shown with a dotted circle—for example, U+0940 M devanagari vowel sign ii. The relative position of the dotted circle gives an approximate indication of the location of the base character in relation to the combining mark. During rendering, additional adjustments are necessary. Accents such as U+0302 combining circumflex accent are adjusted vertically and horizontally based on the height and width of the base character, as in “Ó” versus “Ù”. For non-European scripts, typical typefaces were selected that allow as much distinction as possible among the different characters. The Unicode Standard contains many characters that are used in writing minority languages or that are historical characters, often used primarily in manuscripts or inscriptions. Where there is no strong tradition of printed materials, the typography of a character may not be settled.

Character Names The character names in the code charts precisely match the normative character names in the Unicode Character Database. Character names are unique and stable. By convention, they are in uppercase. Because character names are stable, mistaken names will not be revised, but may be annotated. For example: 2118

}

SCRIPT CAPITAL P = Weierstrass elliptic function • actually this has the form of a lowercase calligraphic p, despite its name

For more information on character names, see Section 4.8, Name—Normative.

Informative Aliases An informative alias (preceded by =) is an alternate name for a character. Characters may have several aliases, and aliases for different characters are not guaranteed to be unique. Aliases are informative and may be updated. By convention, aliases are in lowercase, except where they contain proper names. Where an alias matches the name of a character in The Unicode Standard, Version 1.0, it is listed first, followed by “1.0” in parentheses. Because the formal character names may differ in unexpected ways from commonly used names (for example, pilcrow sign = paragraph sign), some aliases may be useful alternate choices for indicating characters in user interfaces. In the Hangul Jamo block, U+1100..U+11FF, the normative short jamo names are given as aliases.

The Unicode Standard 5.0 – Electronic edition

Copyright © 1991–2007 Unicode, Inc.

C0 Controls and Basic Latin Range: 0000–007F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0000

C0 Controls and Basic Latin 000 0

0023

0014

0024

0015

0025

0016

0026

0017

0018

0019

0027

0028

0029

001A

002A

001B

002B

,

0031

0041

0051

0061

0032

0042

0052

0062

0033

0043

0053

0063

0034

0044

0054

0064

0071

r 0072

s 0073

t 0074

0035

0045

0055

0065

0036

0046

0056

0066

0075

v 0076

001C

002C

0037

0038

0039

: 003A

0047

0048

0057

0058

0059

0069

J Z j 004A

005A

004B

005B

< L \ 003C

0068

I Y i 0049

; K [ 003B

0067

004C

005C

006A

0077

0078

y 0079

z 007A

k { 006B

007B

l

|

006C

007C

- = M ] m }

001D

002D

001E

002E

.

! / 000F

2

0013

+

000E

F

0022

*

000D

E

1 A Q a q

9

000C

D

0012

p 0070

)

000B

C

0060

8 H X h x

000A

B

0050

( 0009

A

0021

0040

7 G W g w

0008

9

0011

0030

007

' 0007

8

0020

& 6 F V f 0006

7

006

% 5 E U e u 0005

6

005

$ 4 D T d 0004

5

004

# 3 C S c 0003

4

003

" 2 B R b 0002

3

0010

! 0001

2

002

" 0 @ P ` 0000

1

001

007F

001F

002F

003D

004D

005D

006D

007D

> N ^ n ~ 003E

004E

005E

006E

007E

? O _ o #

003F

004F

005F

006F

007F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0000

C0 Controls and Basic Latin

C0 controls Alias names are those for ISO/IEC 6429:1992. Commonly used alternative aliases are also shown. 0000 = NULL 0001 = START OF HEADING 0002 = START OF TEXT 0003 = END OF TEXT 0004 = END OF TRANSMISSION 0005 = ENQUIRY 0006 = ACKNOWLEDGE 0007 = BELL 0008 = BACKSPACE 0009 = CHARACTER TABULATION = horizontal tabulation (HT), tab 000A = LINE FEED (LF) = new line (NL), end of line (EOL) 000B = LINE TABULATION = vertical tabulation (VT) 000C = FORM FEED (FF) 000D = CARRIAGE RETURN (CR) 000E = SHIFT OUT 000F = SHIFT IN 0010 = DATA LINK ESCAPE 0011 = DEVICE CONTROL ONE 0012 = DEVICE CONTROL TWO 0013 = DEVICE CONTROL THREE 0014 ! = DEVICE CONTROL FOUR 0015 " = NEGATIVE ACKNOWLEDGE 0016 # = SYNCHRONOUS IDLE 0017 $ = END OF TRANSMISSION BLOCK 0018 % = CANCEL 0019 & = END OF MEDIUM 001A ' = SUBSTITUTE → FFFD ( replacement character 001B ) = ESCAPE 001C * = INFORMATION SEPARATOR FOUR = file separator (FS)

0026

001D = INFORMATION SEPARATOR THREE = group separator (GS) 001E = INFORMATION SEPARATOR TWO = record separator (RS) 001F = INFORMATION SEPARATOR ONE = unit separator (US)

ASCII punctuation and symbols Based on ISO/IEC 646. 0020 SPACE • sometimes considered a control code • other space characters: 2000 –200A → 00A0 no-break space → 200B zero width space → 2060 word joiner → 3000 ideographic space → FEFF zero width no-break space 0021 ! EXCLAMATION MARK = factorial = bang → 00A1 ¡ inverted exclamation mark → 01C3 latin letter retroflex click → 203C double exclamation mark → 203D interrobang → 2762 heavy exclamation mark ornament 0022 " QUOTATION MARK • neutral (vertical), used as opening or closing quotation mark • preferred characters in English for paired quotation marks are 201C “ & 201D ” → 02BA modifier letter double prime → 030B combining double acute accent → 030E combining double vertical line above → 2033 double prime → 3003 〃 ditto mark 0023 # NUMBER SIGN = pound sign, hash, crosshatch, octothorpe → 2114 l b bar symbol → 266F music sharp sign 0024 $ DOLLAR SIGN = milreis, escudo • glyph may have one or two vertical bars • other currency symbol characters: 20A0 –20B5 é → 00A4 ¤ currency sign 0025 % PERCENT SIGN → 066A arabic percent sign → 2030 ‰ per mille sign → 2031 per ten thousand sign → 2052 commercial minus sign 0026 & AMPERSAND → 204A ô tironian sign et → 214B turned ampersand

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

3

C1 Controls and Latin-1 Supplement Range: 0080–00FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0080

C1 Controls and Latin-1 Supplement 008 0

008E

F

00C0

00D0

00E0

00A3

0094

00A4

00B1

00C1

00D1

00E1

00F

00F0

00F1

² Â Ò â ò

00B2

00C2

00D2

00E2

00F2

³ Ã Ó ã ó

00B3

00C3

00D3

00E3

00F3

´ Ä Ô ä ô

00B4

00C4

00D4

00E4

0095

00A5

0096

00A6

0097

00A7

0098

00A8

00B5

0099

00A9

009A

00AA

00C5

00D5

00E5

00F4

00F5

¶ Æ Ö æ ö

00B6

00C6

00D6

00E6

00F6

· Ç × ç ÷

00B7

00C7

00D7

00E7

00F7

¸ È Ø è ø

00B8

00B9

00C8

00D8

00E8

00F8

É Ù é ù 00C9

00D9

00E9

00F9

º Ê Ú ê ú

00BA

00CA

00DA

00EA

009B

00AB

00BB

00CB

00DB

00EB

009C

00AC

00BC

00CC

00DC

00EC

# ½ Í Ý í 008D

E

00E

¬ ¼ Ì Ü ì 008C

D

00D

00FA

« » Ë Û ë û 008B

C

0093

ª 008A

B

00B0

© ¹ 0089

A

00A2

¨ 0088

9

0092

§ 0087

8

00A1

¦ 0086

7

00C

¥ µ Å Õ å õ 0085

6

0091

¤ 0084

5

00A0

£ 0083

4

0090

¢ 0082

3

00B

¡ ± Á Ñ á ñ 0081

2

00A

" ° À Ð à ð 0080

1

009

00FF

009D

009E

00AD

009F

00CD

00DD

® ¾ Î Þ 00AE

! ¯ 008F

00BD

00AF

00ED

î

00FB

ü

00FC

ý

00FD

þ

00BE

00CE

00DE

00EE

00FE

¿

Ï

ß

ï

ÿ

00BF

00CF

00DF

00EF

00FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

7

Latin Extended-A Range: 0100–017F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0100

Latin Extended-A 010 0

0124

0125

0116

0126

0117

0127

0118

0119

0128

0150

0160

0170

ı Ł ő š ű

0131

0141

0151

0132

0142

0152

0133

0143

0153

0134

0144

0154

ĵ Ņ ŕ

0135

0145

0155

0136

0146

0156

011A

011B

011C

011D

011E

ď ğ 010F

011F

0137

0147

0157

0161

0171

0162

0163

0172

ų

0173

0164

0174

ť ŵ

0165

0175

012A

0166

`

0167

0176

ŷ

0177

9 ň Ř Ũ Ÿ 0138

0148

0158

ĩ Ĺ ŉ ř

0129

0139

0149

0159

0168

0178

ũ Ź

0169

0179

ĺ Ŋ Ś Ū ź

013A

014A

015A

016A

017A

ī Ļ ŋ ś ū Ż

012B

012C

013B

014B

015B

016B

017B

ļ Ō Ŝ Ŭ ż

013C

014C

015C

016C

017C

ĭ Ľ ō ŝ ŭ Ž

012D

Ď Ğ Į 010E

F

0115

č ĝ

010D

E

0123

Č Ĝ Ĭ 010C

D

0114

ċ ě

010B

C

0122

Ċ Ě Ī 010A

B

0113

ĉ ę

0109

A

0112

Ĉ Ę Ĩ 0108

9

017

Ő Š Ű

0140

ć ė ) ķ Ň ŗ

0107

8

016

Ć Ė ( Ķ ņ Ŗ _ Ŷ 0106

7

0121

ą ĕ ĥ

0105

6

0111

0130

015

Ą Ĕ Ĥ Ĵ ń Ŕ Ť Ŵ 0104

5

0120

014

ă ē ģ ĳ Ń œ ţ

0103

4

013

Ă Ē Ģ Ĳ ł Œ Ţ Ų 0102

3

0110

ā ġ

0101

2

012

Ā Ġ İ 0100

1

011

017F

012E

013D

014D

015D

016D

ľ Ŏ Ş Ů ž

013E

014E

015E

016E

į ŏ ş ů

012F

017D

013F

014F

015F

016F

017E

ſ

017F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

11

Latin Extended-B Range: 0180–024F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0180

Latin Extended-B

018 0

018E

F

01A2

01A3

0194

01A4

0195

01A5

0196

01A6

0197

01A7

0198

01A8

0199

019A

019B

01A9

*

01AA

+

01AB

019C

01AC

ƭ

018D

E

01C1

01B2

01C2

01B3

01C3

024

01E0

01F0

0200

01D1

01E1

01F1

0201

0210

0211

0220

0221

0230

0240

¨ ¡

0231

0241

ǒ Ǣ ǲ î © ¬

01D2

01E2

01F2

0202

01D3

01E3

01F3

0203

0212

0213

0222

0232

0242

ï ª √

0223

0233

0243

01B4

01C4

01D4

01E4

01F4

0204

0214

0224

0234

0244

01B5

01C5

01D5

01E5

01F5

0205

0215

0225

0235

0245

01B6

01C6

01D6

01E6

01F6

0206

0216

0226

01B7

01C7

01D7

01E7

01F7

0207

0217

0227

0236

∑

0237

0246

«

0247

01B8

01C8

01D8

01E8

01F8

01B9

01C9

01D9

01E9

01F9

0208

0209

0218

0219

0228

0238

0248

π …

0229

0239

0249

ƺ Ǌ ǚ Ǫ Ǻ ¡ ∫

01BA

01CA

01DA

01EA

01FA

: ǋ Ǜ ǫ ǻ

01BB

01CB

01DB

01EB

01FB

020A

020B

021A

021B

022A

023A

024A

¢ ª À

022B

023B

024B

Ƭ ; ǌ ǜ Ǭ Ǽ ë £ º Ã

018C

D

0193

018B

C

0192

Ɗ 018A

B

01B1

023

ǐ ^ ǰ } ì § ¿

01D0

Đ ƙ Ʃ 8 ǉ Ǚ ǩ v 0189

A

01A1

022

ƈ Ƙ ( 7 ǈ ǘ Ǩ u ∏ » 0188

9

0191

021

Ƈ ' Ʒ Ǉ Ǘ ǧ é 0187

8

020

Ɔ & 5 ǆ ǖ Ǧ t ∆ 0186

7

>

01C0

01F

ƥ 4 ǅ Ǖ c ǵ ≈ 0185

6

01B0

01E

Ƥ ƴ Ǆ ǔ b Ǵ ƒ 0184

5

01A0

01D

Ɠ ç ã A Ǔ ǣ ǳ 0183

4

0190

01C

å Ʋ @ 0182

3

01B

Ɓ Ƒ ơ 1 ? Ǒ _ Ǳ ~ 0181

2

01A

Ơ ư 0180

1

019

024F

019D

019E

01AD

01BC

01BE

Ə ! Ư è 018F

019F

01AF

01DC

01EC

01FC

020C

021C

022C

023C

024C

< Ǎ [ ǭ ǽ í ¤ Ω Õ

01BD

. = 01AE

01CC

01BF

01CD

01DD

01ED

01FD

020D

021D

022D

023D

024D

ǎ Ǟ Ǯ Ǿ ¥ æ Œ

01CE

Ǐ

01CF

01DE

01EE

01FE

020E

021E

022E

023E

024E

ǟ ǯ ǿ ¦ ø œ

01DF

01EF

01FF

020F

021F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

022F

023F

024F

15

0180

Latin Extended-B

Non-European and historic Latin 0180

%

0190

LATIN CAPITAL LETTER OPEN E = epsilon • African → 025B ɛ latin small letter open e → 2107 euler constant

LATIN SMALL LETTER B WITH STROKE

• Americanist and Indo-Europeanist usage for phonetic beta

• Americanist orthographies use an alternate glyph with the stroke through the bowl

0181 0182 0183

0184 0185

0186

0187 0188 0189

018A 018B 018C 018D

018E

018F

16

• Old Saxon • uppercase is 0243 √ → 03B2 β greek small letter beta → 2422 & blank symbol Ɓ LATIN CAPITAL LETTER B WITH HOOK • Zulu, Pan-Nigerian alphabet → 0253 ɓ latin small letter b with hook ) LATIN CAPITAL LETTER B WITH TOPBAR * LATIN SMALL LETTER B WITH TOPBAR • Zhuang (old orthography) • former Soviet minority language scripts → 0411 Б cyrillic capital letter be , LATIN CAPITAL LETTER TONE SIX - LATIN SMALL LETTER TONE SIX • Zhuang (old orthography) • Zhuang tone three is Cyrillic ze • Zhuang tone four is Cyrillic che → 01A8 . latin small letter tone two → 01BD / latin small letter tone five → 0437 з cyrillic small letter ze → 0447 ч cyrillic small letter che → 044C ь cyrillic small letter soft sign Ɔ LATIN CAPITAL LETTER OPEN O • typographically a turned C • African → 0254 ɔ latin small letter open o Ƈ LATIN CAPITAL LETTER C WITH HOOK ƈ LATIN SMALL LETTER C WITH HOOK • African Đ LATIN CAPITAL LETTER AFRICAN D • Ewe → 00D0 Ð latin capital letter eth → 0110 8 latin capital letter d with stroke → 0256 9 latin small letter d with tail Ɗ LATIN CAPITAL LETTER D WITH HOOK • Pan-Nigerian alphabet → 0257 ɗ latin small letter d with hook < LATIN CAPITAL LETTER D WITH TOPBAR = LATIN SMALL LETTER D WITH TOPBAR • former-Soviet minority language scripts • Zhuang (old orthography) > LATIN SMALL LETTER TURNED DELTA

0191 0192

0193 0194 0195 0196 0197

= turned e • Pan-Nigerian alphabet • lowercase is 01DD B

Ə LATIN CAPITAL LETTER SCHWA • Azerbaijani, ... → 0259 ə latin small letter schwa → 04D8 Ә cyrillic capital letter schwa

Ƒ LATIN CAPITAL LETTER F WITH HOOK • African LATIN SMALL LETTER F WITH HOOK = script f = Florin currency symbol (Netherlands) = function symbol • used as abbreviation convention for folder

Ɠ LATIN CAPITAL LETTER G WITH HOOK • African → 0260 latin small letter g with hook

LATIN CAPITAL LETTER GAMMA • African → 0263 latin small letter gamma LATIN SMALL LETTER HV • Gothic transliteration • uppercase is 01F6 LATIN CAPITAL LETTER IOTA • African → 0269 ɩ latin small letter iota LATIN CAPITAL LETTER I WITH STROKE = barred i, i bar • African • ISO 6438 gives lowercase as 026A , not 0268 → 026A latin letter small capital i

0198 0199

Ƙ LATIN CAPITAL LETTER K WITH HOOK ƙ LATIN SMALL LETTER K WITH HOOK • Hausa, Pan-Nigerian alphabet 019A LATIN SMALL LETTER L WITH BAR

019B

= barred l • Americanist phonetic usage for 026C ɬ → 0142 ł latin small letter l with stroke → 023D Ω latin capital letter l with bar

LATIN SMALL LETTER LAMBDA WITH STROKE

= barred lambda, lambda bar • Americanist phonetic usage

019C LATIN CAPITAL LETTER TURNED M • Zhuang (old orthography) → 026F latin small letter turned m 019D LATIN CAPITAL LETTER N WITH LEFT HOOK • African → 0272 latin small letter n with left hook 019E LATIN SMALL LETTER N WITH LONG RIGHT LEG

• archaic phonetic for Japanese 3093 ん • recommended spelling for syllabic n is 006E n 0329 • Lakota (indicates nasalization of vowel) → 0220 ì latin capital letter n with long right

= reversed Polish-hook o • archaic phonetic for labialized alveolar fricative • recommended spellings 007A z 02B7 ? or 007A z 032B @

A LATIN CAPITAL LETTER REVERSED E

01A0

019F

01A0

leg

LATIN CAPITAL LETTER O WITH MIDDLE TILDE

= barred o, o bar • lowercase is 0275 • African → 04E8 Ө cyrillic capital letter barred o

Ơ LATIN CAPITAL LETTER O WITH HORN ≡ 004F O 031B $

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

IPA Extensions Range: 0250–02AF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0250

IPA Extensions 025 0

ə ɛ

0273

0264

0265

0275

0266

0276

0267

0277

ɸ

0268

ɩ

0269

026A

026B

ɬ

026C

026D

0278

+

0279

,

027A

-

027B

ɼ

027C

0282

0283

0291

ʒ

0292

S

02A1

T

02A2

ʓ U

0293

02A3

ʔ ʤ

0285

0286

026E

0287

0295

02A4

027E

02A5

H ʦ

0296

02A6

ʗ ʧ

0297

02A7

: ◎ ʨ

0288

0298

02A8

; K [ 0289

0299

02A9

ʊ L \

028A

029A

02AA

ʋ M ]

028B

029B

02AB

> N ^

028C

029C

/ ? O

027D

ɮ ɾ

025E

F

0263

ɚ

025D

E

0272

ɗ ɧ ɷ 9

025C

D

0262

ɦ ( ʆ

025B

C

0281

G ʥ

025A

B

0271

7

0259

A

0261

02A0

ɕ '

0258

9

0290

0294

0257

8

02A

0284

0256

7

029

0274

0255

6

0280

ɔ ɤ & 6

0254

5

0270

ɓ % ʃ 0253

4

0260

$ ʂ 0252

3

028

ɑ ɡ ɱ 3 ʑ 0251

2

027

" 2 B ʠ

0250

1

026

02AF

028D

029D

02AC

_

02AD

@ P

028E

029E

02AE

! 1 A Q ʯ

025F

026F

027F

028F

029F

02AF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

21

Spacing Modifier Letters Range: 02B0–02FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

02B0

Spacing Modifier Letters 02B 0

02B0

1

02B1

2

02B2

3

02B3

4

5

F

ô

5

02E3

6

02E4

02F2

ö

02F3

õ

02F4

8 ù

)

9 û

˘

: ü

˙

; †

ʹ

02C5

ˆ

02C6

ˇ

02C7

02C8

02C9

02CA

02CB

02CC

02D5

02D6

02D7

02D8

02D9

˜

02DC

˝

02BE

02CE

02DE

!

02BF

02CF

02E6

02E7

02E8

02E9

02F5

02F6

02F7

02F8

02F9

02EA

02FA

˛ = ¢

02DB

02DD

02E5

˚ < °

02DA

02CD

02BD

E

4

02E2

02F1

(

02BC

D

ò

7 ú

02BB

C

&

3

02E1

'

02BA

B

˄

%

02D3

02F0

˅

02B9

A

˃

02C3

$

02D2

2 ó

02E0

02B8

9

˂

02C2

#

02D1

02F

02D4

02B7

8

02C1

"

02D0

02E

02C4

02B6

7

02C0

02D

02B4

02B5

6

02C

02FF

0

02EB

02FB

> £

02EC

02FC

? §

02ED

02FD

@ •

02EE

02FE

1 ñ ¶

02DF

02EF

02FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

25

Combining Diacritical Marks Range: 0300–036F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0300

Combining Diacritical Marks 030 0

0341

0351

0361

0312

0322

0332

0342

0352

0362

0313

0323

0333

0343

0353

0363

0314

0324

0334

0344

0354

0364

0315

0325

0335

0345

0355

0365

0316

0326

0336

0346

0356

0366

0317

0327

0337

0347

0357

0367

0318

0328

0338

0348

0358

0368

0319

0329

0339

0349

0359

0369

031A

032A

033A

034A

035A

036A

031B

032B

033B

034B

035B

036B

031C

032C

033C

034C

035C

036C

031D

032D

033D

034D

035D

036D

0 ? L

030E

F

0331

/ > K

030D

E

0321

. = J

030C

D

0311

- < I

030B

C

0360

, ; H

030A

B

0350

+ : G 0309

A

0340

* 9 F 0308

9

0330

) 8 E 0307

8

0320

( 7 D 0306

7

0310

' 6 C 0305

6

036

& 5 B 0304

5

035

% ã 0303

4

034

$ 4 A O 0302

3

033

# 3 N 0301

2

032

" 2 M 0300

1

031

036F

031E

032E

033E

034E

035E

036E

! 1 @

030F

031F

032F

033F

034F

035F

036F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

29

Greek and Coptic Range: 0370–03FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0370

Greek and Coptic 037

038

039

03A

0390

0391

03A0

03A1

0392

0393

ʹ

0374

͵

0375

03E0

03F0

03B1

03C1

03D1

03E1

03F1

03B2

03C2

03D2

03E2

03F2

03A3

03B3

03C3

03D3

03E3

03F3

0394

03A4

03B4

03C4

03D4

03E4

03F4

΅ Ε Υ ε υ ϕ ϥ ϵ 0385

0387

0395

03A5

03B5

03C5

03D5

03E5

03F5

0396

03A6

03B6

03C6

03D6

03E6

03F6

0397

03A7

03B7

03C7

03D7

03E7

03F7

Έ Θ Ψ θ ψ Ϙ Ϩ ϸ

8

0388

0398

03A8

03B8

03C8

03D8

03E8

03F8

Ή Ι Ω ι ω ϙ ϩ Ϲ

9

0389

0399

03A9

03B9

03C9

03D9

03E9

03F9

ͺ Ί Κ Ϊ κ ϊ Ϛ Ϫ Ϻ

037A

038A

039A

03AA

03BA

03CA

03DA

03EA

03FA

Λ Ϋ λ ϋ ϛ ϫ ϻ

ͻ

039B

037B

03AB

03BB

03CB

03DB

03EB

03FB

ͼ Ό Μ ά μ ό Ϝ Ϭ ϼ

037C

038C

039C

03AC

03BC

03CC

03DC

03EC

03FC

Ν έ ν ύ ϝ ϭ Ͻ

ͽ

037D

039D

03AD

03BD

03CD

03DD

03ED

03FD

; Ύ Ξ ή ξ ώ Ϟ Ϯ Ͼ

037E

F

03D0

· Η Χ η χ ϗ ϧ Ϸ

7

E

03C0

΄ Δ Τ δ τ ϔ Ϥ ϴ

0384

0386

D

03B0

Ά Ζ Φ ζ φ ϖ Ϧ ϶

6

C

03F

Γ Σ γ σ ϓ ϣ ϳ

3

B

03E

β ς ϒ Ϣ ϲ

Β

2

A

03D

Α Ρ α ρ ϑ ϡ ϱ

1

5

03C

ΐ Π ΰ π ϐ Ϡ ϰ

0

4

03B

03FF

038E

039E

03AE

Ώ Ο ί 038F

039F

03AF

03BE

ο

03BF

03CE

03DE

03EE

03FE

ϟ ϯ Ͽ

03DF

03EF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

03FF

33

Cyrillic Range: 0400–04FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0400 040

0400

042

043

044

045

046

047

048

049

04A 04B 04C 04D 04E

04F

0410

0420

0430

0440

0450

0460

0470

0480

0490

04A0

04B0

04C0

04D0

04E0

04F0

Ё Б С б с ё ѡ ѱ ҁ ґ ҡ ұ Ӂ ӑ ӡ ӱ

1

0401

0411

0421

0431

0441

0451

0461

0471

0481

0491

04A1

04B1

04C1

04D1

04E1

04F1

Ђ В Т в т ђ Ѣ Ѳ ҂ Ғ Ң Ҳ ӂ Ӓ Ӣ Ӳ

2

0402

0412

0422

0432

0442

0452

0462

0472

0482

0492

04A2

04B2

04C2

04D2

04E2

04F2

Ѓ Г У г у ѓ ѣ ѳ ҃ ғ ң ҳ Ӄ ӓ ӣ ӳ

3

0403

0413

0423

0433

0443

0453

0463

0473

0483

0493

04A3

04B3

04C3

04D3

04E3

04F3

Є Д Ф д ф є Ѥ Ѵ ҄ Ҕ Ҥ Ҵ ӄ Ӕ Ӥ Ӵ

4

0404

0414

0424

0434

0444

0454

0464

0474

0484

0494

04A4

04B4

04C4

04D4

04E4

04F4

Ѕ Е Х е х ѕ ѥ ѵ ñ ҕ ҥ ҵ Ӆ ӕ ӥ ӵ

5

0405

0415

0425

0435

0445

0455

0465

0475

0485

0495

04A5

04B5

04C5

04D5

04E5

04F5

І Ж Ц ж ц і Ѧ Ѷ ó Җ Ҧ Ҷ ӆ Ӗ Ӧ Á

6

0406

0416

0426

0436

0446

0456

0466

0476

0486

Ї З Ч з ч ї ѧ ѷ

7

0407

0417

0427

0437

0447

0457

0467

0496

04A6

04B6

04C6

04D6

04E6

04F6

җ ҧ ҷ Ӈ ӗ ӧ Ë 0497

0477

04A7

04B7

04C7

04D7

04E7

04F7

Ј И Ш и ш ј Ѩ Ѹ ҈ Ҙ Ҩ Ҹ ӈ Ә Ө Ӹ

8

0408

0418

0428

0438

0448

0458

0468

0478

0488

0498

04A8

04B8

04C8

04D8

04E8

04F8

Љ Й Щ й щ љ ѩ ѹ ҉ ҙ ҩ ҹ Ӊ ә ө ӹ

9

0409

0419

0429

0439

0449

0459

0469

0479

0489

0499

04A9

04B9

04C9

04D9

04E9

04F9

Њ К Ъ к ъ њ ѪѺ Ҋ Қ Ҫ Һ ӊ Ӛ Ӫ ‰

A

040A

041A

042A

043A

044A

045A

046A

047A

048A

049A

04AA

04BA

04CA

04DA

04EA

04FA

Ћ Л Ы л ы ћ ѫ ѻ ҋ қ ҫ һ Ӌ ӛ ӫ Â

B

040B

C

D

041B

042B

043B

044B

045B

046B

047B

048B

049B

04AB

04BB

04CB

04DB

04EB

04FB

Ќ М Ь м ь ќ ѬѼ Ҍ Ҝ Ҭ Ҽ ӌ Ӝ Ӭ Ê 040C

041C

042C

043C

044C

045C

046C

047C

048C

049C

04AC

04BC

04CC

04DC

04EC

04FC

Ѝ Н Э н э ѝ ѭѽ ҍ ҝ ҭ ҽ Ӎ ӝ ӭ Á 040D

041D

042D

043D

044D

045D

046D

047D

048D

049D

04AD

04BD

04CD

04DD

04ED

04FD

Ў ОЮ о ю ў Ѯ Ѿ Ҏ Ҟ Ү Ҿ ӎ Ӟ Ӯ Ë 040E

F

041

04FF

Ѐ А Р а р ѐ ѠѰ Ҁ Ґ Ҡ Ұ Ӏ Ӑ Ӡ Ӱ

0

E

Cyrillic

041E

042E

043E

044E

045E

046E

047E

048E

049E

04AE

04BE

04CE

04DE

04EE

04FE

Џ П Я п я џ ѯ ѿ ҏ ҟ ү ҿ ´ ӟ ӯ È 040F

041F

042F

043F

044F

045F

046F

047F

048F

049F

04AF

04BF

04CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

04DF

04EF

04FF

37

0400

Cyrillic

Cyrillic extensions 0400 0401 0402 0403 0404 0405 0406

Ѐ CYRILLIC CAPITAL LETTER IE WITH GRAVE ≡ 0415 Е 0300 ) Ё CYRILLIC CAPITAL LETTER IO ≡ 0415 Е 0308 + Ђ CYRILLIC CAPITAL LETTER DJE Ѓ CYRILLIC CAPITAL LETTER GJE ≡ 0413 Г 0301 / Є CYRILLIC CAPITAL LETTER UKRAINIAN IE Ѕ CYRILLIC CAPITAL LETTER DZE І CYRILLIC CAPITAL LETTER BYELORUSSIANUKRAINIAN I

→ 0049 I latin capital letter i → 0456 і cyrillic small letter byelorussianukrainian i

0407

Ї

→ 04C0 Ӏ cyrillic letter palochka CYRILLIC CAPITAL LETTER YI

≡ 0406 І 0308 + 0408 Ј CYRILLIC CAPITAL LETTER JE 0409 Љ CYRILLIC CAPITAL LETTER LJE 040A Њ CYRILLIC CAPITAL LETTER NJE 040B Ћ CYRILLIC CAPITAL LETTER TSHE 040C Ќ CYRILLIC CAPITAL LETTER KJE ≡ 041A К 0301 / 040D Ѝ CYRILLIC CAPITAL LETTER I WITH GRAVE ≡ 0418 И 0300 ) 040E Ў CYRILLIC CAPITAL LETTER SHORT U ≡ 0423 У 0306 040F Џ CYRILLIC CAPITAL LETTER DZHE

Basic Russian alphabet 0410 0411 0412 0413 0414 0415 0416 0417 0418 0419 041A 041B 041C 041D 041E 041F 0420 0421 0422 0423 0424 0425 0426 0427 0428 0429 042A 042B 042C 042D 042E

38

А CYRILLIC CAPITAL LETTER A Б CYRILLIC CAPITAL LETTER BE → 0183 P latin small letter b with topbar В CYRILLIC CAPITAL LETTER VE Г CYRILLIC CAPITAL LETTER GHE Д CYRILLIC CAPITAL LETTER DE Е CYRILLIC CAPITAL LETTER IE Ж CYRILLIC CAPITAL LETTER ZHE З CYRILLIC CAPITAL LETTER ZE И CYRILLIC CAPITAL LETTER I Й CYRILLIC CAPITAL LETTER SHORT I ≡ 0418 И 0306 К CYRILLIC CAPITAL LETTER KA Л CYRILLIC CAPITAL LETTER EL М CYRILLIC CAPITAL LETTER EM Н CYRILLIC CAPITAL LETTER EN О CYRILLIC CAPITAL LETTER O П CYRILLIC CAPITAL LETTER PE Р CYRILLIC CAPITAL LETTER ER С CYRILLIC CAPITAL LETTER ES Т CYRILLIC CAPITAL LETTER TE У CYRILLIC CAPITAL LETTER U → 0478 Ѹ cyrillic capital letter uk → 04AF ү cyrillic small letter straight u Ф CYRILLIC CAPITAL LETTER EF Х CYRILLIC CAPITAL LETTER HA Ц CYRILLIC CAPITAL LETTER TSE Ч CYRILLIC CAPITAL LETTER CHE Ш CYRILLIC CAPITAL LETTER SHA Щ CYRILLIC CAPITAL LETTER SHCHA Ъ CYRILLIC CAPITAL LETTER HARD SIGN Ы CYRILLIC CAPITAL LETTER YERU Ь CYRILLIC CAPITAL LETTER SOFT SIGN Э CYRILLIC CAPITAL LETTER E Ю CYRILLIC CAPITAL LETTER YU

0459

042F 0430 0431 0432 0433 0434 0435 0436 0437 0438 0439

Я а б в г д е ж з и й

CYRILLIC CAPITAL LETTER YA CYRILLIC SMALL LETTER A CYRILLIC SMALL LETTER BE CYRILLIC SMALL LETTER VE CYRILLIC SMALL LETTER GHE CYRILLIC SMALL LETTER DE CYRILLIC SMALL LETTER IE CYRILLIC SMALL LETTER ZHE CYRILLIC SMALL LETTER ZE CYRILLIC SMALL LETTER I CYRILLIC SMALL LETTER SHORT I

043A 043B 043C 043D 043E 043F 0440 0441 0442 0443 0444 0445 0446 0447 0448 0449 044A 044B 044C

к л м н о п р с т у ф х ц ч ш щ ъ ы ь

CYRILLIC SMALL LETTER KA CYRILLIC SMALL LETTER EL CYRILLIC SMALL LETTER EM CYRILLIC SMALL LETTER EN CYRILLIC SMALL LETTER O CYRILLIC SMALL LETTER PE CYRILLIC SMALL LETTER ER CYRILLIC SMALL LETTER ES CYRILLIC SMALL LETTER TE CYRILLIC SMALL LETTER U CYRILLIC SMALL LETTER EF CYRILLIC SMALL LETTER HA CYRILLIC SMALL LETTER TSE CYRILLIC SMALL LETTER CHE CYRILLIC SMALL LETTER SHA CYRILLIC SMALL LETTER SHCHA CYRILLIC SMALL LETTER HARD SIGN CYRILLIC SMALL LETTER YERU CYRILLIC SMALL LETTER SOFT SIGN

044D 044E 044F

≡ 0438 и 0306

→ 0185 $ latin small letter tone six э CYRILLIC SMALL LETTER E ю CYRILLIC SMALL LETTER YU я CYRILLIC SMALL LETTER YA

Cyrillic extensions 0450

ѐ

CYRILLIC SMALL LETTER IE WITH GRAVE

0451

ё

CYRILLIC SMALL LETTER IO

0452

ђ

0453

ѓ

0454

є

0455

ѕ

0456

і

0457

ї

0458

ј

0459

• Macedonian ≡ 0435 е 0300 ) • Russian, ... ≡ 0435 е 0308 +

CYRILLIC SMALL LETTER DJE

• Serbian → 0111 - latin small letter d with stroke CYRILLIC SMALL LETTER GJE

• Macedonian ≡ 0433 г 0301 /

CYRILLIC SMALL LETTER UKRAINIAN IE

= Old Cyrillic yest CYRILLIC SMALL LETTER DZE

= Old Cyrillic zelo • Macedonian CYRILLIC SMALL LETTER BYELORUSSIANUKRAINIAN I

= Old Cyrillic i CYRILLIC SMALL LETTER YI

• Ukrainian ≡ 0456 і 0308 +

CYRILLIC SMALL LETTER JE

• Serbian, Azerbaijani, Altay љ CYRILLIC SMALL LETTER LJE • Serbian, Macedonian → 01C9 ǉ latin small letter lj

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Cyrillic Supplement Range: 0500–052F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Armenian Range: 0530–058F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Hebrew Range: 0590–05FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0590

Hebrew 059

05A

$† $∞

0

05A0

1

05A3

05B3

05C2

05D2

05E2

05F0

05F1

05F2

√ !

05C3

05D3

05E3

05A4

05B4

05C4

05A5

05B5

05C5

05A6

05B6

05C6

05D4

05E4

05F3

05F4

05D5

05E5

05D6

05E6

05A7

05B7

05A8

05B8

05A9

05B9

05C7

05D7

05E7

05D8

05E8

05D9

05E9

$ú $¨ $º

$ù $≠ $Ω

$û $Æ æ

$ü $Ø $ø

059E

F

05E1

059D

E

05D1

$õ $´ $ª 059C

D

05C1

059B

C

05E0

$ö Ç π 059A

B

05B2

$ô $© $π 0599

A

05A2

$ò $® $∏ 0598

9

05D0

$ó $ß $∑ Å 0597

8

05B1

$ñ $¶ $∂ ä 0596

7

05F

05C0

$ï $• $μ Ñ 0595

6

05E

$î $§ $¥ $ƒ " 0594

5

05A1

$ì $£ $≥ 0593

4

05D

$í É $≤ $¬ 0592

3

05B0

05C

$ë $° $± $¡ 0591

2

05B

05FF

059F

05AA

05AB

05AC

05AD

05AE

05AF

05BA

05BB

05BC

05BD

05BE

05BF

05DA

05EA

05DB

05DC

05DD

05DE

05DF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

45

Arabic Range: 0600–06FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0600 060

Arabic 061

062



0

0600

0601

065

066

067

0611

0602

0612

0640

0650

0660

0670

0621

0622

0631

0641

0632

0642

0651

0661

+ ; J 0652

0662

0671

Z

0672

 ! , < K [

3

0603

0613

068

0623

0633

0643

0653

0663

0673

0680

i

0681

j

0682

k

0683

 " - = L \ l

4

0614



5

0615

0624

0634

0644

# .

0625

0635

0645

0654

0664

> M

0655

0665

0674

0684

]

m

0675

0685

$ /  N ^ n

6

0626

7

0627

0636

0646

0656

0666

0676

0686

% 0  O _ o 0637

0647

0657

0667

0677

0687

& 1  P ` p

8

0628

9

069

) 9 H X h x

0630

0610



2

0629

0638

0648

0658

0668

0678

0688

' 2  Q a q

0639

0649

0659

0669

0679

0689

0690

06A 06B 06C 06D 06E

06A0

06B0

062A

 ‫ ؛‬

B

060B

061B

،

C

062C

060C



D

062D

060D

 060E

062B



061E

062E

 ‫ ؟‬ 060F

48

061F

062F

063A

064A

065A

066A

067A

068A

06A1

06B1

065B

5 

064C

065C

066B

067B

068B

¸  Ö

06D0

06E0

06C1

06D1

06E1

06A2

06B2

06C2

06D2

06E2

067C

068C

06F1

06F2

{ « » Ê Ù

0693

06A3

06B3

06C3

06D3

06E3

06F3

| ¬ ¼ Ë Ú

0694

06A4

06B4

}

0695

06A5

06B5

~

0696

06A6

06B6

0697

06A7

06B7

0698

06A8

06B8

¡

0699

06A9

06B9

069A

06AA

06BA

069B

06AB

06BB

06C4

06D4

06E4

06F4

½

Ì

Û

06C5

06D5

06E5

069C

06AC

06BC

06F5

® ¾ Í Ü

06C6

06D6

06E6

06F6

¯ ¿ Î Ý

06C7

06D7

06E7

06F7

° À Ï Þ

06C8

06D8

06E8

06F8

± Á ۩ ß

06C9

06D9

06E9

06F9

² Â Ñ à

06CA

06DA

06EA

06FA

³ ۛ Ò á

06CB

06DB

06EB

T d t ¤ ´ Ä Ó

066C

06F0

z ª º  Ø

0692

4  S c s £

064B

¨

06C0

06F

y © ¹  ×

0691

( 3  R b r ¢

A

F

064

 * : I Y

1

E

063

06FF

06CC

06DC

06EC

06FB

â

06FC

6  U e u ¥ µ ã

064D

065D

066D

067D

068D

069D

06AD

7  V f v

064E

8

064F

065E

066E

067E

068E

069E

06AE

06BD

067F

068F

069F

06AF

06DD

06ED

¦ ¶ ۞ Ô

06BE

W g w §

066F

06CD

06BF

06CE

06DE

06EE

·  

06CF

06DF

06EF

06FD

ä

06FE



06FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0600

Arabic

Subtending marks 0600 0601 0602 0603

ARABIC NUMBER SIGN ARABIC SIGN SANAH ARABIC FOOTNOTE MARKER ARABIC SIGN SAFHA

Currency sign 060B



AFGHANI SIGN

Punctuation 060C

،

ARABIC COMMA

• also used with Thaana and Syriac in modern text

060D



→ 002C , comma ARABIC DATE SEPARATOR

Poetic marks 060E 060F

 ARABIC POETIC VERSE SIGN  ARABIC SIGN MISRA

Honorifics 0610



0611



0612



0613



0614



ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM

• represents sallallahu alayhe wasallam “may God’s peace and blessings be upon him” • represents alayhe assalam “upon him be peace” ARABIC SIGN RAHMATULLAH ALAYHE

• represents rahmatullah alayhe “may God have mercy upon him” ARABIC SIGN RADI ALLAHOU ANHU

• represents radi allahu ’anhu “may God be pleased with him” ARABIC SIGN TAKHALLUS

ARABIC LETTER WAW WITH HAMZA ABOVE

0625

ARABIC LETTER ALEF WITH HAMZA BELOW

0626

0627 0628 0629 062A 062B 062C 062D 062E 062F 0630 0631 0632 0633 0634 0635 0636 0637 0638 0639

063A 063B 063C 063D 063E 063F 0640

" " " " "

• sign placed over the name or nom-de-plume of

Koranic annotation sign 

0624

ARABIC SIGN ALAYHE ASSALLAM

a poet, or in some writings used to mark all proper names

0615

0652

ARABIC SMALL HIGH TAH

• marks a recommended pause position in some

Korans published in Iran and Pakistan • should not be confused with the small TAH sign used as a diacritic for some letters such as 0679 : ‫؛‬

ARABIC SEMICOLON

• also used with Thaana and Syriac in modern text

→ 003B ; semicolon

061C " 061D " 061E  ARABIC TRIPLE DOT PUNCTUATION MARK 061F ‫ ؟‬ARABIC QUESTION MARK • also used with Thaana and Syriac in modern text → 003F ? question mark

≡ 0627 0655

ARABIC LETTER YEH WITH HAMZA ABOVE

≡ 064A 0654

ARABIC LETTER ALEF ARABIC LETTER BEH ARABIC LETTER TEH MARBUTA ARABIC LETTER TEH ARABIC LETTER THEH ARABIC LETTER JEEM ARABIC LETTER HAH ARABIC LETTER KHAH ARABIC LETTER DAL ARABIC LETTER THAL ARABIC LETTER REH ARABIC LETTER ZAIN ARABIC LETTER SEEN ARABIC LETTER SHEEN ARABIC LETTER SAD ARABIC LETTER DAD ARABIC LETTER TAH ARABIC LETTER ZAH ARABIC LETTER AIN

→ 01B9 latin small letter ezh reversed → 02BF modifier letter left half ring ARABIC LETTER GHAIN ARABIC TATWEEL

= kashida • inserted to stretch characters • also used with Syriac

0641 0642 0643 0644 0645 0646 0647 0648 0649

! " # $ %

ARABIC LETTER FEH ARABIC LETTER QAF ARABIC LETTER KAF ARABIC LETTER LAM ARABIC LETTER MEEM ARABIC LETTER NOON ARABIC LETTER HEH ARABIC LETTER WAW ARABIC LETTER ALEF MAKSURA

064A

ARABIC LETTER YEH

Punctuation 061B

≡ 0648 0654

• represents YEH-shaped letter with no dots in any positional form

Points from ISO 8859-6 064B 064C 064D 064E 064F 0650 0651 0652

& ' ( ) * + , -

ARABIC FATHATAN ARABIC DAMMATAN ARABIC KASRATAN ARABIC FATHA ARABIC DAMMA ARABIC KASRA ARABIC SHADDA ARABIC SUKUN

Based on ISO 8859-6

• marks absence of a vowel after the base

0621

=

ARABIC LETTER HAMZA

• used in some Korans to mark a long vowel as

0622

>

ARABIC LETTER ALEF WITH MADDA ABOVE

0623

@

• can have a variety of shapes, including a circular one and a shape that looks like ‘’ → 06E1  arabic small high dotless head of khah

→ 02BE modifier letter right half ring ≡ 0627 0653 ?

ARABIC LETTER ALEF WITH HAMZA ABOVE

≡ 0627 0654

consonant ignored

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

49

Syriac Range: 0700–074F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0700

Syriac 070 0

‫܁‬

0701

2

‫܅‬

0705

6

‫܆‬

0706

7

‫܈‬

0708

9

‫܉‬

0709

A

‫܍‬

070D

0731

0741

0712

0722

0732

0742

0713

0723

0733

0743

0714

0724

0734

0744

‫݅ ܵ ܥ ܕ‬

0715

0725

0735

0745

‫݆ ܶ ܦ ܖ‬

0716

0726

0736

0746

0717

0727

0737

0747

‫݈ ܸ ܨ ܘ‬

0718

‫ܙ‬

0719

0728

071A

0738

0748

‫݉ ܹ ܩ‬ 0729

072A

0739

0749

ܺ ݊

073A

074A

071B

072B

073B

071C

072C

073C

‫ݍ ܽ ܭ ܝ‬

071D

072D

073D

074D

‫ݎ ܾ ܮܞ‬

E

F

0721

‫ܼ ܬ ܜ ܌‬

070C

D

0740

‫ܻ ܫ ܛ ܋‬ 070B

C

0730

ܑ ‫݁ ܱ ܡ‬

0711

‫ܪ ܚ ܊‬

070A

B

0720

‫݇ ܷ ܧ ܗ ܇‬

0707

8

0710

‫݄ ܴ ܤ ܔ ܄‬

0704

5

074

‫݃ ܳ ܣ ܓ ܃‬

0703

4

073

‫݂ ܲ ܢ ܒ ܂‬

0702

3

072

‫݀ ܰ ܠ ܐ ܀‬ 0700

1

071

074F

071E

072E

073E

074E

071F

072F

073F

074F

‫ݏ ܿ ܯ ܟ‬ 070F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

53

Arabic Supplement Range: 0750–077F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Thaana Range: 0780–07BF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

NKo Range: 07C0–07FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Devanagari Range: 0900–097F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0900

Devanagari 090

091

0910

0951

0961

0912

0922

0932

0942

0952

0962

0913

0923

0933

0943

0953

0914

0924

0934

0944

0954

0963

।

0964

0915

0916

0917

0925

0926

0927

0935

0936

0937

0965

0945

0966

0946

0967

0947

0918

0928

0938

0948

0958

0968

0919

0929

0939

0949

0959

0969

ऋ छ फ

$ो ज़ ५ ॻ

091A

091B

094A

092A

094B

092B

095A

095B

096A

096B

097B

ऌ ज ब $़ $ौ ड़ ६ ॼ 091C

092C

093C

094C

095C

096C

097C

ऍ झ भ ऽ $् ढ़ ७ ॽ 091D

092D

093D

094D

095D

096D

097D

ऎ ञ म $ा

फ़ ८ ॾ

ए ट य $ि

य़ ९ ॿ

090F

60

0941

$ॊ ग़ ४

090E

F

0931

ऊ च प

090D

E

0970

उ ङ ऩ ह $ॉ ख़ ३

090C

D

097

ई घ न स $ै क़ २

090B

C

0921

१

090A

B

0911

इ ग ध ष $े

0909

A

0960

०

0908

9

0950

आ ख द श $ॆ 0907

8

0940

॥

0906

7

0930

अ क थ व $ॅ 0905

6

0920

ऄ औ त ऴ $ॄ $॔ 0904

5

096

$ः ओ ण ळ $ृ $॓ $ॣ 0903

4

095

$ं ऒ ढ ल $ू $॒ $ॢ 0902

3

094

$ँ ऑ ड ऱ $ु $॑ ॡ 0901

2

093

ऐ ठ र $ी ॐ ॠ ॰

0

1

092

097F

091E

091F

092E

092F

093E

093F

095E

095F

096E

096F

097E

097F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Bengali Range: 0980–09FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0980

Bengali 098

099

0990

$‚ Ú

$É ì £

$√

$„ Û

î §

$ƒ

Ù

09A2

0993

09E2

09E3

09C3

09A3

0995

0996

09F3

ı 09F5

09A5

09A6

09F2

09F4

09C4

09A4

Ê ˆ 09E6

09B6

0997

09A7

09B7

09C7

09D7

09F6

0998

09A8

09B8

09E8

09C8

09E9

09B9

0999

09EA

09AA

099B

09F9

09EB

09CB

09AB

09FA

Î

$À

ã õ ´ 098B

09F8

Í ˙

ä ö ™ 099A

09F7

È ˘

π

â ô

09E7

Ë ¯

à ò ® ∏ $»

098A

å ú ¨ $º $Ã ‹ Ï 098C

099C

09AC

09BC

09CC

09DC

09EC

ù ≠ Ω $Õ › Ì

D

099D

09AD

09BD

09CD

09DD

099E

09AE

09BE

è ü Ø $ø 098F

099F

09AF

09BF

09ED

Ó

û Æ $æ Œ

E

F

09C2

09F1

á ó ß ∑ $« $◊ Á ˜

0989

C

09B2

Ü ñ ¶ ∂

0988

B

09E1

Ö ï •

0987

A

09F0

¢ ≤ $¬

0986

9

09E0

$Ç

0985

8

‡

09C0

09C1

09A1

0994

7

09B0

09F

· Ò

4

6

09A0

09E

$¡

0983

5

09D

°

0982

3

09C

$Å

0981

2

09B

ê † ∞ $¿

0

1

09A

09FF

09CE

09EE

ﬂ Ô 09DF

09EF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

63

Gurmukhi Range: 0A00–0A7F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0A00

Gurmukhi 0A0 0

0A1

¢ ≤ $¬

Ú

0A22

0A72

0A42

î §

Ù

0A13

0A23

0A73

0A33

0A74

0A24

Ö ï • μ 0A15

0A25

0A35

Ê

Ü ñ ¶ ∂ 0A16

0A26

0A36

0A66

$«

Á

à ò ® ∏ $»

Ë

á ó ß 0A17

0A18

0A28

0A38

0A68

0A48

Ÿ È

π 0A39

0A19

0A67

0A47

0A27

â ô 0A09

0A59

0A69

ä ö ™

⁄ Í

õ ´

$À ¤ Î

0A0A

B

0A1A

0A1B

0A5A

0A2A

0A4B

0A2B

0A5B

0A6A

0A6B

ú ¨ $º $Ã ‹ Ï

C

0A1C

0A2C

0A3C

0A1D

0A4C

0A5C

0A2D

0A6C

Ì

$Õ

ù ≠

D

0A6D

0A4D

û Æ $æ

ﬁ Ó

è ü Ø $ø

Ô

E

0A1E

0A0F

66

0A32

Û

0A08

F

0A71

0A41

$É ì £ ≥

0A07

A

0A70

0A40

$Ç

0A06

9

0A30

$Ò

0A05

8

0A20

0A21

0A14

7

0A7

$¡

4

6

0A6

°

0A03

5

0A5

$Å

0A02

3

0A4

$

0A01

2

0A3

ê † ∞ $¿ 0A10

1

0A2

0A7F

0A1F

0A2E

0A2F

0A3E

0A3F

0A5E

0A6E

0A6F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Gujarati Range: 0A80–0AFF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Oriya Range: 0B00–0B7F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0B00

Oriya 0B0

0B1

ê †

0

0B10

1

¢

$Ç

0B22

0B13

0B14

0B71

≤ $¬

0B32

0B33

0B42

0B43

0B24

0B15

0B25

0B35

0B16

0B26

$÷

0B56

0B36

0B17

0B27

0B37

0B18

0B28

0B38

0B47

0B57

Ê

0B66

0B1A

å

0B0C

0B1B

0B68

È

π

0B69

Í 0B6A

0B2A

Î

$À 0B4B

0B2B

0B6B

ú ¨ $º $Ã ‹ Ï

0B1C

0B2C

ù ≠

D

0B67

Ë

0B48

0B39

0B19

ã õ ´ 0B0B

0B1D

0B2D

0B3C

0B4C

0B5C

0B6C

Ω $Õ › Ì

0B3D

0B4D

0B5D

0B6D

û Æ $æ

Ó

è ü Ø $ø

ﬂ Ô

E

0B1E

F

0B61

0B41

ä ö ™ 0B0A

C

0B23

â ô 0B09

B

0B70

· Ò

$¡

à ò ® ∏ $» 0B08

A

0B60

á ó ß ∑ $« $◊ Á 0B07

9

‡

0B40

Ü ñ ¶ ∂ 0B06

8

0B7

Ö ï • μ

0B05

7

0B6

î §

4

6

0B5

$É ì £ ≥ $√ 0B03

5

0B4

∞ $¿

0B30

0B21

0B02

3

0B20

0B3

°

$Å

0B01

2

0B2

0B7F

0B0F

0B1F

0B2E

0B2F

0B6E

0B3E

0B3F

0B5F

0B6F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

71

Tamil Range: 0B80–0BFF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0B80

Tamil 0B8

0B9

0BA

ê

0

0BB

1

$Ç í 0B82

3

Ò

≤ $¬

Ú

Ö ï

μ

ı

Ü

∂ $∆

á

∑ $« $◊ Á ˜

0BA3

0BA4

0BF4

0BB4

0BB6

0BB7

0B87

0BF5

0BA8

0BB8

0BC7

0BC8

0B99

0BA9

0B8A

0B9A

B

ú

D

Ï

$Õ

Ì

0B8F

74

0BED

Ô

0B9F

0BAF

0BBE

0BBF

0BFA

0BEC

è ü Ø $ø

0BAE

0BF9

0BEB

Ó

0B9E

0BF8

$Ã

0BEA

é û Æ $æ 0B8E

F

0BE8

Î

0BCD

E

Ë¯

$À 0BCC

0B9C

0BF7

Í ˙

0BCB

C

0BE7

0BF6

$

0BCA

0BAA

0BD7

0BE9

0BB9

ä ö ™

0BE6

È ˘

â ô ©π 0B89

Ê ˆ

0BC6

® ∏ $»

à 0B88

A

0BF3

0BB3

0BB5

0B95

0B86

9

0BF2

0BC2

Ù

0B85

8

0BF1

0BC1

î § ¥

0B93

0B94

7

0BF0

Û

4

6

0BC0

É ì £ ≥

0B83

5

0BF

± $¡ 0BB2

0B92

0BE

0BB1

2

0BD

∞ $¿

0BB0

0B90

0BC

0BFF

0BEE

0BEF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Telugu Range: 0C00–0C7F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Kannada Range: 0C80–0CFF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Malayalam Range: 0D00–0D7F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0D00

Malayalam 0D0

0D1

0D2

0D10

1

° ± ¡

·

0D12

0D13

0D14

0D43

0D24

0D34

0D15

0D25

0D35

0D16

0D26

0D36

Ê

0D66

0D46

0D17

0D27

0D37

0D47

0D18

0D28

0D38

0D48

π

0D67

Ë 0D68

È

0D39

0D19

0D57

0D69

ã õ ´

À 0D4B

0D6B

å ú ¨

Ã

0D2C

0D4C

Ï

ù ≠

Õ

0D0C

D

0D1A

0D1B

0D1C

0D1D

0D4A

0D2A

0D2B

0D4D

0D2D

Í

0D6A

Î

0D6C

Ì

0D6D

é û Æ æ

Ó

è ü Ø ø

Ô

0D0E

F

0D33

0D0B

E

0D42

ä ö ™ 0D0A

C

0D23

â ô 0D09

B

0D32

àò ® ∏» 0D08

A

0D22

á ó ß ∑ « ◊ Á 0D07

9

0D61

Ü ñ ¶ ∂ ∆ 0D06

8

0D41

Ö ï • μ 0D05

7

0D31

0D60

0D40

î § ¥

4

6

0D30

0D7

É ì £ ≥ √ 0D03

5

0D6

Ç í ¢ ≤ ¬ 0D02

3

0D5

‡

0D20

0D21

2

0D4

∞ ¿

ê †

0

0D3

0D7F

0D0F

0D1E

0D1F

0D2E

0D2F

0D3E

0D3F

0D6E

0D6F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

81

Sinhala Range: 0D80–0DFF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0D80

Sinhala 0D8

0D9

0D90

0D91

0D95

0DA2

0DA3

0DA4

0DC2

0DB3

0DB4

0DC3

0DC4

0DD2

0DD3

0DD4

0DA5

0DB5

0DF2

0DF3

0DF4

0DC5

0D96

0DA6

0DB6

0DC6

0DD6

à

® ∏

ÿ

â

© π

Ÿ

0DA7

0DA8

0DA9

0DB7

0DB8

0DD8

0DB9

0DD9

ä ö ™ ∫ ⁄ 0D9A

0DAA

0DBA

0DCA

0DDA

ã õ ´ ª

¤

å ú ¨

‹

0D8C

0D9B

0D9C

0DAB

0DBB

0DDB

0DAC

0DDC

ç ù ≠ Ω

›

é û Æ

fi

è ü Ø

œ ﬂ

0D8D

0D8E

0D8F

84

0DD1

ß ∑

0D8B

F

0DC1

á

0D8A

E

0DB1

Ü ñ ¶ ∂ ∆ ÷

0D89

D

0DA1

Ö ï • μ ≈

0D88

C

0DD0

Ù

0D93

0D87

B

0DC0

î § ¥ ƒ ‘

0D86

A

0DB0

Û

0D85

9

0DF

É ì£ ≥ √ ”

0D92

0D94

8

0DE

Ú

4

7

0DD

¬ “

0D83

6

0DA0

Ç í ¢ 0D82

5

0DC

ë ° ± ¡ —

1

3

0DB

ê † ∞ ¿ –

0

2

0DA

0DFF

0D9D

0D9E

0D9F

0DAD

0DBD

0DDD

0DAE

0DAF

0DDE

0DCF

0DDF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Thai Range: 0E00–0E7F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0E00

Thai 0E0

0E1

0E10

0E14

0E24

0E34

0E15

0E25

0E35

0E16

0E26

0E36

0E17

0E27

0E37

0E18

0E28

0E38

0E19

0E29

0E39

0E1A

0E2A

0E3A

0E1B

0E51

? O

0E42

0E52

@ P

0E43

0E53

A Q

0E44

0E54

B

R

0E45

0E55

0E46

0E56

0E47

0E57

0E48

0E58

0E49

0E59

0E1C

0E1D

0E0F

0E1E

0E1F

0E4A

0E5A

H X

0E2B

0E4B

0E5B

I

0E4C

0E2C

J

0E4D

0E2D

K

/ 0E0E

F

4

0E33

. 0E0D

E

0E23

0E0C

D

0E13

3

0E32

, 0E0B

C

0E22

0E41

+ ; G W

0E0A

B

0E50

* : F V 0E09

A

M

0E7

) 9 E U 0E08

9

=

0E40

0E6

( 8 D T

0E07

8

0E5

' 7 C S 0E06

7

0E12

0E31

& 6 0E05

6

0E30

% 5 0E04

5

0E21

$ 0E03

4

0E11

#

0E02

3

0E20

0E4

" 2 > N 0E01

2

0E3

! 1

0

1

0E2

0E7F

0E4E

0E2E

0 < L

0E2F

0E3F

0E4F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

87

Lao Range: 0E80–0EFF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0E80

Lao 0E8

0E9

0EA

0EB

∞

0

0EB0

1

0EA1

0EA2

0EA3

0EA5

0EC1

0ED1

0EC2

0ED2

0EC3

0ED3

0EC4

0ED4

’

0ED5

0EB5

0EC6

0ED6

◊ á ó ß ∑ ∏ » ÿ à 0E87

0E97

0EA7

0E88

0E99

0EB8

0EC8

0ED8

0EB9

0EC9

0ED9

ä ö ™

0E8A

0E9A

0ED7

0EB7

π … Ÿ

ô

9

0ECA

0EAA

õ ´ ª À

B

0E9B

0EAB

0EBB

0ECB

º Ã ‹

ú

C

0EBC

0E9C

0ECC

0EDC

ç ù ≠ Ω Õ ›

0E8D

0E9D

0EAD

0EBD

0ECD

0EDD

û Æ

0E9E

0EAE

ü Ø

0E9F

90

0EB3

0EB6

0E96

F

0ED0

∂ ∆ ÷

ñ

6

E

0EB2

0EB4

0E94

0E95

D

0EB1

ï • μ

5

A

¿ –

0EC0

¥ ƒ ‘

Ñ î 0E84

8

0EF

£ ≥ √ ”

3

7

0EE

¢ ≤ ¬ “

Ç 0E82

4

0ED

° ± ¡ —

Å 0E81

2

0EC

0EFF

0EAF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Tibetan Range: 0F00–0FFF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

0F00 0F0

0F00

0F2

0F3

0F4

0F5

0F6

0F7

0F10

0F20

0F30

0F40

0F50

0F8

0FFF 0F9 0FA 0FB 0FC 0FD 0FE 0FF

- < 0F80

0F60

0F90

0FA0

0FB0

0FC0

Ü 0FD0

" 2 @ O _ i . = á

1

0F01

0F11

0F21

0F31

0F41

0F51

0F61

0F71

0F81

0F91

0FA1

0FB1

0FC1

0FD1

# 3 A P ` j / >

2

0F02

0F12

0F22

0F32

0F42

0F52

0F62

0F72

0F82

0F92

$ 4 B Q a k

3

0F03

0F13

0F23

0F33

0F43

0F53

0F63

0F73

0F83

0F93

0FA2

0FB2

0FA3

0FB3

0FC2

0 ?

0FC3

% 5 C R b l ! 1 @

4

0F04

0F14

0F24

0F34

0F44

0F54

0F64

0F74

0F84

0F94

0FA4

0FB4

0FC4

& 6 D S c m " 2 A

5

0F05

0F15

0F25

0F35

0F45

0F55

0F65

0F75

0F85

0F95

0FA5

0FB5

0FC5

0F06

0F16

0F26

0F36

0F46

0F56

0F66

0F76

0F86

0F96

0FA6

0FB6

0FC6

0F07

0F17

0F27

0F37

0F47

0F57

0F67

0F77

0F87

0F97

0FA7

0FB7

0FC7

' 7 E T d n # 3 B

6

( 8 F U e o $ 4 C

7

V f p

) 9

8

0F08

0F18

0F28

0F38

0F09

0F19

0F29

0F39

0F58

0F68

0F78

0F88

0F59

0F69

0F79

0F89

% 5 D

0FA8

0FB8

0FC8

0FA9

0FB9

0FC9

* : G W g q & 6 E

9

0F49

0F99

+ H X h r ' 7 F

A

0F0A

0F1A

0F2A

0F3A

0F4A

0F5A

, I Y

B

C

0F7A

- ; J Z

t

0F1B

0F2B

0F3B

0F4B

0F8A

0F9A

0FAA

0F8B

0F9B

0FAB

0FAC

0F7C

0F9C

. < K [

u

*

0F2C

0F3C

0F4C

0FBB

0FBC

0F7D

0F9D

/ = L \

v

+ :

0F0E

0F0F

0F1D

0F1E

0F1F

0F2D

0F2E

0F3D

0F3E

0F4D

0F4E

0F5E

0 > M ] 0F2F

0F3F

0F4F

0F5F

0F7E

w 0F7F

0FCB

0F9E

0FCC

0FAD

0F5D

0F0D

0FCA

) 9 H

0F5C

0F1C

0FBA

s ( 8 G

0F7B

0F0C

D

0F6A

0F5B

0F0B

F

0F1

! 1 ? N ^

0

E

Tibetan

0FAE

0FBE

, ; I 0F9F

0FAF

0FBF

0FCF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

93

Myanmar Range: 1000–109F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Georgian Range: 10A0–10FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Hangul Jamo Range: 1100–11FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

1100 110

1100

112

113

114

115

116

117

118

119

11A 11B 11C 11D 11E

11F

1110

1120

1130

1140

1150

1160

1170

1180

1190

11A0

11B0

11C0

11D0

11E0

11F0

ᄁᄑᄡᄱᅁᅑᅡᅱᆁᆑᆡᆱᇁᇑᇡᇱ

1

1101

1111

1121

1131

1141

1151

1161

1171

1181

1191

11A1

11B1

11C1

11D1

11E1

11F1

ᄂᄒᄢᄲᅂᅒᅢᅲᆂᆒᆢᆲᇂᇒᇢᇲ

2

1102

3

4

1142

1152

1162

1172

1182

1192

11A2

11B2

11C2

11D2

11E2

11F2

ᆴᇄᇔᇤᇴ

ᄅᄕᄥᄵᅅᅕᅥᅵᆅᆕ

ᆵᇅᇕᇥᇵ

ᄆᄖᄦᄶᅆᅖᅦᅶᆆᆖ

ᆶᇆᇖᇦᇶ

ᄇᄗᄧᄷᅇᅗᅧᅷᆇᆗ

ᆷᇇᇗᇧᇷ

1106

7

1132

ᄄᄔᄤᄴᅄᅔᅤᅴᆄᆔ 1105

6

1122

ᆳᇃᇓᇣᇳ

1104

5

1112

ᄃᄓᄣᄳᅃᅓᅣᅳᆃᆓ 1103

1107

1113

1114

1115

1116

1117

1123

1124

1125

1126

1127

1133

1134

1135

1136

1137

1143

1144

1145

1146

1147

1153

1154

1155

1156

1157

1163

1164

1165

1166

1167

1173

1174

1175

1176

1177

1183

1184

1185

1186

1187

11B3

1193

11B4

1194

11B5

1195

11B6

1196

11B7

1197

11C3

11C4

11C5

11C6

11C7

11D3

11D4

11D5

11D6

11D7

11E3

11E4

11E5

11E6

11E7

11F3

11F4

11F5

11F6

11F7

ᄈᄘᄨᄸᅈᅘᅨᅸᆈᆘᆨᆸᇈᇘᇨᇸ

8

1108

1118

1128

1138

1148

1158

1168

1178

1188

1198

11A8

11B8

11C8

11D8

11E8

11F8

ᄉᄙᄩᄹᅉᅙᅩᅹᆉᆙᆩᆹᇉᇙᇩᇹ

9

1109

A

B

1139

1149

1159

1169

1179

1189

1199

11A9

11B9

11C9

11D9

11E9

ᄋᄛᄫᄻᅋ

ᅫᅻᆋᆛᆫᆻᇋᇛᇫ

ᄌᄜᄬᄼᅌ

ᅬᅼᆌᆜᆬᆼᇌᇜᇬ

ᄍᄝᄭᄽᅍ

ᅭᅽᆍᆝᆭᆽᇍᇝᇭ

ᄎᄞᄮᄾᅎ

ᅮᅾᆎᆞᆮᆾᇎᇞᇮ

110C

D

1129

ᅪᅺᆊᆚᆪᆺᇊᇚᇪ

110B

C

1119

ᄊᄚᄪᄺᅊ 110A

110D

110E

F

111

11FF

ᄀᄐᄠᄰᅀᅐᅰᆀᆐᆠᆰᇀᇐᇠᇰ

0

E

Hangul Jamo

111A

111B

111C

111D

111E

112A

112B

112C

112D

112E

113A

113B

113C

113D

113E

116A

114A

116B

114B

116C

114C

116D

114D

116E

114E

117A

117B

117C

117D

117E

118A

118B

118C

118D

118E

119A

119B

119C

119D

119E

11AA

11AB

11AC

11AD

11AE

11BA

11BB

11BC

11BD

11BE

11CA

11CB

11CC

11CD

11CE

11DA

11DB

11DC

11DD

11DE

11F9

11EA

11EB

11EC

11ED

11EE

ᄏᄟᄯᄿᅏᅯᅿᆏᆟᆯᆿᇏᇟᇯ 110F

111F

112F

113F

114F

115F

116F

117F

118F

119F

11AF

11BF

11CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

11DF

11EF

101

Ethiopic Range: 1200–137F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

1200

Ethiopic 120

0

1250

1260

1270

1280

1290

12A0

12B0

1211

1221

1231

1241

1251

1261

1271

1281

1291

12A1

1212

1222

1232

1242

1252

1262

1272

1282

1292

12A2

12B2

1213

1223

1233

1243

1253

1263

1273

1283

1293

12A3

12B3

1214

1224

1234

1244

1254

1264

1274

1284

1294

12A4

12B4

1215

1225

1235

1245

1255

1265

1275

1285

1295

12A5

12B5

1216

1226

1236

1246

1256

1217

1227

1237

1266

1276

1286

1296

12A6

b r 1267

1247

1277

1287

1297

12A7

1218

1228

1238

1248

1258

1219

1229

1268

1278

1288

d t 1269

1239

1298

12A8

12B8

¡ ® 1299

1279

12A9

12B9

121A

122A

123A

124A

125A

126A

127A

128A

129A

12AA

12BA

121B

122B

123B

124B

125B

126B

127B

128B

129B

12AB

12BB

121C

122C

123C

124C

125C

126C

127C

128C

129C

12AC

12BC

121D

122D

123D

124D

125D

126D

127D

128D

129D

12AD

12BD

0 @

i y

¦ ³

! 1 A

j z

§

120E

F

1240

/ ? N Z h x ¥ ² 120D

E

1230

. > M Y g w ¤ ± 120C

D

12B

- = L X f v £ ° 120B

C

12A

, < K W e u ¢ ¯ 120A

B

1220

+ ; 1209

A

129

* : J V c s 1208

9

1210

) 9 I 1207

8

128

( 8 H U a q O 1206

7

127

' 7 G T ` p E ¬ 1205

6

126

& 6 F S _ o « 1204

5

125

% 5 E R ^ n ~ ª 1203

4

124

$ 4 D Q ] m } I © 1202

3

123

# 3 C P \ l | U 1201

2

122

" 2 B O [ k { A ¨ 1200

1

121

12BF

120F

121E

121F

122E

122F

123E

123F

126E

126F

127E

127F

129E

129F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

12AE

12BE

12AF

105

12C0

Ethiopic 12C

0

12D

12D0

12D1

12D2

12E0

12E1

12E2

12F1

1320

1321

1301

1330

1331

1340

1350

0 @ 1341

1351

1360

1370

L

\

1361

12F2

1302

1312

1322

1332

1342

1352

1362

1371

]

1372

a

& 6 F

b

12D5

12E4

12E5

12F3

12F4

12F5

1303

1304

1305

1313

1314

1315

12E6

12F6

¹ È

12D8

12F7

1323

1324

1325

1326

1306

× ç ÷

1327

1307

1333

1334

1335

1336

1337

1343

1344

1345

1346

1347

1353

1354

1355

1356

1357

1363

1364

1365

1366

R

1367

1373

1374

1375

1376

1377

Ø è ø ' 7 G S c

12E8

12F8

1308

1318

1328

1338

1348

1358

1368

1378

º É Ù é ù ( 8 H T d 12D9

12E9

12F9

1309

1319

1329

1339

1349

1359

1369

» Ê Ú ê ú ) 9 I U 12CA

12DA

12EA

12FA

130A

131A

132A

133A

134A

135A

¼ Ë Û ë û * : 12CB

12DB

12EB

12FB

130B

131B

132B

133B

12DC

12EC

12FC

130C

131C

132C

133C

V

12DD

12ED

12FD

130D

131D

132D

133D

12DE

12EE

12FE

130E

131E

132E

133E

e

137A

f

137B

137C

X 136D

134D

¿ Î Þ î þ - = 12CE

1379

W g

136C

134C

¾ Í Ý í ý , < 12CD

136A

136B

134B

½ Ì Ü ì ü + ; 12CC

Y

136E

134E

À Ï ß ï . > $J Z 12CF

106

1310

% 5 E Q

12C9

F

1300

O Ö æ ö

12C8

E

12F0

`

12E3

12E7

D

137

¸ E Õ å õ $ 4 D P

12D4

7

C

136

_

12D3

12D6

B

135

· Å Ô ä ô # 3 C O

6

A

134

^

12C5

9

133

2 B N

12C4

8

132

¶ Ä Ó ã ó " 12C3

5

131

µ I Ò â ò ! 1 A M 12C2

4

130

U Ñ á ñ

1

3

12F

´ A Ð à ð / ? K [ 12C0

2

12E

137F

12DF

12EF

12FF

130F

131F

132F

133F

134F

135F

136F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Ethiopic Supplement Range: 1380–139F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Cherokee Range: 13A0–13FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Unified Canadian Aboriginal Syllabics Range: 1400–167F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

1400 140

Unified Canadian Aboriginal Syllabics 141

1410

1450

1460

1470

1480

1490

14A0

14B0

14C0

14D0

1411

1421

1431

1441

1451

1461

1471

1481

1491

14A1

14B1

14C1

14D1

1412

1422

1432

1442

1452

1462

1472

1482

1492

14A2

14B2

14C2

14D2

1413

1423

1433

1443

1453

1463

1473

1483

1493

14A3

14B3

14C3

14D3

1414

1424

1434

1444

1454

1464

1474

1484

1494

14A4

14B4

14C4

14D4

1415

1425

1435

1445

1455

1465

1475

1485

1495

14A5

14B5

14C5

14D5

1416

1426

1436

1446

1456

1466

1476

1486

1496

14A6

14B6

14C6

14D6

1417

1427

1437

1447

1457

1467

1477

1487

1497

14A7

14B7

14C7

14D7

1418

1428

1438

1448

1458

1468

1478

1488

1498

14A8

14B8

14C8

14D8

1419

1429

1439

1449

1459

1469

1479

1489

1499

14A9

14B9

14C9

14D9

141A

142A

143A

144A

145A

146A

147A

148A

149A

14AA

14BA

14CA

14DA

141B

142B

143B

144B

145B

146B

147B

148B

149B

14AB

14BB

14CB

14DB

141C

142C

143C

144C

145C

146C

147C

148C

149C

14AC

14BC

14CC

14DC

141D

142D

143D

144D

145D

146D

147D

148D

149D

14AD

14BD

14CD

14DD

/ ? O _ o ¯ ¿ Ï ß 140E

F

1440

. > N ^ n ~ ® ¾ Î Þ 140D

E

1430

- = M ] m } ½ Í Ý 140C

D

1420

, < L \ l | ¬ ¼ Ì Ü 140B

C

14D

+ ; K [ k { « » Ë Û 140A

B

14C

* : J Z j z ª º Ê Ú 1409

A

14B

) 9 I Y i y © ¹ É Ù 1408

9

14A

( 8 H X h x ¨ ¸ È Ø 1407

8

149

' 7 G W g w § · Ç × 1406

7

148

& 6 F V f v ¦ ¶ Æ Ö 1405

6

147

% 5 E U e u ¥ µ Å Õ 1404

5

146

$ 4 D T d t ¤ ´ Ä Ô 1403

4

145

# 3 C S c s £ ³ Ã Ó 1402

3

144

" 2 B R b r ¢ ² Â Ò 1401

2

143

! 1 A Q a q ¡ ± Á Ñ

0

1

142

14DF

140F

141E

141F

142E

143E

144E

145E

146E

147E

148E

149E

14AE

14BE

14CE

14DE

0 @ P ` p ° À Ð à 142F

143F

144F

145F

146F

147F

148F

149F

14AF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

14BF

14CF

14DF

113

14E0 14E 0

1520

1530

1540

1550

1560

1570

1580

1590

15A0

14F1

1501

1511

1521

1531

1541

1551

1561

1571

1581

1591

15A1

14F2

1502

1512

1522

1532

1542

1552

1562

1572

1582

1592

15A2

14F3

1503

1513

1523

1533

1543

1553

1563

1573

1583

1593

15A3

14F4

1504

1514

1524

1534

1544

1554

1564

1574

1584

1594

15A4

14F5

1505

1515

1525

1535

1545

1555

1565

1575

1585

1595

15A5

14F6

1506

1516

1526

1536

1546

1556

1566

1576

1586

1596

15A6

14F7

1507

1517

1527

1537

1547

1557

1567

1577

1587

1597

15A7

14F8

1508

1518

1528

1538

1548

1558

1568

1578

1588

1598

15A8

14F9

1509

1519

1529

1539

1549

1559

1569

1579

1589

1599

15A9

14FA

150A

151A

152A

153A

154A

155A

156A

157A

158A

159A

15AA

14FB

150B

151B

152B

153B

154B

155B

156B

157B

158B

159B

15AB

14FC

150C

14FD

150D

151C

151D

152C

153C

154C

155C

156C

157C

158C

159C

15AC

0 @ P ` p ° 152D

153D

154D

155D

156D

157D

158D

159D

15AD

14FE

150E

151E

152E

153E

154E

155E

156E

157E

158E

159E

15AE

ð " 2 B R b r ¢ ² 14EF

114

1510

ï ! 1 A Q a q ¡ ± 14EE

F

1500

î þ 14ED

E

14F0

í ý / ? O _ o ¯ 14EC

D

15A

ì ü . > N ^ n ~ ® 14EB

C

159

ë û - = M ] m } 14EA

B

158

ê ú , < L \ l | ¬ 14E9

A

157

é ù + ; K [ k { « 14E8

9

156

è ø * : J Z j z ª 14E7

8

155

ç ÷ ) 9 I Y i y © 14E6

7

154

æ ö ( 8 H X h x ¨ 14E5

6

153

å õ ' 7 G W g w § 14E4

5

152

ä ô & 6 F V f v ¦ 14E3

4

151

ã ó % 5 E U e u ¥ 14E2

3

150

â ò $ 4 D T d t ¤ 14E1

2

14F

15AF

á ñ # 3 C S c s £ 14E0

1

Unified Canadian Aboriginal Syllabics

14FF

150F

151F

152F

153F

154F

155F

156F

157F

158F

159F

15AF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

15B0 15B 0

15E0

15F0

1600

1610

1620

1630

1640

1650

1660

1670

15C1

15D1

15E1

15F1

1601

1611

1621

1631

1641

1651

1661

1671

15C2

15D2

15E2

15F2

1602

1612

1622

1632

1642

1652

1662

1672

15C3

15D3

15E3

15F3

1603

1613

1623

1633

1643

1653

1663

1673

15C4

15D4

15E4

15F4

1604

1614

1624

1634

1644

1654

1664

1674

15C5

15D5

15E5

15F5

1605

1615

1625

1635

1645

1655

1665

1675

15C6

15D6

15E6

15F6

1606

1616

1626

1636

1646

1656

1666

1676

15C7

15D7

15E7

15F7

1607

1617

1627

1637

1647

1657

1667

15C8

15D8

15E8

15F8

1608

1618

1628

1638

1648

1658

1668

15C9

15D9

15E9

15F9

1609

1619

1629

1639

1649

1659

1669

15CA

15DA

15EA

15FA

160A

15CB

15DB

15EB

15FB

160B

161A

161B

162A

163A

164A

165A

166A

0 @ P ` p 162B

163B

164B

165B

166B

15CC

15DC

15EC

15FC

160C

161C

162C

163C

164C

165C

166C

15CD

15DD

15ED

15FD

160D

161D

162D

163D

164D

165D

166D

Á Ñ á ñ # 3 C S c s 15BE

F

15D0

À Ð à ð " 2 B R b r 15BD

E

167

¿ Ï ß ï ! 1 A Q a q 15BC

D

15C0

¾ Î Þ î þ 15BB

C

166

½ Í Ý í ý / ? O _ o 15BA

B

165

¼ Ì Ü ì ü . > N ^ n 15B9

A

164

» Ë Û ë û - = M ] m 15B8

9

163

º Ê Ú ê ú , < L \ l 15B7

8

162

¹ É Ù é ù + ; K [ k { 15B6

7

161

¸ È Ø è ø * : J Z j z 15B5

6

160

· Ç × ç ÷ ) 9 I Y i y 15B4

5

15F

¶ Æ Ö æ ö ( 8 H X h x 15B3

4

15E

µ Å Õ å õ ' 7 G W g w 15B2

3

15D

´ Ä Ô ä ô & 6 F V f v 15B1

2

15C

167F

³ Ã Ó ã ó % 5 E U e u 15B0

1

Unified Canadian Aboriginal Syllabics

15CE

15DE

15EE

15FE

160E

161E

162E

163E

164E

165E

166E

Â Ò â ò $ 4 D T d t 15BF

15CF

15DF

15EF

15FF

160F

161F

162F

163F

164F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

165F

166F

115

1401

Unified Canadian Aboriginal Syllabics

Syllables

1429

1401

142A

1402 1403 1404 1405 1406 1407 1408 1409 140A 140B 140C 140D 140E 140F 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 141A 141B 141C 141D 141E 141F

5 CANADIAN SYLLABICS E • Inuktitut (AI), Carrier (U) 6 CANADIAN SYLLABICS AAI • Inuktitut 7 CANADIAN SYLLABICS I • Carrier (O) 8 CANADIAN SYLLABICS II 9 CANADIAN SYLLABICS O • Inuktitut (U), Carrier (E) : CANADIAN SYLLABICS OO • Inuktitut (UU) ; CANADIAN SYLLABICS Y-CREE OO < CANADIAN SYLLABICS CARRIER EE = CANADIAN SYLLABICS CARRIER I > CANADIAN SYLLABICS A ? CANADIAN SYLLABICS AA @ CANADIAN SYLLABICS WE A CANADIAN SYLLABICS WEST-CREE WE B CANADIAN SYLLABICS WI C CANADIAN SYLLABICS WEST-CREE WI D CANADIAN SYLLABICS WII E CANADIAN SYLLABICS WEST-CREE WII F CANADIAN SYLLABICS WO G CANADIAN SYLLABICS WEST-CREE WO H CANADIAN SYLLABICS WOO I CANADIAN SYLLABICS WEST-CREE WOO J CANADIAN SYLLABICS NASKAPI WOO K CANADIAN SYLLABICS WA L CANADIAN SYLLABICS WEST-CREE WA M CANADIAN SYLLABICS WAA N CANADIAN SYLLABICS WEST-CREE WAA O CANADIAN SYLLABICS NASKAPI WAA P CANADIAN SYLLABICS AI • East Cree Q CANADIAN SYLLABICS Y-CREE W R CANADIAN SYLLABICS GLOTTAL STOP • Moose Cree (Y), Algonquian (GLOTTAL STOP)

S CANADIAN SYLLABICS FINAL ACUTE • West Cree (T), East Cree (Y), Inuktitut (GLOTTAL STOP)

1420 1421 1422 1423

• Athapascan (B/P), Sayisi (I), Carrier (G)

T CANADIAN SYLLABICS FINAL GRAVE • West Cree (K), Athapascan (K), Carrier (KH) U CANADIAN SYLLABICS FINAL BOTTOM HALF RING

• N Cree (SH), Sayisi (R), Carrier (NG)

V CANADIAN SYLLABICS FINAL TOP HALF RING • Algonquian (S), Chipewyan (R), Sayisi (S) W CANADIAN SYLLABICS FINAL RIGHT HALF RING

• West Cree (N), Athapascan (D/T), Sayisi (N), 1424 1425 1426 1427 1428

Carrier (N)

X CANADIAN SYLLABICS FINAL RING • West Cree (W), Sayisi (O) Y CANADIAN SYLLABICS FINAL DOUBLE ACUTE • Chipewyan (TT), South Slavey (GH) Z CANADIAN SYLLABICS FINAL DOUBLE SHORT VERTICAL STROKES

1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 143A 143B 143C 143D 143E 143F 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 144A 144B 144C 144D 144E 144F 1450 1451 1452 1453 1454 1455

• Algonquian (H), Carrier (R) 1456 [ CANADIAN SYLLABICS FINAL MIDDLE DOT • Moose Cree (W), Athapascan (Y), Sayisi (YU) 1457 1458 \ CANADIAN SYLLABICS FINAL SHORT HORIZONTAL STROKE

• West Cree (C), Sayisi (D)

116

142B 142C 142D 142E 142F

1459 145A 145B

145B

CANADIAN SYLLABICS FINAL PLUS • Athapascan (N), Sayisi (AI) CANADIAN SYLLABICS FINAL DOWN TACK • N Cree (L), Carrier (D) → 22A4 ⊤ down tack CANADIAN SYLLABICS EN CANADIAN SYLLABICS IN CANADIAN SYLLABICS ON CANADIAN SYLLABICS AN CANADIAN SYLLABICS PE • Inuktitut (PAI), Athapascan (BE), Carrier (HU) CANADIAN SYLLABICS PAAI • Inuktitut

CANADIAN SYLLABICS PI CANADIAN SYLLABICS PII CANADIAN SYLLABICS PO • Inuktitut (PU), Athapascan (BO), Carrier (HE)

CANADIAN SYLLABICS POO • Inuktitut (PUU) CANADIAN SYLLABICS Y-CREE POO CANADIAN SYLLABICS CARRIER HEE CANADIAN SYLLABICS CARRIER HI CANADIAN SYLLABICS PA • Athapascan (BA), Carrier (HA) CANADIAN SYLLABICS PAA CANADIAN SYLLABICS PWE CANADIAN SYLLABICS WEST-CREE PWE CANADIAN SYLLABICS PWI CANADIAN SYLLABICS WEST-CREE PWI CANADIAN SYLLABICS PWII CANADIAN SYLLABICS WEST-CREE PWII CANADIAN SYLLABICS PWO CANADIAN SYLLABICS WEST-CREE PWO CANADIAN SYLLABICS PWOO CANADIAN SYLLABICS WEST-CREE PWOO CANADIAN SYLLABICS PWA CANADIAN SYLLABICS WEST-CREE PWA CANADIAN SYLLABICS PWAA CANADIAN SYLLABICS WEST-CREE PWAA

! CANADIAN SYLLABICS Y-CREE PWAA " CANADIAN SYLLABICS P # CANADIAN SYLLABICS WEST-CREE P • Sayisi (G) $ CANADIAN SYLLABICS CARRIER H % CANADIAN SYLLABICS TE • Inuktitut (TAI), Athapascan (DI), Carrier (DU) & CANADIAN SYLLABICS TAAI • Inuktitut ' CANADIAN SYLLABICS TI • Athapascan (DE), Carrier (DO) ( CANADIAN SYLLABICS TII ) CANADIAN SYLLABICS TO • Inuktitut (TU), Athapascan (DO), Carrier (DE), Sayisi (DU)

* CANADIAN SYLLABICS TOO • Inuktitut (TUU) + CANADIAN SYLLABICS Y-CREE TOO , CANADIAN SYLLABICS CARRIER DEE - CANADIAN SYLLABICS CARRIER DI . CANADIAN SYLLABICS TA • Athapascan (DA) / CANADIAN SYLLABICS TAA 0 CANADIAN SYLLABICS TWE 1 CANADIAN SYLLABICS WEST-CREE TWE 2 CANADIAN SYLLABICS TWI 3 CANADIAN SYLLABICS WEST-CREE TWI 4 CANADIAN SYLLABICS TWII

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Ogham Range: 1680–169F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Runic Range: 16A0–16FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Tagalog Range: 1700–171F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Hanunoo Range: 1720–173F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Buhid Range: 1740–175F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Tagbanwa Range: 1760–177F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Khmer Range: 1780–17FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

1780

Khmer 178 0

1

2

17D0

17E0

17F0

1781

1791

17A1

17B1

17C1

17D1

17E1

17F1

Å ë ° ± $¡ $— · Ò Ç í ¢ ≤ $¬ ‚

Ú

17F2

1793

17A3

17B3

17C3

17D3

17E3

17F3

1794

17A4

17B4

17C4

17D4

17E4

17F4

1795

17A5

17B5

17C5

17D5

17E5

1796

17A6

17B6

17C6

17D6

17E6

17F5

ˆ

17F6

1797

17A7

17B7

17C7

17D7

17E7

17F7

1798

17A8

17B8

17C8

17D8

17E8

17F8

â ô © $π $… Ÿ È ˘ 1799

17A9

17B9

17C9

17D9

17E9

17F9

ä ö ™ $∫ $ ⁄ 179A

17AA

17BA

17CA

ã õ ´ $ª $À 179B

17AB

17BB

17CB

å ú ¨ $º $Ã 179C

17AC

17BC

17CC

17DA

¤

17DB

‹

17DC

ç ù ≠ $Ω $Õ $› 179D

17AD

17BD

17CD

17DD

é û Æ $æ $Œ 178E

F

17E2

à ò ® $∏ $» ÿ Ë ¯

178D

E

17D2

á ó ß $∑ $« ◊ Á ˜

178C

D

17C2

Ü ñ ¶ $∂ $∆ ÷ Ê

178B

C

17B2

Ö ï • $≈ ’ Â ı

178A

B

17A2

Ñ î § $ƒ ‘ ‰ Ù

1789

A

1792

É ì £ ≥ $√ $” „ Û

1788

9

17F

17C0

1787

8

17E

17B0

1786

7

17D

17A0

1785

6

17C

1790

1784

5

17B

1780

1783

4

17A

Ä ê † ∞ $¿ $– ‡

1782

3

179

17FF

179E

17AE

17BE

17CE

è ü Ø $ø $œ 178F

179F

17AF

17BF

17CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

129

Mongolian Range: 1800–18AF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

1800

Mongolian 180 0

1

2

3

4

5

6

7

8

9

A

B

C

D

E

F

181

182

183

184

185

186

18AF 187

188

189

18A

Äê†∞¿–‡Äê† Åë°±¡—·ÒÅë° Çí¢≤¬“‚ÚÇí¢ Éì£≥√”„ÛÉì£ Ñî§¥ƒ‘‰ÙÑî§ Öï•μ≈’ÂıÖï• Üñ¶∂∆÷ÊˆÜñ¶ áóß∑«◊Á˜áóß àò®∏»ÿË àò® âô©π…ŸÈ âô© ä ™∫ ⁄Í äö ´ªÀ¤Î ãõ ¨ºÃ‹Ï åú ≠ΩÕ›Ì çù ÆæŒﬁÓ éû ØøœﬂÔ èü 1800

1810

1820

1830

1840

1850

1860

1870

1880

1890

18A0

1801

1811

1821

1831

1841

1851

1861

1871

1881

1891

18A1

1802

1812

1822

1832

1842

1852

1862

1872

1882

1892

18A2

1803

1813

1823

1833

1843

1853

1863

1873

1883

1893

18A3

1804

1814

1824

1834

1844

1854

1864

1874

1884

1894

18A4

1805

1815

1825

1835

1845

1855

1865

1875

1885

1895

18A5

1806

1816

1826

1836

1846

1856

1866

1876

1886

1896

18A6

1807

1817

1827

1837

1847

1857

1867

1877

1887

1897

18A7

1808

1818

1828

1838

1848

1858

1868

1888

1898

18A8

1809

1819

1829

1839

1849

1859

1869

1889

1899

18A9

180A

182A

183A

184A

185A

186A

188A

189A

180B

182B

183B

184B

185B

186B

188B

189B

180C

182C

183C

184C

185C

186C

188C

189C

180D

182D

183D

184D

185D

186D

188D

189D

180E

182E

183E

184E

185E

186E

188E

189E

182F

183F

184F

185F

186F

188F

189F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

133

Limbu Range: 1900–194F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Tai Le Range: 1950–197F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

New Tai Lue Range: 1980–19DF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Khmer Symbols Range: 19E0–19FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Buginese Range: 1A00–1A1F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Balinese Range: 1B00–1B7F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

1B00

Balinese 1B0 0

F

1B12

1B22

1B32

1B13

1B23

1B33

1B14

1B24

1B34

1B15

1B25

1B35

1B16

1B26

1B36

1B17

1B18

1B27

1B37

1B41

1B42

1B43

1B44

1B45

1B46

1B47

® $∏ »

1B28

1B38

1B19

1B29

1B39

1B1A

1B2A

1B3A

1B1B

1B2B

1B3B

1B1C

1B2C

1B3C

ç ù ≠ $Ω

1B0D

E

1B31

å ú ¨ $º 1B0C

D

1B21

1B51

1B61

1B52

1B62

1B53

1B54

1B55

1B56

1B63

1B70

1B71

1B72

‰

1B48

1B57

ÿ

1B58

1B73

Ù

1B64

1B74

Â

ı

1B65

Ê

1B66

1B67

Ë

1B68

1B49

1B59

1B69

1B4A

1B5A

1B6A

ã õ ´ $ª À ¤ $Î

1B0B

C

‡ $

1B60

ä ö ™ $∫ ⁄ Í 1B0A

B

1B50

1B7

1B75

ˆ

1B76

˜

1B77

¯

1B78

â ô © $π … Ÿ È ˘

1B09

A

1B11

à ò 1B08

9

1B40

1B6

á ó ß $∑ « ◊ Á 1B07

8

1B30

Ü ñ ¶ $∂ ∆ ÷ 1B06

7

1B20

Ö ï • $μ ≈ ’ 1B05

6

1B10

$Ñ î § $¥ $ƒ ‘ 1B04

5

1B5

$É ì £ ≥ $√ ” „ $Û 1B03

4

1B4

$Ç í ¢ ≤ $¬ “ ‚ $Ú 1B02

3

1B3

$Å ë ° ± $¡ — · $Ò 1B01

2

1B2

$Ä ê † ∞ $¿ – 1B00

1

1B1

1B7F

1B1D

1B2D

1B3D

é û Æ $æ 1B2E

1B3E

1B0E

1B1E

è

ü Ø $ø

1B0F

1B1F

1B2F

1B3F

1B4B

1B5B

1B6B

‹

$Ï

1B5C

1B6C

1B79

˙

1B7A

˚

1B7B

¸

1B7C

› $Ì

1B5D

1B6D

ﬁ $Ó

1B5E

1B6E

ﬂ $Ô

1B5F

1B6F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

143

Phonetic Extensions Range: 1D00–1D7F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

1D00

Phonetic Extensions 1D0 0

1D09

A

1D0A

B

1D14

1D24

1D15

1D25

1D16

1D26

1D17

1D27

1D18

1D28

1D41

1D33

1D42

E

1D43

1D5

1D6

1D7

1D29

F

1D2A

1D1B

1D2B

1D1C

1D2C

v

G

W

g

w

1D35

8

1D36

1D37

1D38

1D45

1D39

1D3A

1D3B

1D3C

1D3D

I

1D47

J

1D48

1D49

M

1D4B

N

1D4C

O

1D4D

A

Q

1D3F

1D56

Y

1D57

Z

1D58

[

1D59

1D65

h

1D66

i

1D67

j

1D68

k

1D69

1D75

x

1D76

y

1D77

z

1D78

{

1D79

L \ l Â

1D4A

! 1

@

1D55

H X

1D46

1D4E

1D2F

f

u

1D73

7

1D3E

1D1F

V

e

1D63

t

1D72

1D74

1D2E

0

U

1D53

d

1D62

s

1D71

1D64

1D1E

1D52

c

1D61

1D70

1D54

, <

1D1A

S

1D51

1D60

1D44

+ ; K

1D19

1D50

1D34

1D2D

1D0F

146

1D40

1D1D

1D0E

F

1D23

/ ? 1D0D

E

1D13

. >

1D0C

D

1D4

4 D T

1D32

- =

1D0B

C

1D22

* :

1D08

9

1D12

1D31

) 9

1D07

8

1D21

( 1D06

7

1D11

' 1D05

6

1D30

& 6

1D04

5

1D20

% 5

1D03

4

1D10

$ 1D02

3

1D3

# 3 C 1D01

2

1D2

" 2 B R b r

1D00

1

1D1

1D7F

P

1D4F

1D5A

1D6A

1D7A

] m ò

1D5B

1D6B

^ n

1D5C

_

1D5D

`

1D5E

1D6C

1D7B



1D7C

o

1D6D

1D7D

p 

1D6E

1D7E

a q ô

1D5F

1D6F

1D7F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Phonetic Extensions Supplement Range: 1D80–1DBF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

1D80

Phonetic Extensions Supplement 1D8 0

ᶀ ᶐ

1D80

1

ᶂ

1D82

3

5

ᶵ

F

1D95

ᶆ ᶖ

1D96

ᶇ ᶗ

1D97

ᶈ ᶘ ᶉ

ᶊ

ᶋ

ᶌ ᶍ

1D8D

E

ᶴ

ᶥ

1D8C

D

ᶤ

ᶳ

1DB3

ᶕ

1D8B

C

ᶣ

1DA3

ᶲ

1DB2

ᶅ

1D8A

B

ᶢ

1DA2

ᶱ

1DB1

1DB4

1D89

A

ᶡ

1DA1

ᶰ

1DB0

1DA4

1D88

9

ᶠ

1DA0

1DB

1D94

1D87

8

1D93

ᶄ ᶔ

1D86

7

ᶒ

1D92

1DA

1D84

1D85

6

1D91

ᶃ ᶓ 1D83

4

1D90

ᶁ ᶑ

1D81

2

1D9

ᶎ

1D98

ᶙ

1D99

ᶚ

1D9A

ᶛ

1D9B

ᶜ

1D9C

ᶝ

1D9D

ᶞ

1DA5

ᶦ

1DA6

ᶧ

1DA7

ᶨ

1DA8

ᶩ

1DA9

ᶪ

1DAA

ᶫ

1DAB

ᶬ

1DAC

ᶭ

1DAD

ᶮ

1DB5

ᶶ

1DB6

ᶷ

1DB7

ᶸ

1DB8

ᶹ

1DB9

ᶺ

1DBA

ᶻ

1DBB

ᶼ

1DBC

ᶽ

1DBD

ᶾ

1D8E

1D9E

1DAE

1DBE

ᶏ

ᶟ

ᶯ

ᶿ

1D8F

1D9F

1DAF

1DBF

1DBF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

149

Combining Diacritical Marks Supplement Range: 1DC0–1DFF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Latin Extended Additional Range: 1E00–1EFF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

1E00

Latin Extended Additional

1EFF

1E0 1E1 1E2 1E3 1E4 1E5 1E6 1E7 1E8 1E9 1EA 1EB 1EC 1ED 1EE 1EF

Ḁ Ḑ Ḡ Ḱ Ṁ Ṑ Ṡ Ṱ Ẁ Ẑ Ạ Ằ Ề Ố Ỡ Ự

0

1E00

1E01

1E30

1E40

1E50

1E60

1E70

1E80

1E90

1EA0

1EB0

1EC0

1ED0

1EE0

1EF0

1E11

1E21

1E31

1E41

1E51

1E61

1E71

1E81

1E91

1EA1

1EB1

1EC1

1ED1

1EE1

1EF1

Ḃ Ḓ Ḣ Ḳ Ṃ Ṓ Ṣ t Ẃ Ẓ ° À Ồ Ợ Ỳ

2

1E02

1E12

1E22

1E32

1E42

1E52

1E62

1E72

1E82

1E92

1EA2

1EB2

1EC2

1ED2

1EE2

1EF2

ḃ ḓ ḣ ḳ ṃ ṓ ṣ u ẃ ẓ ¡ ± Á ồ ợ ỳ

3

1E03

1E13

1E23

1E33

1E43

1E53

1E63

1E73

1E83

1E93

1EA3

1EB3

1EC3

1ED3

1EE3

1EF3

Ḅ Ḕ Ḥ6 Ṅ Ṕ Ṥ Ṵ Ẅ Ấ Ẵ Ễ Ò Ụ Ỵ

4

1E04

1E14

1E24

1E34

1E44

1E54

1E64

1E74

1E84

1E94

1EA4

1EB4

1EC4

1ED4

1EE4

1EF4

ḅ ḕ ḥ 7 ṅ ṕ ṥ ṵ ẅ ấ ẵ ễ Ó ụ ỵ

5

1E05

1E15

1E25

1E35

1E45

1E55

1E65

1E75

1E85

1E95

1EA5

1EB5

1EC5

1ED5

1EE5

1EF5

Ḗ Ḧ Ḷ Ṇ Ṗ Ṧ Ṷ Ẇ Ầ Ặ Ä Ỗ ä ô

6

1E06

1E16

1E26

1E36

1E46

1E56

1E66

1E76

1E86

1E96

1EA6

1EB6

1EC6

1ED6

1EE6

1EF6

ḗ ḧ ḷ ṇ ṗ ṧ ṷ ẇ ẗ ầ ặ Å ỗ å õ

7

1E07

1E17

1E27

1E37

1E47

1E57

1E67

1E77

1E87

1E97

1EA7

1EB7

1EC7

1ED7

1EE7

1EF7

Ḉ Ḙ Ḩ Ḹ J Ṙ Ṩ Ṹ Ẉ ẘ ¦ Ẹ Æ Ö Ứ Ỹ

8

1E08

1E18

1E28

1E38

1E48

1E58

1E68

1E78

1E88

1E98

1EA8

1EB8

1EC8

1ED8

1EE8

1EF8

ḉ ḙ ḩ ḹ K ṙ ṩ ṹ ẉ ẙ § ẹ Ç × ứ ỹ

9

1E09

1E19

1E29

1E39

1E49

1E59

1E69

1E79

1E89

1E99

1EA9

1EB9

1EC9

1ED9

1EE9

1EF9

Ḋ Ḛ Ḫ < Ṋ Ṛ Ṫ Ṻ Ẋ Ẫ ¸ Ị Ớ Ừ

A

1E0A

1E1A

1E2A

1E3A

1E4A

1E5A

1E6A

1E7A

1E8A

1E9A

1EAA

1EBA

1ECA

1EDA

1EEA

ḋ ḛ ḫ = ṋ ṛ ṫ ṻ ẋ ẛ ẫ ¹ ị ớ ừ

B

1E0B

C

D

1E1B

1E2B

1E3B

1E4B

1E5B

1E6B

1E7B

1E8B

Ḍ Ḝ Ḭ Ḽ Ṍ Ṝ Ṭ Ṽ Ẍ 1E0C

1E1C

1E2C

1E3C

1E4C

1E5C

1E6C

1E7C

1E8C

ḍ ḝ ḭ ḽ ṍ ṝ ṭ ṽ ẍ

1E0D

1E1D

1E2D

1E3D

1E4D

1E5D

1E6D

1E7D

1E8D

1E9B

1EAB

1EBB

1ECB

1EDB

1EEB

ª Ẽ Ọ Ờ ê 1EAC

1EBC

1ECC

1EDC

1EEC

« ẽ ọ ờ ë

1EAD

1EBD

1ECD

1EDD

1EED

Ḟ Ḯ Ḿ Ṏ ` p Ṿ Ẏ

Ắ Ế Ì Ü Ữ

ḟ ḯ ḿ ṏ a q ṿ ẏ

ắ ế Í Ý ữ

1E0E

F

1E20

ḁ ḑ ḡ ḱ ṁ ṑ ṡ ṱ ẁ ẑ ạ ằ ề ố ỡ ự

1

E

1E10

1E0F

1E1E

1E1F

1E2E

1E2F

1E3E

1E3F

1E4E

1E4F

1E5E

1E5F

1E6E

1E6F

1E7E

1E7F

1E8E

1E8F

1EAE

1EAF

1EBE

1EBF

1ECE

1ECF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

1EDE

1EDF

1EEE

1EEF

153

1E00

Latin Extended Additional

In this block the names "WITH LINE BELOW" refer to a macron below the letter.

Latin general use extensions 1E00

1E08

Ḁ LATIN CAPITAL LETTER A WITH RING BELOW ≡ 0041 A 0325 . ḁ LATIN SMALL LETTER A WITH RING BELOW ≡ 0061 a 0325 . Ḃ LATIN CAPITAL LETTER B WITH DOT ABOVE ≡ 0042 B 0307 ḃ LATIN SMALL LETTER B WITH DOT ABOVE • Irish Gaelic (old orthography) ≡ 0062 b 0307 Ḅ LATIN CAPITAL LETTER B WITH DOT BELOW ≡ 0042 B 0323 ḅ LATIN SMALL LETTER B WITH DOT BELOW ≡ 0062 b 0323 4 LATIN CAPITAL LETTER B WITH LINE BELOW ≡ 0042 B 0331 , 5 LATIN SMALL LETTER B WITH LINE BELOW ≡ 0062 b 0331 , Ḉ LATIN CAPITAL LETTER C WITH CEDILLA AND

1E09

ḉ

1E01 1E02 1E03 1E04 1E05 1E06 1E07

1E0A 1E0B 1E0C 1E0D 1E0E 1E0F 1E10 1E11 1E12 1E13 1E14 1E15

ACUTE

≡ 00C7 Ç 0301 % LATIN SMALL LETTER C WITH CEDILLA AND ACUTE

≡ 00E7 ç 0301 % Ḋ LATIN CAPITAL LETTER D WITH DOT ABOVE ≡ 0044 D 0307 ḋ LATIN SMALL LETTER D WITH DOT ABOVE • Irish Gaelic (old orthography) ≡ 0064 d 0307 Ḍ LATIN CAPITAL LETTER D WITH DOT BELOW ≡ 0044 D 0323 ḍ LATIN SMALL LETTER D WITH DOT BELOW • Indic transliteration ≡ 0064 d 0323 < LATIN CAPITAL LETTER D WITH LINE BELOW ≡ 0044 D 0331 , = LATIN SMALL LETTER D WITH LINE BELOW ≡ 0064 d 0331 , Ḑ LATIN CAPITAL LETTER D WITH CEDILLA ≡ 0044 D 0327 ḑ LATIN SMALL LETTER D WITH CEDILLA • Livonian ≡ 0064 d 0327 Ḓ LATIN CAPITAL LETTER D WITH CIRCUMFLEX

1E1D

≡ 0228 0306 ḝ

LATIN SMALL LETTER E WITH CEDILLA AND BREVE

1E2A

≡ 0229 0306 Ḟ LATIN CAPITAL LETTER F WITH DOT ABOVE ≡ 0046 F 0307 ḟ LATIN SMALL LETTER F WITH DOT ABOVE • Irish Gaelic (old orthography) ≡ 0066 f 0307 Ḡ LATIN CAPITAL LETTER G WITH MACRON ≡ 0047 G 0304 ḡ LATIN SMALL LETTER G WITH MACRON ≡ 0067 g 0304 Ḣ LATIN CAPITAL LETTER H WITH DOT ABOVE ≡ 0048 H 0307 ḣ LATIN SMALL LETTER H WITH DOT ABOVE ≡ 0068 h 0307 Ḥ LATIN CAPITAL LETTER H WITH DOT BELOW ≡ 0048 H 0323 ḥ LATIN SMALL LETTER H WITH DOT BELOW • Indic transliteration ≡ 0068 h 0323 Ḧ LATIN CAPITAL LETTER H WITH DIAERESIS ≡ 0048 H 0308 ḧ LATIN SMALL LETTER H WITH DIAERESIS ≡ 0068 h 0308 Ḩ LATIN CAPITAL LETTER H WITH CEDILLA ≡ 0048 H 0327 ḩ LATIN SMALL LETTER H WITH CEDILLA ≡ 0068 h 0327 Ḫ LATIN CAPITAL LETTER H WITH BREVE

1E2B

ḫ

1E1E 1E1F 1E20 1E21 1E22 1E23 1E24 1E25 1E26 1E27 1E28 1E29

BELOW

≡ 0048 H 032E LATIN SMALL LETTER H WITH BREVE BELOW

• Semitic transliteration ≡ 0068 h 032E ḭ

1E2E

Ḯ

1E2F LATIN CAPITAL LETTER E WITH MACRON AND

ḯ

≡ 0112 Ē 0300 M

≡ 00EF ï 0301 % Ḱ LATIN CAPITAL LETTER K WITH ACUTE ≡ 004B K 0301 % ḱ LATIN SMALL LETTER K WITH ACUTE • Macedonian transliteration ≡ 006B k 0301 % Ḳ LATIN CAPITAL LETTER K WITH DOT BELOW ≡ 004B K 0323 ḳ LATIN SMALL LETTER K WITH DOT BELOW ≡ 006B k 0323 + LATIN CAPITAL LETTER K WITH LINE BELOW ≡ 004B K 0331 ,

LATIN SMALL LETTER D WITH CIRCUMFLEX BELOW GRAVE

LATIN SMALL LETTER E WITH MACRON AND GRAVE

≡ 0113 ē 0300 M Ḗ LATIN CAPITAL LETTER E WITH MACRON AND

1E17

ḗ

ACUTE

≡ 0112 Ē 0301 % LATIN SMALL LETTER E WITH MACRON AND ACUTE

≡ 0113 ē 0301 % Ḙ LATIN CAPITAL LETTER E WITH CIRCUMFLEX BELOW

≡ 0045 E 032D

154

BREVE

1E2D

1E16

1E18

≡ 0065 e 032D Ḛ LATIN CAPITAL LETTER E WITH TILDE BELOW ≡ 0045 E 0330 1E1B ḛ LATIN SMALL LETTER E WITH TILDE BELOW ≡ 0065 e 0330 1E1C Ḝ LATIN CAPITAL LETTER E WITH CEDILLA AND 1E1A

≡ 0064 d 032D

ḕ

LATIN SMALL LETTER E WITH CIRCUMFLEX BELOW

Ḭ

≡ 0044 D 032D

Ḕ

ḙ

1E2C

BELOW

ḓ

1E19

1E34

1E30 1E31 1E32 1E33 1E34

LATIN CAPITAL LETTER I WITH TILDE BELOW

≡ 0049 I 0330 LATIN SMALL LETTER I WITH TILDE BELOW

≡ 0069 i 0330 LATIN CAPITAL LETTER I WITH DIAERESIS AND ACUTE

≡ 00CF Ï 0301 % LATIN SMALL LETTER I WITH DIAERESIS AND ACUTE

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Greek Extended Range: 1F00–1FFF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

1F00 1F0

Greek Extended 1F1

1F00

1F4

1F5

1F6

1F7

1F8

1F9 1FA 1FB 1FC 1FD 1FE 1FF

1F10

1F20

1F30

1F40

1F50

1F60

1F70

1F80

1F90

1FA0

1FB0

1FC0

1FD0

1FE0

ἁ ἑ ἡ ἱ ὁ ὑ ὡ ά ᾁ ᾑ ᾡ ᾱ ῁ ῑ ῡ

1

1F01

1F11

1F21

1F31

1F41

1F51

1F61

1F71

1F81

1F91

1FA1

1FB1

1FC1

1FD1

1FE1

ἂ ἒ ἢ ἲ ὂ ὒ ὢ ὲ ᾂ ᾒ ᾢ ᾲ ῂ ῒ ῢ ῲ

2

1F02

1F12

1F22

1F32

1F42

1F52

1F62

1F72

1F82

1F92

1FA2

1FB2

1FC2

1FD2

1FE2

1FF2

ἃ ἓ ἣ ἳ ὃ ὓ ὣ έ ᾃ ᾓ ᾣ ᾳ ῃ ΐ ΰ ῳ

3

1F03

1F13

1F23

1F33

1F43

1F53

1F63

1F73

1F83

1F93

1FA3

1FB3

1FC3

1FD3

1F04

1F14

1F24

1F34

1F44

1F54

1F64

1F74

1F84

1F94

1FA4

1FB4

1FE4

1FC4

1F05

6

1F15

1F35

ἆ

ἦ ἶ

ἇ

ἧ ἷ

1F26

1F06

7

1F25

1F27

1F07

1F45

1F55

1F08

1F18

1F28

1F85

1F95

1FE5

1FA5

1F66

1F76

1F86

1F96

1FA6

1FB6

1FC6

1FD6

1FE6

1FF6

ὗ ὧ ί ᾇ ᾗ ᾧ ᾷ ῇ ῗ ῧ ῷ

1F57

1F37

1F38

1F75

1FF4

ὖ ὦ ὶ ᾆ ᾖ ᾦ ᾶ ῆ ῖ ῦ ῶ

1F56

1F36

1F67

1F77

1F87

1F97

1FA7

1FB7

1FC7

1FD7

1FE7

1FF7

Ὠ ὸ ᾈᾘᾨ Ᾰ Ὲ Ῐ Ῠ Ὸ

Ἀ ἘἨ Ἰ Ὀ

8

1F65

1FF3

ῥ

ἅ ἕ ἥ ἵ ὅ ὕ ὥ ή ᾅ ᾕ ᾥ

5

1FE3

ῤ ῴ

ἄ ἔ ἤ ἴ ὄ ὔ ὤ ὴ ᾄ ᾔ ᾤ ᾴ ῄ

4

1F48

1F68

1F78

1F88

1F98

1FA8

1FB8

1FC8

1FD8

1FE8

1FF8

Ἁ Ἑ Ἡ Ἱ Ὁ Ὑ Ὡ ό ᾉᾙᾩ Ᾱ Έ Ῑ Ῡ Ό

9

1F09

1F19

1F29

1F39

1F49

1F59

1F0A

1F1A

1F2A

1F3A

1F69

1F79

1F89

1F99

1FA9

1FB9

1FC9

1FD9

1FE9

1FF9

Ὢ ὺ ᾊ ᾚᾪ Ὰ Ὴ Ὶ Ὺ Ὼ

Ἂ ἚἪ Ἲ Ὂ

A

1F4A

1F6A

1F7A

1F8A

1F9A

1FAA

1FBA

1FCA

1FDA

1FEA

1FFA

Ἃ Ἓ Ἣ Ἳ Ὃ Ὓ Ὣ ύ ᾋ ᾛᾫ Ά Ή Ί Ύ Ώ

B

1F0B

C

D

1F1B

1F2B

1F3B

1F4B

1F5B

1F1C

1F2C

1F3C

1F6B

1F7B

1F8B

1F9B

1FAB

1FBB

1FCB

1FDB

1F4C

1F6C

1F7C

1F8C

1F9C

1FAC

1FBC

1FEB

1FFB

Ῥ ῼ

Ὤ ὼ ᾌ ᾜᾬ ᾼ ῌ

Ἄ ἜἬ Ἴ Ὄ 1F0C

1FEC

1FCC

1FFC

Ἅ Ἕ Ἥ Ἵ Ὅ Ὕ Ὥ ώ ᾍ ᾝᾭ ᾽ ῍ ῝ ῭ ´ 1F0D

1F1D

1F2D

1F3D

1F4D

1F5D

1F6D

Ἆ

ἮἾ

Ὦ

Ἇ

ἯἿ

ὟὯ

1F0E

F

1F3

ἀ ἐ ἠ ἰ ὀ ὐ ὠ ὰ ᾀ ᾐ ᾠ ᾰ ῀ ῐ ῠ

0

E

1F2

1FFF

1F0F

1F2E

1F2F

1F6E

1F3E

1F3F

1F5F

1F6F

1F7D

1F8D

1F9D

1FAD

1FBD

1FCD

1FDD

1FED

1FFD

ᾎ ᾞᾮ ι ῎ ῞ ΅ ῾ 1F8E

1F9E

1FAE

1FBE

1FCE

1FDE

1FEE

1FFE

ᾏ ᾟᾯ ᾿ ῏ ῟ ` 1F8F

1F9F

1FAF

1FBF

1FCF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

1FDF

1FEF

159

1F00

Greek Extended

Precomposed polytonic Greek 1F00 1F01 1F02 1F03 1F04

ἀ GREEK SMALL LETTER ALPHA WITH PSILI ≡ 03B1 α 0313 ἁ GREEK SMALL LETTER ALPHA WITH DASIA ≡ 03B1 α 0314 ἂ GREEK SMALL LETTER ALPHA WITH PSILI AND VARIA

≡ 1F00 ἀ 0300

ἃ GREEK SMALL LETTER ALPHA WITH DASIA AND VARIA

≡ 1F01 ἁ 0300

ἄ GREEK SMALL LETTER ALPHA WITH PSILI AND OXIA

≡ 1F00 ἀ 0301

1F05

ἅ GREEK SMALL LETTER ALPHA WITH DASIA

1F06

≡ 1F01 ἁ 0301 ἆ GREEK SMALL LETTER ALPHA WITH PSILI

1F07

≡ 1F00 ἀ 0342 ἇ GREEK SMALL LETTER ALPHA WITH DASIA

AND OXIA

AND PERISPOMENI

≡ 1F01 ἁ 0342 Ἀ GREEK CAPITAL LETTER ALPHA WITH PSILI ≡ 0391 Α 0313 1F09 Ἁ GREEK CAPITAL LETTER ALPHA WITH DASIA ≡ 0391 Α 0314 1F0A Ἂ GREEK CAPITAL LETTER ALPHA WITH PSILI AND VARIA

≡ 1F08 Ἀ 0300 1F0B Ἃ GREEK CAPITAL LETTER ALPHA WITH DASIA AND VARIA

≡ 1F09 Ἁ 0300 1F0C Ἄ GREEK CAPITAL LETTER ALPHA WITH PSILI AND OXIA

≡ 1F08 Ἀ 0301 1F0D Ἅ GREEK CAPITAL LETTER ALPHA WITH DASIA AND OXIA

≡ 1F09 Ἁ 0301 1F0E Ἆ GREEK CAPITAL LETTER ALPHA WITH PSILI AND PERISPOMENI

1F0F

≡ 1F08 Ἀ 0342 Ἇ GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI

1F10

ἐ

1F11

ἑ

1F12

ἒ

1F13

ἓ

1F14

ἔ

1F15

ἕ

≡ 1F09 Ἁ 0342

GREEK SMALL LETTER EPSILON WITH PSILI

≡ 03B5 ε 0313

GREEK SMALL LETTER EPSILON WITH DASIA

≡ 03B5 ε 0314

GREEK SMALL LETTER EPSILON WITH PSILI AND VARIA

≡ 1F10 ἐ 0300

GREEK SMALL LETTER EPSILON WITH DASIA AND VARIA

≡ 1F11 ἑ 0300

GREEK SMALL LETTER EPSILON WITH PSILI AND OXIA

≡ 1F10 ἐ 0301

GREEK SMALL LETTER EPSILON WITH DASIA AND OXIA

≡ 1F11 ἑ 0301

1F16 " 1F17 " 1F18 Ἐ GREEK CAPITAL LETTER EPSILON WITH PSILI ≡ 0395 Ε 0313

160

Ἑ GREEK CAPITAL LETTER EPSILON WITH DASIA ≡ 0395 Ε 0314 1F1A Ἒ GREEK CAPITAL LETTER EPSILON WITH PSILI 1F19

AND VARIA

≡ 1F18 Ἐ 0300 1F1B Ἓ GREEK CAPITAL LETTER EPSILON WITH DASIA AND VARIA

≡ 1F19 Ἑ 0300 1F1C Ἔ GREEK CAPITAL LETTER EPSILON WITH PSILI AND OXIA

≡ 1F18 Ἐ 0301 1F1D Ἕ GREEK CAPITAL LETTER EPSILON WITH DASIA AND OXIA

≡ 1F19 Ἑ 0301 1F1E " 1F1F " 1F20 ἠ GREEK SMALL LETTER ETA WITH PSILI ≡ 03B7 η 0313 1F21 ἡ GREEK SMALL LETTER ETA WITH DASIA ≡ 03B7 η 0314 1F22 ἢ GREEK SMALL LETTER ETA WITH PSILI AND VARIA

AND PERISPOMENI

1F08

1F31

1F23

≡ 1F20 ἠ 0300 ἣ GREEK SMALL LETTER ETA WITH DASIA AND

1F24

≡ 1F21 ἡ 0300 ἤ GREEK SMALL LETTER ETA WITH PSILI AND

1F25

≡ 1F20 ἠ 0301 ἥ GREEK SMALL LETTER ETA WITH DASIA AND

1F26

≡ 1F21 ἡ 0301 ἦ GREEK SMALL LETTER ETA WITH PSILI AND

1F27

≡ 1F20 ἠ 0342 ἧ GREEK SMALL LETTER ETA WITH DASIA AND

VARIA

OXIA

OXIA

PERISPOMENI

PERISPOMENI

≡ 1F21 ἡ 0342 Ἠ GREEK CAPITAL LETTER ETA WITH PSILI ≡ 0397 Η 0313 1F29 Ἡ GREEK CAPITAL LETTER ETA WITH DASIA ≡ 0397 Η 0314 1F2A Ἢ GREEK CAPITAL LETTER ETA WITH PSILI AND

1F28

VARIA

≡ 1F28 Ἠ 0300 1F2B Ἣ GREEK CAPITAL LETTER ETA WITH DASIA AND VARIA

≡ 1F29 Ἡ 0300 1F2C Ἤ GREEK CAPITAL LETTER ETA WITH PSILI AND OXIA

≡ 1F28 Ἠ 0301 1F2D Ἥ GREEK CAPITAL LETTER ETA WITH DASIA AND OXIA

≡ 1F29 Ἡ 0301 1F2E Ἦ GREEK CAPITAL LETTER ETA WITH PSILI AND PERISPOMENI

≡ 1F28 Ἠ 0342 1F2F Ἧ GREEK CAPITAL LETTER ETA WITH DASIA AND PERISPOMENI

1F30

ἰ

1F31

ἱ

≡ 1F29 Ἡ 0342

GREEK SMALL LETTER IOTA WITH PSILI

≡ 03B9 ι 0313

GREEK SMALL LETTER IOTA WITH DASIA

≡ 03B9 ι 0314

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

1F32 1F32 1F33 1F34

ἲ ἳ ἴ

1F35

ἵ

1F36

ἶ

1F37

ἷ

Greek Extended GREEK SMALL LETTER IOTA WITH PSILI AND VARIA

≡ 1F30 ἰ 0300

GREEK SMALL LETTER IOTA WITH DASIA AND VARIA

≡ 1F31 ἱ 0300

GREEK SMALL LETTER IOTA WITH PSILI AND OXIA

1F3A

Ἲ GREEK CAPITAL LETTER IOTA WITH PSILI AND

GREEK CAPITAL LETTER IOTA WITH PSILI

≡ 0399 Ι 0313 ≡ 0399 Ι 0314

DASIA AND OXIA

AND VARIA

1F53

ὓ

1F54

ὔ

1F55

ὕ

1F56

ὖ

1F57

ὗ

VARIA

≡ 1F38 Ἰ 0300

Ἳ GREEK CAPITAL LETTER IOTA WITH DASIA AND VARIA

≡ 1F39 Ἱ 0300

Ἴ GREEK CAPITAL LETTER IOTA WITH PSILI AND OXIA

≡ 1F38 Ἰ 0301

Ἵ GREEK CAPITAL LETTER IOTA WITH DASIA

≡ 1F50 ὐ 0300 GREEK SMALL LETTER UPSILON WITH DASIA AND VARIA

≡ 1F51 ὑ 0300

GREEK SMALL LETTER UPSILON WITH PSILI AND OXIA

≡ 1F50 ὐ 0301

GREEK SMALL LETTER UPSILON WITH DASIA AND OXIA

≡ 1F51 ὑ 0301

GREEK SMALL LETTER UPSILON WITH PSILI AND PERISPOMENI

≡ 1F50 ὐ 0342

GREEK SMALL LETTER UPSILON WITH DASIA AND PERISPOMENI

≡ 1F51 ὑ 0342 1F58 " GREEK CAPITAL LETTER IOTA WITH PSILI AND 1F59 Ὑ GREEK CAPITAL LETTER UPSILON WITH AND OXIA

Ἶ

≡ 1F39 Ἱ 0301 PERISPOMENI

≡ 1F38 Ἰ 0342

Ἷ GREEK CAPITAL LETTER IOTA WITH DASIA

1F40

ὀ

1F41

ὁ

1F42

ὂ

1F43

ὃ

1F45

≡ 1F48 Ὀ 0301 1F4D Ὅ GREEK CAPITAL LETTER OMICRON WITH

GREEK CAPITAL LETTER IOTA WITH DASIA

1F3F

1F44

AND OXIA

≡ 1F31 ἱ 0342

Ἱ

1F3E

≡ 1F49 Ὁ 0300 1F4C Ὄ GREEK CAPITAL LETTER OMICRON WITH PSILI

≡ 1F49 Ὁ 0301 1F4E " 1F4F " ≡ 1F31 ἱ 0301 1F50 ὐ GREEK SMALL LETTER UPSILON WITH PSILI GREEK SMALL LETTER IOTA WITH PSILI AND ≡ 03C5 υ 0313 PERISPOMENI 1F51 ὑ GREEK SMALL LETTER UPSILON WITH DASIA ≡ 1F30 ἰ 0342 ≡ 03C5 υ 0314 GREEK SMALL LETTER IOTA WITH DASIA AND 1F52 ὒ GREEK SMALL LETTER UPSILON WITH PSILI PERISPOMENI

1F39

1F3D

DASIA AND VARIA

GREEK SMALL LETTER IOTA WITH DASIA AND OXIA

Ἰ

1F3C

1F4B Ὃ GREEK CAPITAL LETTER OMICRON WITH

≡ 1F30 ἰ 0301

1F38

1F3B

1F65

AND PERISPOMENI

ὄ ὅ

≡ 1F39 Ἱ 0342

GREEK SMALL LETTER OMICRON WITH PSILI

≡ 03BF ο 0313

GREEK SMALL LETTER OMICRON WITH DASIA

≡ 03BF ο 0314

GREEK SMALL LETTER OMICRON WITH PSILI AND VARIA

≡ 1F40 ὀ 0300

GREEK SMALL LETTER OMICRON WITH DASIA AND VARIA

DASIA

≡ 03A5 Υ 0314 1F5A " 1F5B Ὓ GREEK CAPITAL LETTER UPSILON WITH DASIA AND VARIA

≡ 1F59 Ὑ 0300 1F5C " 1F5D Ὕ GREEK CAPITAL LETTER UPSILON WITH DASIA AND OXIA

≡ 1F59 Ὑ 0301 1F5E " 1F5F Ὗ GREEK CAPITAL LETTER UPSILON WITH DASIA AND PERISPOMENI

≡ 1F41 ὁ 0300

1F60

GREEK SMALL LETTER OMICRON WITH PSILI AND OXIA

1F61

≡ 1F40 ὀ 0301

GREEK SMALL LETTER OMICRON WITH DASIA AND OXIA

1F62

≡ 1F41 ὁ 0301

1F63 1F46 " 1F47 " 1F48 Ὀ GREEK CAPITAL LETTER OMICRON WITH PSILI 1F64 ≡ 039F Ο 0313 1F49 Ὁ GREEK CAPITAL LETTER OMICRON WITH DASIA

≡ 039F Ο 0314

1F4A Ὂ GREEK CAPITAL LETTER OMICRON WITH PSILI AND VARIA

≡ 1F48 Ὀ 0300

1F65

≡ 1F59 Ὑ 0342 ὠ GREEK SMALL LETTER OMEGA WITH PSILI ≡ 03C9 ω 0313 ὡ GREEK SMALL LETTER OMEGA WITH DASIA ≡ 03C9 ω 0314 ὢ GREEK SMALL LETTER OMEGA WITH PSILI AND VARIA

≡ 1F60 ὠ 0300 ὣ GREEK SMALL LETTER OMEGA WITH DASIA AND VARIA

≡ 1F61 ὡ 0300 ὤ GREEK SMALL LETTER OMEGA WITH PSILI AND OXIA

≡ 1F60 ὠ 0301 ὥ GREEK SMALL LETTER OMEGA WITH DASIA AND OXIA

≡ 1F61 ὡ 0301

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

161

General Punctuation Range: 2000–206F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

2000

General Punctuation 200 0

2000

1

2024

2025

2016

2017

2026

2027

2018

2028

2019

2029

201A

202A

2050

2060

2051

2061

2033

2034

2035

2036

2042

2052

2062

˛

2043

2053

2063

⁄ ˇ

2044

2054

m

2045

2055

–

2046

2056

2037

2038

2039

203A

2047

2057

— 2048

2049

ô

204A

2058

“ 2059

205A

206A

201B

202B

203B

204B

205B

206B

201C

202C

203C

204C

201D

202D

203D

204D

201E

202E

203E

204E

205C

206C

Û #

205D

206D

Ù $

205E

206E

% 200F

166

2023

2041

„ ¯ 200E

F

2040

” # 200D

E

2015

206

“ " ⸏ " 200C

D

2014

205

! Ò ! 200B

C

204

⁂

2032

‚ › 200A

B

•

2022

2031

’ ‹ 2009

A

2021

‘ 2008

9

2013

‗ 2007

8

2030

… 2006

7

2020

2005

6

203

† ‰

— 2004

5

2012

– 2003

4

2011

‒ 2002

3

2010

202

‡ 2001

2

201

206F

201F

202F

203F

204F

205F

206F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

2000

General Punctuation

For additional general punctuation characters see also Basic Latin, Latin-1, Supplemental Punctuation and CJK Symbols and Punctuation.

Spaces

EN QUAD ≡ 2002 en space 2001 EM QUAD 2000

2002

2003

2004 2005 2006

= mutton quad ≡ 2003 em space

EN SPACE

= nut • half an em 0020 space

EM SPACE

= mutton • nominally, a space equal to the type size in points • may scale by the condensation factor of a font 0020 space

THREE-PER-EM SPACE

Dashes 2010

-

General punctuation 2016

2017

‗

2018

‘

SIX-PER-EM SPACE • in computer typography sometimes equated to FIGURE SPACE • space equal to tabular width of a font • this is equivalent to the digit width of fonts with fixed-width digits

<noBreak> 0020

PUNCTUATION SPACE • space equal to narrow punctuation of a font 0020 space 2009 THIN SPACE • a fifth of an em (or sometimes a sixth) 0020 space 200A HAIR SPACE • thinner than a thin space • in traditional typography, the thinnest space 2008

2019

’

available

0020 space

200B ZERO WIDTH SPACE 201A • commonly abbreviated ZWSP • this character is intended for line break control; it has no width, but its presence between two characters does not prevent increased letter 201B spacing in justification

‚

Format characters

200C ZERO WIDTH NON-JOINER • commonly abbreviated ZWNJ 200D ZERO WIDTH JOINER • commonly abbreviated ZWJ 200E LEFT-TO-RIGHT MARK • commonly abbreviated LRM 200F RIGHT-TO-LEFT MARK • commonly abbreviated RLM

DOUBLE VERTICAL LINE

• used in pairs to indicate norm of a matrix → 20E6 combining double vertical stroke overlay

thin space

2007

→ 002D - hyphen-minus → 00AD soft hyphen

mark 2015 HORIZONTAL BAR = quotation dash • long dash introducing quoted text

FOUR-PER-EM SPACE

0020 space

HYPHEN

NON-BREAKING HYPHEN → 002D - hyphen-minus → 00AD soft hyphen <noBreak> 2010 2012 ‒ FIGURE DASH 2013 – EN DASH 2014 — EM DASH • may be used in pairs to offset parenthetical text → 30FC ー katakana-hiragana prolonged sound 2011

= thick space 0020 space = mid space 0020 space

201C

201C

“

→ 2225 parallel to DOUBLE LOW LINE

• this is a spacing character → 005F _ low line → 0333 ã combining double low line 0020 0333 ã LEFT SINGLE QUOTATION MARK

= single turned comma quotation mark • this is the preferred character (as opposed to 201B ) → 0027 ' apostrophe → 02BB modifier letter turned comma → 275B heavy single turned comma quotation mark ornament RIGHT SINGLE QUOTATION MARK

= single comma quotation mark • this is the preferred character to use for apostrophe → 0027 ' apostrophe → 02BC modifier letter apostrophe → 275C heavy single comma quotation mark ornament SINGLE LOW-9 QUOTATION MARK

= low single comma quotation mark • used as opening single quotation mark in some languages SINGLE HIGH-REVERSED-9 QUOTATION MARK

= single reversed comma quotation mark • has same semantic as 2018 ‘ , but differs in appearance → 02BD modifier letter reversed comma LEFT DOUBLE QUOTATION MARK

= double turned comma quotation mark • this is the preferred character (as opposed to 201F ) → 0022 " quotation mark → 275D heavy double turned comma quotation mark ornament → 301D reversed double prime quotation mark

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

167

Superscripts and Subscripts Range: 2070–209F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Currency Symbols Range: 20A0–20CF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Combining Diacritical Marks for Symbols Range: 20D0–20FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Letterlike Symbols Range: 2100–214F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

2100

Letterlike Symbols 210 0

A

B

2122

2132

2142

2113

2123

2133

2143

2114

2124

2134

2144

2115

2125

2135

2145

2116

2126

2136

2146

2117

2127

2137

2147

2118

2128

2138

2148

2129

2139

2149

210A

211A

212A

213A

214A

Å 211B

212B

213B

214B

ˇ Í 211C

212C

213C

214C

å 211D

212D

213D

214D

℞ ℮ Æ

210E

F

2141

2119

210D

E

2131

2109

210C

D

2112

210B

C

2121

℘ ? 2108

9

2111

℧ > 2107

8

2140

№ Ω = 2106

7

2130

℅ ℵ 2105

6

2120

2104

5

2110

2103

4

214

™ 2102

3

213

2101

2

212

2100

1

211

214F

211E

212E

213E

214E

210F

211F

212F

213F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

177

Number Forms Range: 2150–218F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Arrows Range: 2190–21FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

2190

Arrows 219 0

↑

2191

2

2197

8

2198

9

F

21B2

21B3

21A4

21A5

21B4

21C1

21D1

21F0

⇡

c

21E1

21F1

4 ⇒ ⇢ d

21C2

5

21C3

21D2

⇓

21D3

21E2

⇣

21E3

21F2

e

21F3

21C4

21D4

21E4

21F4

↵ 7 G W

21B5

21A6

21B6

21C5

21D5

21E5

21F5

21C6

21D6

21E6

21F6

) 9 I Y

21A7

21A8

21B7

21C7

21D7

21E7

21F7

* : J Z

21B8

21A9

21B9

21AA

21BA

21AB

21BB

21AC

21BC

21AD

21BD

219E

21AE

21BE

!

219F

182

↗ 3 ⇑

21B1

21E0

21C8

21D8

21E8

21F8

21C9

21D9

21E9

21F9

21CA

21DA

21EA

21FA

21CB

21DB

21EB

21FB

21CC

21DC

21EC

21FC

/ ? O _ 219D

E

21D0

. > N ^

219C

D

21A3

21C0

- = M ] 219B

C

21F

, < L ⇪ 219A

B

21E

+ ; K [ 2199

A

21D

( 8 H X 2196

7

21A2

2195

6

21C

↔ & 6 ⇔ V 2194

5

21A1

21B0

↓ ↘

2193

4

21A0

→ ↙ 2192

3

21B

← ↖ 2 ⇐ ⇠ b 2190

1

21A

21FF

21AF

21CD

21DD

0 @ ⇞ 21CE

21DE

1 A ⇟

21BF

21CF

21DF

21ED

21FD

`

21EE

21FE

a

21EF

21FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Mathematical Operators Range: 2200–22FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

2200 220

2200

222

223

224

225

226

227

228

229

22A 22B 22C 22D 22E

22F

2210

2220

2230

2240

2250

2260

2270

2280

2290

22A0

22B0

22C0

22D0

22E0

22F0

∑ ! 8 G ≡ g ⊁ ¦ ± ¼ Ì Ü

1

2201

2211

2221

2231

2241

2251

2261

2271

2281

2291

22A1

22B1

22C1

22D1

22E1

22F1

∂ − " 9 ≒ X h ⊂ § ² ½ Í

2

2202

2212

2222

2232

2242

2252

2262

2272

2282

2292

22A2

22B2

22C2

22D2

22E2

22F2

∃ ∓ ∣ : I Y i ⊃ ⊣ ¨ ³ ¾ Î

3

2203

2213

2223

2233

2243

2253

2263

2273

2283

2293

22A3

22B3

22C3

22D3

22E3

22F3

$ ∴ ; J ≤ j x ⊤ © ´ ¿ Ï

4

2204

2214

2224

2234

2244

2254

2264

2274

2284

2294

22A4

22B4

22C4

22D4

22E4

22F4

∅ % ∵ ≈ K ≥ k y ⊕ ⊥ ª ⋅ À Ð

5

2205

2215

2225

2235

2245

2255

2265

2275

2285

2295

22A5

22B5

22C5

22D5

22E5

22F5

∆ & ∶ = L ≦ l ⊆ ¶ Á Ñ

6

2206

2216

2226

2236

2246

2256

2266

2276

2286

2296

22A6

22B6

22C6

22D6

22E6

22F6

∇ ∗ ∧ ∷ > M ≧ m ⊇ ⊗ · Â Ò

7

2207

2217

2227

2237

2247

2257

2267

2277

2287

2297

22A7

22B7

22C7

22D7

22E7

22F7

∈ ∨ / ? N ^ | Ã Ó

8

2208

2218

2228

2238

2248

2258

2268

2278

2288

2298

22A8

22B8

22C8

22D8

22E8

22F8

∉ ∩ 0 @ O _ } « Ä Ô

9

2209

2219

2229

2239

2249

2259

2269

2279

2289

2299

22A9

22B9

22C9

22D9

22E9

22F9

√ ∪ 1 A P ` ≺ ⊊ ¬ Å Õ

A

220A

221A

222A

223A

224A

225A

226A

227A

228A

229A

22AA

22BA

22CA

22DA

22EA

22FA

∋ ∫ 2 B Q a ≻ ⊋ Æ Ö

B

220B

C

D

221B

222B

223B

224B

225B

226B

227B

228B

229B

22AB

22BB

22CB

22DB

22EB

22FB

∼ ≌ R b p ¡ ® Ç × 220C

221C

222C

223C

224C

225C

226C

227C

228C

229C

22AC

22BC

22CC

22DC

22EC

22FC

∝ 4 C S c q ¢ ¯ ¸ È Ø

220D

221D

222D

223D

224D

225D

226D

227D

228D

229D

22AD

22BD

22CD

22DD

22ED

22FD

∞ ∮ 5 D T ≮ r £ ¹ É Ù

220E

F

221

22FF

∀ ∠ 7 F ≠ f ⊀ ¥ ° » Ë Û

0

E

Mathematical Operators

221E

222E

223E

224E

225E

226E

227E

228E

229E

22AE

22BE

22CE

22DE

22EE

22FE

∏ ∟ 6 E U ≯ s ¤ ⊿ º Ê Ú 220F

221F

222F

223F

224F

225F

226F

227F

228F

229F

22AF

22BF

22CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

22DF

22EF

22FF

185

2200

Mathematical Operators

Miscellaneous mathematical symbols

2213

2200

2214 2215

2201 2202 2203 2204 2205

2206

2207

∀ FOR ALL )

= universal quantifier COMPLEMENT

→ 0297 ʗ latin letter stretched c ∂ PARTIAL DIFFERENTIAL ∃ THERE EXISTS = existential quantifier

. THERE DOES NOT EXIST ≡ 2203 ∃ 0338 ∅ EMPTY SET

= null set • used in linguistics to indicate a null morpheme or phonological “zero” → 00D8 Ø latin capital letter o with stroke → 2300 1 diameter sign

∆ INCREMENT

= Laplace operator = forward difference = symmetric difference of sets → 0394 Δ greek capital letter delta → 25B3 3 white up-pointing triangle

∇ NABLA

= backward difference = gradient, del • used for Laplacian operator (written with superscript 2) → 25BD 5 white down-pointing triangle

Set membership 2208 2209 220A

220B

∈ ELEMENT OF ∉ NOT AN ELEMENT OF ≡ 2208 ∈ 0338 8 SMALL ELEMENT OF • originates in math pi fonts; not the straight epsilon → 03F5 ϵ greek lunate epsilon symbol

∋ CONTAINS AS MEMBER

2216 2217 2218

∓ MINUS-OR-PLUS SIGN → 00B1 ± plus-minus sign DOT PLUS DIVISION SLASH • generic division operator → 002F / solidus → 2044 ⁄ fraction slash SET MINUS → 005C \ reverse solidus ∗ ASTERISK OPERATOR → 002A * asterisk RING OPERATOR

221A

√ SQUARE ROOT

Miscellaneous mathematical symbols 221E 221F 2220 2221 2222

∞ ∟ ∠

2223

DIVIDES

DOES NOT DIVIDE

Miscellaneous mathematical symbol

2225

2226

END OF PROOF

2210 2211

∏ N-ARY PRODUCT

@ N-ARY COPRODUCT = coproduct sign

186

2228

∑ N-ARY SUMMATION

= summation sign → 03A3 Σ greek capital letter sigma → 2140 double-struck n-ary summation

2212

2227

= product sign → 03A0 Π greek capital letter pi

Operators

= such that = APL stile → 007C | vertical line → 01C0 latin letter dental click

≡ 2223 ∣ 0338 PARALLEL TO

→ 01C1 latin letter lateral click → 2016 double vertical line NOT PARALLEL TO

≡ 2225 0338

Logical and set operators

N-ary operators 220F

= angle arc

∣

= q.e.d. → 2023 = triangular bullet → 25AE > black vertical rectangle

INFINITY RIGHT ANGLE ANGLE MEASURED ANGLE SPHERICAL ANGLE

Operators

2224

<

→ 00B7 · middle dot → 2022 • bullet → 2024 one dot leader

CUBE ROOT

FOURTH ROOT ∝ PROPORTIONAL TO → 03B1 α greek small letter alpha

: DOES NOT CONTAIN AS MEMBER ≡ 220B ∋ 0338 220D ; SMALL CONTAINS AS MEMBER → 03F6 ϶ greek reversed lunate epsilon symbol

220E

BULLET OPERATOR

= radical sign → 2713 ˉ check mark

= such that

220C

= composite function = APL jot → 00B0 ° degree sign → 25E6 white bullet

2219

221B 221C 221D

222A

− MINUS SIGN → 002D - hyphen-minus

2229 222A

∧ LOGICAL AND

= wedge, conjunction → 22C0 n-ary logical and → 2303 up arrowhead

∨ LOGICAL OR

= vee, disjunction → 22C1 " n-ary logical or → 2304 # down arrowhead

∩ INTERSECTION

= cap, hat → 22C2 % n-ary intersection

∪ UNION

= cup → 22C3 ' n-ary union

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Miscellaneous Technical Range: 2300–23FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

2300 230

2300

232

233

234

235

236

237

238

239

23A 23B 23C 23D 23E

23F

2310

2320

2330

2340

2350

2360

2370

2380

2390

23A0

23B0

23C0

23D0

23E0

⌡ 1 A Q a q - = ⚒

1

2301

2311

2321

2331

2341

2351

2361

2371

2381

2391

23A1

23B1

23C1

23D1

23E1

⌂ ⌒ $ 2 B R b r . > ⚓

2

2302

2312

2322

2332

2342

2352

2362

2372

2382

2392

23A2

23B2

23C2

23D2

23E2

% 3 C S c s / ? ⚔

3

2303

2313

2323

2333

2343

2353

2363

2373

2383

2393

& 4 D T d t

4

2304

2314

2324

2334

2344

2354

2364

2374

2384

2394

23A3

23A4

23B3

23C3

23D3

23E3

0 @ ⚕ 23B4

23C4

23D4

23E4

⌅ ' 5 E U e u ! 1 A⚖

5

2305

2315

2325

2335

2345

2355

2365

2375

2385

2395

23A5

23B5

23C5

23D5

23E5

(6 F V f v " 2 B ⚗

6

2306

2316

2326

2336

2346

2356

2366

2376

2386

2396

23A6

23B6

23C6

23D6

23E6

) 7 G W g w # 3 C ⚘

7

2307

2317

2327

2337

2347

2357

2367

2377

2387

2397

23A7

23B7

23C7

23D7

23E7

*8 H X h x ⎨ 4 D⚙

8

2308

2318

2328

2338

2348

2358

2368

2378

2388

2398

23A8

23B8

23C8

23D8

〈 9 I Y i y % 5 E⚚

9

2309

2319

2329

2339

2349

2359

2369

2379

2389

2399

23A9

23B9

23C9

23D9

〉 : J Z j z & 6 F Ω

A

230A

231A

232A

233A

234A

235A

236A

237A

238A

239A

23AA

23BA

23CA

23DA

+; K [ k { ' 7 G æ

B

230B

C

D

231B

232B

233B

234B

235B

236B

237B

238B

239B

23AB

23BB

23CB

23DB

, < L \ l ⎬ 8 H 230C

231C

232C

233C

234C

235C

236C

237C

238C

239C

23AC

23BC

23CC

23DC

- = M ] m | ) 9 I 230D

230E

F

231

23FF

⌠ 0 @ P ` p , <

0

E

Miscellaneous Technical

231D

231E

232D

233D

234D

235D

236D

237D

238D

239D

23AD

23BD

23CD

23DD

. > N ^ n } * : J 232E

233E

234E

235E

236E

237E

238E

239E

23AE

23BE

23CE

23DE

! / ? O _ o ~ + ; 230F

231F

232F

233F

234F

235F

236F

237F

238F

239F

23AF

23BF

23CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

23DF

191

2300

Miscellaneous Technical

Miscellaneous technical

GUI icons

2300

DIAMETER SIGN

2301

ELECTRIC ARROW

231A 231B

2302 2303

⌂ HOUSE UP ARROWHEAD → 005E ^ circumflex accent → 02C4 ˄ modifier letter up arrowhead → 2038 ! caret → 2227 ∧ logical and # DOWN ARROWHEAD → 02C5 ˅ modifier letter down arrowhead → 2228 ∨ logical or → 2335 & countersink ⌅ PROJECTIVE → 22BC ( nand ) PERSPECTIVE * WAVY LINE → 3030 wavy dash

2304

2305 2306 2307

→ 2205 ∅ empty set • from ISO 2047 • symbol for End of Transmission

Corner brackets The ceiling and floor characters are recommended for general-purpose corner brackets, rather than the CJK corner brackets, which are wide quotation marks. 2308 + LEFT CEILING = APL upstile → 300C 「 left corner bracket 2309 , RIGHT CEILING → 20E7 combining annuity symbol 230A - LEFT FLOOR = APL downstile 230B . RIGHT FLOOR → 300D 」 right corner bracket

Crops 230C

/ BOTTOM RIGHT CROP • set of four “crop” corners, arranged facing

230D 230E 230F

0 BOTTOM LEFT CROP 1 TOP RIGHT CROP 2 TOP LEFT CROP

outward

Miscellaneous technical 2310 2311

2312 2313 2314 2315 2316 2317 2318 2319

3 REVERSED NOT SIGN 4

= beginning of line → 00AC ¬ not sign SQUARE LOZENGE

= Kissen (pillow) • used as a command delimiter in some very old computers

⌒ ARC → 25E0 6 upper half circle 7 SEGMENT 8 SECTOR 9 TELEPHONE RECORDER : POSITION INDICATOR ; VIEWDATA SQUARE → 22D5 = equal and parallel to ? PLACE OF INTEREST SIGN = command key (1.0)

@ TURNED NOT SIGN = line marker

192

2330

WATCH HOURGLASS

Quine corners 231C

TOP LEFT CORNER • set of four “quine” corners, for quincuncial arrangement

• these are also used in mathematics in upper and lower pairs

231D 231E 231F

→ 2E00 ⸀ right angle substitution marker

TOP RIGHT CORNER BOTTOM LEFT CORNER BOTTOM RIGHT CORNER

Integral pieces 2320

⌠

TOP HALF INTEGRAL

2321

⌡

BOTTOM HALF INTEGRAL

→ 23AE integral extension

Frown and smile 2322 2323

FROWN → 2040 character tie

SMILE → 203F undertie

Keyboard symbols 2324

UP ARROWHEAD BETWEEN TWO HORIZONTAL BARS

= enter key 2325 OPTION KEY 2326 ERASE TO THE RIGHT = delete to the right key 2327 X IN A RECTANGLE BOX = clear key 2328 KEYBOARD

Angle brackets These are discouraged for mathematical use because of their canonical equivalence to CJK punctuation. 2329 〈 LEFT-POINTING ANGLE BRACKET → 003C < less-than sign → 2039 ‹ single left-pointing angle quotation mark → 27E8 ⟨ mathematical left angle bracket ≡ 3008 〈 left angle bracket 232A 〉 RIGHT-POINTING ANGLE BRACKET → 003E > greater-than sign → 203A › single right-pointing angle quotation mark → 27E9 ⟩ mathematical right angle bracket ≡ 3009 〉 right angle bracket

Keyboard symbol

232B ERASE TO THE LEFT = delete to the left key

Chemistry symbol 232C

BENZENE RING

Drafting symbols 232D 232E 232F 2330

CYLINDRICITY ALL AROUND-PROFILE SYMMETRY TOTAL RUNOUT

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Control Pictures Range: 2400–243F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Optical Character Recognition Range: 2440–245F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Enclosed Alphanumerics Range: 2460–24FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

2460

Enclosed Alphanumerics 246 0

24E0

24F0

2471

2481

2491

24A1

24B1

24C1

24D1

24E1

24F1

2472

2482

2492

24A2

24B2

24C2

24D2

24E2

24F2

2473

2483

2493

24A3

24B3

24C3

24D3

24E3

24F3

2474

2484

2494

24A4

24B4

24C4

24D4

24E4

24F4

2475

2485

2495

24A5

24B5

24C5

24D5

24E5

24F5

2476

2486

2496

24A6

24B6

24C6

24D6

24E6

24F6

2477

2487

2497

24A7

24B7

24C7

24D7

24E7

24F7

2478

2488

2498

24A8

24B8

24C8

24D8

24E8

24F8

2479

2489

2499

24A9

24B9

24C9

24D9

24E9

24F9

247A

248A

249A

24AA

24BA

24CA

24DA

24EA

24FA

247B

248B

249B

24AB

24BB

24CB

24DB

24EB

24FB

247C

248C

249C

24AC

24BC

24CC

24DC

24EC

24FC

247D

248D

249D

24AD

24BD

24CD

24DD

24ED

24FD

247E

248E

249E

24AE

24BE

24CE

24DE

24EE

24FE

⑿ ⒏ ⒟ ⒯ a q 246F

198

24D0

⑾ ⒎ ⒞ ⒮ ` p 246E

F

24C0

⑽ ⒍ ⒝ ⒭ _ o 246D

E

24B0

⑼ ⒌ ⒜ ⒬ ^ n ~ 246C

D

24A0

⑻ ⒋ ⒛ ⒫ ] m } 246B

C

2490

⑺ ⒊ ⒚ ⒪ \ l | 246A

B

2480

⑹ ⒉ ⒙ ⒩ [ k { 2469

A

2470

⑸ ⒈ ⒘ ⒨ Z j z 2468

9

24F

⑷ ⒇ ⒗ ⒧ Y i y 2467

8

24E

⑶ ⒆ ⒖ ⒦ X h x 2466

7

24D

⑵ ⒅ ⒕ ⒥ ⒵ g w 2465

6

24C

⑴ ⒄ ⒔ ⒤ ⒴ f v 2464

5

24B

⒃ ⒓ ⒣ ⒳ e u

2463

4

24A

⒂ ⒒ ⒢ ⒲ d t 2462

3

249

⒁ ⒑ ⒡ ⒱ c s 2461

2

248

⒀ ⒐ ⒠ ⒰ b r 2460

1

247

24FF

247F

248F

249F

24AF

24BF

24CF

24DF

24EF

24FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Box Drawing Range: 2500–257F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

2500

Box Drawing 250 0

2531

2541

2551

2561

2571

2512

2522

2532

2542

2552

2562

2572

2513

2523

2533

2543

2553

2563

2573

2514

2524

2534

2544

2554

2564

2574

2515

2525

2535

2545

2555

2565

2575

2516

2526

2536

2546

2556

2566

2576

2517

2527

2537

2547

2557

2567

2577

2518

2528

2538

2548

2558

2568

2578

2519

2529

2539

2549

2559

2569

2579

251A

252A

253A

254A

255A

256A

257A

251B

252B

253B

254B

255B

256B

257B

251C

252C

253C

254C

255C

256C

257C

251D

252D

253D

254D

255D

256D

257D

251E

252E

253E

254E

255E

256E

257E

┏ ┟ ┯ ┿ ╏ ╟ ╯ ╿ 250F

202

2521

┎ ┞ ┮ ┾ ╎ ╞ ╮ ╾

250E

F

2511

┍ ┝ ┭ ┽ ╍ ╝ ╭ ╽

250D

E

2570

┌ ├ ┬ ┼ ╌ ╜ ╬ ╼ 250C

D

2560

┋ ┛ ┫ ┻ ╋ ╛ ╫ ╻

250B

C

2550

┊ ┚ ┪ ┺ ╊ ╚ ╪ ╺

250A

B

2540

┉ ┙ ┩ ┹ ╉ ╙ ╩ ╹ 2509

A

2530

┈ ┘ ┨ ┸ ╈ ╘ ╨ ╸ 2508

9

2520

┇ ┗ ┧ ┷ ╇ ╗ ╧ ╷ 2507

8

2510

┆ ┖ ┦ ┶ ╆ ╖ ╦ ╶ 2506

7

257

┅ ┕ ┥ ┵ ╅ ╕ ╥ ╵ 2505

6

256

┄ └ ┤ ┴ ╄ ╔ ╤ ╴ 2504

5

255

┃ ┓ ┣ ┳ ╃ ╓ ╣ ╳ 2503

4

254

│ ┒ ┢ ┲ ╂ ╒ ╢ ╲ 2502

3

253

━ ┑ ┡ ┱ ╁ ║ ╡ ╱ 2501

2

252

─ ┐ ┠ ┰ ╀ ═ ╠ ╰ 2500

1

251

257F

251F

252F

253F

254F

255F

256F

257F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Block Elements Range: 2580–259F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Geometric Shapes Range: 25A0–25FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

25A0

Geometric Shapes 25A 0

25C2

25D2

25E2

25F2

25B3

25C3

25D3

25E3

25F3

25B4

25C4

25D4

25E4

25F4

25B5

25C5

25D5

25E5

25F5

25B6

25C6

25D6

25E6

25F6

25B7

25C7

25D7

25E7

25F7

25B8

25C8

25D8

25E8

25F8

25B9

25C9

25D9

25E9

25F9

25BA

25BB

25CA

25CB

25DA

25DB

25EA

25EB

25FA

25FB

25BC

25CC

25DC

25EC

25FC

/ ?

25AF

206

25B2

. >

25AE

F

25F1

○ - =

25AD

E

25E1

25AC

D

25D1

◊ , <

25AB

C

25C1

25AA

B

25B1

◉ + ; 25A9

A

25F0

* : 25A8

9

25E0

♦ ) 9 I 25A7

8

25D0

( 8 H 25A6

7

25C0

' 7 G 25A5

6

25B0

& 6 F 25A4

5

25F

% 5 E 25A3

4

25E

$ 4 D 25A2

3

25D

# 3 C 25A1

2

25C

" 2 B 25A0

1

25B

25FF

25BD

25BE

25CD

25CE

25DD

25ED

25FD

0 @ 25DE

25EE

25FE

! 1 A

25BF

25CF

25DF

25EF

25FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Miscellaneous Symbols Range: 2600–26FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

2600 260

2600

262

263

264

265

266

267

268

269

26A 26B 26C 26D 26E

26F

2610

2620

2630

2640

2650

2660

2670

2680

2690

26A0

26B0

+ ; K ♡ c mx

1

2601

2611

2621

2631

2641

2651

2661

2671

2681

2691

26A1

26B1

,
2

2602

2612

2622

2632

2642

2652

2662

2672

2682

2692

26A2

26B2

- = M o z

3

2603

2613

2623

2633

2643

2653

2663

2673

2683

2693

26A3

. > N ♤ p {

4

2604

2614

5

2605

2615

2624

2625

2634

2644

2654

2664

2674

2684

2694

26A4

/ ? O è | 2635

2645

2655

2665

2675

2685

2695

26A5

! 0@P q }

6

2606

2616

2626

2636

2646

2656

2666

2676

2686

2696

26A6

1 A Q ♧ r ~

7

2607

2617

2627

2637

2647

2657

2667

2677

2687

2697

26A7

" 2BR Z ⚈ ÷

8

2608

2618

2628

2638

2648

2658

2668

2678

2688

2698

26A8

# 3CS [ ⚉ s

9

2609

2619

2629

2639

2649

2659

2669

2679

2689

2699

26A9

$4DT \ f t

A

260A

261A

262A

263A

264A

265A

266A

267A

268A

269A

26AA

% 5EU ] g u

B

260B

C

D

261B

262B

263B

264B

265B

266B

267B

268B

269B

26AB

& 6FV ^ h v 260C

261C

262C

263C

264C

265C

266C

267C

268C

269C

26AC

' 7 GW _ i

( 8 HX ` d j

) 9 IY a e k

260D

260E

F

261

26FF

* : J b lw

0

E

Miscellaneous Symbols

260F

261D

261E

261F

262D

262E

262F

263D

263E

263F

264D

264E

264F

265D

265E

265F

266D

266E

266F

267D

267E

267F

268D

268E

268F

26AD

26AE

26AF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

209

Dingbats Range: 2700–27BF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

2700

Dingbats 270

271

2710

2712

e u

2770

2750

2780

2790

27A0

2721

2722

2731

2741

2751

2761

/ ? M W 2732

2742

2752

2762

2771

2781

2791

27A1

27B1

g w £

2772

2782

2792

27A2

27B2

2714

2723

2724

2733

2734

2763

2743

h x ¤

2764

2744

2773

2774

2783

2784

2793

2794

27A3

27A4

27B3

27B4

§

2716

2725

2726

2735

2736

2765

2745

2746

2756

2717

2727

2737

2747

2767

5 E O

2738

2718

2748

2758

& 6 F P 2719

2729

2739

2749

2759

' 7 G Q 271A

272A

273A

274A

275A

( 8 H R

B

271B

272B

273B

274B

270C

271C

272C

275B

S

) 9

275C

273C

2766

2775

2776

27A5

2785

27A6

2786

\ l |

% 4 D

A

2777

27B5

27B6

¨

27A7

2787

27B7

] m } ©

2768

2778

2788

2798

27A8

27B8

^ n ~ ª

2769

2779

2789

2799

27A9

27B9

_ o «

276A

277A

278A

279A

27AA

27BA

` p ¬

276B

277B

278B

279B

27AB

27BB

a q

276C

277C

278C

279C

27AC

27BC

* : I T b r ® 270D

271D

272D

273D

274D

271E

272E

275E

273E

, < J 270F

271F

272F

273F

275D

276D

277D

278D

279D

27AD

27BD

U c s ¯

+ ; 270E

F

27B

$ 3 C N [ k {

2709

E

27A

¦

2708

D

279

Z j z

2707

C

2740

# 2 B 2706

9

2730

278

Y i y ¥

2713

2715

8

2720

277

" 1 A

5

7

276

X

2704

6

275

! 0 @ 2703

4

2711

2702

3

274

. > L V f v ¢ 2701

2

273

- = K

0

1

272

27BF

274F

276E

277E

278E

279E

27AE

27BE

d t ¡

276F

277F

278F

279F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

27AF

213

Miscellaneous Mathematical Symbols-A Range: 27C0–27EF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Supplemental Arrows-A Range: 27F0–27FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Braille Patterns Range: 2800–28FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

2800 280

2800

282

283

284

285

286

287

288

289

28A 28B 28C 28D 28E

28F

2810

2820

2830

2840

2850

2860

2870

2880

2890

28A0

28B0

28C0

28D0

28E0

28F0

Åë°±¡—·ÒÅë°±¡—·Ò

1

2801

2811

2821

2831

2841

2851

2861

2871

2881

2891

28A1

28B1

28C1

28D1

28E1

28F1

Çí¢≤¬“‚ÚÇí¢≤¬“‚Ú

2

2802

2812

2822

2832

2842

2852

2862

2872

2882

2892

28A2

28B2

28C2

28D2

28E2

28F2

Éì£≥√”„ÛÉì£≥√”„Û

3

2803

2813

2823

2833

2843

2853

2863

2873

2883

2893

28A3

28B3

28C3

28D3

28E3

28F3

Ñî§¥ƒ‘‰ÙÑî§¥ƒ‘‰Ù

4

2804

2814

2824

2834

2844

2854

2864

2874

2884

2894

28A4

28B4

28C4

28D4

28E4

28F4

Öï•μ≈’ÂıÖï•μ≈’Âı

5

2805

2815

2825

2835

2845

2855

2865

2875

2885

2895

28A5

28B5

28C5

28D5

28E5

28F5

Üñ¶∂∆÷ÊˆÜñ¶∂∆÷Êˆ

6

2806

2816

2826

2836

2846

2856

2866

2876

2886

2896

28A6

28B6

28C6

28D6

28E6

28F6

áóß∑«◊Á˜áóß∑«◊Á˜

7

2807

2817

2827

2837

2847

2857

2867

2877

2887

2897

28A7

28B7

28C7

28D7

28E7

28F7

àò®∏»ÿË¯àò®∏»ÿË¯

8

2808

2818

2828

2838

2848

2858

2868

2878

2888

2898

28A8

28B8

28C8

28D8

28E8

28F8

âô©π…ŸÈ˘âô©π…ŸÈ˘

9

2809

2819

2829

2839

2849

2859

2869

2879

2889

2899

28A9

28B9

28C9

28D9

28E9

28F9

äö™∫ ⁄Í˙äö™∫ ⁄Í˙

A

280A

281A

282A

283A

284A

285A

286A

287A

288A

289A

28AA

28BA

28CA

28DA

28EA

28FA

ãõ´ªÀ¤Î˚ãõ´ªÀ¤Î˚

B

280B

C

D

281B

282B

283B

284B

285B

286B

287B

288B

289B

28AB

28BB

28CB

28DB

28EB

28FB

åú¨ºÃ‹Ï¸åú¨ºÃ‹Ï¸ 280C

281C

282C

283C

284C

285C

286C

287C

288C

289C

28AC

28BC

28CC

28DC

28EC

28FC

çù≠ΩÕ›Ì˝çù≠ΩÕ›Ì˝ 280D

281D

282D

283D

284D

285D

286D

287D

288D

289D

28AD

28BD

28CD

28DD

28ED

28FD

éûÆæŒﬁÓ˛éûÆæŒﬁÓ˛ 280E

F

281

28FF

Äê†∞¿–‡Äê†∞¿–‡

0

E

Braille Patterns

281E

282E

283E

284E

285E

286E

287E

288E

289E

28AE

28BE

28CE

28DE

28EE

28FE

èüØøœﬂÔˇèüØøœﬂÔˇ 280F

220

281F

282F

283F

284F

285F

286F

287F

288F

289F

28AF

28BF

28CF

28DF

28EF

28FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Supplemental Arrows-B Range: 2900–297F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

2900

Supplemental Arrows-B 290 0

2941

2912

2913

2922

2923

2932

2933

2942

2943

2914

2924

2934

2944

2915

2925

2935

2945

2916

2926

2936

2946

2917

2927

2918

2928

2919

2929

2937

2947

2951

2970

⥡ ⥱

2961

2971

2952

2953

2962

2963

2972

2973

2954

2955

2964

2974

⥥ ⥵ 2965

2975

2956

2966

2976

2957

2967

2977

⤸ ⥈ ⥘ ⥨ ⥸

2938

⤹

2939

2948

⥉

2949

291A

292A

293A

294A

291B

292B

293B

294B

⤌ ⤜ ⤬ ⤼ ⥌ 291C

292C

293C

294C

⤍ ⤝ ⤭ ⤽ ⥍

2958

2968

2978

⥙ ⥩ ⥹

2959

2969

2979

295A

296A

297A

291D

292D

293D

294D

295B

296B

297B

⥜ ⥬ ⥼

295C

296C

297C

⥝ ⥭ ⥽

295D

296D

297D

⤎ ⤞ ⤮ ⤾ ⥎ ⥞ ⥮ ⥾ 290E

F

2931

2960

⤋ ⤛ ⤫ ⤻ ⥋ ⥛ ⥫ ⥻

290D

E

2950

⤊ ⤚ ⤪ ⤺ ⥊ ⥚ ⥪ ⥺

290C

D

2921

⤉ ⤙ ⤩

290B

C

2911

⤈ ⤘ ⤨

290A

B

2940

⤇ ⤗ ⤧ ⤷ ⥇ ⥗ ⥧ ⥷

2909

A

2930

⤆ ⤖ ⤦ ⤶ ⥆ ⥖ ⥦ ⥶

2908

9

2920

⤅ ⤕ ⤥ ⤵ ⥅ ⥕

2907

8

2910

⤄ ⤔ ⤤ ⤴ ⥄ ⥔ ⥤ ⥴

2906

7

297

⤣ ⤳ ⥃ ⥓ ⥣ ⥳

2905

6

296

⤃ ⤓ 2904

5

295

⤢ ⤲ ⥂ ⥒ ⥢ ⥲

2903

4

294

⤂ ⤒ 2902

3

293

⤁ ⤑ ⤡ ⤱ ⥁ ⥑ 2901

2

292

⤀ ⤐ ⤠ ⤰ ⥀ ⥐ ⥠ ⥰ 2900

1

291

297F

291E

292E

293E

294E

295E

296E

297E

⤏ ⤟ ⤯ ⤿ ⥏ ⥟ ⥯ ⥿ 290F

291F

292F

293F

294F

295F

296F

297F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

223

Miscellaneous Mathematical Symbols-B Range: 2980–29FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

2980

Miscellaneous Mathematical Symbols-B 298 0

⦀

2980

1

⦁

2981

2

⦂

2982

3

⦉

2989

A

⦊

298A

B

29E0

29F0

⦑ ⦡ ⦱ ⧁ ⧑ ⧡ ⧱

2991

29A1

29B1

29C1

29D1

29E1

29F1

⦒ ⦢ ⦲ ⧂ ⧒ ⧢ ⧲

2992

29A2

29B2

29C2

29D2

29E2

29F2

2993

29A3

29B3

29C3

29D3

29E3

29F3

2994

29A4

29B4

29C4

29D4

29E4

2995

29A5

29B5

29C5

29D5

29E5

29F4

⧵

29F5

2996

29A6

29B6

29C6

29D6

29E6

2997

29A7

29B7

29C7

2998

29A8

29B8

29C8

29D7

29E7

29F6

⧷

29F7

⧘ ⧨ ⧸

29D8

29E8

29F8

⦙

⦩ ⦹ ⧉ ⧙ ⧩ ⧹

⦚

⦪ ⦺ ⧊ ⧚ ⧪ ⧺

2999

299A

29A9

29AA

29B9

29BA

29C9

29CA

29D9

29DA

29E9

29EA

29F9

29FA

299B

29AB

29BB

29CB

29DB

29EB

29FB

299C

29AC

29BC

29CC

29DC

29EC

29FC

299D

29AD

29BD

29CD

29DD

29ED

29FD

299E

29AE

29BE

29CE

29DE

29EE

29FE

⦏ ⦟ ⦯ ⦿ ⧏ ⧟ ⧯ ⧿

298F

226

29D0

⦎ ⦞ ⦮ ⦾ ⧎ ⧞ ⧮ ⧾

298E

F

29C0

⦍ ⦝ ⦭ ⦽ ⧍ ⧝ ⧭ ⧽

298D

E

29B0

⦌ ⦜ ⦬ ⦼ ⧌ ⧜ ⧬ ⧼

298C

D

29A0

⦋ ⦛ ⦫ ⦻ ⧋ ⧛ ⧫ ⧻

298B

C

29F

⦐ ⦠ ⦰ ⧀ ⧐ ⧠ ⧰

2990

⦈ ⦘ ⦨ ⦸ ⧈

2988

9

29E

⦇ ⦗ ⦧ ⦷ ⧇ ⧗ ⧧

2987

8

29D

⦆ ⦖ ⦦ ⦶ ⧆ ⧖ ⧦ ⧶

2986

7

29C

⦅ ⦕ ⦥ ⦵ ⧅ ⧕ ⧥

2985

6

29B

⦄ ⦔ ⦤ ⦴ ⧄ ⧔ ⧤ ⧴

2984

5

29A

⦃ ⦓ ⦣ ⦳ ⧃ ⧓ ⧣ ⧳

2983

4

299

29FF

299F

29AF

29BF

29CF

29DF

29EF

29FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Supplemental Mathematical Operators Range: 2A00–2AFF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

2A00

Supplemental Mathematical Operators

2AFF

2A0 2A1 2A2 2A3 2A4 2A5 2A6 2A7 2A8 2A9 2AA 2AB 2AC 2AD 2AE 2AF

⨀ ⨐ ⨠

0

2A00

2A01

2A30

⩀ ⩐ ⩠ ⩰ ⪀ ⪐ ⪠ ⪰ ⫀ ⫐ ⫠ ⫰ 2A40

2A50

2A60

2A70

2A80

2A90

2AA0

2AB0

2AC0

2AD0

2AE0

2AF0

2A11

2A21

2A31

2A41

2A51

2A61

2A71

2A81

2A91

2AA1

2AB1

2AC1

2AD1

2AE1

2AF1

⨂ ⨒ ⨢ ⨲ ⩂ ⩒ ⩢ ⩲ ⪂ ⪒ ⪢ ⪲ ⫂ ⫒ ⫲

2

2A02

2A12

2A22

2A32

2A42

2A52

2A62

2A72

2A82

2A92

2AA2

2AB2

2AC2

2AD2

2AE2

2AF2

⨃ ⨓ ⨣ ⨳ ⩃ ⩓ ⩣ ⩳ ⪃ ⪓ ⪣ ⪳ ⫃ ⫓ ⫣ ⫳

3

2A03

2A13

2A23

2A33

2A43

2A53

2A63

2A73

2A83

2A93

2AA3

2AB3

2AC3

2AD3

2AE3

2AF3

⨄ ⨔ ⨤ ⨴ ⩄ ⩔ ⩤ ⩴ ⪄ ⪔ ⪤ ⪴ ⫄ ⫔ ⫤ ⫴

4

2A04

2A14

2A24

2A34

2A44

2A54

2A64

2A74

2A84

2A94

2AA4

2AB4

2AC4

2AD4

2AE4

2AF4

⨅ ⨕ ⨥ ⨵ ⩅ ⩕ ⩥ ⩵ ⪅ ⪕ ⪥ ⪵ ⫅ ⫕ ⫥ ⫵

5

2A05

2A15

2A25

2A35

2A45

2A55

2A65

2A75

2A85

2A95

2AA5

2AB5

2AC5

2AD5

2AE5

2AF5

⨆ ⨖ ⨦ ⨶ ⩆ ⩖ ⩦⩶⪆ ⪖ ⪦ ⪶ ⫆ ⫖ ⫦ ⫶

6

2A06

2A16

2A26

2A36

2A46

2A56

2A66

2A76

2A86

2A96

2AA6

2AB6

2AC6

2AD6

2AE6

2AF6

⨇ ⨗ ⨧ ⨷ ⩇ ⩗ ⩧ ⩷ ⪇ ⪗ ⪧ ⪷ ⫇ ⫗ ⫧ ⫷

7

2A07

2A17

2A27

2A37

2A47

2A57

2A67

2A77

2A87

2A97

2AA7

2AB7

2AC7

2AD7

2AE7

2AF7

⨈ ⨘ ⨨ ⨸ ⩈ ⩘ ⩨ ⩸ ⪈ ⪘ ⪨ ⪸ ⫈ ⫘ ⫨ ⫸

8

2A08

2A18

2A28

2A38

2A48

2A58

2A68

2A78

2A88

2A98

2AA8

2AB8

2AC8

2AD8

2AE8

2AF8

⨉ ⨙ ⨩ ⨹ ⩉ ⩙ ⩩ ⩹ ⪉ ⪙ ⪩ ⪹ ⫉ ⫙ ⫩ ⫹

9

2A09

2A19

2A29

2A39

2A49

2A59

2A69

2A79

2A89

2A99

2AA9

2AB9

2AC9

2AD9

2AE9

2AF9

⨊ ⨚ ⨪ ⨺ ⩊ ⩚ ⩪ ⩺ ⪊ ⪚ ⪪ ⪺ ⫊ ⫚ ⫪ ⫺

A

2A0A

2A1A

2A2A

2A3A

2A4A

2A5A

2A6A

2A7A

2A8A

2A9A

2AAA

2ABA

2ACA

2ADA

2AEA

2AFA

⨋ ⨛ ⨫ ⨻ ⩋ ⩛ ⩫ ⩻ ⪋ ⪛ ⪫ ⪻ ⫋ ⫛ ⫫ ⫻

B

2A0B

C

⨌ 2A0C

D

2A1B

2A2B

2A3B

2A4B

2A5B

2A6B

2A7B

2A8B

2A9B

2AAB

2ABB

2ACB

2ADB

2AEB

2AFB

⨜ ⨬ ⨼ ⩌ ⩜ ⩬ ⩼ ⪌ ⪜ ⪬ ⪼ ⫌ ⫝̸ ⫬ ⫼

2A1C

2A2C

2A3C

2A4C

2A5C

2A6C

2A7C

2A8C

2A9C

2AAC

2ABC

2ACC

2ADC

2AEC

2AFC

⨍ ⨝ ⨭ ⨽ ⩍ ⩝ ⩭ ⩽ ⪍ ⪝ ⪭ ⪽ ⫍ ⫝ ⫭ ⫽

2A0D

2A1D

2A2D

2A3D

2A4D

2A5D

2A6D

2A7D

2A8D

2A9D

2AAD

2ABD

2ACD

2ADD

2AED

⨎ ⨞ ⨮ ⨾ ⩎ ⩞ ⩮ ⩾ ⪎ ⪞ ⪮ ⪾ ⫎ ⫞ ⫮

2A0E

F

2A20

⨁ ⨑ ⨡ ⨱ ⩁ ⩑ ⩡ ⩱ ⪁ ⪑ ⪡ ⪱ ⫁ ⫑ ⫡ ⫱

1

E

2A10

2A1E

2A2E

2A3E

2A4E

2A5E

2A6E

2A7E

2A8E

2A9E

2AAE

2ABE

2ACE

2ADE

2AEE

2AFD

⫾

2AFE

⨏ ⨟ ⨯ ⨿ ⩏ ⩟ ⩯ ⩿ ⪏ ⪟ ⪯ ⪿ ⫏ ⫟ ⫯ ⫿

2A0F

2A1F

2A2F

2A3F

2A4F

2A5F

2A6F

2A7F

2A8F

2A9F

2AAF

2ABF

2ACF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

2ADF

2AEF

2AFF

229

2A00

Supplemental Mathematical Operators 2A21

N-ary operators

2A00 ⨀ N-ARY CIRCLED DOT OPERATOR → 2299 circled dot operator → 25C9 ◉ fisheye 2A01 ⨁ N-ARY CIRCLED PLUS OPERATOR → 2295 ⊕ circled plus 2A02 ⨂ N-ARY CIRCLED TIMES OPERATOR → 2297 ⊗ circled times 2A03 ⨃ N-ARY UNION OPERATOR WITH DOT 2A04 ⨄ N-ARY UNION OPERATOR WITH PLUS → 228E multiset union 2A05 ⨅ N-ARY SQUARE INTERSECTION OPERATOR → 2293 square cap 2A06 ⨆ N-ARY SQUARE UNION OPERATOR → 2294 square cup 2A07 ⨇ TWO LOGICAL AND OPERATOR = merge → 2A55 ⩕ two intersecting logical and 2A08 ⨈ TWO LOGICAL OR OPERATOR → 2A56 ⩖ two intersecting logical or 2A09 ⨉ N-ARY TIMES OPERATOR → 00D7 × multiplication sign

Summations and integrals

2A0A ⨊ MODULO TWO SUM → 2211 ∑ n-ary summation 2A0B ⨋ SUMMATION WITH INTEGRAL 2A0C ⨌ QUADRUPLE INTEGRAL OPERATOR → 222D triple integral 222B ∫ 222B ∫ 222B ∫ 222B ∫ 2A0D ⨍ FINITE PART INTEGRAL 2A0E ⨎ INTEGRAL WITH DOUBLE STROKE 2A0F ⨏ INTEGRAL AVERAGE WITH SLASH 2A10 ⨐ CIRCULATION FUNCTION 2A11 ⨑ ANTICLOCKWISE INTEGRATION 2A12 ⨒ LINE INTEGRATION WITH RECTANGULAR 2A13

⨓

2A14 2A15

⨔ ⨕

2A16 2A17

⨖ ⨗

2A18 2A19 2A1A 2A1B

⨘ ⨙ ⨚ ⨛

2A1C

⨜

Z NOTATION SCHEMA PROJECTION

→ 21BE upwards harpoon with barb rightwards

Plus and minus sign operators 2A22 2A23

⨢ PLUS SIGN WITH SMALL CIRCLE ABOVE ⨣ PLUS SIGN WITH CIRCUMFLEX ACCENT

2A24

⨤ PLUS SIGN WITH TILDE ABOVE

2A25 2A26

⨥ PLUS SIGN WITH DOT BELOW → 2214 dot plus ⨦ PLUS SIGN WITH TILDE BELOW

2A27

⨧ PLUS SIGN WITH SUBSCRIPT TWO

2A28 2A29 2A2A

⨨ PLUS SIGN WITH BLACK TRIANGLE ⨩ MINUS SIGN WITH COMMA ABOVE ⨪ MINUS SIGN WITH DOT BELOW → 2238 dot minus ⨫ MINUS SIGN WITH FALLING DOTS ⨬ MINUS SIGN WITH RISING DOTS ⨭ PLUS SIGN IN LEFT HALF CIRCLE ⨮ PLUS SIGN IN RIGHT HALF CIRCLE

2A2B 2A2C 2A2D 2A2E

ABOVE

= positive difference or sum

= sum or positive difference = nim-addition

Multiplication and division sign operators 2A2F

⨯

2A30 2A31 2A32 2A33 2A34 2A35 2A36

⨱ ⨲ ⨳ ⨴ ⨵ ⨶

VECTOR OR CROSS PRODUCT

→ 00D7 × multiplication sign

MULTIPLICATION SIGN WITH DOT ABOVE MULTIPLICATION SIGN WITH UNDERBAR SEMIDIRECT PRODUCT WITH BOTTOM CLOSED SMASH PRODUCT MULTIPLICATION SIGN IN LEFT HALF CIRCLE MULTIPLICATION SIGN IN RIGHT HALF CIRCLE CIRCLED MULTIPLICATION SIGN WITH CIRCUMFLEX ACCENT MULTIPLICATION SIGN IN DOUBLE CIRCLE CIRCLED DIVISION SIGN

2A37 ⨷ PATH AROUND POLE 2A38 ⨸ LINE INTEGRATION WITH SEMICIRCULAR PATH AROUND POLE LINE INTEGRATION NOT INCLUDING THE POLE Miscellaneous mathematical operators INTEGRAL AROUND A POINT OPERATOR 2A39 ⨹ PLUS SIGN IN TRIANGLE → 222E ∮ contour integral 2A3A ⨺ MINUS SIGN IN TRIANGLE QUATERNION INTEGRAL OPERATOR 2A3B ⨻ MULTIPLICATION SIGN IN TRIANGLE INTEGRAL WITH LEFTWARDS ARROW WITH 2A3C ⨼ INTERIOR PRODUCT HOOK → 230B right floor INTEGRAL WITH TIMES SIGN 2A3D ⨽ RIGHTHAND INTERIOR PRODUCT INTEGRAL WITH INTERSECTION → 230A left floor INTEGRAL WITH UNION → 2319 turned not sign INTEGRAL WITH OVERBAR 2A3E ⨾ Z NOTATION RELATIONAL COMPOSITION = upper integral

INTEGRAL WITH UNDERBAR

= lower integral

Miscellaneous large operators

2A1D ⨝ JOIN = large bowtie • relational database theory → 22C8 H bowtie → 27D7 ⟗ full outer join 2A1E ⨞ LARGE LEFT TRIANGLE OPERATOR • relational database theory → 25C1 white left-pointing triangle 2A1F ⨟ Z NOTATION SCHEMA COMPOSITION → 2A3E ⨾ z notation relational composition 2A20 ⨠ Z NOTATION SCHEMA PIPING → 226B much greater-than

230

⨡

2A46

2A3F

→ 2A1F ⨟ z notation schema composition ⨿ AMALGAMATION OR COPRODUCT → 2210 n-ary coproduct

Intersections and unions 2A40 2A41 2A42 2A43 2A44 2A45 2A46

⩀ INTERSECTION WITH DOT → 2227 ∧ logical and → 27D1 ⟑ and with dot ⩁ UNION WITH MINUS SIGN ⩂ ⩃ ⩄ ⩅ ⩆

= z notation bag subtraction → 228E multiset union

UNION WITH OVERBAR INTERSECTION WITH OVERBAR INTERSECTION WITH LOGICAL AND UNION WITH LOGICAL OR UNION ABOVE INTERSECTION

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Miscellaneous Symbols and Arrows Range: 2B00–2BFF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Glagolitic Range: 2C00–2C5F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Latin Extended-C Range: 2C60–2C7F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Coptic Range: 2C80–2CFF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

2C80

Coptic 2C8 0

Ç

2C82

3

É

2C83

4

í

2C92

ì

2C93

2CA1

2CB1

2CC1

2CD1

2CE0

2C94

2C95

2CE1

¢ ≤ ¬ “ ‚

2CA2

2CB2

2CC2

2CD2

2CE2

£ ≥ √ ” „

2CA3

2CB3

2CC3

2CD3

2CA4

2CB4

2CC4

2CA5

2CB5

2CC5

2CD4

2CE3

‰

2CE4

’ Â

2CD5

2C96

2CA6

2CB6

2CC6

2CD6

2C97

2CA7

2CB7

2CC7

2CD7

2C98

2CA8

2CB8

2C99

2CA9

2CB9

2CC8

2CD8

2CE5

2CE6

2CE7

2CE8

… Ÿ È ˘

2CC9

2CD9

2C9A

2CAA

2CBA

2CCA

2CDA

2C9B

2CAB

2CBB

2CCB

2CDB

2C9C

2CAC

2CBC

2CCC

2CDC

2C9D

2CAD

2CBD

2CCD

2CDD

é û Æ æ Œ ﬁ 2C8E

F

2C91

ç ù ≠ Ω Õ ›

2C8D

E

2CD0

å ú ¨ º Ã ‹ 2C8C

D

2CC0

ã õ ´ ª À € 2C8B

C

2CB0

2CE9

2CF9

ä ö ™ ∫ ⁄Í ˙ 2C8A

B

2CA0

â ô © π 2C89

A

2CF

à ò ® ∏ » ÿ Ë 2C88

9

2CE

á ó ß ∑ « ◊ Á 2C87

8

2CD

Ü ñ ¶ ∂ ∆ ÷ Ê 2C86

7

2C90

Ö ï • μ ≈

2C85

6

2CC

Ñ î § ¥ ƒ ‘ 2C84

5

2CB

Å ë ° ± ¡ — · 2C81

2

2CA

Ä ê † ∞ ¿ – ‡ 2C80

1

2C9

2CFF

2C9E

2CAE

2CBE

2CCE

2CDE

è ü Ø ø œ ﬂ

2C8F

2C9F

2CAF

2CBF

2CCF

2CDF

2CEA

2CFA

˚

2CFB

¸ 2CFC

˝

2CFD

˛

2CFE

ˇ

2CFF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

239

Georgian Supplement Range: 2D00–2D2F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Tifinagh Range: 2D30–2D7F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Ethiopic Extended Range: 2D80–2DDF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Supplemental Punctuation Range: 2E00–2E7F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

CJK Radicals Supplement Range: 2E80–2EFF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

2E80

CJK Radicals Supplement 2E8 0

2E91

2E92

2E93

2E94

2E95

2E96

2E97

2E98

2E99

2EC1

2ED1

2EE1

2EF1

2EA2

2EB2

2EC2

2ED2

2EE2

2EF2

2EA3

2EB3

2EC3

2ED3

2EE3

2EF3

2EA4

2EB4

2EC4

2ED4

2EE4

2EA5

2EB5

2EC5

2ED5

2EE5

2EA6

2EB6

2EC6

2ED6

2EE6

2EA7

2EB7

2EC7

2ED7

2EE7

2EA8

2EB8

2EC8

2ED8

2EE8

2EA9

2EB9

2EC9

2ED9

2EE9

2EAA

2EBA

2ECA

2EDA

2EEA

2E9B

2EAB

2EBB

2ECB

2EDB

2EEB

2E9C

2EAC

2EBC

2ECC

2EDC

2EEC

2E9D

2EAD

2EBD

2ECD

2EDD

2EED

2E9E

2EAE

2EBE

2ECE

2EDE

2EEE

⺏⺟⺯⺿⻏⻟⻯ 2E8F

248

2EB1

⺎⺞⺮⺾⻎⻞⻮ 2E8E

F

2EA1

⺍⺝⺭⺽⻍⻝⻭ 2E8D

E

2EF0

⺌⺜⺬⺼⻌⻜⻬ 2E8C

D

2EE0

⺋⺛⺫⺻⻋⻛⻫ 2E8B

C

2ED0

⺪⺺⻊⻚⻪

⺊ 2E8A

B

2EC0

⺉⺙⺩⺹⻉⻙⻩ 2E89

A

2EB0

⺈⺘⺨⺸⻈⻘⻨ 2E88

9

2EA0

⺇⺗⺧⺷⻇⻗⻧ 2E87

8

2E90

⺆⺖⺦⺶⻆⻖⻦ 2E86

7

2EF

⺅⺕⺥⺵⻅⻕⻥ 2E85

6

2EE

⺄⺔⺤⺴⻄⻔⻤ 2E84

5

2ED

⺃⺓⺣⺳⻃⻓⻣⻳ 2E83

4

2EC

⺂⺒⺢⺲⻂⻒⻢⻲ 2E82

3

2EB

⺁⺑⺡⺱⻁⻑⻡⻱ 2E81

2

2EA

⺀⺐⺠⺰⻀⻐⻠⻰ 2E80

1

2E9

2EFF

2E9F

2EAF

2EBF

2ECF

2EDF

2EEF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Kangxi Radicals Range: 2F00–2FDF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

2F00 2F0 0

2F0E

F

2F10

2F20

2F30

2F40

2F50

2F60

2F70

2F80

2F90

2FA0

2FB0

2FC0

2FD0

2F11

2F21

2F31

2F41

2F51

2F61

2F71

2F81

2F91

2FA1

2FB1

2FC1

2FD1

2F12

2F22

2F32

2F42

2F52

2F62

2F72

2F82

2F92

2FA2

2FB2

2FC2

2FD2

2F13

2F23

2F33

2F43

2F53

2F63

2F73

2F83

2F93

2FA3

2FB3

2FC3

2FD3

2F14

2F24

2F34

2F44

2F54

2F64

2F74

2F84

2F94

2FA4

2FB4

2FC4

2FD4

2F15

2F25

2F35

2F45

2F55

2F65

2F75

2F85

2F95

2FA5

2FB5

2FC5

2FD5

2F16

2F26

2F36

2F46

2F56

2F66

2F76

2F86

2F96

2FA6

2FB6

2FC6

2F17

2F27

2F37

2F47

2F57

2F67

2F77

2F87

2F97

2FA7

2FB7

2FC7

2F18

2F28

2F38

2F48

2F58

2F68

2F78

2F88

2F98

2FA8

2FB8

2FC8

2F19

2F29

2F39

2F49

2F59

2F69

2F79

2F89

2F99

2FA9

2FB9

2FC9

2F1A

2F2A

2F3A

2F4A

2F5A

2F6A

2F7A

2F8A

2F9A

2FAA

2FBA

2FCA

2F1B

2F2B

2F3B

2F4B

2F5B

2F6B

2F7B

2F8B

2F9B

2FAB

2FBB

2FCB

2F1C

2F2C

2F3C

2F4C

2F5C

2F6C

2F7C

2F8C

2F9C

2FAC

2FBC

2FCC

/ ? O _ o ¯ ¿ Ï 2F0D

E

2FD

. > N ^ n~ ® ¾ Î 2F0C

D

2FC

- = M ] m } ½ Í 2F0B

C

2FB

, < L \ l | ¬ ¼ Ì 2F0A

B

2FA

+ ; K [ k { « » Ë 2F09

A

2F9

* : J Z j z ª º Ê 2F08

9

2F8

) 9 I Y i y © ¹ É 2F07

8

2F7

( 8 H X h x ¨ ¸ È 2F06

7

2F6

' 7 G W g w § · Ç × 2F05

6

2F5

& 6 F V f v ¦ ¶ Æ Ö 2F04

5

2F4

% 5 E U e u ¥ µ Å Õ 2F03

4

2F3

$ 4 D T d t ¤ ´ Ä Ô 2F02

3

2F2

# 3 C S c s £ ³ Ã Ó 2F01

2

2F1

2FDF

" 2 B R b r ¢ ² Â Ò 2F00

1

Kangxi Radicals

2F1D

2F1E

2F2D

2F3D

2F4D

2F5D

2F6D

2F7D

2F8D

2F9D

2FAD

2FBD

2FCD

0 @ P ` p ° À Ð 2F2E

2F3E

2F4E

2F5E

2F6E

2F7E

2F8E

2F9E

2FAE

2FBE

2FCE

! 1 A Q a q ¡ ± Á Ñ 2F0F

2F1F

2F2F

2F3F

2F4F

2F5F

2F6F

2F7F

2F8F

2F9F

2FAF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

2FBF

2FCF

253

2F00 Kangxi radicals

Kangxi Radicals

2F00 ! KANGXI RADICAL ONE 4E00 一 2F01 " KANGXI RADICAL LINE 4E28 丨 2F02 # KANGXI RADICAL DOT 4E36 丶 2F03 $ KANGXI RADICAL SLASH 4E3F 丿 2F04 % KANGXI RADICAL SECOND 4E59 乙 2F05 & KANGXI RADICAL HOOK 4E85 亅 2F06 ' KANGXI RADICAL TWO 4E8C 二 2F07 ( KANGXI RADICAL LID 4EA0 亠 2F08 ) KANGXI RADICAL MAN 4EBA 人 2F09 * KANGXI RADICAL LEGS 513F 儿 2F0A + KANGXI RADICAL ENTER 5165 入 2F0B , KANGXI RADICAL EIGHT 516B 八 2F0C - KANGXI RADICAL DOWN BOX 5182 冂 2F0D . KANGXI RADICAL COVER 5196 冖 2F0E / KANGXI RADICAL ICE 51AB 冫 2F0F 0 KANGXI RADICAL TABLE 51E0 几 2F10 1 KANGXI RADICAL OPEN BOX 51F5 凵 2F11 2 KANGXI RADICAL KNIFE 5200 刀 2F12 3 KANGXI RADICAL POWER 529B 力 2F13 4 KANGXI RADICAL WRAP 52F9 勹 2F14 5 KANGXI RADICAL SPOON 5315 匕 2F15 6 KANGXI RADICAL RIGHT OPEN BOX 531A 匚 2F16 7 KANGXI RADICAL HIDING ENCLOSURE 5338 匸 2F17 8 KANGXI RADICAL TEN 5341 十 2F18 9 KANGXI RADICAL DIVINATION 535C 卜 2F19 : KANGXI RADICAL SEAL 5369 卩 2F1A ; KANGXI RADICAL CLIFF 5382 厂 2F1B < KANGXI RADICAL PRIVATE 53B6 厶 2F1C = KANGXI RADICAL AGAIN 53C8 又 2F1D > KANGXI RADICAL MOUTH 53E3 口

254

2F3C

2F1E KANGXI RADICAL ENCLOSURE 56D7 囗 2F1F KANGXI RADICAL EARTH 571F 土 2F20 KANGXI RADICAL SCHOLAR 58EB 士 2F21 KANGXI RADICAL GO 5902 夂 2F22 KANGXI RADICAL GO SLOWLY 590A 夊 2F23 KANGXI RADICAL EVENING 5915 夕 2F24 KANGXI RADICAL BIG 5927 大 2F25 KANGXI RADICAL WOMAN 5973 女 2F26 KANGXI RADICAL CHILD 5B50 子 2F27 KANGXI RADICAL ROOF 5B80 宀 2F28 KANGXI RADICAL INCH 5BF8 寸 2F29 KANGXI RADICAL SMALL 5C0F 小 2F2A KANGXI RADICAL LAME 5C22 尢 2F2B KANGXI RADICAL CORPSE 5C38 尸 2F2C KANGXI RADICAL SPROUT 5C6E 屮 2F2D KANGXI RADICAL MOUNTAIN 5C71 山 2F2E KANGXI RADICAL RIVER 5DDB 巛 2F2F KANGXI RADICAL WORK 5DE5 工 2F30 KANGXI RADICAL ONESELF 5DF1 己 2F31 KANGXI RADICAL TURBAN 5DFE 巾 2F32 KANGXI RADICAL DRY 5E72 干 2F33 KANGXI RADICAL SHORT THREAD 5E7A 幺 2F34 KANGXI RADICAL DOTTED CLIFF 5E7F 广 2F35 KANGXI RADICAL LONG STRIDE 5EF4 廴 2F36 KANGXI RADICAL TWO HANDS 5EFE 廾 2F37 KANGXI RADICAL SHOOT 5F0B 弋 2F38 KANGXI RADICAL BOW 5F13 弓 2F39 KANGXI RADICAL SNOUT 5F50 彐 2F3A KANGXI RADICAL BRISTLE 5F61 彡 2F3B KANGXI RADICAL STEP 5F73 彳 2F3C KANGXI RADICAL HEART 5FC3 心

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Ideographic Description Characters Range: 2FF0–2FFF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

CJK Symbols and Punctuation Range: 3000–303F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

3000

CJK Symbols and Punctuation 300 0

3014

3024

3034

3015

3025

3035

3016

3026

3036

3017

3027

3037

3018

3028

3038

3019

3029

3039

301A

302A

301B

302B

303A

303B

301C

302C

303C

301D

302D

303D

『〞 $ 300E

F

3033

」 $ 300D

E

3023

「 $ 300C

D

3013

》〛 $ 300B

C

3032

《〚 $ 卅 300A

B

3022

〉〙〩卄 3009

A

3012

〈〘〨十 3008

9

3031

〇〗〧 3007

8

3021

〆〖〦 3006

7

3011

々〕〥 3005

6

3030

〔〤 3004

5

3020

〃〓〣 3003

4

3010

。〒〢 3002

3

303

、】〡 3001

2

302

【〠 3000

1

301

303F

301E

302E

303E

』 $ 300F

301F

302F

303F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

259

Hiragana Range: 3040–309F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

3040

Hiragana 304

305

3050

3082

3092

3053

3063

3073

3083

3093

3054

3064

3074

3084

3094

3055

3065

3075

3085

3095

3056

3066

3076

3086

3096

3057

3067

3077

3087

3058

3068

3078

3088

3059

3069

3079

3089

3099

305A

306A

307A

308A

309A

305B

306B

307B

308B

309B

305C

306C

307C

308C

309C

305D

306D

307D

308D

309D

305E

306E

307E

308E

309E

くたはみわ 304F

262

3072

ぎぞのまゎゞ 304E

F

3062

きそねぽろゝ 304D

E

3052

がぜぬぼれ゜ 304C

D

3091

かせにほる゛ 304B

C

3081

おずなぺり $ 304A

B

3071

ぉすどべら $ 3049

A

3061

えじとへよ 3048

9

3051

ぇしでぷょ 3047

8

3090

うざてぶゆ 3046

7

3080

ぅさづふゅ 3045

6

3070

いごつぴやゔ 3044

5

3060

ぃこっびゃん 3043

4

309

あげぢひもを 3042

3

308

ぁけちぱめゑ 3041

2

307

ぐだばむゐ

0

1

306

309F

305F

306F

307F

308F

309F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Katakana Range: 30A0–30FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

30A0

Katakana 30A 0

30B2

30C2

30D2

30E2

30F2

30B3

30C3

30D3

30E3

30F3

30B4

30C4

30D4

30E4

30F4

30B5

30C5

30D5

30E5

30F5

30B6

30C6

30D6

30E6

30F6

30B7

30C7

30D7

30E7

30F7

30B8

30C8

30D8

30E8

30F8

30B9

30C9

30D9

30E9

30F9

30BA

30CA

30DA

30EA

30FA

30BB

30CB

30DB

30EB

30FB

30BC

30CC

30DC

30EC

30FC

30BD

30CD

30DD

30ED

30FD

ギゾノマヮヾ 30AE

F

30F1

キソネポロヽ 30AD

E

30E1

ガゼヌボレー 30AC

D

30D1

カセニホル・ 30AB

C

30C1

オズナペリ 30AA

B

30B1

ォスドベラ 30A9

A

30F0

エジトヘヨ 30A8

9

30E0

ェシデプョ 30A7

8

30D0

ウザテブユヶ 30A6

7

30C0

ゥサヅフュヵ 30A5

6

30B0

イゴツピヤヴ 30A4

5

30F

ィコッビャン 30A3

4

30E

アゲヂヒモヲ 30A2

3

30D

ァケチパメヱ 30A1

2

30C

グダバムヰ 30A0

1

30B

30FF

30BE

30CE

30DE

30EE

30FE

クタハミワ 30AF

30BF

30CF

30DF

30EF

30FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

265

Bopomofo Range: 3100–312F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Hangul Compatibility Jamo Range: 3130–318F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

3130

Hangul Compatibility Jamo 313

314

3140

3162

3172

3182

3143

3153

3163

3173

3183

3144

3154

3164

3174

3184

3145

3155

3165

3175

3185

3146

3156

3166

3176

3186

3147

3157

3167

3177

3187

3148

3158

3168

3178

3188

3149

3159

3169

3179

3189

314A

315A

316A

317A

318A

314B

315B

316B

317B

318B

314C

315C

316C

317C

318C

314D

315D

316D

317D

318D

ㄾ ㅎ ㅞ ㅮ ㅾ ㆎ 313E

F

3152

ㄽ ㅍ ㅝ ㅭ ㅽ ㆍ 313D

E

3142

ㄼ ㅌ ㅜ ㅬ ㅼ ㆌ 313C

D

3181

ㄻ ㅋ ㅛ ㅫ ㅻ ㆋ 313B

C

3171

ㄺ ㅊ ㅚ ㅪ ㅺ ㆊ 313A

B

3161

ㄹ ㅉ ㅙ ㅩ ㅹ ㆉ 3139

A

3151

ㄸ ㅈ ㅘ ㅨ ㅸ ㆈ 3138

9

3141

ㄷ ㅇ ㅗ ㅧ ㅷ ㆇ 3137

8

3180

ㄶ ㅆ ㅖ ㅦ ㅶ ㆆ 3136

7

3170

ㄵ ㅅ ㅕ ㅥ ㅵ ㆅ 3135

6

3160

ㄴ ㅄ ㅔ ㅤ ㅴ ㆄ 3134

5

3150

ㄳ ㅃ ㅓ ㅣ ㅳ ㆃ 3133

4

318

ㄲ ㅂ ㅒ ㅢ ㅲ ㆂ 3132

3

317

ㄱ ㅁ ㅑ ㅡ ㅱ ㆁ 3131

2

316

ㅀ ㅐ ㅠ ㅰ ㆀ

0

1

315

318F

314E

315E

316E

317E

318E

ㄿ ㅏ ㅟ ㅯ ㅿ 313F

314F

315F

316F

317F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

269

Kanbun Range: 3190–319F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Bopomofo Extended Range: 31A0–31BF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

CJK Strokes Range: 31C0–31EF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Katakana Phonetic Extensions Range: 31F0–31FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Enclosed CJK Letters and Months Range: 3200–32FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

3200 320

3200

322

323

324

325

326

327

328

329

32A 32B 32C 32D 32E

32F

3210

3220

3230

3240

3250

3260

3270

3280

3290

32A0

32B0

32C0

32D0

32E0

32F0

㈁㈑㈡㈱㉁㉡㉱㊁㊑㊡㋁㋑㋡㋱

1

3201

3211

3221

3231

3241

3251

3261

3271

3281

3291

32A1

32B1

32C1

32D1

32E1

32F1

㈂㈒㈢㈲㉂㉢㉲㊂㊒㊢㋂㋒㋢㋲

2

3202

3212

3222

3232

3242

3252

3262

3272

3282

3292

32A2

32B2

32C2

32D2

32E2

32F2

㈃㈓㈣㈳㉃㉣㉳㊃㊓㊣㋃㋓㋣㋳

3

3203

4

5

3293

32A3

32B3

32C3

32D3

32E3

32F3

㈈㈘㈨㈸

㉨㉸㊈㊘㊨㋈㋘㋨㋸

㈉㈙㈩㈹

㉩㉹㊉㊙㊩㋉㋙㋩㋹

㈊㈚㈪㈺

㉪㉺㊊㊚㊪㋊㋚㋪㋺

㈋㈛㈫㈻

㉫㉻㊋㊛㊫㋋㋛㋫㋻

㈌㈜㈬㈼

㉬㉼㊌㊜㊬ !㋜㋬㋼

㈍㈝㈭㈽

㉭㉽㊍㊝㊭ "㋝㋭㋽

㈎㈞㈮㈾

㉮㉾㊎㊞㊮ #㋞㋮㋾

320C

D

3283

㉧㉷㊇㊗㊧㋇㋗㋧㋷

320B

C

3273

㈇㈗㈧㈷

320A

B

3263

㉦㉶㊆㊖㊦㋆㋖㋦㋶

3209

A

3253

㈆㈖㈦㈶

3208

9

3243

㉥㉵㊅㊕㊥㋅㋕㋥㋵

3207

8

3233

㈅㈕㈥㈵ 3206

7

3223

㉤㉴㊄㊔㊤㋄㋔㋤㋴

3205

6

3213

㈄㈔㈤㈴ 3204

320D

320E

F

321

32FF

㈀㈐㈠㈰㉀㉠㉰㊀㊐㊠㊰㋀㋐㋠㋰

0

E

Enclosed CJK Letters and Months

㈏ 320F

276

3214

3215

3216

3217

3218

3219

321A

321B

321C

321D

321E

3224

3225

3226

3227

3228

3229

322A

322B

322C

322D

322E

3234

3235

3236

3237

3238

3239

323A

323B

323C

323D

323E

㈯㈿ 322F

323F

3254

3255

3256

3257

3258

3259

325A

325B

325C

325D

325E

3264

3265

3266

3267

3268

3269

326A

326B

326C

326D

326E

3274

3275

3276

3277

3278

3279

327A

327B

327C

327D

327E

3284

3285

3286

3287

3288

3289

328A

328B

328C

328D

328E

3294

3295

3296

3297

3298

3299

329A

329B

329C

329D

329E

32A4

32A5

32A6

32A7

32A8

32A9

32AA

32AB

32AC

32AD

32AE

㉯㉿㊏㊟㊯ 325F

326F

327F

328F

329F

32AF

32B4

32B5

32B6

32B7

32B8

32B9

32BA

32BB

32BC

32BD

32BE

32BF

32C4

32C5

32C6

32C7

32C8

32C9

32CA

32CB

32CC

32CD

32CE

32D4

32D5

32D6

32D7

32D8

32D9

32DA

32DB

32DC

32DD

32DE

32E4

32E5

32E6

32E7

32E8

32E9

32EA

32EB

32EC

32ED

32EE

32F4

32F5

32F6

32F7

32F8

32F9

32FA

32FB

32FC

32FD

32FE

$㋟㋯ 32CF

32DF

32EF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

3200

Enclosed CJK Letters and Months

Parenthesized Hangul elements 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 320A 320B 320C 320D

㈀ PARENTHESIZED HANGUL KIYEOK 0028 ( 1100 ᄀ0029 ) ㈁ PARENTHESIZED HANGUL NIEUN 0028 ( 1102 ᄂ0029 ) ㈂ PARENTHESIZED HANGUL TIKEUT 0028 ( 1103 ᄃ0029 ) ㈃ PARENTHESIZED HANGUL RIEUL 0028 ( 1105 ᄅ0029 ) ㈄ PARENTHESIZED HANGUL MIEUM 0028 ( 1106 ᄆ0029 ) ㈅ PARENTHESIZED HANGUL PIEUP 0028 ( 1107 ᄇ0029 ) ㈆ PARENTHESIZED HANGUL SIOS 0028 ( 1109 ᄉ0029 ) ㈇ PARENTHESIZED HANGUL IEUNG 0028 ( 110B ᄋ0029 ) ㈈ PARENTHESIZED HANGUL CIEUC 0028 ( 110C ᄌ0029 ) ㈉ PARENTHESIZED HANGUL CHIEUCH 0028 ( 110E ᄎ0029 ) ㈊ PARENTHESIZED HANGUL KHIEUKH 0028 ( 110F ᄏ0029 ) ㈋ PARENTHESIZED HANGUL THIEUTH 0028 ( 1110 ᄐ0029 ) ㈌ PARENTHESIZED HANGUL PHIEUPH 0028 ( 1111 ᄑ0029 ) ㈍ PARENTHESIZED HANGUL HIEUH 0028 ( 1112 ᄒ0029 )

Parenthesized Hangul syllables

320E ㈎ PARENTHESIZED HANGUL KIYEOK A 0028 ( 1100 ᄀ1161 ᅡ0029 ) 320F ㈏ PARENTHESIZED HANGUL NIEUN A 0028 ( 1102 ᄂ1161 ᅡ0029 ) 3210 ㈐ PARENTHESIZED HANGUL TIKEUT A 0028 ( 1103 ᄃ1161 ᅡ0029 ) 3211 ㈑ PARENTHESIZED HANGUL RIEUL A 0028 ( 1105 ᄅ1161 ᅡ0029 ) 3212 ㈒ PARENTHESIZED HANGUL MIEUM A 0028 ( 1106 ᄆ1161 ᅡ0029 ) 3213 ㈓ PARENTHESIZED HANGUL PIEUP A 0028 ( 1107 ᄇ1161 ᅡ0029 ) 3214 ㈔ PARENTHESIZED HANGUL SIOS A 0028 ( 1109 ᄉ1161 ᅡ0029 ) 3215 ㈕ PARENTHESIZED HANGUL IEUNG A 0028 ( 110B ᄋ1161 ᅡ0029 ) 3216 ㈖ PARENTHESIZED HANGUL CIEUC A 0028 ( 110C ᄌ1161 ᅡ0029 ) 3217 ㈗ PARENTHESIZED HANGUL CHIEUCH A 0028 ( 110E ᄎ1161 ᅡ0029 ) 3218 ㈘ PARENTHESIZED HANGUL KHIEUKH A 0028 ( 110F ᄏ1161 ᅡ0029 ) 3219 ㈙ PARENTHESIZED HANGUL THIEUTH A 0028 ( 1110 ᄐ1161 ᅡ0029 ) 321A ㈚ PARENTHESIZED HANGUL PHIEUPH A 0028 ( 1111 ᄑ1161 ᅡ0029 ) 321B ㈛ PARENTHESIZED HANGUL HIEUH A 0028 ( 1112 ᄒ1161 ᅡ0029 ) 321C ㈜ PARENTHESIZED HANGUL CIEUC U 0028 ( 110C ᄌ116E ᅮ0029 )

3235

Parenthesized Korean words

321D ㈝ PARENTHESIZED KOREAN CHARACTER OJEON 0028 ( 110B ᄋ1169 ᅩ110C ᄌ1165 ᅥ 11AB ᆫ0029 ) 321E ㈞ PARENTHESIZED KOREAN CHARACTER O HU 0028 ( 110B ᄋ1169 ᅩ1112 ᄒ116E ᅮ0029 )

Parenthesized ideographs 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 322A 322B 322C 322D 322E 322F 3230 3231 3232 3233 3234 3235

㈠ PARENTHESIZED IDEOGRAPH ONE 0028 ( 4E00 一 0029 ) ㈡ PARENTHESIZED IDEOGRAPH TWO 0028 ( 4E8C 二 0029 ) ㈢ PARENTHESIZED IDEOGRAPH THREE 0028 ( 4E09 三 0029 ) ㈣ PARENTHESIZED IDEOGRAPH FOUR 0028 ( 56DB 四 0029 ) ㈤ PARENTHESIZED IDEOGRAPH FIVE 0028 ( 4E94 五 0029 ) ㈥ PARENTHESIZED IDEOGRAPH SIX 0028 ( 516D 六 0029 ) ㈦ PARENTHESIZED IDEOGRAPH SEVEN 0028 ( 4E03 七 0029 ) ㈧ PARENTHESIZED IDEOGRAPH EIGHT 0028 ( 516B 八 0029 ) ㈨ PARENTHESIZED IDEOGRAPH NINE 0028 ( 4E5D 九 0029 ) ㈩ PARENTHESIZED IDEOGRAPH TEN 0028 ( 5341 十 0029 ) ㈪ PARENTHESIZED IDEOGRAPH MOON • Monday 0028 ( 6708 月 0029 ) ㈫ PARENTHESIZED IDEOGRAPH FIRE • Tuesday 0028 ( 706B 火 0029 ) ㈬ PARENTHESIZED IDEOGRAPH WATER • Wednesday 0028 ( 6C34 水 0029 ) ㈭ PARENTHESIZED IDEOGRAPH WOOD • Thursday 0028 ( 6728 木 0029 ) ㈮ PARENTHESIZED IDEOGRAPH METAL • Friday 0028 ( 91D1 金 0029 ) ㈯ PARENTHESIZED IDEOGRAPH EARTH • Saturday 0028 ( 571F 土 0029 ) ㈰ PARENTHESIZED IDEOGRAPH SUN • Sunday 0028 ( 65E5 日 0029 ) ㈱ PARENTHESIZED IDEOGRAPH STOCK • incorporated 0028 ( 682A 株 0029 ) ㈲ PARENTHESIZED IDEOGRAPH HAVE • limited 0028 ( 6709 有 0029 ) ㈳ PARENTHESIZED IDEOGRAPH SOCIETY • company 0028 ( 793E 社 0029 ) ㈴ PARENTHESIZED IDEOGRAPH NAME 0028 ( 540D 名 0029 ) ㈵ PARENTHESIZED IDEOGRAPH SPECIAL 0028 ( 7279 特 0029 )

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

277

CJK Compatibility Range: 3300–33FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

3300 330

3300

332

333

334

335

336

337

338

339

33A 33B 33C 33D 33E

33F

3310

3320

3330

3340

3350

3360

3370

3380

3390

33A0

33B0

33C0

33D0

33E0

33F0

㌁㌑㌡㌱㍁㍑㍡㍱㎁㎑㎡㎱㏁㏑㏡㏱

1

3301

3311

3321

3331

3341

3351

3361

3371

3381

3391

33A1

33B1

33C1

33D1

33E1

33F1

㌂㌒㌢㌲㍂㍒㍢㍲㎂㎒㎢㎲㏂㏒㏢㏲

2

3302

3312

3322

3332

3342

3352

3362

3372

3382

3392

33A2

33B2

33C2

33D2

33E2

33F2

㌃㌓㌣㌳㍃㍓㍣㍳㎃㎓㎣㎳㏃㏓㏣㏳

3

3303

3313

3323

3333

3343

3353

3363

3373

3383

3393

33A3

33B3

33C3

33D3

33E3

33F3

㌄㌔㌤㌴㍄㍔㍤㍴㎄㎔㎤㎴㏄㏔㏤㏴

4

3304

3314

3324

3334

3344

3354

3364

3374

3384

3394

33A4

33B4

33C4

33D4

33E4

33F4

㌅㌕㌥㌵㍅㍕㍥㍵㎅㎕㎥㎵㏅㏕㏥㏵

5

3305

3315

3325

3335

3345

3355

3365

3375

3385

3395

33A5

33B5

33C5

33D5

33E5

33F5

㌆㌖㌦㌶㍆㍖㍦㍶㎆㎦㎶㏆㏖㏦㏶

6

3306

3316

3326

3336

3346

3356

3366

3376

3386

3396

33A6

33B6

33C6

33D6

33E6

33F6

㌇㌗㌧㌷㍇㍗㍧㎇㎗㎧㎷㏇㏗㏧㏷

7

3307

3317

3327

3337

3347

3357

3367

3377

3387

3397

33A7

33B7

33C7

33D7

33E7

33F7

㌈㌘㌨㌸㍈㍘㍨㎈㎘㎨㎸㏈㏘㏨㏸

8

3308

3318

3328

3338

3348

3358

3368

3378

3388

3398

33A8

33B8

33C8

33D8

33E8

33F8

㌉㌙㌩㌹㍉㍙㍩㎉㎙㎩㎹㏉㏙㏩㏹

9

3309

3319

3329

3339

3349

3359

3369

3379

3389

3399

33A9

33B9

33C9

33D9

33E9

33F9

㌊㌚㌪㌺㍊㍚㍪㎊㎚㎪㎺㏊㏚㏪㏺

A

330A

331A

332A

333A

334A

335A

336A

337A

338A

339A

33AA

33BA

33CA

33DA

33EA

33FA

㌋㌛㌫㌻㍋㍛㍫㍻㎋㎛㎫㎻㏋㏛㏫㏻

B

330B

C

D

331B

332B

333B

334B

335B

336B

337B

338B

339B

33AB

33BB

33CB

33DB

33EB

33FB

㌌㌜㌬㌼㍌㍜㍬㍼㎌㎜㎼㏌㏜㏬㏼ 330C

331C

332C

333C

334C

335C

336C

337C

338C

339C

33AC

33BC

33CC

33DC

33EC

33FC

㌍㌝㌭㌽㍍㍝㍭㍽㎍㎝㎭㎽㏍㏝㏭㏽ 330D

331D

332D

333D

334D

335D

336D

337D

338D

339D

33AD

33BD

33CD

33DD

33ED

33FD

㌎㌞㌮㌾㍎㍞㍮㍾㎎㎞㎮㎾㏎㏮㏾ 330E

F

331

33FF

㌀㌐㌠㌰㍀㍐㍠㍰㎀㎐㎠㎰㏀㏐㏠㏰

0

E

CJK Compatibility

331E

332E

333E

334E

335E

336E

337E

338E

339E

33AE

33BE

33CE

33DE

33EE

33FE

㌏㌟㌯㌿㍏㍟㍯㍿㎏㎟㎯㎿㏏㏯

330F

282

331F

332F

333F

334F

335F

336F

337F

338F

339F

33AF

33BF

33CF

33DF

33EF

33FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

3300 Squared Katakana words 3300

3301

3302

3303 3304

3305 3306 3307

3308

3309 330A 330B 330C

330D

330E 330F 3310 3311

CJK Compatibility

㌀ SQUARE APAATO • apartment <square> 30A2 ア 30D1 パ 30FC ー 30C8 ト㌁ SQUARE ARUHUA • alpha <square> 30A2 ア 30EB ル 30D5 フ 30A1 ァ㌂ SQUARE ANPEA • ampere <square> 30A2 ア 30F3 ン 30DA ペ 30A2 ア㌃ SQUARE AARU • are (unit of area) <square> 30A2 ア 30FC ー 30EB ル㌄ SQUARE ININGU • inning <square> 30A4 イ 30CB ニ 30F3 ン 30B0 グ㌅ SQUARE INTI • inch <square> 30A4 イ 30F3 ン 30C1 チ㌆ SQUARE UON • won (Korean currency) <square> 30A6 ウ 30A9 ォ 30F3 ン㌇ SQUARE ESUKUUDO • escudo (Portuguese currency) <square> 30A8 エ 30B9 ス 30AF ク 30FC ー 30C9 ド㌈ SQUARE EEKAA • acre <square> 30A8 エ 30FC ー 30AB カ 30FC ー㌉ SQUARE ONSU • ounce <square> 30AA オ 30F3 ン 30B9 ス㌊ SQUARE OOMU • ohm <square> 30AA オ 30FC ー 30E0 ム㌋ SQUARE KAIRI • kai-ri: nautical mile <square> 30AB カ 30A4 イ 30EA リ㌌ SQUARE KARATTO • carat <square> 30AB カ 30E9 ラ 30C3 ッ 30C8 ト㌍ SQUARE KARORII • calorie <square> 30AB カ 30ED ロ 30EA リ 30FC ー㌎ SQUARE GARON • gallon <square> 30AC ガ 30ED ロ 30F3 ン㌏ SQUARE GANMA • gamma <square> 30AC ガ 30F3 ン 30DE マ㌐ SQUARE GIGA • giga <square> 30AE ギ 30AC ガ㌑ SQUARE GINII • guinea <square> 30AE ギ 30CB ニ 30FC ー

3312

3313

3314 3315

3316

3317

3318 3319

331A

331B

331C 331D 331E 331F

3320

3321

3322 3323

3323

㌒ SQUARE KYURII • curie <square> 30AD キ 30E5 ュ 30EA リ 30FC ー㌓ SQUARE GIRUDAA • guilder <square> 30AE ギ 30EB ル 30C0 ダ 30FC ー㌔ SQUARE KIRO • kilo <square> 30AD キ 30ED ロ㌕ SQUARE KIROGURAMU • kilogram <square> 30AD キ 30ED ロ 30B0 グ 30E9 ラ 30E0 ム㌖ SQUARE KIROMEETORU • kilometer <square> 30AD キ 30ED ロ 30E1 メ 30FC ー 30C8 ト 30EB ル㌗ SQUARE KIROWATTO • kilowatt <square> 30AD キ 30ED ロ 30EF ワ 30C3 ッ 30C8 ト㌘ SQUARE GURAMU • gram <square> 30B0 グ 30E9 ラ 30E0 ム㌙ SQUARE GURAMUTON • gram ton <square> 30B0 グ 30E9 ラ 30E0 ム 30C8 ト 30F3 ン㌚ SQUARE KURUZEIRO • cruzeiro (Brazilian currency) <square> 30AF ク 30EB ル 30BC ゼ 30A4 イ 30ED ロ㌛ SQUARE KUROONE • krone <square> 30AF ク 30ED ロ 30FC ー 30CD ネ㌜ SQUARE KEESU • case <square> 30B1 ケ 30FC ー 30B9 ス㌝ SQUARE KORUNA • koruna (Czech currency) <square> 30B3 コ 30EB ル 30CA ナ㌞ SQUARE KOOPO • co-op <square> 30B3 コ 30FC ー 30DD ポ㌟ SQUARE SAIKURU • cycle <square> 30B5 サ 30A4 イ 30AF ク 30EB ル㌠ SQUARE SANTIIMU • centime <square> 30B5 サ 30F3 ン 30C1 チ 30FC ー 30E0 ム㌡ SQUARE SIRINGU • shilling <square> 30B7 シ 30EA リ 30F3 ン 30B0 グ㌢ SQUARE SENTI • centi <square> 30BB セ 30F3 ン 30C1 チ㌣ SQUARE SENTO • cent <square> 30BB セ 30F3 ン 30C8 ト

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

283

3324 3324 3325 3326 3327 3328 3329 332A 332B

332C 332D

332E

332F 3330 3331 3332

3333

3334

3335 3336

284

CJK Compatibility

㌤ SQUARE DAASU • dozen <square> 30C0 ダ 30FC ー 30B9 ス㌥ SQUARE DESI • deci <square> 30C7 デ 30B7 シ㌦ SQUARE DORU • dollar <square> 30C9 ド 30EB ル㌧ SQUARE TON • ton <square> 30C8 ト 30F3 ン㌨ SQUARE NANO • nano <square> 30CA ナ 30CE ノ㌩ SQUARE NOTTO • knot, nautical mile <square> 30CE ノ 30C3 ッ 30C8 ト㌪ SQUARE HAITU • heights <square> 30CF ハ 30A4 イ 30C4 ツ㌫ SQUARE PAASENTO • percent <square> 30D1 パ 30FC ー 30BB セ 30F3 ン 30C8 ト㌬ SQUARE PAATU • parts <square> 30D1 パ 30FC ー 30C4 ツ㌭ SQUARE BAARERU • barrel <square> 30D0 バ 30FC ー 30EC レ 30EB ル㌮ SQUARE PIASUTORU • piaster <square> 30D4 ピ 30A2 ア 30B9 ス 30C8 ト 30EB ル㌯ SQUARE PIKURU • picul (unit of weight) <square> 30D4 ピ 30AF ク 30EB ル㌰ SQUARE PIKO • pico <square> 30D4 ピ 30B3 コ㌱ SQUARE BIRU • building <square> 30D3 ビ 30EB ル㌲ SQUARE HUARADDO • farad <square> 30D5 フ 30A1 ァ 30E9 ラ 30C3 ッ 30C9 ド㌳ SQUARE HUIITO • feet <square> 30D5 フ 30A3 ィ 30FC ー 30C8 ト㌴ SQUARE BUSSYERU • bushel <square> 30D6 ブ 30C3 ッ 30B7 シ 30A7 ェ 30EB ル㌵ SQUARE HURAN • franc <square> 30D5 フ 30E9 ラ 30F3 ン㌶ SQUARE HEKUTAARU • hectare <square> 30D8 ヘ 30AF ク 30BF タ 30FC ー 30EB ル

3337 3338 3339 333A 333B 333C 333D

333E 333F 3340 3341 3342 3343

3344 3345 3346 3347

3348

3349 334A

334A

㌷ SQUARE PESO • peso <square> 30DA ペ 30BD ソ㌸ SQUARE PENIHI • pfennig <square> 30DA ペ 30CB ニ 30D2 ヒ㌹ SQUARE HERUTU • hertz <square> 30D8 ヘ 30EB ル 30C4 ツ㌺ SQUARE PENSU • pence <square> 30DA ペ 30F3 ン 30B9 ス㌻ SQUARE PEEZI • page <square> 30DA ペ 30FC ー 30B8 ジ㌼ SQUARE BEETA • beta <square> 30D9 ベ 30FC ー 30BF タ㌽ SQUARE POINTO • point <square> 30DD ポ 30A4 イ 30F3 ン 30C8 ト㌾ SQUARE BORUTO • volt, bolt <square> 30DC ボ 30EB ル 30C8 ト㌿ SQUARE HON • hon: volume <square> 30DB ホ 30F3 ン㍀ SQUARE PONDO • pound <square> 30DD ポ 30F3 ン 30C9 ド㍁ SQUARE HOORU • hall <square> 30DB ホ 30FC ー 30EB ル㍂ SQUARE HOON • horn <square> 30DB ホ 30FC ー 30F3 ン㍃ SQUARE MAIKURO • micro <square> 30DE マ 30A4 イ 30AF ク 30ED ロ㍄ SQUARE MAIRU • mile <square> 30DE マ 30A4 イ 30EB ル㍅ SQUARE MAHHA • mach <square> 30DE マ 30C3 ッ 30CF ハ㍆ SQUARE MARUKU • mark <square> 30DE マ 30EB ル 30AF ク㍇ SQUARE MANSYON • mansion (i.e. better quality apartment) <square> 30DE マ 30F3 ン 30B7 シ 30E7 ョ 30F3 ン㍈ SQUARE MIKURON • micron <square> 30DF ミ 30AF ク 30ED ロ 30F3 ン㍉ SQUARE MIRI • milli <square> 30DF ミ 30EA リ㍊ SQUARE MIRIBAARU • millibar <square> 30DF ミ 30EA リ 30D0 バ 30FC ー 30EB ル

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

CJK Unified Ideographs Extension A Range: 3400–4DBF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

3400 340

3400

342

343

344

345

346

347

348

349

34A 34B 34C 34D 34E

34F

3410

3420

3430

3440

3450

3460

3470

3480

3490

34A0

34B0

34C0

34D0

34E0

34F0

㐁㐑㐡㐱㑁㑑㑡㑱㒁㒑㒡㒱㓁㓑㓡㓱

1

3401

3411

3421

3431

3441

3451

3461

3471

3481

3491

34A1

34B1

34C1

34D1

34E1

34F1

㐂㐒㐢㐲㑂㑒㑢㑲㒂㒒㒢㒲㓂㓒㓢㓲

2

3402

3412

3422

3432

3442

3452

3462

3472

3482

3492

34A2

34B2

34C2

34D2

34E2

34F2

㐃㐓㐣㐳㑃㑓㑣㑳㒃㒓㒣㒳㓃㓓㓣㓳

3

3403

3413

3423

3433

3443

3453

3463

3473

3483

3493

34A3

34B3

34C3

34D3

34E3

34F3

㐄㐔㐤㐴㑄㑔㑤㑴㒄㒔㒤㒴㓄㓔㓤㓴

4

3404

3414

3424

3434

3444

3454

3464

3474

3484

3494

34A4

34B4

34C4

34D4

34E4

34F4

㐅㐕㐥㐵㑅㑕㑥㑵㒅㒕㒥㒵㓅㓕㓥㓵

5

3405

3415

3425

3435

3445

3455

3465

3475

3485

3495

34A5

34B5

34C5

34D5

34E5

34F5

㐆㐖㐦㐶㑆㑖㑦㑶㒆㒖㒦㒶㓆㓖㓦㓶

6

3406

3416

3426

3436

3446

3456

3466

3476

3486

3496

34A6

34B6

34C6

34D6

34E6

34F6

㐇㐗㐧㐷㑇㑗㑧㑷㒇㒗㒧㒷㓇㓗㓧㓷

7

3407

3417

3427

3437

3447

3457

3467

3477

3487

3497

34A7

34B7

34C7

34D7

34E7

34F7

㐈㐘㐨㐸㑈㑘㑨㑸㒈㒘㒨㒸㓈㓘㓨㓸

8

3408

3418

3428

3438

3448

3458

3468

3478

3488

3498

34A8

34B8

34C8

34D8

34E8

34F8

㐉㐙㐩㐹㑉㑙㑩㑹㒉㒙㒩㒹㓉㓙㓩㓹

9

3409

3419

3429

3439

3449

3459

3469

3479

3489

3499

34A9

34B9

34C9

34D9

34E9

34F9

㐊㐚㐪㐺㑊㑚㑪㑺㒊㒚㒪㒺㓊㓚㓪㓺

A

340A

341A

342A

343A

344A

345A

346A

347A

348A

349A

34AA

34BA

34CA

34DA

34EA

34FA

㐋㐛㐫㐻㑋㑛㑫㑻㒋㒛㒫㒻㓋㓛㓫㓻

B

340B

C

D

341B

342B

343B

344B

345B

346B

347B

348B

349B

34AB

34BB

34CB

34DB

34EB

34FB

㐌㐜㐬㐼㑌㑜㑬㑼㒌㒜㒬㒼㓌㓜㓬㓼 340C

341C

342C

343C

344C

345C

346C

347C

348C

349C

34AC

34BC

34CC

34DC

34EC

34FC

㐍㐝㐭㐽㑍㑝㑭㑽㒍㒝㒭㒽㓍㓝㓭㓽 340D

341D

342D

343D

344D

345D

346D

347D

348D

349D

34AD

34BD

34CD

34DD

34ED

34FD

㐎㐞㐮㐾㑎㑞㑮㑾㒎㒞㒮㒾㓎㓞㓮㓾 340E

F

341

34FF

㐀㐐㐠㐰㑀㑐㑠㑰㒀㒐㒠㒰㓀㓐㓠㓰

0

E

CJK Unified Ideographs Extension A

341E

342E

343E

344E

345E

346E

347E

348E

349E

34AE

34BE

34CE

34DE

34EE

34FE

㐏㐟㐯㐿㑏㑟㑯㑿㒏㒟㒯㒿㓏㓟㓯㓿 340F

341F

342F

343F

344F

345F

346F

347F

348F

349F

34AF

34BF

34CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

34DF

34EF

34FF

289

3500 350

3500

352

353

354

355

356

357

358

359

35A 35B 35C 35D 35E

35F

3510

3520

3530

3540

3550

3560

3570

3580

3590

35A0

35B0

35C0

35D0

35E0

35F0

㔁㔑㔡㔱㕁㕑㕡㕱㖁㖑㖡㖱㗁㗑㗡㗱

1

3501

3511

3521

3531

3541

3551

3561

3571

3581

3591

35A1

35B1

35C1

35D1

35E1

35F1

㔂㔒㔢㔲㕂㕒㕢㕲㖂㖒㖢㖲㗂㗒㗢㗲

2

3502

3512

3522

3532

3542

3552

3562

3572

3582

3592

35A2

35B2

35C2

35D2

35E2

35F2

㔃㔓㔣㔳㕃㕓㕣㕳㖃㖓㖣㖳㗃㗓㗣㗳

3

3503

3513

3523

3533

3543

3553

3563

3573

3583

3593

35A3

35B3

35C3

35D3

35E3

35F3

㔄㔔㔤㔴㕄㕔㕤㕴㖄㖔㖤㖴㗄㗔㗤㗴

4

3504

3514

3524

3534

3544

3554

3564

3574

3584

3594

35A4

35B4

35C4

35D4

35E4

35F4

㔅㔕㔥㔵㕅㕕㕥㕵㖅㖕㖥㖵㗅㗕㗥㗵

5

3505

3515

3525

3535

3545

3555

3565

3575

3585

3595

35A5

35B5

35C5

35D5

35E5

35F5

㔆㔖㔦㔶㕆㕖㕦㕶㖆㖖㖦㖶㗆㗖㗦㗶

6

3506

3516

3526

3536

3546

3556

3566

3576

3586

3596

35A6

35B6

35C6

35D6

35E6

35F6

㔇㔗㔧㔷㕇㕗㕧㕷㖇㖗㖧㖷㗇㗗㗧㗷

7

3507

3517

3527

3537

3547

3557

3567

3577

3587

3597

35A7

35B7

35C7

35D7

35E7

35F7

㔈㔘㔨㔸㕈㕘㕨㕸㖈㖘㖨㖸㗈㗘㗨㗸

8

3508

3518

3528

3538

3548

3558

3568

3578

3588

3598

35A8

35B8

35C8

35D8

35E8

35F8

㔉㔙㔩㔹㕉㕙㕩㕹㖉㖙㖩㖹㗉㗙㗩㗹

9

3509

3519

3529

3539

3549

3559

3569

3579

3589

3599

35A9

35B9

35C9

35D9

35E9

35F9

㔊㔚㔪㔺㕊㕚㕪㕺㖊㖚㖪㖺㗊㗚㗪㗺

A

350A

351A

352A

353A

354A

355A

356A

357A

358A

359A

35AA

35BA

35CA

35DA

35EA

35FA

㔋㔛㔫㔻㕋㕛㕫㕻㖋㖛㖫㖻㗋㗛㗫㗻

B

350B

C

D

351B

352B

353B

354B

355B

356B

357B

358B

359B

35AB

35BB

35CB

35DB

35EB

35FB

㔌㔜㔬㔼㕌㕜㕬㕼㖌㖜㖬㖼㗌㗜㗬㗼 350C

351C

352C

353C

354C

355C

356C

357C

358C

359C

35AC

35BC

35CC

35DC

35EC

35FC

㔍㔝㔭㔽㕍㕝㕭㕽㖍㖝㖭㖽㗍㗝㗭㗽 350D

351D

352D

353D

354D

355D

356D

357D

358D

359D

35AD

35BD

35CD

35DD

35ED

35FD

㔎㔞㔮㔾㕎㕞㕮㕾㖎㖞㖮㖾㗎㗞㗮㗾 350E

F

351

35FF

㔀㔐㔠㔰㕀㕐㕠㕰㖀㖐㖠㖰㗀㗐㗠㗰

0

E

CJK Unified Ideographs Extension A

351E

352E

353E

354E

355E

356E

357E

358E

359E

35AE

35BE

35CE

35DE

35EE

35FE

㔏㔟㔯㔿㕏㕟㕯㕿㖏㖟㖯㖿㗏㗟㗯㗿 350F

290

351F

352F

353F

354F

355F

356F

357F

358F

359F

35AF

35BF

35CF

35DF

35EF

35FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

3600 360

3600

362

363

364

365

366

367

368

369

36A 36B 36C 36D 36E

36F

3610

3620

3630

3640

3650

3660

3670

3680

3690

36A0

36B0

36C0

36D0

36E0

36F0

㘁㘑㘡㘱㙁㙑㙡㙱㚁㚑㚡㚱㛁㛑㛡㛱

1

3601

3611

3621

3631

3641

3651

3661

3671

3681

3691

36A1

36B1

36C1

36D1

36E1

36F1

㘂㘒㘢㘲㙂㙒㙢㙲㚂㚒㚢㚲㛂㛒㛢㛲

2

3602

3612

3622

3632

3642

3652

3662

3672

3682

3692

36A2

36B2

36C2

36D2

36E2

36F2

㘃㘓㘣㘳㙃㙓㙣㙳㚃㚓㚣㚳㛃㛓㛣㛳

3

3603

3613

3623

3633

3643

3653

3663

3673

3683

3693

36A3

36B3

36C3

36D3

36E3

36F3

㘄㘔㘤㘴㙄㙔㙤㙴㚄㚔㚤㚴㛄㛔㛤㛴

4

3604

3614

3624

3634

3644

3654

3664

3674

3684

3694

36A4

36B4

36C4

36D4

36E4

36F4

㘅㘕㘥㘵㙅㙕㙥㙵㚅㚕㚥㚵㛅㛕㛥㛵

5

3605

3615

3625

3635

3645

3655

3665

3675

3685

3695

36A5

36B5

36C5

36D5

36E5

36F5

㘆㘖㘦㘶㙆㙖㙦㙶㚆㚖㚦㚶㛆㛖㛦㛶

6

3606

3616

3626

3636

3646

3656

3666

3676

3686

3696

36A6

36B6

36C6

36D6

36E6

36F6

㘇㘗㘧㘷㙇㙗㙧㙷㚇㚗㚧㚷㛇㛗㛧㛷

7

3607

3617

3627

3637

3647

3657

3667

3677

3687

3697

36A7

36B7

36C7

36D7

36E7

36F7

㘈㘘㘨㘸㙈㙘㙨㙸㚈㚘㚨㚸㛈㛘㛨㛸

8

3608

3618

3628

3638

3648

3658

3668

3678

3688

3698

36A8

36B8

36C8

36D8

36E8

36F8

㘉㘙㘩㘹㙉㙙㙩㙹㚉㚙㚩㚹㛉㛙㛩㛹

9

3609

3619

3629

3639

3649

3659

3669

3679

3689

3699

36A9

36B9

36C9

36D9

36E9

36F9

㘊㘚㘪㘺㙊㙚㙪㙺㚊㚚㚪㚺㛊㛚㛪㛺

A

360A

361A

362A

363A

364A

365A

366A

367A

368A

369A

36AA

36BA

36CA

36DA

36EA

36FA

㘋㘛㘫㘻㙋㙛㙫㙻㚋㚛㚫㚻㛋㛛㛫㛻

B

360B

C

D

361B

362B

363B

364B

365B

366B

367B

368B

369B

36AB

36BB

36CB

36DB

36EB

36FB

㘌㘜㘬㘼㙌㙜㙬㙼㚌㚜㚬㚼㛌㛜㛬㛼 360C

361C

362C

363C

364C

365C

366C

367C

368C

369C

36AC

36BC

36CC

36DC

36EC

36FC

㘍㘝㘭㘽㙍㙝㙭㙽㚍㚝㚭㚽㛍㛝㛭㛽 360D

361D

362D

363D

364D

365D

366D

367D

368D

369D

36AD

36BD

36CD

36DD

36ED

36FD

㘎㘞㘮㘾㙎㙞㙮㙾㚎㚞㚮㚾㛎㛞㛮㛾 360E

F

361

36FF

㘀㘐㘠㘰㙀㙐㙠㙰㚀㚐㚠㚰㛀㛐㛠㛰

0

E

CJK Unified Ideographs Extension A

361E

362E

363E

364E

365E

366E

367E

368E

369E

36AE

36BE

36CE

36DE

36EE

36FE

㘏㘟㘯㘿㙏㙟㙯㙿㚏㚟㚯㚿㛏㛟㛯㛿 360F

361F

362F

363F

364F

365F

366F

367F

368F

369F

36AF

36BF

36CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

36DF

36EF

36FF

291

3700 370

3700

372

373

374

375

376

377

378

379

37A 37B 37C 37D 37E

37F

3710

3720

3730

3740

3750

3760

3770

3780

3790

37A0

37B0

37C0

37D0

37E0

37F0

㜁㜑㜡㜱㝁㝑㝡㝱㞁㞑㞡㞱㟁㟑㟡㟱

1

3701

3711

3721

3731

3741

3751

3761

3771

3781

3791

37A1

37B1

37C1

37D1

37E1

37F1

㜂㜒㜢㜲㝂㝒㝢㝲㞂㞒㞢㞲㟂㟒㟢㟲

2

3702

3712

3722

3732

3742

3752

3762

3772

3782

3792

37A2

37B2

37C2

37D2

37E2

37F2

㜃㜓㜣㜳㝃㝓㝣㝳㞃㞓㞣㞳㟃㟓㟣㟳

3

3703

3713

3723

3733

3743

3753

3763

3773

3783

3793

37A3

37B3

37C3

37D3

37E3

37F3

㜄㜔㜤㜴㝄㝔㝤㝴㞄㞔㞤㞴㟄㟔㟤㟴

4

3704

3714

3724

3734

3744

3754

3764

3774

3784

3794

37A4

37B4

37C4

37D4

37E4

37F4

㜅㜕㜥㜵㝅㝕㝥㝵㞅㞕㞥㞵㟅㟕㟥㟵

5

3705

3715

3725

3735

3745

3755

3765

3775

3785

3795

37A5

37B5

37C5

37D5

37E5

37F5

㜆㜖㜦㜶㝆㝖㝦㝶㞆㞖㞦㞶㟆㟖㟦㟶

6

3706

3716

3726

3736

3746

3756

3766

3776

3786

3796

37A6

37B6

37C6

37D6

37E6

37F6

㜇㜗㜧㜷㝇㝗㝧㝷㞇㞗㞧㞷㟇㟗㟧㟷

7

3707

3717

3727

3737

3747

3757

3767

3777

3787

3797

37A7

37B7

37C7

37D7

37E7

37F7

㜈㜘㜨㜸㝈㝘㝨㝸㞈㞘㞨㞸㟈㟘㟨㟸

8

3708

3718

3728

3738

3748

3758

3768

3778

3788

3798

37A8

37B8

37C8

37D8

37E8

37F8

㜉㜙㜩㜹㝉㝙㝩㝹㞉㞙㞩㞹㟉㟙㟩㟹

9

3709

3719

3729

3739

3749

3759

3769

3779

3789

3799

37A9

37B9

37C9

37D9

37E9

37F9

㜊㜚㜪㜺㝊㝚㝪㝺㞊㞚㞪㞺㟊㟚㟪㟺

A

370A

371A

372A

373A

374A

375A

376A

377A

378A

379A

37AA

37BA

37CA

37DA

37EA

37FA

㜋㜛㜫㜻㝋㝛㝫㝻㞋㞛㞫㞻㟋㟛㟫㟻

B

370B

C

D

371B

372B

373B

374B

375B

376B

377B

378B

379B

37AB

37BB

37CB

37DB

37EB

37FB

㜌㜜㜬㜼㝌㝜㝬㝼㞌㞜㞬㞼㟌㟜㟬㟼 370C

371C

372C

373C

374C

375C

376C

377C

378C

379C

37AC

37BC

37CC

37DC

37EC

37FC

㜍㜝㜭㜽㝍㝝㝭㝽㞍㞝㞭㞽㟍㟝㟭㟽 370D

371D

372D

373D

374D

375D

376D

377D

378D

379D

37AD

37BD

37CD

37DD

37ED

37FD

㜎㜞㜮㜾㝎㝞㝮㝾㞎㞞㞮㞾㟎㟞㟮㟾 370E

F

371

37FF

㜀㜐㜠㜰㝀㝐㝠㝰㞀㞐㞠㞰㟀㟐㟠㟰

0

E

CJK Unified Ideographs Extension A

371E

372E

373E

374E

375E

376E

377E

378E

379E

37AE

37BE

37CE

37DE

37EE

37FE

㜏㜟㜯㜿㝏㝟㝯㝿㞏㞟㞯㞿㟏㟟㟯㟿 370F

292

371F

372F

373F

374F

375F

376F

377F

378F

379F

37AF

37BF

37CF

37DF

37EF

37FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

3800 380

3800

382

383

384

385

386

387

388

389

38A 38B 38C 38D 38E

38F

3810

3820

3830

3840

3850

3860

3870

3880

3890

38A0

38B0

38C0

38D0

38E0

38F0

㠁㠑㠡㠱㡁㡑㡡㡱㢁㢑㢡㢱㣁㣑㣡㣱

1

3801

3811

3821

3831

3841

3851

3861

3871

3881

3891

38A1

38B1

38C1

38D1

38E1

38F1

㠂㠒㠢㠲㡂㡒㡢㡲㢂㢒㢢㢲㣂㣒㣢㣲

2

3802

3812

3822

3832

3842

3852

3862

3872

3882

3892

38A2

38B2

38C2

38D2

38E2

38F2

㠃㠓㠣㠳㡃㡓㡣㡳㢃㢓㢣㢳㣃㣓㣣㣳

3

3803

3813

3823

3833

3843

3853

3863

3873

3883

3893

38A3

38B3

38C3

38D3

38E3

38F3

㠄㠔㠤㠴㡄㡔㡤㡴㢄㢔㢤㢴㣄㣔㣤㣴

4

3804

3814

3824

3834

3844

3854

3864

3874

3884

3894

38A4

38B4

38C4

38D4

38E4

38F4

㠅㠕㠥㠵㡅㡕㡥㡵㢅㢕㢥㢵㣅㣕㣥㣵

5

3805

3815

3825

3835

3845

3855

3865

3875

3885

3895

38A5

38B5

38C5

38D5

38E5

38F5

㠆㠖㠦㠶㡆㡖㡦㡶㢆㢖㢦㢶㣆㣖㣦㣶

6

3806

3816

3826

3836

3846

3856

3866

3876

3886

3896

38A6

38B6

38C6

38D6

38E6

38F6

㠇㠗㠧㠷㡇㡗㡧㡷㢇㢗㢧㢷㣇㣗㣧㣷

7

3807

3817

3827

3837

3847

3857

3867

3877

3887

3897

38A7

38B7

38C7

38D7

38E7

38F7

㠈㠘㠨㠸㡈㡘㡨㡸㢈㢘㢨㢸㣈㣘㣨㣸

8

3808

3818

3828

3838

3848

3858

3868

3878

3888

3898

38A8

38B8

38C8

38D8

38E8

38F8

㠉㠙㠩㠹㡉㡙㡩㡹㢉㢙㢩㢹㣉㣙㣩㣹

9

3809

3819

3829

3839

3849

3859

3869

3879

3889

3899

38A9

38B9

38C9

38D9

38E9

38F9

㠊㠚㠪㠺㡊㡚㡪㡺㢊㢚㢪㢺㣊㣚㣪㣺

A

380A

381A

382A

383A

384A

385A

386A

387A

388A

389A

38AA

38BA

38CA

38DA

38EA

38FA

㠋㠛㠫㠻㡋㡛㡫㡻㢋㢛㢫㢻㣋㣛㣫㣻

B

380B

C

D

381B

382B

383B

384B

385B

386B

387B

388B

389B

38AB

38BB

38CB

38DB

38EB

38FB

㠌㠜㠬㠼㡌㡜㡬㡼㢌㢜㢬㢼㣌㣜㣬㣼 380C

381C

382C

383C

384C

385C

386C

387C

388C

389C

38AC

38BC

38CC

38DC

38EC

38FC

㠍㠝㠭㠽㡍㡝㡭㡽㢍㢝㢭㢽㣍㣝㣭㣽 380D

381D

382D

383D

384D

385D

386D

387D

388D

389D

38AD

38BD

38CD

38DD

38ED

38FD

㠎㠞㠮㠾㡎㡞㡮㡾㢎㢞㢮㢾㣎㣞㣮㣾 380E

F

381

38FF

㠀㠐㠠㠰㡀㡐㡠㡰㢀㢐㢠㢰㣀㣐㣠㣰

0

E

CJK Unified Ideographs Extension A

381E

382E

383E

384E

385E

386E

387E

388E

389E

38AE

38BE

38CE

38DE

38EE

38FE

㠏㠟㠯㠿㡏㡟㡯㡿㢏㢟㢯㢿㣏㣟㣯㣿 380F

381F

382F

383F

384F

385F

386F

387F

388F

389F

38AF

38BF

38CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

38DF

38EF

38FF

293

3900 390

3900

392

393

394

395

396

397

398

399

39A 39B 39C 39D 39E

39F

3910

3920

3930

3940

3950

3960

3970

3980

3990

39A0

39B0

39C0

39D0

39E0

39F0

㤁㤑㤡㤱㥁㥑㥡㥱㦁㦑㦡㦱㧁㧑㧡㧱

1

3901

3911

3921

3931

3941

3951

3961

3971

3981

3991

39A1

39B1

39C1

39D1

39E1

39F1

㤂㤒㤢㤲㥂㥒㥢㥲㦂㦒㦢㦲㧂㧒㧢㧲

2

3902

3912

3922

3932

3942

3952

3962

3972

3982

3992

39A2

39B2

39C2

39D2

39E2

39F2

㤃㤓㤣㤳㥃㥓㥣㥳㦃㦓㦣㦳㧃㧓㧣㧳

3

3903

3913

3923

3933

3943

3953

3963

3973

3983

3993

39A3

39B3

39C3

39D3

39E3

39F3

㤄㤔㤤㤴㥄㥔㥤㥴㦄㦔㦤㦴㧄㧔㧤㧴

4

3904

3914

3924

3934

3944

3954

3964

3974

3984

3994

39A4

39B4

39C4

39D4

39E4

39F4

㤅㤕㤥㤵㥅㥕㥥㥵㦅㦕㦥㦵㧅㧕㧥㧵

5

3905

3915

3925

3935

3945

3955

3965

3975

3985

3995

39A5

39B5

39C5

39D5

39E5

39F5

㤆㤖㤦㤶㥆㥖㥦㥶㦆㦖㦦㦶㧆㧖㧦㧶

6

3906

3916

3926

3936

3946

3956

3966

3976

3986

3996

39A6

39B6

39C6

39D6

39E6

39F6

㤇㤗㤧㤷㥇㥗㥧㥷㦇㦗㦧㦷㧇㧗㧧㧷

7

3907

3917

3927

3937

3947

3957

3967

3977

3987

3997

39A7

39B7

39C7

39D7

39E7

39F7

㤈㤘㤨㤸㥈㥘㥨㥸㦈㦘㦨㦸㧈㧘㧨㧸

8

3908

3918

3928

3938

3948

3958

3968

3978

3988

3998

39A8

39B8

39C8

39D8

39E8

39F8

㤉㤙㤩㤹㥉㥙㥩㥹㦉㦙㦩㦹㧉㧙㧩㧹

9

3909

3919

3929

3939

3949

3959

3969

3979

3989

3999

39A9

39B9

39C9

39D9

39E9

39F9

㤊㤚㤪㤺㥊㥚㥪㥺㦊㦚㦪㦺㧊㧚㧪㧺

A

390A

391A

392A

393A

394A

395A

396A

397A

398A

399A

39AA

39BA

39CA

39DA

39EA

39FA

㤋㤛㤫㤻㥋㥛㥫㥻㦋㦛㦫㦻㧋㧛㧫㧻

B

390B

C

D

391B

392B

393B

394B

395B

396B

397B

398B

399B

39AB

39BB

39CB

39DB

39EB

39FB

㤌㤜㤬㤼㥌㥜㥬㥼㦌㦜㦬㦼㧌㧜㧬㧼 390C

391C

392C

393C

394C

395C

396C

397C

398C

399C

39AC

39BC

39CC

39DC

39EC

39FC

㤍㤝㤭㤽㥍㥝㥭㥽㦍㦝㦭㦽㧍㧝㧭㧽 390D

391D

392D

393D

394D

395D

396D

397D

398D

399D

39AD

39BD

39CD

39DD

39ED

39FD

㤎㤞㤮㤾㥎㥞㥮㥾㦎㦞㦮㦾㧎㧞㧮㧾 390E

F

391

39FF

㤀㤐㤠㤰㥀㥐㥠㥰㦀㦐㦠㦰㧀㧐㧠㧰

0

E

CJK Unified Ideographs Extension A

391E

392E

393E

394E

395E

396E

397E

398E

399E

39AE

39BE

39CE

39DE

39EE

39FE

㤏㤟㤯㤿㥏㥟㥯㥿㦏㦟㦯㦿㧏㧟㧯㧿 390F

294

391F

392F

393F

394F

395F

396F

397F

398F

399F

39AF

39BF

39CF

39DF

39EF

39FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

3A00

CJK Unified Ideographs Extension A

3AFF

3A0 3A1 3A2 3A3 3A4 3A5 3A6 3A7 3A8 3A9 3AA 3AB 3AC 3AD 3AE 3AF

㨀㨐㨠㨰㩀㩐㩠㩰㪀㪐㪠㪰㫀㫐㫠㫰

0

3A00

3A01

3A30

3A40

3A50

3A60

3A70

3A80

3A90

3AA0

3AB0

3AC0

3AD0

3AE0

3AF0

3A11

3A21

3A31

3A41

3A51

3A61

3A71

3A81

3A91

3AA1

3AB1

3AC1

3AD1

3AE1

3AF1

㨂㨒㨢㨲㩂㩒㩢㩲㪂㪒㪢㪲㫂㫒㫢㫲

2

3A02

3A12

3A22

3A32

3A42

3A52

3A62

3A72

3A82

3A92

3AA2

3AB2

3AC2

3AD2

3AE2

3AF2

㨃㨓㨣㨳㩃㩓㩣㩳㪃㪓㪣㪳㫃㫓㫣㫳

3

3A03

3A13

3A23

3A33

3A43

3A53

3A63

3A73

3A83

3A93

3AA3

3AB3

3AC3

3AD3

3AE3

3AF3

㨄㨔㨤㨴㩄㩔㩤㩴㪄㪔㪤㪴㫄㫔㫤㫴

4

3A04

3A14

3A24

3A34

3A44

3A54

3A64

3A74

3A84

3A94

3AA4

3AB4

3AC4

3AD4

3AE4

3AF4

㨅㨕㨥㨵㩅㩕㩥㩵㪅㪕㪥㪵㫅㫕㫥㫵

5

3A05

3A15

3A25

3A35

3A45

3A55

3A65

3A75

3A85

3A95

3AA5

3AB5

3AC5

3AD5

3AE5

3AF5

㨆㨖㨦㨶㩆㩖㩦㩶㪆㪖㪦㪶㫆㫖㫦㫶

6

3A06

3A16

3A26

3A36

3A46

3A56

3A66

3A76

3A86

3A96

3AA6

3AB6

3AC6

3AD6

3AE6

3AF6

㨇㨗㨧㨷㩇㩗㩧㩷㪇㪗㪧㪷㫇㫗㫧㫷

7

3A07

3A17

3A27

3A37

3A47

3A57

3A67

3A77

3A87

3A97

3AA7

3AB7

3AC7

3AD7

3AE7

3AF7

㨈㨘㨨㨸㩈㩘㩨㩸㪈㪘㪨㪸㫈㫘㫨㫸

8

3A08

3A18

3A28

3A38

3A48

3A58

3A68

3A78

3A88

3A98

3AA8

3AB8

3AC8

3AD8

3AE8

3AF8

㨉㨙㨩㨹㩉㩙㩩㩹㪉㪙㪩㪹㫉㫙㫩㫹

9

3A09

3A19

3A29

3A39

3A49

3A59

3A69

3A79

3A89

3A99

3AA9

3AB9

3AC9

3AD9

3AE9

3AF9

㨊㨚㨪㨺㩊㩚㩪㩺㪊㪚㪪㪺㫊㫚㫪㫺

A

3A0A

3A1A

3A2A

3A3A

3A4A

3A5A

3A6A

3A7A

3A8A

3A9A

3AAA

3ABA

3ACA

3ADA

3AEA

3AFA

㨋㨛㨫㨻㩋㩛㩫㩻㪋㪛㪫㪻㫋㫛㫫㫻

B

3A0B

C

D

3A1B

3A2B

3A3B

3A4B

3A5B

3A6B

3A7B

3A8B

3A9B

3AAB

3ABB

3ACB

3ADB

3AEB

3AFB

㨌㨜㨬㨼㩌㩜㩬㩼㪌㪜㪬㪼㫌㫜㫬㫼 3A0C

3A1C

3A2C

3A3C

3A4C

3A5C

3A6C

3A7C

3A8C

3A9C

3AAC

3ABC

3ACC

3ADC

3AEC

3AFC

㨍㨝㨭㨽㩍㩝㩭㩽㪍㪝㪭㪽㫍㫝㫭㫽 3A0D

3A1D

3A2D

3A3D

3A4D

3A5D

3A6D

3A7D

3A8D

3A9D

3AAD

3ABD

3ACD

3ADD

3AED

3AFD

㨎㨞㨮㨾㩎㩞㩮㩾㪎㪞㪮㪾㫎㫞㫮㫾 3A0E

F

3A20

㨁㨑㨡㨱㩁㩑㩡㩱㪁㪑㪡㪱㫁㫑㫡㫱

1

E

3A10

3A1E

3A2E

3A3E

3A4E

3A5E

3A6E

3A7E

3A8E

3A9E

3AAE

3ABE

3ACE

3ADE

3AEE

3AFE

㨏㨟㨯㨿㩏㩟㩯㩿㪏㪟㪯㪿㫏㫟㫯㫿 3A0F

3A1F

3A2F

3A3F

3A4F

3A5F

3A6F

3A7F

3A8F

3A9F

3AAF

3ABF

3ACF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

3ADF

3AEF

3AFF

295

3B00

CJK Unified Ideographs Extension A

3BFF

3B0 3B1 3B2 3B3 3B4 3B5 3B6 3B7 3B8 3B9 3BA 3BB 3BC 3BD 3BE 3BF

㬀㬐㬠㬰㭀㭐㭠㭰㮀㮐㮠㮰㯀㯐㯠㯰

0

3B00

3B01

3B30

3B40

3B50

3B60

3B70

3B80

3B90

3BA0

3BB0

3BC0

3BD0

3BE0

3BF0

3B11

3B21

3B31

3B41

3B51

3B61

3B71

3B81

3B91

3BA1

3BB1

3BC1

3BD1

3BE1

3BF1

㬂㬒㬢㬲㭂㭒㭢㭲㮂㮒㮢㮲㯂㯒㯢㯲

2

3B02

3B12

3B22

3B32

3B42

3B52

3B62

3B72

3B82

3B92

3BA2

3BB2

3BC2

3BD2

3BE2

3BF2

㬃㬓㬣㬳㭃㭓㭣㭳㮃㮓㮣㮳㯃㯓㯣㯳

3

3B03

3B13

3B23

3B33

3B43

3B53

3B63

3B73

3B83

3B93

3BA3

3BB3

3BC3

3BD3

3BE3

3BF3

㬄㬔㬤㬴㭄㭔㭤㭴㮄㮔㮤㮴㯄㯔㯤㯴

4

3B04

3B14

3B24

3B34

3B44

3B54

3B64

3B74

3B84

3B94

3BA4

3BB4

3BC4

3BD4

3BE4

3BF4

㬅㬕㬥㬵㭅㭕㭥㭵㮅㮕㮥㮵㯅㯕㯥㯵

5

3B05

3B15

3B25

3B35

3B45

3B55

3B65

3B75

3B85

3B95

3BA5

3BB5

3BC5

3BD5

3BE5

3BF5

㬆㬖㬦㬶㭆㭖㭦㭶㮆㮖㮦㮶㯆㯖㯦㯶

6

3B06

3B16

3B26

3B36

3B46

3B56

3B66

3B76

3B86

3B96

3BA6

3BB6

3BC6

3BD6

3BE6

3BF6

㬇㬗㬧㬷㭇㭗㭧㭷㮇㮗㮧㮷㯇㯗㯧㯷

7

3B07

3B17

3B27

3B37

3B47

3B57

3B67

3B77

3B87

3B97

3BA7

3BB7

3BC7

3BD7

3BE7

3BF7

㬈㬘㬨㬸㭈㭘㭨㭸㮈㮘㮨㮸㯈㯘㯨㯸

8

3B08

3B18

3B28

3B38

3B48

3B58

3B68

3B78

3B88

3B98

3BA8

3BB8

3BC8

3BD8

3BE8

3BF8

㬉㬙㬩㬹㭉㭙㭩㭹㮉㮙㮩㮹㯉㯙㯩㯹

9

3B09

3B19

3B29

3B39

3B49

3B59

3B69

3B79

3B89

3B99

3BA9

3BB9

3BC9

3BD9

3BE9

3BF9

㬊㬚㬪㬺㭊㭚㭪㭺㮊㮚㮪㮺㯊㯚㯪㯺

A

3B0A

3B1A

3B2A

3B3A

3B4A

3B5A

3B6A

3B7A

3B8A

3B9A

3BAA

3BBA

3BCA

3BDA

3BEA

3BFA

㬋㬛㬫㬻㭋㭛㭫㭻㮋㮛㮫㮻㯋㯛㯫㯻

B

3B0B

C

D

3B1B

3B2B

3B3B

3B4B

3B5B

3B6B

3B7B

3B8B

3B9B

3BAB

3BBB

3BCB

3BDB

3BEB

3BFB

㬌㬜㬬㬼㭌㭜㭬㭼㮌㮜㮬㮼㯌㯜㯬㯼 3B0C

3B1C

3B2C

3B3C

3B4C

3B5C

3B6C

3B7C

3B8C

3B9C

3BAC

3BBC

3BCC

3BDC

3BEC

3BFC

㬍㬝㬭㬽㭍㭝㭭㭽㮍㮝㮭㮽㯍㯝㯭㯽 3B0D

3B1D

3B2D

3B3D

3B4D

3B5D

3B6D

3B7D

3B8D

3B9D

3BAD

3BBD

3BCD

3BDD

3BED

3BFD

㬎㬞㬮㬾㭎㭞㭮㭾㮎㮞㮮㮾㯎㯞㯮㯾 3B0E

F

3B20

㬁㬑㬡㬱㭁㭑㭡㭱㮁㮑㮡㮱㯁㯑㯡㯱

1

E

3B10

3B1E

3B2E

3B3E

3B4E

3B5E

3B6E

3B7E

3B8E

3B9E

3BAE

3BBE

3BCE

3BDE

3BEE

3BFE

㬏㬟㬯㬿㭏㭟㭯㭿㮏㮟㮯㮿㯏㯟㯯㯿 3B0F

296

3B1F

3B2F

3B3F

3B4F

3B5F

3B6F

3B7F

3B8F

3B9F

3BAF

3BBF

3BCF

3BDF

3BEF

3BFF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

3C00

CJK Unified Ideographs Extension A

3CFF

3C0 3C1 3C2 3C3 3C4 3C5 3C6 3C7 3C8 3C9 3CA 3CB 3CC 3CD 3CE 3CF

㰀㰐㰠㰰㱀㱐㱠㱰㲀㲐㲠㲰㳀㳐㳠㳰

0

3C00

3C01

3C30

3C40

3C50

3C60

3C70

3C80

3C90

3CA0

3CB0

3CC0

3CD0

3CE0

3CF0

3C11

3C21

3C31

3C41

3C51

3C61

3C71

3C81

3C91

3CA1

3CB1

3CC1

3CD1

3CE1

3CF1

㰂㰒㰢㰲㱂㱒㱢㱲㲂㲒㲢㲲㳂㳒㳢㳲

2

3C02

3C12

3C22

3C32

3C42

3C52

3C62

3C72

3C82

3C92

3CA2

3CB2

3CC2

3CD2

3CE2

3CF2

㰃㰓㰣㰳㱃㱓㱣㱳㲃㲓㲣㲳㳃㳓㳣㳳

3

3C03

3C13

3C23

3C33

3C43

3C53

3C63

3C73

3C83

3C93

3CA3

3CB3

3CC3

3CD3

3CE3

3CF3

㰄㰔㰤㰴㱄㱔㱤㱴㲄㲔㲤㲴㳄㳔㳤㳴

4

3C04

3C14

3C24

3C34

3C44

3C54

3C64

3C74

3C84

3C94

3CA4

3CB4

3CC4

3CD4

3CE4

3CF4

㰅㰕㰥㰵㱅㱕㱥㱵㲅㲕㲥㲵㳅㳕㳥㳵

5

3C05

3C15

3C25

3C35

3C45

3C55

3C65

3C75

3C85

3C95

3CA5

3CB5

3CC5

3CD5

3CE5

3CF5

㰆㰖㰦㰶㱆㱖㱦㱶㲆㲖㲦㲶㳆㳖㳦㳶

6

3C06

3C16

3C26

3C36

3C46

3C56

3C66

3C76

3C86

3C96

3CA6

3CB6

3CC6

3CD6

3CE6

3CF6

㰇㰗㰧㰷㱇㱗㱧㱷㲇㲗㲧㲷㳇㳗㳧㳷

7

3C07

3C17

3C27

3C37

3C47

3C57

3C67

3C77

3C87

3C97

3CA7

3CB7

3CC7

3CD7

3CE7

3CF7

㰈㰘㰨㰸㱈㱘㱨㱸㲈㲘㲨㲸㳈㳘㳨㳸

8

3C08

3C18

3C28

3C38

3C48

3C58

3C68

3C78

3C88

3C98

3CA8

3CB8

3CC8

3CD8

3CE8

3CF8

㰉㰙㰩㰹㱉㱙㱩㱹㲉㲙㲩㲹㳉㳙㳩㳹

9

3C09

3C19

3C29

3C39

3C49

3C59

3C69

3C79

3C89

3C99

3CA9

3CB9

3CC9

3CD9

3CE9

3CF9

㰊㰚㰪㰺㱊㱚㱪㱺㲊㲚㲪㲺㳊㳚㳪㳺

A

3C0A

3C1A

3C2A

3C3A

3C4A

3C5A

3C6A

3C7A

3C8A

3C9A

3CAA

3CBA

3CCA

3CDA

3CEA

3CFA

㰋㰛㰫㰻㱋㱛㱫㱻㲋㲛㲫㲻㳋㳛㳫㳻

B

3C0B

C

D

3C1B

3C2B

3C3B

3C4B

3C5B

3C6B

3C7B

3C8B

3C9B

3CAB

3CBB

3CCB

3CDB

3CEB

3CFB

㰌㰜㰬㰼㱌㱜㱬㱼㲌㲜㲬㲼㳌㳜㳬㳼 3C0C

3C1C

3C2C

3C3C

3C4C

3C5C

3C6C

3C7C

3C8C

3C9C

3CAC

3CBC

3CCC

3CDC

3CEC

3CFC

㰍㰝㰭㰽㱍㱝㱭㱽㲍㲝㲭㲽㳍㳝㳭㳽 3C0D

3C1D

3C2D

3C3D

3C4D

3C5D

3C6D

3C7D

3C8D

3C9D

3CAD

3CBD

3CCD

3CDD

3CED

3CFD

㰎㰞㰮㰾㱎㱞㱮㱾㲎㲞㲮㲾㳎㳞㳮㳾 3C0E

F

3C20

㰁㰑㰡㰱㱁㱑㱡㱱㲁㲑㲡㲱㳁㳑㳡㳱

1

E

3C10

3C1E

3C2E

3C3E

3C4E

3C5E

3C6E

3C7E

3C8E

3C9E

3CAE

3CBE

3CCE

3CDE

3CEE

3CFE

㰏㰟㰯㰿㱏㱟㱯㱿㲏㲟㲯㲿㳏㳟㳯㳿 3C0F

3C1F

3C2F

3C3F

3C4F

3C5F

3C6F

3C7F

3C8F

3C9F

3CAF

3CBF

3CCF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

3CDF

3CEF

3CFF

297

3D00

CJK Unified Ideographs Extension A

3DFF

3D0 3D1 3D2 3D3 3D4 3D5 3D6 3D7 3D8 3D9 3DA 3DB 3DC 3DD 3DE 3DF

㴀㴐㴠㴰㵀㵐㵠㵰㶀㶐㶠㶰㷀㷐㷠㷰

0

3D00

3D01

3D30

3D40

3D50

3D60

3D70

3D80

3D90

3DA0

3DB0

3DC0

3DD0

3DE0

3DF0

3D11

3D21

3D31

3D41

3D51

3D61

3D71

3D81

3D91

3DA1

3DB1

3DC1

3DD1

3DE1

3DF1

㴂㴒㴢㴲㵂㵒㵢㵲㶂㶒㶢㶲㷂㷒㷢㷲

2

3D02

3D12

3D22

3D32

3D42

3D52

3D62

3D72

3D82

3D92

3DA2

3DB2

3DC2

3DD2

3DE2

3DF2

㴃㴓㴣㴳㵃㵓㵣㵳㶃㶓㶣㶳㷃㷓㷣㷳

3

3D03

3D13

3D23

3D33

3D43

3D53

3D63

3D73

3D83

3D93

3DA3

3DB3

3DC3

3DD3

3DE3

3DF3

㴄㴔㴤㴴㵄㵔㵤㵴㶄㶔㶤㶴㷄㷔㷤㷴

4

3D04

3D14

3D24

3D34

3D44

3D54

3D64

3D74

3D84

3D94

3DA4

3DB4

3DC4

3DD4

3DE4

3DF4

㴅㴕㴥㴵㵅㵕㵥㵵㶅㶕㶥㶵㷅㷕㷥㷵

5

3D05

3D15

3D25

3D35

3D45

3D55

3D65

3D75

3D85

3D95

3DA5

3DB5

3DC5

3DD5

3DE5

3DF5

㴆㴖㴦㴶㵆㵖㵦㵶㶆㶖㶦㶶㷆㷖㷦㷶

6

3D06

3D16

3D26

3D36

3D46

3D56

3D66

3D76

3D86

3D96

3DA6

3DB6

3DC6

3DD6

3DE6

3DF6

㴇㴗㴧㴷㵇㵗㵧㵷㶇㶗㶧㶷㷇㷗㷧㷷

7

3D07

3D17

3D27

3D37

3D47

3D57

3D67

3D77

3D87

3D97

3DA7

3DB7

3DC7

3DD7

3DE7

3DF7

㴈㴘㴨㴸㵈㵘㵨㵸㶈㶘㶨㶸㷈㷘㷨㷸

8

3D08

3D18

3D28

3D38

3D48

3D58

3D68

3D78

3D88

3D98

3DA8

3DB8

3DC8

3DD8

3DE8

3DF8

㴉㴙㴩㴹㵉㵙㵩㵹㶉㶙㶩㶹㷉㷙㷩㷹

9

3D09

3D19

3D29

3D39

3D49

3D59

3D69

3D79

3D89

3D99

3DA9

3DB9

3DC9

3DD9

3DE9

3DF9

o㴚㴪㴺㵊㵚㵪㵺㶊㶚㶪㶺㷊㷚㷪㷺

A

3D0A

3D1A

3D2A

3D3A

3D4A

3D5A

3D6A

3D7A

3D8A

3D9A

3DAA

3DBA

3DCA

3DDA

3DEA

3DFA

㴋㴛㴫㴻㵋㵛㵫㵻㶋㶛㶫㶻㷋㷛㷫㷻

B

3D0B

C

D

3D1B

3D2B

3D3B

3D4B

3D5B

3D6B

3D7B

3D8B

3D9B

3DAB

3DBB

3DCB

3DDB

3DEB

3DFB

㴌㴜㴬㴼㵌㵜㵬㵼㶌㶜㶬㶼㷌㷜㷬㷼 3D0C

3D1C

3D2C

3D3C

3D4C

3D5C

3D6C

3D7C

3D8C

3D9C

3DAC

3DBC

3DCC

3DDC

3DEC

3DFC

㴍㴝㴭㴽㵍㵝㵭㵽㶍㶝㶭㶽㷍㷝㷭㷽 3D0D

3D1D

3D2D

3D3D

3D4D

3D5D

3D6D

3D7D

3D8D

3D9D

3DAD

3DBD

3DCD

3DDD

3DED

3DFD

㴎㴞㴮㴾㵎㵞㵮㵾㶎㶞㶮㶾㷎㷞㷮㷾 3D0E

F

3D20

㴁㴑㴡㴱㵁㵑㵡㵱㶁㶑㶡㶱㷁㷑㷡㷱

1

E

3D10

3D1E

3D2E

3D3E

3D4E

3D5E

3D6E

3D7E

3D8E

3D9E

3DAE

3DBE

3DCE

3DDE

3DEE

3DFE

㴏㴟㴯㴿㵏㵟㵯㵿㶏㶟㶯㶿㷏㷟㷯㷿 3D0F

298

3D1F

3D2F

3D3F

3D4F

3D5F

3D6F

3D7F

3D8F

3D9F

3DAF

3DBF

3DCF

3DDF

3DEF

3DFF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

3E00

CJK Unified Ideographs Extension A

3EFF

3E0 3E1 3E2 3E3 3E4 3E5 3E6 3E7 3E8 3E9 3EA 3EB 3EC 3ED 3EE 3EF

㸀㸐㸠㸰㹀㹐㹠㹰㺀㺐㺠㺰㻀㻐㻠㻰

0

3E00

3E01

3E30

3E40

3E50

3E60

3E70

3E80

3E90

3EA0

3EB0

3EC0

3ED0

3EE0

3EF0

3E11

3E21

3E31

3E41

3E51

3E61

3E71

3E81

3E91

3EA1

3EB1

3EC1

3ED1

3EE1

3EF1

㸂㸒㸢㸲㹂㹒㹢㹲㺂㺒㺢㺲㻂㻒㻢㻲

2

3E02

3E12

3E22

3E32

3E42

3E52

3E62

3E72

3E82

3E92

3EA2

3EB2

3EC2

3ED2

3EE2

3EF2

㸃㸓㸣㸳㹃㹓㹣㹳㺃㺓㺣㺳㻃㻓㻣㻳

3

3E03

3E13

3E23

3E33

3E43

3E53

3E63

3E73

3E83

3E93

3EA3

3EB3

3EC3

3ED3

3EE3

3EF3

㸄㸔㸤㸴㹄㹔㹤㹴㺄㺔㺤㺴㻄㻔㻤㻴

4

3E04

3E14

3E24

3E34

3E44

3E54

3E64

3E74

3E84

3E94

3EA4

3EB4

3EC4

3ED4

3EE4

3EF4

㸅㸕㸥㸵㹅㹕㹥㹵㺅㺕㺥㺵㻅㻕㻥㻵

5

3E05

3E15

3E25

3E35

3E45

3E55

3E65

3E75

3E85

3E95

3EA5

3EB5

3EC5

3ED5

3EE5

3EF5

㸆㸖㸦㸶㹆㹖㹦㹶㺆㺖㺦㺶㻆㻖㻦㻶

6

3E06

3E16

3E26

3E36

3E46

3E56

3E66

3E76

3E86

3E96

3EA6

3EB6

3EC6

3ED6

3EE6

3EF6

㸇㸗㸧㸷㹇㹗㹧㹷㺇㺗㺧㺷㻇㻗㻧㻷

7

3E07

3E17

3E27

3E37

3E47

3E57

3E67

3E77

3E87

3E97

3EA7

3EB7

3EC7

3ED7

3EE7

3EF7

㸈㸘㸨㸸㹈㹘㹨㹸㺈㺘㺨㺸㻈㻘㻨㻸

8

3E08

3E18

3E28

3E38

3E48

3E58

3E68

3E78

3E88

3E98

3EA8

3EB8

3EC8

3ED8

3EE8

3EF8

㸉㸙㸩㸹㹉㹙㹩㹹㺉㺙㺩㺹㻉㻙㻩㻹

9

3E09

3E19

3E29

3E39

3E49

3E59

3E69

3E79

3E89

3E99

3EA9

3EB9

3EC9

3ED9

3EE9

3EF9

㸊㸚㸪㸺㹊㹚㹪㹺㺊㺚㺪㺺㻊㻚㻪㻺

A

3E0A

3E1A

3E2A

3E3A

3E4A

3E5A

3E6A

3E7A

3E8A

3E9A

3EAA

3EBA

3ECA

3EDA

3EEA

3EFA

㸋㸛㸫㸻㹋㹛㹫㹻㺋㺛㺫㺻㻋㻛㻫㻻

B

3E0B

C

D

3E1B

3E2B

3E3B

3E4B

3E5B

3E6B

3E7B

3E8B

3E9B

3EAB

3EBB

3ECB

3EDB

3EEB

3EFB

㸌㸜㸬㸼㹌㹜㹬㹼㺌㺜㺬㺼㻌㻜㻬㻼 3E0C

3E1C

3E2C

3E3C

3E4C

3E5C

3E6C

3E7C

3E8C

3E9C

3EAC

3EBC

3ECC

3EDC

3EEC

3EFC

㸍㸝㸭㸽㹍㹝㹭㹽㺍㺝㺭㺽㻍㻝㻭㻽 3E0D

3E1D

3E2D

3E3D

3E4D

3E5D

3E6D

3E7D

3E8D

3E9D

3EAD

3EBD

3ECD

3EDD

3EED

3EFD

㸎㸞㸮㸾㹎㹞㹮㹾㺎㺞㺮㺾㻎㻞㻮㻾 3E0E

F

3E20

㸁㸑㸡㸱㹁㹑㹡㹱㺁㺑㺡㺱㻁㻑㻡㻱

1

E

3E10

3E1E

3E2E

3E3E

3E4E

3E5E

3E6E

3E7E

3E8E

3E9E

3EAE

3EBE

3ECE

3EDE

3EEE

3EFE

㸏㸟㸯㸿㹏㹟㹯㹿㺏㺟㺯㺿㻏㻟㻯㻿 3E0F

3E1F

3E2F

3E3F

3E4F

3E5F

3E6F

3E7F

3E8F

3E9F

3EAF

3EBF

3ECF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

3EDF

3EEF

3EFF

299

3F00 3F0

3F00

3F2

3F3

3F4

3F5

3F6

3F7

3F8

3F9 3FA 3FB 3FC 3FD 3FE 3FF

3F10

3F20

3F30

3F40

3F50

3F60

3F70

3F80

3F90

3FA0

3FB0

3FC0

3FD0

3FE0

3FF0

㼁㼑㼡㼱㽁㽑㽡㽱㾁㾑㾡㾱㿁㿑㿡㿱

1

3F01

3F11

3F21

3F31

3F41

3F51

3F61

3F71

3F81

3F91

3FA1

3FB1

3FC1

3FD1

3FE1

3FF1

㼂㼒㼢㼲㽂㽒㽢㽲㾂㾒㾢㾲㿂㿒㿢㿲

2

3F02

3F12

3F22

3F32

3F42

3F52

3F62

3F72

3F82

3F92

3FA2

3FB2

3FC2

3FD2

3FE2

3FF2

㼃㼓㼣㼳㽃㽓㽣㽳㾃㾓㾣㾳㿃㿓㿣㿳

3

3F03

3F13

3F23

3F33

3F43

3F53

3F63

3F73

3F83

3F93

3FA3

3FB3

3FC3

3FD3

3FE3

3FF3

㼄㼔㼤㼴㽄㽔㽤㽴㾄㾔㾤㾴㿄㿔㿤㿴

4

3F04

3F14

3F24

3F34

3F44

3F54

3F64

3F74

3F84

3F94

3FA4

3FB4

3FC4

3FD4

3FE4

3FF4

㼅㼕㼥㼵㽅㽕㽥㽵㾅㾕㾥㾵㿅㿕㿥㿵

5

3F05

3F15

3F25

3F35

3F45

3F55

3F65

3F75

3F85

3F95

3FA5

3FB5

3FC5

3FD5

3FE5

3FF5

㼆㼖㼦㼶㽆㽖㽦㽶㾆㾖㾦㾶㿆㿖㿦㿶

6

3F06

3F16

3F26

3F36

3F46

3F56

3F66

3F76

3F86

3F96

3FA6

3FB6

3FC6

3FD6

3FE6

3FF6

㼇㼗㼧㼷㽇㽗㽧㽷㾇㾗㾧㾷㿇㿗㿧㿷

7

3F07

3F17

3F27

3F37

3F47

3F57

3F67

3F77

3F87

3F97

3FA7

3FB7

3FC7

3FD7

3FE7

3FF7

㼈㼘㼨㼸㽈㽘㽨㽸㾈㾘㾨㾸㿈㿘㿨㿸

8

3F08

3F18

3F28

3F38

3F48

3F58

3F68

3F78

3F88

3F98

3FA8

3FB8

3FC8

3FD8

3FE8

3FF8

㼉㼙㼩㼹㽉㽙㽩㽹㾉㾙㾩㾹㿉㿙㿩㿹

9

3F09

3F19

3F29

3F39

3F49

3F59

3F69

3F79

3F89

3F99

3FA9

3FB9

3FC9

3FD9

3FE9

3FF9

㼊㼚㼪㼺㽊㽚㽪㽺㾊㾚㾪㾺㿊㿚㿪㿺

A

3F0A

3F1A

3F2A

3F3A

3F4A

3F5A

3F6A

3F7A

3F8A

3F9A

3FAA

3FBA

3FCA

3FDA

3FEA

3FFA

㼋㼛㼫㼻㽋㽛㽫㽻㾋㾛㾫㾻㿋㿛㿫㿻

B

3F0B

C

D

3F1B

3F2B

3F3B

3F4B

3F5B

3F6B

3F7B

3F8B

3F9B

3FAB

3FBB

3FCB

3FDB

3FEB

3FFB

㼌㼜㼬㼼㽌㽜㽬㽼㾌㾜㾬㾼㿌㿜㿬㿼 3F0C

3F1C

3F2C

3F3C

3F4C

3F5C

3F6C

3F7C

3F8C

3F9C

3FAC

3FBC

3FCC

3FDC

3FEC

3FFC

㼍㼝㼭㼽㽍㽝㽭㽽㾍㾝㾭㾽㿍㿝㿭㿽 3F0D

3F1D

3F2D

3F3D

3F4D

3F5D

3F6D

3F7D

3F8D

3F9D

3FAD

3FBD

3FCD

3FDD

3FED

3FFD

㼎㼞㼮㼾㽎㽞㽮㽾㾎㾞㾮㾾㿎㿞㿮㿾 3F0E

F

3F1

3FFF

㼀㼐㼠㼰㽀㽐㽠㽰㾀㾐㾠㾰㿀㿐㿠㿰

0

E

CJK Unified Ideographs Extension A

3F1E

3F2E

3F3E

3F4E

3F5E

3F6E

3F7E

3F8E

3F9E

3FAE

3FBE

3FCE

3FDE

3FEE

3FFE

㼏㼟㼯㼿㽏㽟㽯㽿㾏㾟㾯㾿㿏㿟㿯㿿 3F0F

300

3F1F

3F2F

3F3F

3F4F

3F5F

3F6F

3F7F

3F8F

3F9F

3FAF

3FBF

3FCF

3FDF

3FEF

3FFF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Yijing Hexagram Symbols Range: 4DC0–4DFF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

CJK Unified Ideographs Range: 4E00–9FBF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

4E00

CJK Unified Ideographs

4EFF

4E0 4E1 4E2 4E3 4E4 4E5 4E6 4E7 4E8 4E9 4EA 4EB 4EC 4ED 4EE 4EF

一丐丠丰乀乐习买亀亐亠亰什仐仠仰

0

4E00

4E01

4E30

4E40

4E50

4E60

4E70

4E80

4E90

4EA0

4EB0

4EC0

4ED0

4EE0

4EF0

4E11

4E21

4E31

4E41

4E51

4E61

4E71

4E81

4E91

4EA1

4EB1

4EC1

4ED1

4EE1

4EF1

丂丒丢串乂乒乢乲亂互亢亲仂仒仢仲

2

4E02

4E12

4E22

4E32

4E42

4E52

4E62

4E72

4E82

4E92

4EA2

4EB2

4EC2

4ED2

4EE2

4EF2

七专丣丳乃乓乣乳亃亓亣亳仃仓代仳

3

4E03

4E13

4E23

4E33

4E43

4E53

4E63

4E73

4E83

4E93

4EA3

4EB3

4EC3

4ED3

4EE3

4EF3

丄且两临乄乔乤乴亄五交亴仄仔令仴

4

4E04

4E14

4E24

4E34

4E44

4E54

4E64

4E74

4E84

4E94

4EA4

4EB4

4EC4

4ED4

4EE4

4EF4

丅丕严丵久乕乥乵亅井亥亵仅仕以仵

5

4E05

4E15

4E25

4E35

4E45

4E55

4E65

4E75

4E85

4E95

4EA5

4EB5

4EC5

4ED5

4EE5

4EF5

丆世並丶乆乖书乶了亖亦亶仆他仦件

6

4E06

4E16

4E26

4E36

4E46

4E56

4E66

4E76

4E86

4E96

4EA6

4EB6

4EC6

4ED6

4EE6

4EF6

万丗丧丷乇乗乧乷亇亗产亷仇仗仧价

7

4E07

4E17

4E27

4E37

4E47

4E57

4E67

4E77

4E87

4E97

4EA7

4EB7

4EC7

4ED7

4EE7

4EF7

丈丘丨丸么乘乨乸予亘亨亸仈付仨仸

8

4E08

4E18

4E28

4E38

4E48

4E58

4E68

4E78

4E88

4E98

4EA8

4EB8

4EC8

4ED8

4EE8

4EF8

三丙丩丹义乙乩乹争亙亩亹仉仙仩仹

9

4E09

4E19

4E29

4E39

4E49

4E59

4E69

4E79

4E89

4E99

4EA9

4EB9

4EC9

4ED9

4EE9

4EF9

上业个为乊乚乪乺亊亚亪人今仚仪仺

A

4E0A

4E1A

4E2A

4E3A

4E4A

4E5A

4E6A

4E7A

4E8A

4E9A

4EAA

4EBA

4ECA

4EDA

4EEA

4EFA

下丛丫主之乛乫乻事些享亻介仛仫任

B

4E0B

C

D

4E1B

4E2B

4E3B

4E4B

4E5B

4E6B

4E7B

4E8B

4E9B

4EAB

4EBB

4ECB

4EDB

4EEB

4EFB

丌东丬丼乌乜乬乼二亜京亼仌仜们仼 4E0C

4E1C

4E2C

4E3C

4E4C

4E5C

4E6C

4E7C

4E8C

4E9C

4EAC

4EBC

4ECC

4EDC

4EEC

4EFC

不丝中丽乍九乭乽亍亝亭亽仍仝仭份 4E0D

4E1D

4E2D

4E3D

4E4D

4E5D

4E6D

4E7D

4E8D

4E9D

4EAD

4EBD

4ECD

4EDD

4EED

4EFD

与丞丮举乎乞乮乾于亞亮亾从仞仮仾 4E0E

F

4E20

丁丑両丱乁乑乡乱亁云亡亱仁仑仡仱

1

E

4E10

4E1E

4E2E

4E3E

4E4E

4E5E

4E6E

4E7E

4E8E

4E9E

4EAE

4EBE

4ECE

4EDE

4EEE

4EFE

丏丟丯丿乏也乯乿亏亟亯亿仏仟仯仿 4E0F

316

4E1F

4E2F

4E3F

4E4F

4E5F

4E6F

4E7F

4E8F

4E9F

4EAF

4EBF

4ECF

4EDF

4EEF

4EFF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

4F00 4F0

4F00

4F2

4F3

4F4

4F5

4F6

4F7

4F8

4F9 4FA 4FB 4FC 4FD 4FE 4FF

4F10

4F20

4F30

4F40

4F50

4F60

4F70

4F80

4F90

4FA0

4FB0

4FC0

4FD0

4FE0

4FF0

企休伡伱佁佑佡佱侁侑価侱俁俑信俱

1

4F01

4F11

4F21

4F31

4F41

4F51

4F61

4F71

4F81

4F91

4FA1

4FB1

4FC1

4FD1

4FE1

4FF1

伂伒伢伲佂佒佢佲侂侒侢侲係俒俢俲

2

4F02

4F12

4F22

4F32

4F42

4F52

4F62

4F72

4F82

4F92

4FA2

4FB2

4FC2

4FD2

4FE2

4FF2

伃伓伣伳佃体佣佳侃侓侣侳促俓俣俳

3

4F03

4F13

4F23

4F33

4F43

4F53

4F63

4F73

4F83

4F93

4FA3

4FB3

4FC3

4FD3

4FE3

4FF3

伄伔伤伴佄佔佤佴侄侔侤侴俄俔俤俴

4

4F04

4F14

4F24

4F34

4F44

4F54

4F64

4F74

4F84

4F94

4FA4

4FB4

4FC4

4FD4

4FE4

4FF4

伅伕伥伵佅何佥併侅侕侥侵俅俕俥俵

5

4F05

4F15

4F25

4F35

4F45

4F55

4F65

4F75

4F85

4F95

4FA5

4FB5

4FC5

4FD5

4FE5

4FF5

伆伖伦伶但佖佦佶來侖侦侶俆俖俦俶

6

4F06

4F16

4F26

4F36

4F46

4F56

4F66

4F76

4F86

4F96

4FA6

4FB6

4FC6

4FD6

4FE6

4FF6

伇众伧伷佇佗佧佷侇侗侧侷俇俗俧俷

7

4F07

4F17

4F27

4F37

4F47

4F57

4F67

4F77

4F87

4F97

4FA7

4FB7

4FC7

4FD7

4FE7

4FF7

伈优伨伸佈佘佨佸侈侘侨侸俈俘俨俸

8

4F08

4F18

4F28

4F38

4F48

4F58

4F68

4F78

4F88

4F98

4FA8

4FB8

4FC8

4FD8

4FE8

4FF8

伉伙伩伹佉余佩佹侉侙侩侹俉俙俩俹

9

4F09

4F19

4F29

4F39

4F49

4F59

4F69

4F79

4F89

4F99

4FA9

4FB9

4FC9

4FD9

4FE9

4FF9

伊会伪伺佊佚佪佺侊侚侪侺俊俚俪俺

A

4F0A

4F1A

4F2A

4F3A

4F4A

4F5A

4F6A

4F7A

4F8A

4F9A

4FAA

4FBA

4FCA

4FDA

4FEA

4FFA

伋伛伫伻佋佛佫佻例供侫侻俋俛俫俻

B

4F0B

C

D

4F1B

4F2B

4F3B

4F4B

4F5B

4F6B

4F7B

4F8B

4F9B

4FAB

4FBB

4FCB

4FDB

4FEB

4FFB

伌伜伬似佌作佬佼侌侜侬侼俌俜俬俼 4F0C

4F1C

4F2C

4F3C

4F4C

4F5C

4F6C

4F7C

4F8C

4F9C

4FAC

4FBC

4FCC

4FDC

4FEC

4FFC

伍伝伭伽位佝佭佽侍依侭侽俍保俭俽 4F0D

4F1D

4F2D

4F3D

4F4D

4F5D

4F6D

4F7D

4F8D

4F9D

4FAD

4FBD

4FCD

4FDD

4FED

4FFD

伎伞伮伾低佞佮佾侎侞侮侾俎俞修俾 4F0E

F

4F1

4FFF

伀伐传估佀佐你佰侀侐侠侰俀俐俠俰

0

E

CJK Unified Ideographs

4F1E

4F2E

4F3E

4F4E

4F5E

4F6E

4F7E

4F8E

4F9E

4FAE

4FBE

4FCE

4FDE

4FEE

4FFE

伏伟伯伿住佟佯使侏侟侯便俏俟俯俿 4F0F

4F1F

4F2F

4F3F

4F4F

4F5F

4F6F

4F7F

4F8F

4F9F

4FAF

4FBF

4FCF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

4FDF

4FEF

4FFF

317

5000 500

5000

502

503

504

505

506

507

508

509

50A 50B 50C 50D 50E

50F

5010

5020

5030

5040

5050

5060

5070

5080

5090

50A0

50B0

50C0

50D0

50E0

50F0

倁們倡倱偁偑偡偱傁傑傡傱僁僑僡僱

1

5001

5011

5021

5031

5041

5051

5061

5071

5081

5091

50A1

50B1

50C1

50D1

50E1

50F1

倂倒倢倲偂偒偢偲傂傒傢傲僂僒僢僲

2

5002

5012

5022

5032

5042

5052

5062

5072

5082

5092

50A2

50B2

50C2

50D2

50E2

50F2

倃倓倣倳偃偓偣偳傃傓傣傳僃僓僣僳

3

5003

5013

5023

5033

5043

5053

5063

5073

5083

5093

50A3

50B3

50C3

50D3

50E3

50F3

倄倔値倴偄偔偤側傄傔傤傴僄僔僤僴

4

5004

5014

5024

5034

5044

5054

5064

5074

5084

5094

50A4

50B4

50C4

50D4

50E4

50F4

倅倕倥倵偅偕健偵傅傕傥債僅僕僥僵

5

5005

5015

5025

5035

5045

5055

5065

5075

5085

5095

50A5

50B5

50C5

50D5

50E5

50F5

倆倖倦倶偆偖偦偶傆傖傦傶僆僖僦僶

6

5006

5016

5026

5036

5046

5056

5066

5076

5086

5096

50A6

50B6

50C6

50D6

50E6

50F6

倇倗倧倷假偗偧偷傇傗傧傷僇僗僧僷

7

5007

5017

5027

5037

5047

5057

5067

5077

5087

5097

50A7

50B7

50C7

50D7

50E7

50F7

倈倘倨倸偈偘偨偸傈傘储傸僈僘僨僸

8

5008

5018

5028

5038

5048

5058

5068

5078

5088

5098

50A8

50B8

50C8

50D8

50E8

50F8

倉候倩倹偉偙偩偹傉備傩傹僉僙僩價

9

5009

5019

5029

5039

5049

5059

5069

5079

5089

5099

50A9

50B9

50C9

50D9

50E9

50F9

倊倚倪债偊做偪偺傊傚傪傺僊僚僪僺

A

500A

501A

502A

503A

504A

505A

506A

507A

508A

509A

50AA

50BA

50CA

50DA

50EA

50FA

個倛倫倻偋偛偫偻傋傛傫傻僋僛僫僻

B

500B

C

D

501B

502B

503B

504B

505B

506B

507B

508B

509B

50AB

50BB

50CB

50DB

50EB

50FB

倌倜倬值偌停偬偼傌傜催傼僌僜僬僼 500C

501C

502C

503C

504C

505C

506C

507C

508C

509C

50AC

50BC

50CC

50DC

50EC

50FC

倍倝倭倽偍偝偭偽傍傝傭傽働僝僭僽 500D

501D

502D

503D

504D

505D

506D

507D

508D

509D

50AD

50BD

50CD

50DD

50ED

50FD

倎倞倮倾偎偞偮偾傎傞傮傾僎僞僮僾 500E

F

501

50FF

倀倐倠倰偀偐偠偰傀傐傠傰僀僐僠僰

0

E

CJK Unified Ideographs

501E

502E

503E

504E

505E

506E

507E

508E

509E

50AE

50BE

50CE

50DE

50EE

50FE

倏借倯倿偏偟偯偿傏傟傯傿像僟僯僿 500F

318

501F

502F

503F

504F

505F

506F

507F

508F

509F

50AF

50BF

50CF

50DF

50EF

50FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

5100 510

5100

512

513

514

515

516

517

518

519

51A 51B 51C 51D 51E

51F

5110

5120

5130

5140

5150

5160

5170

5180

5190

51A0

51B0

51C0

51D0

51E0

51F0

儁儑儡儱允兑兡共冁冑冡冱凁凑凡凱

1

5101

5111

5121

5131

5141

5151

5161

5171

5181

5191

51A1

51B1

51C1

51D1

51E1

51F1

儂儒儢儲兂兒兢兲冂冒冢冲凂凒凢凲

2

5102

5112

5122

5132

5142

5152

5162

5172

5182

5192

51A2

51B2

51C2

51D2

51E2

51F2

儃儓儣儳元兓兣关冃冓冣决凃凓凣凳

3

5103

5113

5123

5133

5143

5153

5163

5173

5183

5193

51A3

51B3

51C3

51D3

51E3

51F3

億儔儤儴兄兔兤兴冄冔冤冴凄凔凤凴

4

5104

5114

5124

5134

5144

5154

5164

5174

5184

5194

51A4

51B4

51C4

51D4

51E4

51F4

儅儕儥儵充兕入兵内冕冥况凅凕凥凵

5

5105

5115

5125

5135

5145

5155

5165

5175

5185

5195

51A5

51B5

51C5

51D5

51E5

51F5

儆儖儦儶兆兖兦其円冖冦冶准凖処凶

6

5106

5116

5126

5136

5146

5156

5166

5176

5186

5196

51A6

51B6

51C6

51D6

51E6

51F6

儇儗儧儷兇兗內具冇冗冧冷凇凗凧凷

7

5107

5117

5127

5137

5147

5157

5167

5177

5187

5197

51A7

51B7

51C7

51D7

51E7

51F7

儈儘儨儸先兘全典冈冘冨冸凈凘凨凸

8

5108

5118

5128

5138

5148

5158

5168

5178

5188

5198

51A8

51B8

51C8

51D8

51E8

51F8

儉儙儩儹光兙兩兹冉写冩冹凉凙凩凹

9

5109

5119

5129

5139

5149

5159

5169

5179

5189

5199

51A9

51B9

51C9

51D9

51E9

51F9

儊儚優儺兊党兪兺冊冚冪冺凊凚凪出

A

510A

511A

512A

513A

514A

515A

516A

517A

518A

519A

51AA

51BA

51CA

51DA

51EA

51FA

儋儛儫儻克兛八养冋军冫冻凋凛凫击

B

510B

C

D

511B

512B

513B

514B

515B

516B

517B

518B

519B

51AB

51BB

51CB

51DB

51EB

51FB

儌儜儬儼兌兜公兼册农冬冼凌凜凬凼 510C

511C

512C

513C

514C

515C

516C

517C

518C

519C

51AC

51BC

51CC

51DC

51EC

51FC

儍儝儭儽免兝六兽再冝冭冽凍凝凭函 510D

511D

512D

513D

514D

515D

516D

517D

518D

519D

51AD

51BD

51CD

51DD

51ED

51FD

儎儞儮儾兎兞兮兾冎冞冮冾凎凞凮凾 510E

F

511

51FF

儀儐儠儰兀児兠兰冀冐冠冰净凐几凰

0

E

CJK Unified Ideographs

511E

512E

513E

514E

515E

516E

517E

518E

519E

51AE

51BE

51CE

51DE

51EE

51FE

儏償儯儿兏兟兯兿冏冟冯冿减凟凯凿 510F

511F

512F

513F

514F

515F

516F

517F

518F

519F

51AF

51BF

51CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

51DF

51EF

51FF

319

5200 520

5200

522

523

524

525

526

527

528

529

52A 52B 52C 52D 52E

52F

5210

5220

5230

5240

5250

5260

5270

5280

5290

52A0

52B0

52C0

52D0

52E0

52F0

刁刑刡刱剁剑剡剱劁劑务励勁勑勡勱

1

5201

5211

5221

5231

5241

5251

5261

5271

5281

5291

52A1

52B1

52C1

52D1

52E1

52F1

刂划刢刲剂剒剢割劂劒劢劲勂勒勢勲

2

5202

5212

5222

5232

5242

5252

5262

5272

5282

5292

52A2

52B2

52C2

52D2

52E2

52F2

刃刓刣刳剃剓剣剳劃劓劣劳勃勓勣勳

3

5203

5213

5223

5233

5243

5253

5263

5273

5283

5293

52A3

52B3

52C3

52D3

52E3

52F3

刄刔判刴剄剔剤剴劄劔劤労勄勔勤勴

4

5204

5214

5224

5234

5244

5254

5264

5274

5284

5294

52A4

52B4

52C4

52D4

52E4

52F4

刅刕別刵剅剕剥創劅劕劥劵勅動勥勵

5

5205

5215

5225

5235

5245

5255

5265

5275

5285

5295

52A5

52B5

52C5

52D5

52E5

52F5

分刖刦制剆剖剦剶劆劖劦劶勆勖勦勶

6

5206

5216

5226

5236

5246

5256

5266

5276

5286

5296

52A6

52B6

52C6

52D6

52E6

52F6

切列刧刷則剗剧剷劇劗劧劷勇勗勧勷

7

5207

5217

5227

5237

5247

5257

5267

5277

5287

5297

52A7

52B7

52C7

52D7

52E7

52F7

刈刘刨券剈剘剨剸劈劘动劸勈勘勨勸

8

5208

5218

5228

5238

5248

5258

5268

5278

5288

5298

52A8

52B8

52C8

52D8

52E8

52F8

刉则利刹剉剙剩剹劉劙助効勉務勩勹

9

5209

5219

5229

5239

5249

5259

5269

5279

5289

5299

52A9

52B9

52C9

52D9

52E9

52F9

刊刚刪刺削剚剪剺劊劚努劺勊勚勪勺

A

520A

521A

522A

523A

524A

525A

526A

527A

528A

529A

52AA

52BA

52CA

52DA

52EA

52FA

刋创别刻剋剛剫剻劋力劫劻勋勛勫勻

B

520B

C

D

521B

522B

523B

524B

525B

526B

527B

528B

529B

52AB

52BB

52CB

52DB

52EB

52FB

刌刜刬刼剌剜剬剼劌劜劬劼勌勜勬勼 520C

521C

522C

523C

524C

525C

526C

527C

528C

529C

52AC

52BC

52CC

52DC

52EC

52FC

刍初刭刽前剝剭剽劍劝劭劽勍勝勭勽 520D

521D

522D

523D

524D

525D

526D

527D

528D

529D

52AD

52BD

52CD

52DD

52ED

52FD

刎刞刮刾剎剞剮剾劎办劮劾勎勞勮勾 520E

F

521

52FF

刀刐删到剀剐剠剰劀劐加劰勀勐勠勰

0

E

CJK Unified Ideographs

521E

522E

523E

524E

525E

526E

527E

528E

529E

52AE

52BE

52CE

52DE

52EE

52FE

刏刟刯刿剏剟副剿劏功劯势勏募勯勿 520F

320

521F

522F

523F

524F

525F

526F

527F

528F

529F

52AF

52BF

52CF

52DF

52EF

52FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

5300 530

5300

532

533

534

535

536

537

538

539

53A 53B 53C 53D 53E

53F

5310

5320

5330

5340

5350

5360

5370

5380

5390

53A0

53B0

53C0

53D0

53E0

53F0

匁匑匡匱十卑卡危厁厑厡厱叁发叡叱

1

5301

5311

5321

5331

5341

5351

5361

5371

5381

5391

53A1

53B1

53C1

53D1

53E1

53F1

匂匒匢匲卂卒卢卲厂厒厢厲参叒叢史

2

5302

5312

5322

5332

5342

5352

5362

5372

5382

5392

53A2

53B2

53C2

53D2

53E2

53F2

匃匓匣匳千卓卣即厃厓厣厳參叓口右

3

5303

5313

5323

5333

5343

5353

5363

5373

5383

5393

53A3

53B3

53C3

53D3

53E3

53F3

匄匔匤匴卄協卤却厄厔厤厴叄叔古叴

4

5304

5314

5324

5334

5344

5354

5364

5374

5384

5394

53A4

53B4

53C4

53D4

53E4

53F4

包匕匥匵卅单卥卵厅厕厥厵叅叕句叵

5

5305

5315

5325

5335

5345

5355

5365

5375

5385

5395

53A5

53B5

53C5

53D5

53E5

53F5

匆化匦匶卆卖卦卶历厖厦厶叆取另叶

6

5306

5316

5326

5336

5346

5356

5366

5376

5386

5396

53A6

53B6

53C6

53D6

53E6

53F6

匇北匧匷升南卧卷厇厗厧厷叇受叧号

7

5307

5317

5327

5337

5347

5357

5367

5377

5387

5397

53A7

53B7

53C7

53D7

53E7

53F7

匈匘匨匸午単卨卸厈厘厨厸又变叨司

8

5308

5318

5328

5338

5348

5358

5368

5378

5388

5398

53A8

53B8

53C8

53D8

53E8

53F8

匉匙匩匹卉卙卩卹厉厙厩厹叉叙叩叹

9

5309

5319

5329

5339

5349

5359

5369

5379

5389

5399

53A9

53B9

53C9

53D9

53E9

53F9

匊匚匪区半博卪卺厊厚厪厺及叚只叺

A

530A

531A

532A

533A

534A

535A

536A

537A

538A

539A

53AA

53BA

53CA

53DA

53EA

53FA

匋匛匫医卋卛卫卻压厛厫去友叛叫叻

B

530B

C

D

531B

532B

533B

534B

535B

536B

537B

538B

539B

53AB

53BB

53CB

53DB

53EB

53FB

匌匜匬匼卌卜卬卼厌厜厬厼双叜召叼 530C

531C

532C

533C

534C

535C

536C

537C

538C

539C

53AC

53BC

53CC

53DC

53EC

53FC

匍匝匭匽卍卝卭卽厍厝厭厽反叝叭叽 530D

531D

532D

533D

534D

535D

536D

537D

538D

539D

53AD

53BD

53CD

53DD

53ED

53FD

匎匞匮匾华卞卮卾厎厞厮厾収叞叮叾 530E

F

531

53FF

匀匐匠匰區卐占印厀厐厠厰叀叐叠台

0

E

CJK Unified Ideographs

531E

532E

533E

534E

535E

536E

537E

538E

539E

53AE

53BE

53CE

53DE

53EE

53FE

匏匟匯匿协卟卯卿厏原厯县叏叟可叿 530F

531F

532F

533F

534F

535F

536F

537F

538F

539F

53AF

53BF

53CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

53DF

53EF

53FF

321

5400 540

5400

542

543

544

545

546

547

548

549

54A 54B 54C 54D 54E

54F

5410

5420

5430

5440

5450

5460

5470

5480

5490

54A0

54B0

54C0

54D0

54E0

54F0

吁向吡吱呁呑呡呱咁咑咡咱品哑員哱

1

5401

5411

5421

5431

5441

5451

5461

5471

5481

5491

54A1

54B1

54C1

54D1

54E1

54F1

吂吒吢吲呂呒呢呲咂咒咢咲哂哒哢哲

2

5402

5412

5422

5432

5442

5452

5462

5472

5482

5492

54A2

54B2

54C2

54D2

54E2

54F2

吃吓吣吳呃呓呣味咃咓咣咳哃哓哣哳

3

5403

5413

5423

5433

5443

5453

5463

5473

5483

5493

54A3

54B3

54C3

54D3

54E3

54F3

各吔吤吴呄呔呤呴咄咔咤咴哄哔哤哴

4

5404

5414

5424

5434

5444

5454

5464

5474

5484

5494

54A4

54B4

54C4

54D4

54E4

54F4

吅吕吥吵呅呕呥呵咅咕咥咵哅哕哥哵

5

5405

5415

5425

5435

5445

5455

5465

5475

5485

5495

54A5

54B5

54C5

54D5

54E5

54F5

吆吖否吶呆呖呦呶咆咖咦咶哆哖哦哶

6

5406

5416

5426

5436

5446

5456

5466

5476

5486

5496

54A6

54B6

54C6

54D6

54E6

54F6

吇吗吧吷呇呗呧呷咇咗咧咷哇哗哧哷

7

5407

5417

5427

5437

5447

5457

5467

5477

5487

5497

54A7

54B7

54C7

54D7

54E7

54F7

合吘吨吸呈员周呸咈咘咨咸哈哘哨哸

8

5408

5418

5428

5438

5448

5458

5468

5478

5488

5498

54A8

54B8

54C8

54D8

54E8

54F8

吉吙吩吹呉呙呩呹咉咙咩咹哉哙哩哹

9

5409

5419

5429

5439

5449

5459

5469

5479

5489

5499

54A9

54B9

54C9

54D9

54E9

54F9

吊吚吪吺告呚呪呺咊咚咪咺哊哚哪哺

A

540A

541A

542A

543A

544A

545A

546A

547A

548A

549A

54AA

54BA

54CA

54DA

54EA

54FA

吋君含吻呋呛呫呻咋咛咫咻哋哛哫哻

B

540B

C

D

541B

542B

543B

544B

545B

546B

547B

548B

549B

54AB

54BB

54CB

54DB

54EB

54FB

同吜听吼呌呜呬呼和咜咬咼哌哜哬哼 540C

541C

542C

543C

544C

545C

546C

547C

548C

549C

54AC

54BC

54CC

54DC

54EC

54FC

名吝吭吽呍呝呭命咍咝咭咽响哝哭哽 540D

541D

542D

543D

544D

545D

546D

547D

548D

549D

54AD

54BD

54CD

54DD

54ED

54FD

后吞吮吾呎呞呮呾咎咞咮咾哎哞哮哾 540E

F

541

54FF

吀吐吠吰呀呐呠呰咀咐咠咰哀哐哠哰

0

E

CJK Unified Ideographs

541E

542E

543E

544E

545E

546E

547E

548E

549E

54AE

54BE

54CE

54DE

54EE

54FE

吏吟启吿呏呟呯呿咏咟咯咿哏哟哯哿 540F

322

541F

542F

543F

544F

545F

546F

547F

548F

549F

54AF

54BF

54CF

54DF

54EF

54FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

5500 550

5500

552

553

554

555

556

557

558

559

55A 55B 55C 55D 55E

55F

5510

5520

5530

5540

5550

5560

5570

5580

5590

55A0

55B0

55C0

55D0

55E0

55F0

唁唑唡唱啁啑啡啱喁喑喡喱嗁嗑嗡嗱

1

5501

5511

5521

5531

5541

5551

5561

5571

5581

5591

55A1

55B1

55C1

55D1

55E1

55F1

唂唒唢唲啂啒啢啲喂喒喢喲嗂嗒嗢嗲

2

5502

5512

5522

5532

5542

5552

5562

5572

5582

5592

55A2

55B2

55C2

55D2

55E2

55F2

唃唓唣唳啃啓啣啳喃喓喣喳嗃嗓嗣嗳

3

5503

5513

5523

5533

5543

5553

5563

5573

5583

5593

55A3

55B3

55C3

55D3

55E3

55F3

唄唔唤唴啄啔啤啴善喔喤喴嗄嗔嗤嗴

4

5504

5514

5524

5534

5544

5554

5564

5574

5584

5594

55A4

55B4

55C4

55D4

55E4

55F4

唅唕唥唵啅啕啥啵喅喕喥喵嗅嗕嗥嗵

5

5505

5515

5525

5535

5545

5555

5565

5575

5585

5595

55A5

55B5

55C5

55D5

55E5

55F5

唆唖唦唶商啖啦啶喆喖喦営嗆嗖嗦嗶

6

5506

5516

5526

5536

5546

5556

5566

5576

5586

5596

55A6

55B6

55C6

55D6

55E6

55F6

唇唗唧唷啇啗啧啷喇喗喧喷嗇嗗嗧嗷

7

5507

5517

5527

5537

5547

5557

5567

5577

5587

5597

55A7

55B7

55C7

55D7

55E7

55F7

唈唘唨唸啈啘啨啸喈喘喨喸嗈嗘嗨嗸

8

5508

5518

5528

5538

5548

5558

5568

5578

5588

5598

55A8

55B8

55C8

55D8

55E8

55F8

唉唙唩唹啉啙啩啹喉喙喩喹嗉嗙嗩嗹

9

5509

5519

5529

5539

5549

5559

5569

5579

5589

5599

55A9

55B9

55C9

55D9

55E9

55F9

唊唚唪唺啊啚啪啺喊喚喪喺嗊嗚嗪嗺

A

550A

551A

552A

553A

554A

555A

556A

557A

558A

559A

55AA

55BA

55CA

55DA

55EA

55FA

唋唛唫唻啋啛啫啻喋喛喫喻嗋嗛嗫嗻

B

550B

C

D

551B

552B

553B

554B

555B

556B

557B

558B

559B

55AB

55BB

55CB

55DB

55EB

55FB

唌唜唬唼啌啜啬啼喌喜喬喼嗌嗜嗬嗼 550C

551C

552C

553C

554C

555C

556C

557C

558C

559C

55AC

55BC

55CC

55DC

55EC

55FC

唍唝唭唽啍啝啭啽喍喝喭喽嗍嗝嗭嗽 550D

551D

552D

553D

554D

555D

556D

557D

558D

559D

55AD

55BD

55CD

55DD

55ED

55FD

唎唞售唾啎啞啮啾喎喞單喾嗎嗞嗮嗾 550E

F

551

55FF

唀唐唠唰啀啐啠啰喀喐喠喰嗀嗐嗠嗰

0

E

CJK Unified Ideographs

551E

552E

553E

554E

555E

556E

557E

558E

559E

55AE

55BE

55CE

55DE

55EE

55FE

唏唟唯唿問啟啯啿喏喟喯喿嗏嗟嗯嗿 550F

551F

552F

553F

554F

555F

556F

557F

558F

559F

55AF

55BF

55CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

55DF

55EF

55FF

323

5600 560

5600

562

563

564

565

566

567

568

569

56A 56B 56C 56D 56E

56F

5610

5620

5630

5640

5650

5660

5670

5680

5690

56A0

56B0

56C0

56D0

56E0

56F0

嘁嘑嘡嘱噁噑噡噱嚁嚑嚡嚱囁囑囡囱

1

5601

5611

5621

5631

5641

5651

5661

5671

5681

5691

56A1

56B1

56C1

56D1

56E1

56F1

嘂嘒嘢嘲噂噒噢噲嚂嚒嚢嚲囂囒团囲

2

5602

5612

5622

5632

5642

5652

5662

5672

5682

5692

56A2

56B2

56C2

56D2

56E2

56F2

嘃嘓嘣嘳噃噓噣噳嚃嚓嚣嚳囃囓団図

3

5603

5613

5623

5633

5643

5653

5663

5673

5683

5693

56A3

56B3

56C3

56D3

56E3

56F3

嘄嘔嘤嘴噄噔噤噴嚄嚔嚤嚴囄囔囤围

4

5604

5614

5624

5634

5644

5654

5664

5674

5684

5694

56A4

56B4

56C4

56D4

56E4

56F4

嘅嘕嘥嘵噅噕噥噵嚅嚕嚥嚵囅囕囥囵

5

5605

5615

5625

5635

5645

5655

5665

5675

5685

5695

56A5

56B5

56C5

56D5

56E5

56F5

嘆嘖嘦嘶噆噖噦噶嚆嚖嚦嚶囆囖囦囶

6

5606

5616

5626

5636

5646

5656

5666

5676

5686

5696

56A6

56B6

56C6

56D6

56E6

56F6

嘇嘗嘧嘷噇噗噧噷嚇嚗嚧嚷囇囗囧囷

7

5607

5617

5627

5637

5647

5657

5667

5677

5687

5697

56A7

56B7

56C7

56D7

56E7

56F7

嘈嘘嘨嘸噈噘器噸嚈嚘嚨嚸囈囘囨囸

8

5608

5618

5628

5638

5648

5658

5668

5678

5688

5698

56A8

56B8

56C8

56D8

56E8

56F8

嘉嘙嘩嘹噉噙噩噹嚉嚙嚩嚹囉囙囩囹

9

5609

5619

5629

5639

5649

5659

5669

5679

5689

5699

56A9

56B9

56C9

56D9

56E9

56F9

嘊嘚嘪嘺噊噚噪噺嚊嚚嚪嚺囊囚囪固

A

560A

561A

562A

563A

564A

565A

566A

567A

568A

569A

56AA

56BA

56CA

56DA

56EA

56FA

嘋嘛嘫嘻噋噛噫噻嚋嚛嚫嚻囋四囫囻

B

560B

C

D

561B

562B

563B

564B

565B

566B

567B

568B

569B

56AB

56BB

56CB

56DB

56EB

56FB

嘌嘜嘬嘼噌噜噬噼嚌嚜嚬嚼囌囜囬囼 560C

561C

562C

563C

564C

565C

566C

567C

568C

569C

56AC

56BC

56CC

56DC

56EC

56FC

嘍嘝嘭嘽噍噝噭噽嚍嚝嚭嚽囍囝园国 560D

561D

562D

563D

564D

565D

566D

567D

568D

569D

56AD

56BD

56CD

56DD

56ED

56FD

嘎嘞嘮嘾噎噞噮噾嚎嚞嚮嚾囎回囮图 560E

F

561

56FF

嘀嘐嘠嘰噀噐噠噰嚀嚐嚠嚰囀囐因困

0

E

CJK Unified Ideographs

561E

562E

563E

564E

565E

566E

567E

568E

569E

56AE

56BE

56CE

56DE

56EE

56FE

嘏嘟嘯嘿噏噟噯噿嚏嚟嚯嚿囏囟囯囿 560F

324

561F

562F

563F

564F

565F

566F

567F

568F

569F

56AF

56BF

56CF

56DF

56EF

56FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

5700 570

5700

572

573

574

575

576

577

578

579

57A 57B 57C 57D 57E

57F

5710

5720

5730

5740

5750

5760

5770

5780

5790

57A0

57B0

57C0

57D0

57E0

57F0

圁圑圡圱坁坑坡坱垁垑垡垱埁埑埡埱

1

5701

5711

5721

5731

5741

5751

5761

5771

5781

5791

57A1

57B1

57C1

57D1

57E1

57F1

圂園圢圲坂坒坢坲垂垒垢垲埂埒埢埲

2

5702

5712

5722

5732

5742

5752

5762

5772

5782

5792

57A2

57B2

57C2

57D2

57E2

57F2

圃圓圣圳坃坓坣坳垃垓垣垳埃埓埣埳

3

5703

5713

5723

5733

5743

5753

5763

5773

5783

5793

57A3

57B3

57C3

57D3

57E3

57F3

圄圔圤圴坄坔坤坴垄垔垤垴埄埔埤埴

4

5704

5714

5724

5734

5744

5754

5764

5774

5784

5794

57A4

57B4

57C4

57D4

57E4

57F4

圅圕圥圵坅坕坥坵垅垕垥垵埅埕埥埵

5

5705

5715

5725

5735

5745

5755

5765

5775

5785

5795

57A5

57B5

57C5

57D5

57E5

57F5

圆圖圦圶坆坖坦坶垆垖垦垶埆埖埦埶

6

5706

5716

5726

5736

5746

5756

5766

5776

5786

5796

57A6

57B6

57C6

57D6

57E6

57F6

圇圗圧圷均块坧坷垇垗垧垷埇埗埧執

7

5707

5717

5727

5737

5747

5757

5767

5777

5787

5797

57A7

57B7

57C7

57D7

57E7

57F7

圈團在圸坈坘坨坸垈垘垨垸埈埘埨埸

8

5708

5718

5728

5738

5748

5758

5768

5778

5788

5798

57A8

57B8

57C8

57D8

57E8

57F8

圉圙圩圹坉坙坩坹垉垙垩垹埉埙埩培

9

5709

5719

5729

5739

5749

5759

5769

5779

5789

5799

57A9

57B9

57C9

57D9

57E9

57F9

圊圚圪场坊坚坪坺垊垚垪垺埊埚埪基

A

570A

571A

572A

573A

574A

575A

576A

577A

578A

579A

57AA

57BA

57CA

57DA

57EA

57FA

國圛圫圻坋坛坫坻型垛垫垻埋埛埫埻

B

570B

C

D

571B

572B

573B

574B

575B

576B

577B

578B

579B

57AB

57BB

57CB

57DB

57EB

57FB

圌圜圬圼坌坜坬坼垌垜垬垼埌埜埬埼 570C

571C

572C

573C

574C

575C

576C

577C

578C

579C

57AC

57BC

57CC

57DC

57EC

57FC

圍圝圭圽坍坝坭坽垍垝垭垽埍埝埭埽 570D

571D

572D

573D

574D

575D

576D

577D

578D

579D

57AD

57BD

57CD

57DD

57ED

57FD

圎圞圮圾坎坞坮坾垎垞垮垾城埞埮埾 570E

F

571

57FF

圀圐圠地址坐坠坰垀垐垠垰埀埐埠埰

0

E

CJK Unified Ideographs

571E

572E

573E

574E

575E

576E

577E

578E

579E

57AE

57BE

57CE

57DE

57EE

57FE

圏土圯圿坏坟坯坿垏垟垯垿埏域埯埿 570F

571F

572F

573F

574F

575F

576F

577F

578F

579F

57AF

57BF

57CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

57DF

57EF

57FF

325

5800 580

5800

582

583

584

585

586

587

588

589

58A 58B 58C 58D 58E

58F

5810

5820

5830

5840

5850

5860

5870

5880

5890

58A0

58B0

58C0

58D0

58E0

58F0

堁堑堡報塁塑塡塱墁墑墡墱壁壑壡壱

1

5801

5811

5821

5831

5841

5851

5861

5871

5881

5891

58A1

58B1

58C1

58D1

58E1

58F1

堂堒堢堲塂塒塢塲墂墒墢墲壂壒壢売

2

5802

5812

5822

5832

5842

5852

5862

5872

5882

5892

58A2

58B2

58C2

58D2

58E2

58F2

堃堓堣堳塃塓塣塳境墓墣墳壃壓壣壳

3

5803

5813

5823

5833

5843

5853

5863

5873

5883

5893

58A3

58B3

58C3

58D3

58E3

58F3

堄堔堤場塄塔塤塴墄墔墤墴壄壔壤壴

4

5804

5814

5824

5834

5844

5854

5864

5874

5884

5894

58A4

58B4

58C4

58D4

58E4

58F4

堅堕堥堵塅塕塥塵墅墕墥墵壅壕壥壵

5

5805

5815

5825

5835

5845

5855

5865

5875

5885

5895

58A5

58B5

58C5

58D5

58E5

58F5

堆堖堦堶塆塖塦塶墆墖墦墶壆壖壦壶

6

5806

5816

5826

5836

5846

5856

5866

5876

5886

5896

58A6

58B6

58C6

58D6

58E6

58F6

堇堗堧堷塇塗塧塷墇増墧墷壇壗壧壷

7

5807

5817

5827

5837

5847

5857

5867

5877

5887

5897

58A7

58B7

58C7

58D7

58E7

58F7

堈堘堨堸塈塘塨塸墈墘墨墸壈壘壨壸

8

5808

5818

5828

5838

5848

5858

5868

5878

5888

5898

58A8

58B8

58C8

58D8

58E8

58F8

堉堙堩堹塉塙塩塹墉墙墩墹壉壙壩壹

9

5809

5819

5829

5839

5849

5859

5869

5879

5889

5899

58A9

58B9

58C9

58D9

58E9

58F9

堊堚堪堺塊塚塪塺墊墚墪墺壊壚壪壺

A

580A

581A

582A

583A

584A

585A

586A

587A

588A

589A

58AA

58BA

58CA

58DA

58EA

58FA

堋堛堫堻塋塛填塻墋墛墫墻壋壛士壻

B

580B

C

D

581B

582B

583B

584B

585B

586B

587B

588B

589B

58AB

58BB

58CB

58DB

58EB

58FB

堌堜堬堼塌塜塬塼墌墜墬墼壌壜壬壼 580C

581C

582C

583C

584C

585C

586C

587C

588C

589C

58AC

58BC

58CC

58DC

58EC

58FC

堍堝堭堽塍塝塭塽墍墝墭墽壍壝壭壽 580D

581D

582D

583D

584D

585D

586D

587D

588D

589D

58AD

58BD

58CD

58DD

58ED

58FD

堎堞堮堾塎塞塮塾墎增墮墾壎壞壮壾 580E

F

581

58FF

堀堐堠堰塀塐塠塰墀墐墠墰壀壐壠声

0

E

CJK Unified Ideographs

581E

582E

583E

584E

585E

586E

587E

588E

589E

58AE

58BE

58CE

58DE

58EE

58FE

堏堟堯堿塏塟塯塿墏墟墯墿壏壟壯壿 580F

326

581F

582F

583F

584F

585F

586F

587F

588F

589F

58AF

58BF

58CF

58DF

58EF

58FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

5900 590

5900

592

593

594

595

596

597

598

599

59A 59B 59C 59D 59E

59F

5910

5920

5930

5940

5950

5960

5970

5980

5990

59A0

59B0

59C0

59D0

59E0

59F0

夁夑夡失奁契奡奱妁妑妡妱姁姑姡姱

1

5901

5911

5921

5931

5941

5951

5961

5971

5981

5991

59A1

59B1

59C1

59D1

59E1

59F1

夂夒夢夲奂奒奢奲如妒妢妲姂姒姢姲

2

5902

5912

5922

5932

5942

5952

5962

5972

5982

5992

59A2

59B2

59C2

59D2

59E2

59F2

夃夓夣夳奃奓奣女妃妓妣妳姃姓姣姳

3

5903

5913

5923

5933

5943

5953

5963

5973

5983

5993

59A3

59B3

59C3

59D3

59E3

59F3

处夔夤头奄奔奤奴妄妔妤妴姄委姤姴

4

5904

5914

5924

5934

5944

5954

5964

5974

5984

5994

59A4

59B4

59C4

59D4

59E4

59F4

夅夕夥夵奅奕奥奵妅妕妥妵姅姕姥姵

5

5905

5915

5925

5935

5945

5955

5965

5975

5985

5995

59A5

59B5

59C5

59D5

59E5

59F5

夆外夦夶奆奖奦奶妆妖妦妶姆姖姦姶

6

5906

5916

5926

5936

5946

5956

5966

5976

5986

5996

59A6

59B6

59C6

59D6

59E6

59F6

备夗大夷奇套奧奷妇妗妧妷姇姗姧姷

7

5907

5917

5927

5937

5947

5957

5967

5977

5987

5997

59A7

59B7

59C7

59D7

59E7

59F7

夈夘夨夸奈奘奨奸妈妘妨妸姈姘姨姸

8

5908

5918

5928

5938

5948

5958

5968

5978

5988

5998

59A8

59B8

59C8

59D8

59E8

59F8

変夙天夹奉奙奩她妉妙妩妹姉姙姩姹

9

5909

5919

5929

5939

5949

5959

5969

5979

5989

5999

59A9

59B9

59C9

59D9

59E9

59F9

夊多太夺奊奚奪奺妊妚妪妺姊姚姪姺

A

590A

591A

592A

593A

594A

595A

596A

597A

598A

599A

59AA

59BA

59CA

59DA

59EA

59FA

夋夛夫夻奋奛奫奻妋妛妫妻始姛姫姻

B

590B

C

D

591B

592B

593B

594B

595B

596B

597B

598B

599B

59AB

59BB

59CB

59DB

59EB

59FB

夌夜夬夼奌奜奬奼妌妜妬妼姌姜姬姼 590C

591C

592C

593C

594C

595C

596C

597C

598C

599C

59AC

59BC

59CC

59DC

59EC

59FC

复夝夭夽奍奝奭好妍妝妭妽姍姝姭姽 590D

591D

592D

593D

594D

595D

596D

597D

598D

599D

59AD

59BD

59CD

59DD

59ED

59FD

夎夞央夾奎奞奮奾妎妞妮妾姎姞姮姾 590E

F

591

59FF

夀夐夠夰奀奐奠奰妀妐妠妰姀姐姠姰

0

E

CJK Unified Ideographs

591E

592E

593E

594E

595E

596E

597E

598E

599E

59AE

59BE

59CE

59DE

59EE

59FE

夏够夯夿奏奟奯奿妏妟妯妿姏姟姯姿 590F

591F

592F

593F

594F

595F

596F

597F

598F

599F

59AF

59BF

59CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

59DF

59EF

59FF

327

5A00

CJK Unified Ideographs

5AFF

5A0 5A1 5A2 5A3 5A4 5A5 5A6 5A7 5A8 5A9 5AA 5AB 5AC 5AD 5AE 5AF

娀娐娠娰婀婐婠婰媀媐媠媰嫀嫐嫠嫰

0

5A00

5A01

5A30

5A40

5A50

5A60

5A70

5A80

5A90

5AA0

5AB0

5AC0

5AD0

5AE0

5AF0

5A11

5A21

5A31

5A41

5A51

5A61

5A71

5A81

5A91

5AA1

5AB1

5AC1

5AD1

5AE1

5AF1

娂娒娢娲婂婒婢婲媂媒媢媲嫂嫒嫢嫲

2

5A02

5A12

5A22

5A32

5A42

5A52

5A62

5A72

5A82

5A92

5AA2

5AB2

5AC2

5AD2

5AE2

5AF2

娃娓娣娳婃婓婣婳媃媓媣媳嫃嫓嫣嫳

3

5A03

5A13

5A23

5A33

5A43

5A53

5A63

5A73

5A83

5A93

5AA3

5AB3

5AC3

5AD3

5AE3

5AF3

娄娔娤娴婄婔婤婴媄媔媤媴嫄嫔嫤嫴

4

5A04

5A14

5A24

5A34

5A44

5A54

5A64

5A74

5A84

5A94

5AA4

5AB4

5AC4

5AD4

5AE4

5AF4

娅娕娥娵婅婕婥婵媅媕媥媵嫅嫕嫥嫵

5

5A05

5A15

5A25

5A35

5A45

5A55

5A65

5A75

5A85

5A95

5AA5

5AB5

5AC5

5AD5

5AE5

5AF5

娆娖娦娶婆婖婦婶媆媖媦媶嫆嫖嫦嫶

6

5A06

5A16

5A26

5A36

5A46

5A56

5A66

5A76

5A86

5A96

5AA6

5AB6

5AC6

5AD6

5AE6

5AF6

娇娗娧娷婇婗婧婷媇媗媧媷嫇嫗嫧嫷

7

5A07

5A17

5A27

5A37

5A47

5A57

5A67

5A77

5A87

5A97

5AA7

5AB7

5AC7

5AD7

5AE7

5AF7

娈娘娨娸婈婘婨婸媈媘媨媸嫈嫘嫨嫸

8

5A08

5A18

5A28

5A38

5A48

5A58

5A68

5A78

5A88

5A98

5AA8

5AB8

5AC8

5AD8

5AE8

5AF8

娉娙娩娹婉婙婩婹媉媙媩媹嫉嫙嫩嫹

9

5A09

5A19

5A29

5A39

5A49

5A59

5A69

5A79

5A89

5A99

5AA9

5AB9

5AC9

5AD9

5AE9

5AF9

娊娚娪娺婊婚婪婺媊媚媪媺嫊嫚嫪嫺

A

5A0A

5A1A

5A2A

5A3A

5A4A

5A5A

5A6A

5A7A

5A8A

5A9A

5AAA

5ABA

5ACA

5ADA

5AEA

5AFA

娋娛娫娻婋婛婫婻媋媛媫媻嫋嫛嫫嫻

B

5A0B

C

D

5A1B

5A2B

5A3B

5A4B

5A5B

5A6B

5A7B

5A8B

5A9B

5AAB

5ABB

5ACB

5ADB

5AEB

5AFB

娌娜娬娼婌婜婬婼媌媜媬媼嫌嫜嫬嫼 5A0C

5A1C

5A2C

5A3C

5A4C

5A5C

5A6C

5A7C

5A8C

5A9C

5AAC

5ABC

5ACC

5ADC

5AEC

5AFC

娍娝娭娽婍婝婭婽媍媝媭媽嫍嫝嫭嫽 5A0D

5A1D

5A2D

5A3D

5A4D

5A5D

5A6D

5A7D

5A8D

5A9D

5AAD

5ABD

5ACD

5ADD

5AED

5AFD

娎娞娮娾婎婞婮婾媎媞媮媾嫎嫞嫮嫾 5A0E

F

5A20

威娑娡娱婁婑婡婱媁媑媡媱嫁嫑嫡嫱

1

E

5A10

5A1E

5A2E

5A3E

5A4E

5A5E

5A6E

5A7E

5A8E

5A9E

5AAE

5ABE

5ACE

5ADE

5AEE

5AFE

娏娟娯娿婏婟婯婿媏媟媯媿嫏嫟嫯嫿 5A0F

328

5A1F

5A2F

5A3F

5A4F

5A5F

5A6F

5A7F

5A8F

5A9F

5AAF

5ABF

5ACF

5ADF

5AEF

5AFF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

5B00

CJK Unified Ideographs

5BFF

5B0 5B1 5B2 5B3 5B4 5B5 5B6 5B7 5B8 5B9 5BA 5BB 5BC 5BD 5BE 5BF

嬀嬐嬠嬰孀子孠孰宀宐宠宰寀寐寠寰

0

5B00

5B01

5B30

5B40

5B50

5B60

5B70

5B80

5B90

5BA0

5BB0

5BC0

5BD0

5BE0

5BF0

5B11

5B21

5B31

5B41

5B51

5B61

5B71

5B81

5B91

5BA1

5BB1

5BC1

5BD1

5BE1

5BF1

嬂嬒嬢嬲孂孒孢孲宂宒客宲寂寒寢寲

2

5B02

5B12

5B22

5B32

5B42

5B52

5B62

5B72

5B82

5B92

5BA2

5BB2

5BC2

5BD2

5BE2

5BF2

嬃嬓嬣嬳孃孓季孳它宓宣害寃寓寣寳

3

5B03

5B13

5B23

5B33

5B43

5B53

5B63

5B73

5B83

5B93

5BA3

5BB3

5BC3

5BD3

5BE3

5BF3

嬄嬔嬤嬴孄孔孤孴宄宔室宴寄寔寤寴

4

5B04

5B14

5B24

5B34

5B44

5B54

5B64

5B74

5B84

5B94

5BA4

5BB4

5BC4

5BD4

5BE4

5BF4

嬅嬕嬥嬵孅孕孥孵宅宕宥宵寅寕寥寵

5

5B05

5B15

5B25

5B35

5B45

5B55

5B65

5B75

5B85

5B95

5BA5

5BB5

5BC5

5BD5

5BE5

5BF5

嬆嬖嬦嬶孆孖学孶宆宖宦家密寖實寶

6

5B06

5B16

5B26

5B36

5B46

5B56

5B66

5B76

5B86

5B96

5BA6

5BB6

5BC6

5BD6

5BE6

5BF6

嬇嬗嬧嬷孇字孧孷宇宗宧宷寇寗寧寷

7

5B07

5B17

5B27

5B37

5B47

5B57

5B67

5B77

5B87

5B97

5BA7

5BB7

5BC7

5BD7

5BE7

5BF7

嬈嬘嬨嬸孈存孨學守官宨宸寈寘寨寸

8

5B08

5B18

5B28

5B38

5B48

5B58

5B68

5B78

5B88

5B98

5BA8

5BB8

5BC8

5BD8

5BE8

5BF8

嬉嬙嬩嬹孉孙孩孹安宙宩容寉寙審对

9

5B09

5B19

5B29

5B39

5B49

5B59

5B69

5B79

5B89

5B99

5BA9

5BB9

5BC9

5BD9

5BE9

5BF9

嬊嬚嬪嬺孊孚孪孺宊定宪宺寊寚寪寺

A

5B0A

5B1A

5B2A

5B3A

5B4A

5B5A

5B6A

5B7A

5B8A

5B9A

5BAA

5BBA

5BCA

5BDA

5BEA

5BFA

嬋嬛嬫嬻孋孛孫孻宋宛宫宻寋寛寫寻

B

5B0B

C

D

5B1B

5B2B

5B3B

5B4B

5B5B

5B6B

5B7B

5B8B

5B9B

5BAB

5BBB

5BCB

5BDB

5BEB

5BFB

嬌嬜嬬嬼孌孜孬孼完宜宬宼富寜寬导 5B0C

5B1C

5B2C

5B3C

5B4C

5B5C

5B6C

5B7C

5B8C

5B9C

5BAC

5BBC

5BCC

5BDC

5BEC

5BFC

嬍嬝嬭嬽孍孝孭孽宍宝宭宽寍寝寭寽 5B0D

5B1D

5B2D

5B3D

5B4D

5B5D

5B6D

5B7D

5B8D

5B9D

5BAD

5BBD

5BCD

5BDD

5BED

5BFD

嬎嬞嬮嬾孎孞孮孾宎实宮宾寎寞寮対 5B0E

F

5B20

嬁嬑嬡嬱孁孑孡孱宁宑审宱寁寑寡寱

1

E

5B10

5B1E

5B2E

5B3E

5B4E

5B5E

5B6E

5B7E

5B8E

5B9E

5BAE

5BBE

5BCE

5BDE

5BEE

5BFE

嬏嬟嬯嬿孏孟孯孿宏実宯宿寏察寯寿 5B0F

5B1F

5B2F

5B3F

5B4F

5B5F

5B6F

5B7F

5B8F

5B9F

5BAF

5BBF

5BCF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

5BDF

5BEF

5BFF

329

5C00

CJK Unified Ideographs

5CFF

5C0 5C1 5C2 5C3 5C4 5C5 5C6 5C7 5C8 5C9 5CA 5CB 5CC 5CD 5CE 5CF

尀尐尠尰局屐屠屰岀岐岠岰峀峐峠峰

0

5C00

5C01

5C30

5C40

5C50

5C60

5C70

5C80

5C90

5CA0

5CB0

5CC0

5CD0

5CE0

5CF0

5C11

5C21

5C31

5C41

5C51

5C61

5C71

5C81

5C91

5CA1

5CB1

5CC1

5CD1

5CE1

5CF1

専尒尢尲层屒屢屲岂岒岢岲峂峒峢峲

2

5C02

5C12

5C22

5C32

5C42

5C52

5C62

5C72

5C82

5C92

5CA2

5CB2

5CC2

5CD2

5CE2

5CF2

尃尓尣尳屃屓屣屳岃岓岣岳峃峓峣峳

3

5C03

5C13

5C23

5C33

5C43

5C53

5C63

5C73

5C83

5C93

5CA3

5CB3

5CC3

5CD3

5CE3

5CF3

射尔尤尴屄屔層屴岄岔岤岴峄峔峤峴

4

5C04

5C14

5C24

5C34

5C44

5C54

5C64

5C74

5C84

5C94

5CA4

5CB4

5CC4

5CD4

5CE4

5CF4

尅尕尥尵居展履屵岅岕岥岵峅峕峥峵

5

5C05

5C15

5C25

5C35

5C45

5C55

5C65

5C75

5C85

5C95

5CA5

5CB5

5CC5

5CD5

5CE5

5CF5

将尖尦尶屆屖屦屶岆岖岦岶峆峖峦島

6

5C06

5C16

5C26

5C36

5C46

5C56

5C66

5C76

5C86

5C96

5CA6

5CB6

5CC6

5CD6

5CE6

5CF6

將尗尧尷屇屗屧屷岇岗岧岷峇峗峧峷

7

5C07

5C17

5C27

5C37

5C47

5C57

5C67

5C77

5C87

5C97

5CA7

5CB7

5CC7

5CD7

5CE7

5CF7

專尘尨尸屈屘屨屸岈岘岨岸峈峘峨峸

8

5C08

5C18

5C28

5C38

5C48

5C58

5C68

5C78

5C88

5C98

5CA8

5CB8

5CC8

5CD8

5CE8

5CF8

尉尙尩尹屉屙屩屹岉岙岩岹峉峙峩峹

9

5C09

5C19

5C29

5C39

5C49

5C59

5C69

5C79

5C89

5C99

5CA9

5CB9

5CC9

5CD9

5CE9

5CF9

尊尚尪尺届屚屪屺岊岚岪岺峊峚峪峺

A

5C0A

5C1A

5C2A

5C3A

5C4A

5C5A

5C6A

5C7A

5C8A

5C9A

5CAA

5CBA

5CCA

5CDA

5CEA

5CFA

尋尛尫尻屋屛屫屻岋岛岫岻峋峛峫峻

B

5C0B

C

D

5C1B

5C2B

5C3B

5C4B

5C5B

5C6B

5C7B

5C8B

5C9B

5CAB

5CBB

5CCB

5CDB

5CEB

5CFB

尌尜尬尼屌屜屬屼岌岜岬岼峌峜峬峼 5C0C

5C1C

5C2C

5C3C

5C4C

5C5C

5C6C

5C7C

5C8C

5C9C

5CAC

5CBC

5CCC

5CDC

5CEC

5CFC

對尝尭尽屍屝屭屽岍岝岭岽峍峝峭峽 5C0D

5C1D

5C2D

5C3D

5C4D

5C5D

5C6D

5C7D

5C8D

5C9D

5CAD

5CBD

5CCD

5CDD

5CED

5CFD

導尞尮尾屎属屮屾岎岞岮岾峎峞峮峾 5C0E

F

5C20

封少尡就屁屑屡山岁岑岡岱峁峑峡峱

1

E

5C10

5C1E

5C2E

5C3E

5C4E

5C5E

5C6E

5C7E

5C8E

5C9E

5CAE

5CBE

5CCE

5CDE

5CEE

5CFE

小尟尯尿屏屟屯屿岏岟岯岿峏峟峯峿 5C0F

330

5C1F

5C2F

5C3F

5C4F

5C5F

5C6F

5C7F

5C8F

5C9F

5CAF

5CBF

5CCF

5CDF

5CEF

5CFF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

5D00

CJK Unified Ideographs

5DFF

5D0 5D1 5D2 5D3 5D4 5D5 5D6 5D7 5D8 5D9 5DA 5DB 5DC 5DD 5DE 5DF

崀崐崠崰嵀嵐嵠嵰嶀嶐嶠嶰巀巐巠巰

0

5D00

5D01

5D30

5D40

5D50

5D60

5D70

5D80

5D90

5DA0

5DB0

5DC0

5DD0

5DE0

5DF0

5D11

5D21

5D31

5D41

5D51

5D61

5D71

5D81

5D91

5DA1

5DB1

5DC1

5DD1

5DE1

5DF1

崂崒崢崲嵂嵒嵢嵲嶂嶒嶢嶲巂巒巢已

2

5D02

5D12

5D22

5D32

5D42

5D52

5D62

5D72

5D82

5D92

5DA2

5DB2

5DC2

5DD2

5DE2

5DF2

崃崓崣崳嵃嵓嵣嵳嶃嶓嶣嶳巃巓巣巳

3

5D03

5D13

5D23

5D33

5D43

5D53

5D63

5D73

5D83

5D93

5DA3

5DB3

5DC3

5DD3

5DE3

5DF3

崄崔崤崴嵄嵔嵤嵴嶄嶔嶤嶴巄巔巤巴

4

5D04

5D14

5D24

5D34

5D44

5D54

5D64

5D74

5D84

5D94

5DA4

5DB4

5DC4

5DD4

5DE4

5DF4

崅崕崥崵嵅嵕嵥嵵嶅嶕嶥嶵巅巕工巵

5

5D05

5D15

5D25

5D35

5D45

5D55

5D65

5D75

5D85

5D95

5DA5

5DB5

5DC5

5DD5

5DE5

5DF5

崆崖崦崶嵆嵖嵦嵶嶆嶖嶦嶶巆巖左巶

6

5D06

5D16

5D26

5D36

5D46

5D56

5D66

5D76

5D86

5D96

5DA6

5DB6

5DC6

5DD6

5DE6

5DF6

崇崗崧崷嵇嵗嵧嵷嶇嶗嶧嶷巇巗巧巷

7

5D07

5D17

5D27

5D37

5D47

5D57

5D67

5D77

5D87

5D97

5DA7

5DB7

5DC7

5DD7

5DE7

5DF7

崈崘崨崸嵈嵘嵨嵸嶈嶘嶨嶸巈巘巨巸

8

5D08

5D18

5D28

5D38

5D48

5D58

5D68

5D78

5D88

5D98

5DA8

5DB8

5DC8

5DD8

5DE8

5DF8

崉崙崩崹嵉嵙嵩嵹嶉嶙嶩嶹巉巙巩巹

9

5D09

5D19

5D29

5D39

5D49

5D59

5D69

5D79

5D89

5D99

5DA9

5DB9

5DC9

5DD9

5DE9

5DF9

崊崚崪崺嵊嵚嵪嵺嶊嶚嶪嶺巊巚巪巺

A

5D0A

5D1A

5D2A

5D3A

5D4A

5D5A

5D6A

5D7A

5D8A

5D9A

5DAA

5DBA

5DCA

5DDA

5DEA

5DFA

崋崛崫崻嵋嵛嵫嵻嶋嶛嶫嶻巋巛巫巻

B

5D0B

C

D

5D1B

5D2B

5D3B

5D4B

5D5B

5D6B

5D7B

5D8B

5D9B

5DAB

5DBB

5DCB

5DDB

5DEB

5DFB

崌崜崬崼嵌嵜嵬嵼嶌嶜嶬嶼巌巜巬巼 5D0C

5D1C

5D2C

5D3C

5D4C

5D5C

5D6C

5D7C

5D8C

5D9C

5DAC

5DBC

5DCC

5DDC

5DEC

5DFC

崍崝崭崽嵍嵝嵭嵽嶍嶝嶭嶽巍川巭巽 5D0D

5D1D

5D2D

5D3D

5D4D

5D5D

5D6D

5D7D

5D8D

5D9D

5DAD

5DBD

5DCD

5DDD

5DED

5DFD

崎崞崮崾嵎嵞嵮嵾嶎嶞嶮嶾巎州差巾 5D0E

F

5D20

崁崑崡崱嵁嵑嵡嵱嶁嶑嶡嶱巁巑巡己

1

E

5D10

5D1E

5D2E

5D3E

5D4E

5D5E

5D6E

5D7E

5D8E

5D9E

5DAE

5DBE

5DCE

5DDE

5DEE

5DFE

崏崟崯崿嵏嵟嵯嵿嶏嶟嶯嶿巏巟巯巿 5D0F

5D1F

5D2F

5D3F

5D4F

5D5F

5D6F

5D7F

5D8F

5D9F

5DAF

5DBF

5DCF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

5DDF

5DEF

5DFF

331

5E00

CJK Unified Ideographs

5EFF

5E0 5E1 5E2 5E3 5E4 5E5 5E6 5E7 5E8 5E9 5EA 5EB 5EC 5ED 5EE 5EF

帀帐帠帰幀幐幠幰庀庐庠庰廀廐廠廰

0

5E00

5E01

5E30

5E40

5E50

5E60

5E70

5E80

5E90

5EA0

5EB0

5EC0

5ED0

5EE0

5EF0

5E11

5E21

5E31

5E41

5E51

5E61

5E71

5E81

5E91

5EA1

5EB1

5EC1

5ED1

5EE1

5EF1

市帒帢帲幂幒幢干庂庒庢庲廂廒廢廲

2

5E02

5E12

5E22

5E32

5E42

5E52

5E62

5E72

5E82

5E92

5EA2

5EB2

5EC2

5ED2

5EE2

5EF2

布帓帣帳幃幓幣平広库庣庳廃廓廣廳

3

5E03

5E13

5E23

5E33

5E43

5E53

5E63

5E73

5E83

5E93

5EA3

5EB3

5EC3

5ED3

5EE3

5EF3

帄帔帤帴幄幔幤年庄应庤庴廄廔廤廴

4

5E04

5E14

5E24

5E34

5E44

5E54

5E64

5E74

5E84

5E94

5EA4

5EB4

5EC4

5ED4

5EE4

5EF4

帅帕帥帵幅幕幥幵庅底庥庵廅廕廥廵

5

5E05

5E15

5E25

5E35

5E45

5E55

5E65

5E75

5E85

5E95

5EA5

5EB5

5EC5

5ED5

5EE5

5EF5

帆帖带帶幆幖幦并庆庖度庶廆廖廦延

6

5E06

5E16

5E26

5E36

5E46

5E56

5E66

5E76

5E86

5E96

5EA6

5EB6

5EC6

5ED6

5EE6

5EF6

帇帗帧帷幇幗幧幷庇店座康廇廗廧廷

7

5E07

5E17

5E27

5E37

5E47

5E57

5E67

5E77

5E87

5E97

5EA7

5EB7

5EC7

5ED7

5EE7

5EF7

师帘帨常幈幘幨幸庈庘庨庸廈廘廨廸

8

5E08

5E18

5E28

5E38

5E48

5E58

5E68

5E78

5E88

5E98

5EA8

5EB8

5EC8

5ED8

5EE8

5EF8

帉帙帩帹幉幙幩幹庉庙庩庹廉廙廩廹

9

5E09

5E19

5E29

5E39

5E49

5E59

5E69

5E79

5E89

5E99

5EA9

5EB9

5EC9

5ED9

5EE9

5EF9

帊帚帪帺幊幚幪幺床庚庪庺廊廚廪建

A

5E0A

5E1A

5E2A

5E3A

5E4A

5E5A

5E6A

5E7A

5E8A

5E9A

5EAA

5EBA

5ECA

5EDA

5EEA

5EFA

帋帛師帻幋幛幫幻庋庛庫庻廋廛廫廻

B

5E0B

C

D

5E1B

5E2B

5E3B

5E4B

5E5B

5E6B

5E7B

5E8B

5E9B

5EAB

5EBB

5ECB

5EDB

5EEB

5EFB

希帜帬帼幌幜幬幼庌府庬庼廌廜廬廼 5E0C

5E1C

5E2C

5E3C

5E4C

5E5C

5E6C

5E7C

5E8C

5E9C

5EAC

5EBC

5ECC

5EDC

5EEC

5EFC

帍帝席帽幍幝幭幽庍庝庭庽廍廝廭廽 5E0D

5E1D

5E2D

5E3D

5E4D

5E5D

5E6D

5E7D

5E8D

5E9D

5EAD

5EBD

5ECD

5EDD

5EED

5EFD

帎帞帮帾幎幞幮幾庎庞庮庾廎廞廮廾 5E0E

F

5E20

币帑帡帱幁幑幡幱庁庑庡庱廁廑廡廱

1

E

5E10

5E1E

5E2E

5E3E

5E4E

5E5E

5E6E

5E7E

5E8E

5E9E

5EAE

5EBE

5ECE

5EDE

5EEE

5EFE

帏帟帯帿幏幟幯广序废庯庿廏廟廯廿 5E0F

332

5E1F

5E2F

5E3F

5E4F

5E5F

5E6F

5E7F

5E8F

5E9F

5EAF

5EBF

5ECF

5EDF

5EEF

5EFF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

5F00 5F0

5F00

5F2

5F3

5F4

5F5

5F6

5F7

5F8

5F9 5FA 5FB 5FC 5FD 5FE 5FF

5F10

5F20

5F30

5F40

5F50

5F60

5F70

5F80

5F90

5FA0

5FB0

5FC0

5FD0

5FE0

5FF0

弁弑弡弱彁彑彡影征徑御徱忁忑忡忱

1

5F01

5F11

5F21

5F31

5F41

5F51

5F61

5F71

5F81

5F91

5FA1

5FB1

5FC1

5FD1

5FE1

5FF1

异弒弢弲彂归形彲徂徒徢徲忂忒忢忲

2

5F02

5F12

5F22

5F32

5F42

5F52

5F62

5F72

5F82

5F92

5FA2

5FB2

5FC2

5FD2

5FE2

5FF2

弃弓弣弳彃当彣彳徃従徣徳心忓忣忳

3

5F03

5F13

5F23

5F33

5F43

5F53

5F63

5F73

5F83

5F93

5FA3

5FB3

5FC3

5FD3

5FE3

5FF3

弄弔弤弴彄彔彤彴径徔徤徴忄忔忤忴

4

5F04

5F14

5F24

5F34

5F44

5F54

5F64

5F74

5F84

5F94

5FA4

5FB4

5FC4

5FD4

5FE4

5FF4

弅引弥張彅录彥彵待徕徥徵必忕忥念

5

5F05

5F15

5F25

5F35

5F45

5F55

5F65

5F75

5F85

5F95

5FA5

5FB5

5FC5

5FD5

5FE5

5FF5

弆弖弦弶彆彖彦彶徆徖徦徶忆忖忦忶

6

5F06

5F16

5F26

5F36

5F46

5F56

5F66

5F76

5F86

5F96

5FA6

5FB6

5FC6

5FD6

5FE6

5FF6

弇弗弧強彇彗彧彷徇得徧德忇志忧忷

7

5F07

5F17

5F27

5F37

5F47

5F57

5F67

5F77

5F87

5F97

5FA7

5FB7

5FC7

5FD7

5FE7

5FF7

弈弘弨弸彈彘彨彸很徘徨徸忈忘忨忸

8

5F08

5F18

5F28

5F38

5F48

5F58

5F68

5F78

5F88

5F98

5FA8

5FB8

5FC8

5FD8

5FE8

5FF8

弉弙弩弹彉彙彩役徉徙復徹忉忙忩忹

9

5F09

5F19

5F29

5F39

5F49

5F59

5F69

5F79

5F89

5F99

5FA9

5FB9

5FC9

5FD9

5FE9

5FF9

弊弚弪强彊彚彪彺徊徚循徺忊忚忪忺

A

5F0A

5F1A

5F2A

5F3A

5F4A

5F5A

5F6A

5F7A

5F8A

5F9A

5FAA

5FBA

5FCA

5FDA

5FEA

5FFA

弋弛弫弻彋彛彫彻律徛徫徻忋忛快忻

B

5F0B

C

D

5F1B

5F2B

5F3B

5F4B

5F5B

5F6B

5F7B

5F8B

5F9B

5FAB

5FBB

5FCB

5FDB

5FEB

5FFB

弌弜弬弼彌彜彬彼後徜徬徼忌応忬忼 5F0C

5F1C

5F2C

5F3C

5F4C

5F5C

5F6C

5F7C

5F8C

5F9C

5FAC

5FBC

5FCC

5FDC

5FEC

5FFC

弍弝弭弽彍彝彭彽徍徝徭徽忍忝忭忽 5F0D

5F1D

5F2D

5F3D

5F4D

5F5D

5F6D

5F7D

5F8D

5F9D

5FAD

5FBD

5FCD

5FDD

5FED

5FFD

弎弞弮弾彎彞彮彾徎從微徾忎忞忮忾 5F0E

F

5F1

5FFF

开弐张弰彀彐彠彰往徐徠徰忀忐忠忰

0

E

CJK Unified Ideographs

5F1E

5F2E

5F3E

5F4E

5F5E

5F6E

5F7E

5F8E

5F9E

5FAE

5FBE

5FCE

5FDE

5FEE

5FFE

式弟弯弿彏彟彯彿徏徟徯徿忏忟忯忿 5F0F

5F1F

5F2F

5F3F

5F4F

5F5F

5F6F

5F7F

5F8F

5F9F

5FAF

5FBF

5FCF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

5FDF

5FEF

5FFF

333

6000 600

6000

602

603

604

605

606

607

608

609

60A 60B 60C 60D 60E

60F

6010

6020

6030

6040

6050

6060

6070

6080

6090

60A0

60B0

60C0

60D0

60E0

60F0

态怑怡怱恁恑恡恱悁悑悡悱惁惑惡惱

1

6001

6011

6021

6031

6041

6051

6061

6071

6081

6091

60A1

60B1

60C1

60D1

60E1

60F1

怂怒怢怲恂恒恢恲悂悒悢悲惂惒惢惲

2

6002

6012

6022

6032

6042

6052

6062

6072

6082

6092

60A2

60B2

60C2

60D2

60E2

60F2

怃怓怣怳恃恓恣恳悃悓患悳惃惓惣想

3

6003

6013

6023

6033

6043

6053

6063

6073

6083

6093

60A3

60B3

60C3

60D3

60E3

60F3

怄怔怤怴恄恔恤恴悄悔悤悴惄惔惤惴

4

6004

6014

6024

6034

6044

6054

6064

6074

6084

6094

60A4

60B4

60C4

60D4

60E4

60F4

怅怕急怵恅恕恥恵悅悕悥悵情惕惥惵

5

6005

6015

6025

6035

6045

6055

6065

6075

6085

6095

60A5

60B5

60C5

60D5

60E5

60F5

怆怖怦怶恆恖恦恶悆悖悦悶惆惖惦惶

6

6006

6016

6026

6036

6046

6056

6066

6076

6086

6096

60A6

60B6

60C6

60D6

60E6

60F6

怇怗性怷恇恗恧恷悇悗悧悷惇惗惧惷

7

6007

6017

6027

6037

6047

6057

6067

6077

6087

6097

60A7

60B7

60C7

60D7

60E7

60F7

怈怘怨怸恈恘恨恸悈悘您悸惈惘惨惸

8

6008

6018

6028

6038

6048

6058

6068

6078

6088

6098

60A8

60B8

60C8

60D8

60E8

60F8

怉怙怩怹恉恙恩恹悉悙悩悹惉惙惩惹

9

6009

6019

6029

6039

6049

6059

6069

6079

6089

6099

60A9

60B9

60C9

60D9

60E9

60F9

怊怚怪怺恊恚恪恺悊悚悪悺惊惚惪惺

A

600A

601A

602A

603A

604A

605A

606A

607A

608A

609A

60AA

60BA

60CA

60DA

60EA

60FA

怋怛怫总恋恛恫恻悋悛悫悻惋惛惫惻

B

600B

C

D

601B

602B

603B

604B

605B

606B

607B

608B

609B

60AB

60BB

60CB

60DB

60EB

60FB

怌怜怬怼恌恜恬恼悌悜悬悼惌惜惬惼 600C

601C

602C

603C

604C

605C

606C

607C

608C

609C

60AC

60BC

60CC

60DC

60EC

60FC

怍思怭怽恍恝恭恽悍悝悭悽惍惝惭惽 600D

601D

602D

603D

604D

605D

606D

607D

608D

609D

60AD

60BD

60CD

60DD

60ED

60FD

怎怞怮怾恎恞恮恾悎悞悮悾惎惞惮惾 600E

F

601

60FF

怀怐怠怰恀恐恠恰悀悐悠悰惀惐惠惰

0

E

CJK Unified Ideographs

601E

602E

603E

604E

605E

606E

607E

608E

609E

60AE

60BE

60CE

60DE

60EE

60FE

怏怟怯怿恏恟息恿悏悟悯悿惏惟惯惿 600F

334

601F

602F

603F

604F

605F

606F

607F

608F

609F

60AF

60BF

60CF

60DF

60EF

60FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

6100 610

6100

612

613

614

615

616

617

618

619

61A 61B 61C 61D 61E

61F

6110

6120

6130

6140

6150

6160

6170

6180

6190

61A0

61B0

61C0

61D0

61E0

61F0

愁愑愡愱慁慑慡慱憁憑憡憱懁懑懡懱

1

6101

6111

6121

6131

6141

6151

6161

6171

6181

6191

61A1

61B1

61C1

61D1

61E1

61F1

愂愒愢愲慂慒慢慲憂憒憢憲懂懒懢懲

2

6102

6112

6122

6132

6142

6152

6162

6172

6182

6192

61A2

61B2

61C2

61D2

61E2

61F2

愃愓愣愳慃慓慣慳憃憓憣憳懃懓懣懳

3

6103

6113

6123

6133

6143

6153

6163

6173

6183

6193

61A3

61B3

61C3

61D3

61E3

61F3

愄愔愤愴慄慔慤慴憄憔憤憴懄懔懤懴

4

6104

6114

6124

6134

6144

6154

6164

6174

6184

6194

61A4

61B4

61C4

61D4

61E4

61F4

愅愕愥愵慅慕慥慵憅憕憥憵懅懕懥懵

5

6105

6115

6125

6135

6145

6155

6165

6175

6185

6195

61A5

61B5

61C5

61D5

61E5

61F5

愆愖愦愶慆慖慦慶憆憖憦憶懆懖懦懶

6

6106

6116

6126

6136

6146

6156

6166

6176

6186

6196

61A6

61B6

61C6

61D6

61E6

61F6

愇愗愧愷慇慗慧慷憇憗憧憷懇懗懧懷

7

6107

6117

6127

6137

6147

6157

6167

6177

6187

6197

61A7

61B7

61C7

61D7

61E7

61F7

愈愘愨愸慈慘慨慸憈憘憨憸懈懘懨懸

8

6108

6118

6128

6138

6148

6158

6168

6178

6188

6198

61A8

61B8

61C8

61D8

61E8

61F8

愉愙愩愹慉慙慩慹憉憙憩憹應懙懩懹

9

6109

6119

6129

6139

6149

6159

6169

6179

6189

6199

61A9

61B9

61C9

61D9

61E9

61F9

愊愚愪愺慊慚慪慺憊憚憪憺懊懚懪懺

A

610A

611A

612A

613A

614A

615A

616A

617A

618A

619A

61AA

61BA

61CA

61DA

61EA

61FA

愋愛愫愻態慛慫慻憋憛憫憻懋懛懫懻

B

610B

C

D

611B

612B

613B

614B

615B

616B

617B

618B

619B

61AB

61BB

61CB

61DB

61EB

61FB

愌愜愬愼慌慜慬慼憌憜憬憼懌懜懬懼 610C

611C

612C

613C

614C

615C

616C

617C

618C

619C

61AC

61BC

61CC

61DC

61EC

61FC

愍愝愭愽慍慝慭慽憍憝憭憽懍懝懭懽 610D

611D

612D

613D

614D

615D

616D

617D

618D

619D

61AD

61BD

61CD

61DD

61ED

61FD

愎愞愮愾慎慞慮慾憎憞憮憾懎懞懮懾 610E

F

611

61FF

愀愐愠愰慀慐慠慰憀憐憠憰懀懐懠懰

0

E

CJK Unified Ideographs

611E

612E

613E

614E

615E

616E

617E

618E

619E

61AE

61BE

61CE

61DE

61EE

61FE

意感愯愿慏慟慯慿憏憟憯憿懏懟懯懿 610F

611F

612F

613F

614F

615F

616F

617F

618F

619F

61AF

61BF

61CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

61DF

61EF

61FF

335

6200 620

6200

622

623

624

625

626

627

628

629

62A 62B 62C 62D 62E

62F

6210

6220

6230

6240

6250

6260

6270

6280

6290

62A0

62B0

62C0

62D0

62E0

62F0

戁我戡戱扁扑扡扱抁抑抡抱拁拑拡拱

1

6201

6211

6221

6231

6241

6251

6261

6271

6281

6291

62A1

62B1

62C1

62D1

62E1

62F1

戂戒戢戲扂扒扢扲抂抒抢抲拂拒拢拲

2

6202

6212

6222

6232

6242

6252

6262

6272

6282

6292

62A2

62B2

62C2

62D2

62E2

62F2

戃戓戣戳扃打扣扳抃抓抣抳拃拓拣拳

3

6203

6213

6223

6233

6243

6253

6263

6273

6283

6293

62A3

62B3

62C3

62D3

62E3

62F3

戄戔戤戴扄扔扤扴抄抔护抴拄拔拤拴

4

6204

6214

6224

6234

6244

6254

6264

6274

6284

6294

62A4

62B4

62C4

62D4

62E4

62F4

戅戕戥戵扅払扥扵抅投报抵担拕拥拵

5

6205

6215

6225

6235

6245

6255

6265

6275

6285

6295

62A5

62B5

62C5

62D5

62E5

62F5

戆或戦戶扆扖扦扶抆抖抦抶拆拖拦拶

6

6206

6216

6226

6236

6246

6256

6266

6276

6286

6296

62A6

62B6

62C6

62D6

62E6

62F6

戇戗戧户扇扗执扷抇抗抧抷拇拗拧拷

7

6207

6217

6227

6237

6247

6257

6267

6277

6287

6297

62A7

62B7

62C7

62D7

62E7

62F7

戈战戨戸扈托扨扸抈折抨抸拈拘拨拸

8

6208

6218

6228

6238

6248

6258

6268

6278

6288

6298

62A8

62B8

62C8

62D8

62E8

62F8

戉戙戩戹扉扙扩批抉抙抩抹拉拙择拹

9

6209

6219

6229

6239

6249

6259

6269

6279

6289

6299

62A9

62B9

62C9

62D9

62E9

62F9

戊戚截戺扊扚扪扺把抚抪抺拊拚拪拺

A

620A

621A

622A

623A

624A

625A

626A

627A

628A

629A

62AA

62BA

62CA

62DA

62EA

62FA

戋戛戫戻手扛扫扻抋抛披抻拋招拫拻

B

620B

C

D

621B

622B

623B

624B

625B

626B

627B

628B

629B

62AB

62BB

62CB

62DB

62EB

62FB

戌戜戬戼扌扜扬扼抌抜抬押拌拜括拼 620C

621C

622C

623C

624C

625C

626C

627C

628C

629C

62AC

62BC

62CC

62DC

62EC

62FC

戍戝戭戽才扝扭扽抍抝抭抽拍拝拭拽 620D

621D

622D

623D

624D

625D

626D

627D

628D

629D

62AD

62BD

62CD

62DD

62ED

62FD

戎戞戮戾扎扞扮找抎択抮抾拎拞拮拾 620E

F

621

62FF

戀成戠戰所扐扠扰技抐抠抰拀拐拠拰

0

E

CJK Unified Ideographs

621E

622E

623E

624E

625E

626E

627E

628E

629E

62AE

62BE

62CE

62DE

62EE

62FE

戏戟戯房扏扟扯承抏抟抯抿拏拟拯拿 620F

336

621F

622F

623F

624F

625F

626F

627F

628F

629F

62AF

62BF

62CF

62DF

62EF

62FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

6300 630

6300

632

633

634

635

636

637

638

639

63A 63B 63C 63D 63E

63F

6310

6320

6330

6340

6350

6360

6370

6380

6390

63A0

63B0

63C0

63D0

63E0

63F0

持挑挡挱捁捑捡捱掁掑採掱揁揑握揱

1

6301

6311

6321

6331

6341

6351

6361

6371

6381

6391

63A1

63B1

63C1

63D1

63E1

63F1

挂挒挢挲捂捒换捲掂排探掲揂插揢揲

2

6302

6312

6322

6332

6342

6352

6362

6372

6382

6392

63A2

63B2

63C2

63D2

63E2

63F2

挃挓挣挳捃捓捣捳掃掓掣掳揃揓揣揳

3

6303

6313

6323

6333

6343

6353

6363

6373

6383

6393

63A3

63B3

63C3

63D3

63E3

63F3

挄挔挤挴捄捔捤捴掄掔掤掴揄揔揤援

4

6304

6314

6324

6334

6344

6354

6364

6374

6384

6394

63A4

63B4

63C4

63D4

63E4

63F4

挅挕挥挵捅捕捥捵掅掕接掵揅揕揥揵

5

6305

6315

6325

6335

6345

6355

6365

6375

6385

6395

63A5

63B5

63C5

63D5

63E5

63F5

挆挖挦挶捆捖捦捶掆掖掦掶揆揖揦揶

6

6306

6316

6326

6336

6346

6356

6366

6376

6386

6396

63A6

63B6

63C6

63D6

63E6

63F6

指挗挧挷捇捗捧捷掇掗控掷揇揗揧揷

7

6307

6317

6327

6337

6347

6357

6367

6377

6387

6397

63A7

63B7

63C7

63D7

63E7

63F7

挈挘挨挸捈捘捨捸授掘推掸揈揘揨揸

8

6308

6318

6328

6338

6348

6358

6368

6378

6388

6398

63A8

63B8

63C8

63D8

63E8

63F8

按挙挩挹捉捙捩捹掉掙掩掹揉揙揩揹

9

6309

6319

6329

6339

6349

6359

6369

6379

6389

6399

63A9

63B9

63C9

63D9

63E9

63F9

挊挚挪挺捊捚捪捺掊掚措掺揊揚揪揺

A

630A

631A

632A

633A

634A

635A

636A

637A

638A

639A

63AA

63BA

63CA

63DA

63EA

63FA

挋挛挫挻捋捛捫捻掋掛掫掻揋換揫揻

B

630B

C

D

631B

632B

633B

634B

635B

636B

637B

638B

639B

63AB

63BB

63CB

63DB

63EB

63FB

挌挜挬挼捌捜捬捼掌掜掬掼揌揜揬揼 630C

631C

632C

633C

634C

635C

636C

637C

638C

639C

63AC

63BC

63CC

63DC

63EC

63FC

挍挝挭挽捍捝捭捽掍掝掭掽揍揝揭揽 630D

631D

632D

633D

634D

635D

636D

637D

638D

639D

63AD

63BD

63CD

63DD

63ED

63FD

挎挞挮挾捎捞据捾掎掞掮掾揎揞揮揾 630E

F

631

63FF

挀挐挠挰捀捐捠捰掀掐掠掰揀提揠揰

0

E

CJK Unified Ideographs

631E

632E

633E

634E

635E

636E

637E

638E

639E

63AE

63BE

63CE

63DE

63EE

63FE

挏挟振挿捏损捯捿掏掟掯掿描揟揯揿 630F

631F

632F

633F

634F

635F

636F

637F

638F

639F

63AF

63BF

63CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

63DF

63EF

63FF

337

6400 640

6400

642

643

644

645

646

647

648

649

64A 64B 64C 64D 64E

64F

6410

6420

6430

6440

6450

6460

6470

6480

6490

64A0

64B0

64C0

64D0

64E0

64F0

搁搑搡搱摁摑摡摱撁撑撡撱擁擑擡擱

1

6401

6411

6421

6431

6441

6451

6461

6471

6481

6491

64A1

64B1

64C1

64D1

64E1

64F1

搂搒搢搲摂摒摢摲撂撒撢撲擂擒擢擲

2

6402

6412

6422

6432

6442

6452

6462

6472

6482

6492

64A2

64B2

64C2

64D2

64E2

64F2

搃搓搣搳摃摓摣摳撃撓撣撳擃擓擣擳

3

6403

6413

6423

6433

6443

6453

6463

6473

6483

6493

64A3

64B3

64C3

64D3

64E3

64F3

搄搔搤搴摄摔摤摴撄撔撤撴擄擔擤擴

4

6404

6414

6424

6434

6444

6454

6464

6474

6484

6494

64A4

64B4

64C4

64D4

64E4

64F4

搅搕搥搵摅摕摥摵撅撕撥撵擅擕擥擵

5

6405

6415

6425

6435

6445

6455

6465

6475

6485

6495

64A5

64B5

64C5

64D5

64E5

64F5

搆搖搦搶摆摖摦摶撆撖撦撶擆擖擦擶

6

6406

6416

6426

6436

6446

6456

6466

6476

6486

6496

64A6

64B6

64C6

64D6

64E6

64F6

搇搗搧搷摇摗摧摷撇撗撧撷擇擗擧擷

7

6407

6417

6427

6437

6447

6457

6467

6477

6487

6497

64A7

64B7

64C7

64D7

64E7

64F7

搈搘搨搸摈摘摨摸撈撘撨撸擈擘擨擸

8

6408

6418

6428

6438

6448

6458

6468

6478

6488

6498

64A8

64B8

64C8

64D8

64E8

64F8

搉搙搩搹摉摙摩摹撉撙撩撹擉擙擩擹

9

6409

6419

6429

6439

6449

6459

6469

6479

6489

6499

64A9

64B9

64C9

64D9

64E9

64F9

搊搚搪携摊摚摪摺撊撚撪撺擊據擪擺

A

640A

641A

642A

643A

644A

645A

646A

647A

648A

649A

64AA

64BA

64CA

64DA

64EA

64FA

搋搛搫搻摋摛摫摻撋撛撫撻擋擛擫擻

B

640B

C

D

641B

642B

643B

644B

645B

646B

647B

648B

649B

64AB

64BB

64CB

64DB

64EB

64FB

搌搜搬搼摌摜摬摼撌撜撬撼擌擜擬擼 640C

641C

642C

643C

644C

645C

646C

647C

648C

649C

64AC

64BC

64CC

64DC

64EC

64FC

損搝搭搽摍摝摭摽撍撝播撽操擝擭擽 640D

641D

642D

643D

644D

645D

646D

647D

648D

649D

64AD

64BD

64CD

64DD

64ED

64FD

搎搞搮搾摎摞摮摾撎撞撮撾擎擞擮擾 640E

F

641

64FF

搀搐搠搰摀摐摠摰撀撐撠撰擀擐擠擰

0

E

CJK Unified Ideographs

641E

642E

643E

644E

645E

646E

647E

648E

649E

64AE

64BE

64CE

64DE

64EE

64FE

搏搟搯搿摏摟摯摿撏撟撯撿擏擟擯擿 640F

338

641F

642F

643F

644F

645F

646F

647F

648F

649F

64AF

64BF

64CF

64DF

64EF

64FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

6500 650

6500

652

653

654

655

656

657

658

659

65A 65B 65C 65D 65E

65F

6510

6520

6530

6540

6550

6560

6570

6580

6590

65A0

65B0

65C0

65D0

65E0

65F0

攁攑攡攱敁救敡敱斁斑斡斱旁旑旡旱

1

6501

6511

6521

6531

6541

6551

6561

6571

6581

6591

65A1

65B1

65C1

65D1

65E1

65F1

攂攒攢攲敂敒敢敲斂斒斢斲旂旒既旲

2

6502

6512

6522

6532

6542

6552

6562

6572

6582

6592

65A2

65B2

65C2

65D2

65E2

65F2

攃攓攣攳敃敓散敳斃斓斣斳旃旓旣旳

3

6503

6513

6523

6533

6543

6553

6563

6573

6583

6593

65A3

65B3

65C3

65D3

65E3

65F3

攄攔攤攴敄敔敤整斄斔斤斴旄旔旤旴

4

6504

6514

6524

6534

6544

6554

6564

6574

6584

6594

65A4

65B4

65C4

65D4

65E4

65F4

攅攕攥攵故敕敥敵斅斕斥斵旅旕日旵

5

6505

6515

6525

6535

6545

6555

6565

6575

6585

6595

65A5

65B5

65C5

65D5

65E5

65F5

攆攖攦收敆敖敦敶斆斖斦斶旆旖旦时

6

6506

6516

6526

6536

6546

6556

6566

6576

6586

6596

65A6

65B6

65C6

65D6

65E6

65F6

攇攗攧攷敇敗敧敷文斗斧斷旇旗旧旷

7

6507

6517

6527

6537

6547

6557

6567

6577

6587

6597

65A7

65B7

65C7

65D7

65E7

65F7

攈攘攨攸效敘敨數斈斘斨斸旈旘旨旸

8

6508

6518

6528

6538

6548

6558

6568

6578

6588

6598

65A8

65B8

65C8

65D8

65E8

65F8

攉攙攩改敉教敩敹斉料斩方旉旙早旹

9

6509

6519

6529

6539

6549

6559

6569

6579

6589

6599

65A9

65B9

65C9

65D9

65E9

65F9

攊攚攪攺敊敚敪敺斊斚斪斺旊旚旪旺

A

650A

651A

652A

653A

654A

655A

656A

657A

658A

659A

65AA

65BA

65CA

65DA

65EA

65FA

攋攛攫攻敋敛敫敻斋斛斫斻旋旛旫旻

B

650B

C

D

651B

652B

653B

654B

655B

656B

657B

658B

659B

65AB

65BB

65CB

65DB

65EB

65FB

攌攜攬攼敌敜敬敼斌斜斬於旌旜旬旼 650C

651C

652C

653C

654C

655C

656C

657C

658C

659C

65AC

65BC

65CC

65DC

65EC

65FC

攍攝攭攽敍敝敭敽斍斝断施旍旝旭旽 650D

651D

652D

653D

654D

655D

656D

657D

658D

659D

65AD

65BD

65CD

65DD

65ED

65FD

攎攞攮放敎敞敮敾斎斞斮斾旎旞旮旾 650E

F

651

65FF

攀攐攠攰敀敐敠数斀斐斠新旀旐无旰

0

E

CJK Unified Ideographs

651E

652E

653E

654E

655E

656E

657E

658E

659E

65AE

65BE

65CE

65DE

65EE

65FE

攏攟支政敏敟敯敿斏斟斯斿族旟旯旿 650F

651F

652F

653F

654F

655F

656F

657F

658F

659F

65AF

65BF

65CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

65DF

65EF

65FF

339

6600 660

6600

662

663

664

665

666

667

668

669

66A 66B 66C 66D 66E

66F

6610

6620

6630

6640

6650

6660

6670

6680

6690

66A0

66B0

66C0

66D0

66E0

66F0

昁昑昡昱晁晑晡晱暁暑暡暱曁曑曡曱

1

6601

6611

6621

6631

6641

6651

6661

6671

6681

6691

66A1

66B1

66C1

66D1

66E1

66F1

昂昒昢昲時晒晢晲暂暒暢暲曂曒曢曲

2

6602

6612

6622

6632

6642

6652

6662

6672

6682

6692

66A2

66B2

66C2

66D2

66E2

66F2

昃易昣昳晃晓晣晳暃暓暣暳曃曓曣曳

3

6603

6613

6623

6633

6643

6653

6663

6673

6683

6693

66A3

66B3

66C3

66D3

66E3

66F3

昄昔昤昴晄晔晤晴暄暔暤暴曄曔曤更

4

6604

6614

6624

6634

6644

6654

6664

6674

6684

6694

66A4

66B4

66C4

66D4

66E4

66F4

昅昕春昵晅晕晥晵暅暕暥暵曅曕曥曵

5

6605

6615

6625

6635

6645

6655

6665

6675

6685

6695

66A5

66B5

66C5

66D5

66E5

66F5

昆昖昦昶晆晖晦晶暆暖暦暶曆曖曦曶

6

6606

6616

6626

6636

6646

6656

6666

6676

6686

6696

66A6

66B6

66C6

66D6

66E6

66F6

昇昗昧昷晇晗晧晷暇暗暧暷曇曗曧曷

7

6607

6617

6627

6637

6647

6657

6667

6677

6687

6697

66A7

66B7

66C7

66D7

66E7

66F7

昈昘昨昸晈晘晨晸暈暘暨暸曈曘曨書

8

6608

6618

6628

6638

6648

6658

6668

6678

6688

6698

66A8

66B8

66C8

66D8

66E8

66F8

昉昙昩昹晉晙晩晹暉暙暩暹曉曙曩曹

9

6609

6619

6629

6639

6649

6659

6669

6679

6689

6699

66A9

66B9

66C9

66D9

66E9

66F9

昊昚昪昺晊晚晪智暊暚暪暺曊曚曪曺

A

660A

661A

662A

663A

664A

665A

666A

667A

668A

669A

66AA

66BA

66CA

66DA

66EA

66FA

昋昛昫昻晋晛晫晻暋暛暫暻曋曛曫曻

B

660B

C

D

661B

662B

663B

664B

665B

666B

667B

668B

669B

66AB

66BB

66CB

66DB

66EB

66FB

昌昜昬昼晌晜晬晼暌暜暬暼曌曜曬曼 660C

661C

662C

663C

664C

665C

666C

667C

668C

669C

66AC

66BC

66CC

66DC

66EC

66FC

昍昝昭昽晍晝晭晽暍暝暭暽曍曝曭曽 660D

661D

662D

663D

664D

665D

666D

667D

668D

669D

66AD

66BD

66CD

66DD

66ED

66FD

明昞昮显晎晞普晾暎暞暮暾曎曞曮曾 660E

F

661

66FF

昀昐映昰晀晐晠晰暀暐暠暰曀曐曠曰

0

E

CJK Unified Ideographs

661E

662E

663E

664E

665E

666E

667E

668E

669E

66AE

66BE

66CE

66DE

66EE

66FE

昏星是昿晏晟景晿暏暟暯暿曏曟曯替 660F

340

661F

662F

663F

664F

665F

666F

667F

668F

669F

66AF

66BF

66CF

66DF

66EF

66FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

6700 670

6700

672

673

674

675

676

677

678

679

67A 67B 67C 67D 67E

67F

6710

6720

6730

6740

6750

6760

6770

6780

6790

67A0

67B0

67C0

67D0

67E0

67F0

朁朑朡朱杁村条東极枑枡枱柁柑柡柱

1

6701

6711

6721

6731

6741

6751

6761

6771

6781

6791

67A1

67B1

67C1

67D1

67E1

67F1

朂朒朢朲杂杒杢杲枂枒枢枲柂柒柢柲

2

6702

6712

6722

6732

6742

6752

6762

6772

6782

6792

67A2

67B2

67C2

67D2

67E2

67F2

會朓朣朳权杓杣杳枃枓枣枳柃染柣柳

3

6703

6713

6723

6733

6743

6753

6763

6773

6783

6793

67A3

67B3

67C3

67D3

67E3

67F3

朄朔朤朴杄杔杤杴构枔枤枴柄柔柤柴

4

6704

6714

6724

6734

6744

6754

6764

6774

6784

6794

67A4

67B4

67C4

67D4

67E4

67F4

朅朕朥朵杅杕来杵枅枕枥枵柅柕查柵

5

6705

6715

6725

6735

6745

6755

6765

6775

6785

6795

67A5

67B5

67C5

67D5

67E5

67F5

朆朖朦朶杆杖杦杶枆枖枦架柆柖柦柶

6

6706

6716

6726

6736

6746

6756

6766

6776

6786

6796

67A6

67B6

67C6

67D6

67E6

67F6

朇朗朧朷杇杗杧杷枇林枧枷柇柗柧柷

7

6707

6717

6727

6737

6747

6757

6767

6777

6787

6797

67A7

67B7

67C7

67D7

67E7

67F7

月朘木朸杈杘杨杸枈枘枨枸柈柘柨柸

8

6708

6718

6728

6738

6748

6758

6768

6778

6788

6798

67A8

67B8

67C8

67D8

67E8

67F8

有朙朩朹杉杙杩杹枉枙枩枹柉柙柩柹

9

6709

6719

6729

6739

6749

6759

6769

6779

6789

6799

67A9

67B9

67C9

67D9

67E9

67F9

朊朚未机杊杚杪杺枊枚枪枺柊柚柪柺

A

670A

671A

672A

673A

674A

675A

676A

677A

678A

679A

67AA

67BA

67CA

67DA

67EA

67FA

朋望末朻杋杛杫杻枋枛枫枻柋柛柫査

B

670B

C

D

671B

672B

673B

674B

675B

676B

677B

678B

679B

67AB

67BB

67CB

67DB

67EB

67FB

朌朜本朼杌杜杬杼枌果枬枼柌柜柬柼 670C

671C

672C

673C

674C

675C

676C

677C

678C

679C

67AC

67BC

67CC

67DC

67EC

67FC

服朝札朽杍杝杭杽枍枝枭枽柍柝柭柽 670D

671D

672D

673D

674D

675D

676D

677D

678D

679D

67AD

67BD

67CD

67DD

67ED

67FD

朎朞朮朾李杞杮松枎枞枮枾柎柞柮柾 670E

F

671

67FF

最朐朠朰杀材杠杰枀析枠枰柀某柠柰

0

E

CJK Unified Ideographs

671E

672E

673E

674E

675E

676E

677E

678E

679E

67AE

67BE

67CE

67DE

67EE

67FE

朏期术朿杏束杯板枏枟枯枿柏柟柯柿 670F

671F

672F

673F

674F

675F

676F

677F

678F

679F

67AF

67BF

67CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

67DF

67EF

67FF

341

6800 680

6800

682

683

684

685

686

687

688

689

68A 68B 68C 68D 68E

68F

6810

6820

6830

6840

6850

6860

6870

6880

6890

68A0

68B0

68C0

68D0

68E0

68F0

栁树校栱桁桑桡桱梁梑梡梱棁棑棡棱

1

6801

6811

6821

6831

6841

6851

6861

6871

6881

6891

68A1

68B1

68C1

68D1

68E1

68F1

栂栒栢栲桂桒桢桲梂梒梢梲棂棒棢棲

2

6802

6812

6822

6832

6842

6852

6862

6872

6882

6892

68A2

68B2

68C2

68D2

68E2

68F2

栃栓栣栳桃桓档桳梃梓梣梳棃棓棣棳

3

6803

6813

6823

6833

6843

6853

6863

6873

6883

6893

68A3

68B3

68C3

68D3

68E3

68F3

栄栔栤栴桄桔桤桴梄梔梤梴棄棔棤棴

4

6804

6814

6824

6834

6844

6854

6864

6874

6884

6894

68A4

68B4

68C4

68D4

68E4

68F4

栅栕栥栵桅桕桥桵梅梕梥梵棅棕棥棵

5

6805

6815

6825

6835

6845

6855

6865

6875

6885

6895

68A5

68B5

68C5

68D5

68E5

68F5

栆栖栦栶框桖桦桶梆梖梦梶棆棖棦棶

6

6806

6816

6826

6836

6846

6856

6866

6876

6886

6896

68A6

68B6

68C6

68D6

68E6

68F6

标栗栧样桇桗桧桷梇梗梧梷棇棗棧棷

7

6807

6817

6827

6837

6847

6857

6867

6877

6887

6897

68A7

68B7

68C7

68D7

68E7

68F7

栈栘栨核案桘桨桸梈梘梨梸棈棘棨棸

8

6808

6818

6828

6838

6848

6858

6868

6878

6888

6898

68A8

68B8

68C8

68D8

68E8

68F8

栉栙栩根桉桙桩桹梉梙梩梹棉棙棩棹

9

6809

6819

6829

6839

6849

6859

6869

6879

6889

6899

68A9

68B9

68C9

68D9

68E9

68F9

栊栚株栺桊桚桪桺梊梚梪梺棊棚棪棺

A

680A

681A

682A

683A

684A

685A

686A

687A

688A

689A

68AA

68BA

68CA

68DA

68EA

68FA

栋栛栫栻桋桛桫桻梋梛梫梻棋棛棫棻

B

680B

C

D

681B

682B

683B

684B

685B

686B

687B

688B

689B

68AB

68BB

68CB

68DB

68EB

68FB

栌栜栬格桌桜桬桼梌梜梬梼棌棜棬棼 680C

681C

682C

683C

684C

685C

686C

687C

688C

689C

68AC

68BC

68CC

68DC

68EC

68FC

栍栝栭栽桍桝桭桽梍條梭梽棍棝棭棽 680D

681D

682D

683D

684D

685D

686D

687D

688D

689D

68AD

68BD

68CD

68DD

68ED

68FD

栎栞栮栾桎桞桮桾梎梞梮梾棎棞森棾 680E

F

681

68FF

栀栐栠栰桀桐桠桰梀梐梠械检棐棠棰

0

E

CJK Unified Ideographs

681E

682E

683E

684E

685E

686E

687E

688E

689E

68AE

68BE

68CE

68DE

68EE

68FE

栏栟栯栿桏桟桯桿梏梟梯梿棏棟棯棿 680F

342

681F

682F

683F

684F

685F

686F

687F

688F

689F

68AF

68BF

68CF

68DF

68EF

68FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

6900 690

6900

692

693

694

695

696

697

698

699

69A 69B 69C 69D 69E

69F

6910

6920

6930

6940

6950

6960

6970

6980

6990

69A0

69B0

69C0

69D0

69E0

69F0

椁椑椡椱楁楑楡楱榁榑榡榱槁槑槡槱

1

6901

6911

6921

6931

6941

6951

6961

6971

6981

6991

69A1

69B1

69C1

69D1

69E1

69F1

椂椒椢椲楂楒楢楲概榒榢榲槂槒槢槲

2

6902

6912

6922

6932

6942

6952

6962

6972

6982

6992

69A2

69B2

69C2

69D2

69E2

69F2

椃椓椣椳楃楓楣楳榃榓榣榳槃槓槣槳

3

6903

6913

6923

6933

6943

6953

6963

6973

6983

6993

69A3

69B3

69C3

69D3

69E3

69F3

椄椔椤椴楄楔楤楴榄榔榤榴槄槔槤槴

4

6904

6914

6924

6934

6944

6954

6964

6974

6984

6994

69A4

69B4

69C4

69D4

69E4

69F4

椅椕椥椵楅楕楥極榅榕榥榵槅槕槥槵

5

6905

6915

6925

6935

6945

6955

6965

6975

6985

6995

69A5

69B5

69C5

69D5

69E5

69F5

椆椖椦椶楆楖楦楶榆榖榦榶槆槖槦槶

6

6906

6916

6926

6936

6946

6956

6966

6976

6986

6996

69A6

69B6

69C6

69D6

69E6

69F6

椇椗椧椷楇楗楧楷榇榗榧榷槇槗槧槷

7

6907

6917

6927

6937

6947

6957

6967

6977

6987

6997

69A7

69B7

69C7

69D7

69E7

69F7

椈椘椨椸楈楘楨楸榈榘榨榸槈様槨槸

8

6908

6918

6928

6938

6948

6958

6968

6978

6988

6998

69A8

69B8

69C8

69D8

69E8

69F8

椉椙椩椹楉楙楩楹榉榙榩榹槉槙槩槹

9

6909

6919

6929

6939

6949

6959

6969

6979

6989

6999

69A9

69B9

69C9

69D9

69E9

69F9

椊椚椪椺楊楚楪楺榊榚榪榺槊槚槪槺

A

690A

691A

692A

693A

694A

695A

696A

697A

698A

699A

69AA

69BA

69CA

69DA

69EA

69FA

椋椛椫椻楋楛楫楻榋榛榫榻構槛槫槻

B

690B

C

D

691B

692B

693B

694B

695B

696B

697B

698B

699B

69AB

69BB

69CB

69DB

69EB

69FB

椌検椬椼楌楜楬楼榌榜榬榼槌槜槬槼 690C

691C

692C

693C

694C

695C

696C

697C

698C

699C

69AC

69BC

69CC

69DC

69EC

69FC

植椝椭椽楍楝業楽榍榝榭榽槍槝槭槽 690D

691D

692D

693D

694D

695D

696D

697D

698D

699D

69AD

69BD

69CD

69DD

69ED

69FD

椎椞椮椾楎楞楮楾榎榞榮榾槎槞槮槾 690E

F

691

69FF

椀椐椠椰楀楐楠楰榀榐榠榰槀槐槠槰

0

E

CJK Unified Ideographs

691E

692E

693E

694E

695E

696E

697E

698E

699E

69AE

69BE

69CE

69DE

69EE

69FE

椏椟椯椿楏楟楯楿榏榟榯榿槏槟槯槿 690F

691F

692F

693F

694F

695F

696F

697F

698F

699F

69AF

69BF

69CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

69DF

69EF

69FF

343

6A00

CJK Unified Ideographs

6AFF

6A0 6A1 6A2 6A3 6A4 6A5 6A6 6A7 6A8 6A9 6AA 6AB 6AC 6AD 6AE 6AF

樀樐樠樰橀橐橠橰檀檐檠檰櫀櫐櫠櫰

0

6A00

6A01

6A30

6A40

6A50

6A60

6A70

6A80

6A90

6AA0

6AB0

6AC0

6AD0

6AE0

6AF0

6A11

6A21

6A31

6A41

6A51

6A61

6A71

6A81

6A91

6AA1

6AB1

6AC1

6AD1

6AE1

6AF1

樂樒樢樲橂橒橢橲檂檒檢檲櫂櫒櫢櫲

2

6A02

6A12

6A22

6A32

6A42

6A52

6A62

6A72

6A82

6A92

6AA2

6AB2

6AC2

6AD2

6AE2

6AF2

樃樓樣樳橃橓橣橳檃檓檣檳櫃櫓櫣櫳

3

6A03

6A13

6A23

6A33

6A43

6A53

6A63

6A73

6A83

6A93

6AA3

6AB3

6AC3

6AD3

6AE3

6AF3

樄樔樤樴橄橔橤橴檄檔檤檴櫄櫔櫤櫴

4

6A04

6A14

6A24

6A34

6A44

6A54

6A64

6A74

6A84

6A94

6AA4

6AB4

6AC4

6AD4

6AE4

6AF4

樅樕樥樵橅橕橥橵檅檕檥檵櫅櫕櫥櫵

5

6A05

6A15

6A25

6A35

6A45

6A55

6A65

6A75

6A85

6A95

6AA5

6AB5

6AC5

6AD5

6AE5

6AF5

樆樖樦樶橆橖橦橶檆檖檦檶櫆櫖櫦櫶

6

6A06

6A16

6A26

6A36

6A46

6A56

6A66

6A76

6A86

6A96

6AA6

6AB6

6AC6

6AD6

6AE6

6AF6

樇樗樧樷橇橗橧橷檇檗檧檷櫇櫗櫧櫷

7

6A07

6A17

6A27

6A37

6A47

6A57

6A67

6A77

6A87

6A97

6AA7

6AB7

6AC7

6AD7

6AE7

6AF7

樈樘樨樸橈橘橨橸檈檘檨檸櫈櫘櫨櫸

8

6A08

6A18

6A28

6A38

6A48

6A58

6A68

6A78

6A88

6A98

6AA8

6AB8

6AC8

6AD8

6AE8

6AF8

樉標権樹橉橙橩橹檉檙檩檹櫉櫙櫩櫹

9

6A09

6A19

6A29

6A39

6A49

6A59

6A69

6A79

6A89

6A99

6AA9

6AB9

6AC9

6AD9

6AE9

6AF9

樊樚横樺橊橚橪橺檊檚檪檺櫊櫚櫪櫺

A

6A0A

6A1A

6A2A

6A3A

6A4A

6A5A

6A6A

6A7A

6A8A

6A9A

6AAA

6ABA

6ACA

6ADA

6AEA

6AFA

樋樛樫樻橋橛橫橻檋檛檫檻櫋櫛櫫櫻

B

6A0B

C

D

6A1B

6A2B

6A3B

6A4B

6A5B

6A6B

6A7B

6A8B

6A9B

6AAB

6ABB

6ACB

6ADB

6AEB

6AFB

樌樜樬樼橌橜橬橼檌檜檬檼櫌櫜櫬櫼 6A0C

6A1C

6A2C

6A3C

6A4C

6A5C

6A6C

6A7C

6A8C

6A9C

6AAC

6ABC

6ACC

6ADC

6AEC

6AFC

樍樝樭樽橍橝橭橽檍檝檭檽櫍櫝櫭櫽 6A0D

6A1D

6A2D

6A3D

6A4D

6A5D

6A6D

6A7D

6A8D

6A9D

6AAD

6ABD

6ACD

6ADD

6AED

6AFD

樎樞樮樾橎橞橮橾檎檞檮檾櫎櫞櫮櫾 6A0E

F

6A20

樁樑模樱橁橑橡橱檁檑檡檱櫁櫑櫡櫱

1

E

6A10

6A1E

6A2E

6A3E

6A4E

6A5E

6A6E

6A7E

6A8E

6A9E

6AAE

6ABE

6ACE

6ADE

6AEE

6AFE

樏樟樯樿橏機橯橿檏檟檯檿櫏櫟櫯櫿 6A0F

344

6A1F

6A2F

6A3F

6A4F

6A5F

6A6F

6A7F

6A8F

6A9F

6AAF

6ABF

6ACF

6ADF

6AEF

6AFF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

6B00

CJK Unified Ideographs

6BFF

6B0 6B1 6B2 6B3 6B4 6B5 6B6 6B7 6B8 6B9 6BA 6BB 6BC 6BD 6BE 6BF

欀欐欠欰歀歐歠歰殀殐殠殰毀毐毠毰

0

6B00

6B01

6B30

6B40

6B50

6B60

6B70

6B80

6B90

6BA0

6BB0

6BC0

6BD0

6BE0

6BF0

6B11

6B21

6B31

6B41

6B51

6B61

6B71

6B81

6B91

6BA1

6BB1

6BC1

6BD1

6BE1

6BF1

欂欒欢欲歂歒止歲殂殒殢殲毂毒毢毲

2

6B02

6B12

6B22

6B32

6B42

6B52

6B62

6B72

6B82

6B92

6BA2

6BB2

6BC2

6BD2

6BE2

6BF2

欃欓欣欳歃歓正歳殃殓殣殳毃毓毣毳

3

6B03

6B13

6B23

6B33

6B43

6B53

6B63

6B73

6B83

6B93

6BA3

6BB3

6BC3

6BD3

6BE3

6BF3

欄欔欤欴歄歔此歴殄殔殤殴毄比毤毴

4

6B04

6B14

6B24

6B34

6B44

6B54

6B64

6B74

6B84

6B94

6BA4

6BB4

6BC4

6BD4

6BE4

6BF4

欅欕欥欵歅歕步歵殅殕殥段毅毕毥毵

5

6B05

6B15

6B25

6B35

6B45

6B55

6B65

6B75

6B85

6B95

6BA5

6BB5

6BC5

6BD5

6BE5

6BF5

欆欖欦欶歆歖武歶殆殖殦殶毆毖毦毶

6

6B06

6B16

6B26

6B36

6B46

6B56

6B66

6B76

6B86

6B96

6BA6

6BB6

6BC6

6BD6

6BE6

6BF6

欇欗欧欷歇歗歧歷殇殗殧殷毇毗毧毷

7

6B07

6B17

6B27

6B37

6B47

6B57

6B67

6B77

6B87

6B97

6BA7

6BB7

6BC7

6BD7

6BE7

6BF7

欈欘欨欸歈歘歨歸殈殘殨殸毈毘毨毸

8

6B08

6B18

6B28

6B38

6B48

6B58

6B68

6B78

6B88

6B98

6BA8

6BB8

6BC8

6BD8

6BE8

6BF8

欉欙欩欹歉歙歩歹殉殙殩殹毉毙毩毹

9

6B09

6B19

6B29

6B39

6B49

6B59

6B69

6B79

6B89

6B99

6BA9

6BB9

6BC9

6BD9

6BE9

6BF9

權欚欪欺歊歚歪歺殊殚殪殺毊毚毪毺

A

6B0A

6B1A

6B2A

6B3A

6B4A

6B5A

6B6A

6B7A

6B8A

6B9A

6BAA

6BBA

6BCA

6BDA

6BEA

6BFA

欋欛欫欻歋歛歫死残殛殫殻毋毛毫毻

B

6B0B

C

D

6B1B

6B2B

6B3B

6B4B

6B5B

6B6B

6B7B

6B8B

6B9B

6BAB

6BBB

6BCB

6BDB

6BEB

6BFB

欌欜欬欼歌歜歬歼殌殜殬殼毌毜毬毼 6B0C

6B1C

6B2C

6B3C

6B4C

6B5C

6B6C

6B7C

6B8C

6B9C

6BAC

6BBC

6BCC

6BDC

6BEC

6BFC

欍欝欭欽歍歝歭歽殍殝殭殽母毝毭毽 6B0D

6B1D

6B2D

6B3D

6B4D

6B5D

6B6D

6B7D

6B8D

6B9D

6BAD

6BBD

6BCD

6BDD

6BED

6BFD

欎欞欮款歎歞歮歾殎殞殮殾毎毞毮毾 6B0E

F

6B20

欁欑次欱歁歑歡歱殁殑殡殱毁毑毡毱

1

E

6B10

6B1E

6B2E

6B3E

6B4E

6B5E

6B6E

6B7E

6B8E

6B9E

6BAE

6BBE

6BCE

6BDE

6BEE

6BFE

欏欟欯欿歏歟歯歿殏殟殯殿每毟毯毿 6B0F

6B1F

6B2F

6B3F

6B4F

6B5F

6B6F

6B7F

6B8F

6B9F

6BAF

6BBF

6BCF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

6BDF

6BEF

6BFF

345

6C00

CJK Unified Ideographs

6CFF

6C0 6C1 6C2 6C3 6C4 6C5 6C6 6C7 6C8 6C9 6CA 6CB 6CC 6CD 6CE 6CF

氀氐氠氰汀汐池汰沀沐沠沰泀泐泠泰

0

6C00

6C01

6C30

6C40

6C50

6C60

6C70

6C80

6C90

6CA0

6CB0

6CC0

6CD0

6CE0

6CF0

6C11

6C21

6C31

6C41

6C51

6C61

6C71

6C81

6C91

6CA1

6CB1

6CC1

6CD1

6CE1

6CF1

氂氒氢氲求汒汢汲沂沒沢沲泂泒波泲

2

6C02

6C12

6C22

6C32

6C42

6C52

6C62

6C72

6C82

6C92

6CA2

6CB2

6CC2

6CD2

6CE2

6CF2

氃氓氣氳汃汓汣汳沃沓沣河泃泓泣泳

3

6C03

6C13

6C23

6C33

6C43

6C53

6C63

6C73

6C83

6C93

6CA3

6CB3

6CC3

6CD3

6CE3

6CF3

氄气氤水汄汔汤汴沄沔沤沴泄泔泤泴

4

6C04

6C14

6C24

6C34

6C44

6C54

6C64

6C74

6C84

6C94

6CA4

6CB4

6CC4

6CD4

6CE4

6CF4

氅氕氥氵汅汕汥汵沅沕沥沵泅法泥泵

5

6C05

6C15

6C25

6C35

6C45

6C55

6C65

6C75

6C85

6C95

6CA5

6CB5

6CC5

6CD5

6CE5

6CF5

氆氖氦氶汆汖汦汶沆沖沦沶泆泖泦泶

6

6C06

6C16

6C26

6C36

6C46

6C56

6C66

6C76

6C86

6C96

6CA6

6CB6

6CC6

6CD6

6CE6

6CF6

氇気氧氷汇汗汧汷沇沗沧沷泇泗泧泷

7

6C07

6C17

6C27

6C37

6C47

6C57

6C67

6C77

6C87

6C97

6CA7

6CB7

6CC7

6CD7

6CE7

6CF7

氈氘氨永汈汘汨汸沈沘沨沸泈泘注泸

8

6C08

6C18

6C28

6C38

6C48

6C58

6C68

6C78

6C88

6C98

6CA8

6CB8

6CC8

6CD8

6CE8

6CF8

氉氙氩氹汉汙汩汹沉沙沩油泉泙泩泹

9

6C09

6C19

6C29

6C39

6C49

6C59

6C69

6C79

6C89

6C99

6CA9

6CB9

6CC9

6CD9

6CE9

6CF9

氊氚氪氺汊汚汪決沊沚沪沺泊泚泪泺

A

6C0A

6C1A

6C2A

6C3A

6C4A

6C5A

6C6A

6C7A

6C8A

6C9A

6CAA

6CBA

6CCA

6CDA

6CEA

6CFA

氋氛氫氻汋汛汫汻沋沛沫治泋泛泫泻

B

6C0B

C

D

6C1B

6C2B

6C3B

6C4B

6C5B

6C6B

6C7B

6C8B

6C9B

6CAB

6CBB

6CCB

6CDB

6CEB

6CFB

氌氜氬氼汌汜汬汼沌沜沬沼泌泜泬泼 6C0C

6C1C

6C2C

6C3C

6C4C

6C5C

6C6C

6C7C

6C8C

6C9C

6CAC

6CBC

6CCC

6CDC

6CEC

6CFC

氍氝氭氽汍汝汭汽沍沝沭沽泍泝泭泽 6C0D

6C1D

6C2D

6C3D

6C4D

6C5D

6C6D

6C7D

6C8D

6C9D

6CAD

6CBD

6CCD

6CDD

6CED

6CFD

氎氞氮氾汎汞汮汾沎沞沮沾泎泞泮泾 6C0E

F

6C20

氁民氡氱汁汑污汱沁沑没沱況泑泡泱

1

E

6C10

6C1E

6C2E

6C3E

6C4E

6C5E

6C6E

6C7E

6C8E

6C9E

6CAE

6CBE

6CCE

6CDE

6CEE

6CFE

氏氟氯氿汏江汯汿沏沟沯沿泏泟泯泿 6C0F

346

6C1F

6C2F

6C3F

6C4F

6C5F

6C6F

6C7F

6C8F

6C9F

6CAF

6CBF

6CCF

6CDF

6CEF

6CFF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

6D00

CJK Unified Ideographs

6DFF

6D0 6D1 6D2 6D3 6D4 6D5 6D6 6D7 6D8 6D9 6DA 6DB 6DC 6DD 6DE 6DF

洀洐洠洰浀浐浠浰涀涐涠涰淀淐淠淰

0

6D00

6D01

6D30

6D40

6D50

6D60

6D70

6D80

6D90

6DA0

6DB0

6DC0

6DD0

6DE0

6DF0

6D11

6D21

6D31

6D41

6D51

6D61

6D71

6D81

6D91

6DA1

6DB1

6DC1

6DD1

6DE1

6DF1

洂洒洢洲浂浒浢浲涂涒涢液淂淒淢淲

2

6D02

6D12

6D22

6D32

6D42

6D52

6D62

6D72

6D82

6D92

6DA2

6DB2

6DC2

6DD2

6DE2

6DF2

洃洓洣洳浃浓浣浳涃涓涣涳淃淓淣淳

3

6D03

6D13

6D23

6D33

6D43

6D53

6D63

6D73

6D83

6D93

6DA3

6DB3

6DC3

6DD3

6DE3

6DF3

洄洔洤洴浄浔浤浴涄涔涤涴淄淔淤淴

4

6D04

6D14

6D24

6D34

6D44

6D54

6D64

6D74

6D84

6D94

6DA4

6DB4

6DC4

6DD4

6DE4

6DF4

洅洕津洵浅浕浥浵涅涕涥涵淅淕淥淵

5

6D05

6D15

6D25

6D35

6D45

6D55

6D65

6D75

6D85

6D95

6DA5

6DB5

6DC5

6DD5

6DE5

6DF5

洆洖洦洶浆浖浦浶涆涖润涶淆淖淦淶

6

6D06

6D16

6D26

6D36

6D46

6D56

6D66

6D76

6D86

6D96

6DA6

6DB6

6DC6

6DD6

6DE6

6DF6

洇洗洧洷浇浗浧海涇涗涧涷淇淗淧混

7

6D07

6D17

6D27

6D37

6D47

6D57

6D67

6D77

6D87

6D97

6DA7

6DB7

6DC7

6DD7

6DE7

6DF7

洈洘洨洸浈浘浨浸消涘涨涸淈淘淨淸

8

6D08

6D18

6D28

6D38

6D48

6D58

6D68

6D78

6D88

6D98

6DA8

6DB8

6DC8

6DD8

6DE8

6DF8

洉洙洩洹浉浙浩浹涉涙涩涹淉淙淩淹

9

6D09

6D19

6D29

6D39

6D49

6D59

6D69

6D79

6D89

6D99

6DA9

6DB9

6DC9

6DD9

6DE9

6DF9

洊洚洪洺浊浚浪浺涊涚涪涺淊淚淪淺

A

6D0A

6D1A

6D2A

6D3A

6D4A

6D5A

6D6A

6D7A

6D8A

6D9A

6DAA

6DBA

6DCA

6DDA

6DEA

6DFA

洋洛洫活测浛浫浻涋涛涫涻淋淛淫添

B

6D0B

C

D

6D1B

6D2B

6D3B

6D4B

6D5B

6D6B

6D7B

6D8B

6D9B

6DAB

6DBB

6DCB

6DDB

6DEB

6DFB

洌洜洬洼浌浜浬浼涌涜涬涼淌淜淬淼 6D0C

6D1C

6D2C

6D3C

6D4C

6D5C

6D6C

6D7C

6D8C

6D9C

6DAC

6DBC

6DCC

6DDC

6DEC

6DFC

洍洝洭洽浍浝浭浽涍涝涭涽淍淝淭淽 6D0D

6D1D

6D2D

6D3D

6D4D

6D5D

6D6D

6D7D

6D8D

6D9D

6DAD

6DBD

6DCD

6DDD

6DED

6DFD

洎洞洮派济浞浮浾涎涞涮涾淎淞淮淾 6D0E

F

6D20

洁洑洡洱流浑浡浱涁涑涡涱淁淑淡深

1

E

6D10

6D1E

6D2E

6D3E

6D4E

6D5E

6D6E

6D7E

6D8E

6D9E

6DAE

6DBE

6DCE

6DDE

6DEE

6DFE

洏洟洯洿浏浟浯浿涏涟涯涿淏淟淯淿 6D0F

6D1F

6D2F

6D3F

6D4F

6D5F

6D6F

6D7F

6D8F

6D9F

6DAF

6DBF

6DCF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

6DDF

6DEF

6DFF

347

6E00

CJK Unified Ideographs

6EFF

6E0 6E1 6E2 6E3 6E4 6E5 6E6 6E7 6E8 6E9 6EA 6EB 6EC 6ED 6EE 6EF

渀渐渠渰湀湐湠湰満源溠溰滀滐滠滰

0

6E00

6E01

6E30

6E40

6E50

6E60

6E70

6E80

6E90

6EA0

6EB0

6EC0

6ED0

6EE0

6EF0

6E11

6E21

6E31

6E41

6E51

6E61

6E71

6E81

6E91

6EA1

6EB1

6EC1

6ED1

6EE1

6EF1

渂渒渢渲湂湒湢湲溂溒溢溲滂滒滢滲

2

6E02

6E12

6E22

6E32

6E42

6E52

6E62

6E72

6E82

6E92

6EA2

6EB2

6EC2

6ED2

6EE2

6EF2

渃渓渣渳湃湓湣湳溃溓溣溳滃滓滣滳

3

6E03

6E13

6E23

6E33

6E43

6E53

6E63

6E73

6E83

6E93

6EA3

6EB3

6EC3

6ED3

6EE3

6EF3

渄渔渤渴湄湔湤湴溄溔溤溴滄滔滤滴

4

6E04

6E14

6E24

6E34

6E44

6E54

6E64

6E74

6E84

6E94

6EA4

6EB4

6EC4

6ED4

6EE4

6EF4

清渕渥渵湅湕湥湵溅溕溥溵滅滕滥滵

5

6E05

6E15

6E25

6E35

6E45

6E55

6E65

6E75

6E85

6E95

6EA5

6EB5

6EC5

6ED5

6EE5

6EF5

渆渖渦渶湆湖湦湶溆準溦溶滆滖滦滶

6

6E06

6E16

6E26

6E36

6E46

6E56

6E66

6E76

6E86

6E96

6EA6

6EB6

6EC6

6ED6

6EE6

6EF6

渇渗渧渷湇湗湧湷溇溗溧溷滇滗滧滷

7

6E07

6E17

6E27

6E37

6E47

6E57

6E67

6E77

6E87

6E97

6EA7

6EB7

6EC7

6ED7

6EE7

6EF7

済渘渨游湈湘湨湸溈溘溨溸滈滘滨滸

8

6E08

6E18

6E28

6E38

6E48

6E58

6E68

6E78

6E88

6E98

6EA8

6EB8

6EC8

6ED8

6EE8

6EF8

渉渙温渹湉湙湩湹溉溙溩溹滉滙滩滹

9

6E09

6E19

6E29

6E39

6E49

6E59

6E69

6E79

6E89

6E99

6EA9

6EB9

6EC9

6ED9

6EE9

6EF9

渊渚渪渺湊湚湪湺溊溚溪溺滊滚滪滺

A

6E0A

6E1A

6E2A

6E3A

6E4A

6E5A

6E6A

6E7A

6E8A

6E9A

6EAA

6EBA

6ECA

6EDA

6EEA

6EFA

渋減渫渻湋湛湫湻溋溛溫溻滋滛滫滻

B

6E0B

C

D

6E1B

6E2B

6E3B

6E4B

6E5B

6E6B

6E7B

6E8B

6E9B

6EAB

6EBB

6ECB

6EDB

6EEB

6EFB

渌渜測渼湌湜湬湼溌溜溬溼滌滜滬滼 6E0C

6E1C

6E2C

6E3C

6E4C

6E5C

6E6C

6E7C

6E8C

6E9C

6EAC

6EBC

6ECC

6EDC

6EEC

6EFC

渍渝渭渽湍湝湭湽溍溝溭溽滍滝滭滽 6E0D

6E1D

6E2D

6E3D

6E4D

6E5D

6E6D

6E7D

6E8D

6E9D

6EAD

6EBD

6ECD

6EDD

6EED

6EFD

渎渞渮渾湎湞湮湾溎溞溮溾滎滞滮滾 6E0E

F

6E20

渁渑渡渱湁湑湡湱溁溑溡溱滁滑满滱

1

E

6E10

6E1E

6E2E

6E3E

6E4E

6E5E

6E6E

6E7E

6E8E

6E9E

6EAE

6EBE

6ECE

6EDE

6EEE

6EFE

渏渟港渿湏湟湯湿溏溟溯溿滏滟滯滿 6E0F

348

6E1F

6E2F

6E3F

6E4F

6E5F

6E6F

6E7F

6E8F

6E9F

6EAF

6EBF

6ECF

6EDF

6EEF

6EFF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

6F00 6F0

6F00

6F2

6F3

6F4

6F5

6F6

6F7

6F8

6F9 6FA 6FB 6FC 6FD 6FE 6FF

6F10

6F20

6F30

6F40

6F50

6F60

6F70

6F80

6F90

6FA0

6FB0

6FC0

6FD0

6FE0

6FF0

漁漑漡漱潁潑潡潱澁澑澡澱濁濑濡濱

1

6F01

6F11

6F21

6F31

6F41

6F51

6F61

6F71

6F81

6F91

6FA1

6FB1

6FC1

6FD1

6FE1

6FF1

漂漒漢漲潂潒潢潲澂澒澢澲濂濒濢濲

2

6F02

6F12

6F22

6F32

6F42

6F52

6F62

6F72

6F82

6F92

6FA2

6FB2

6FC2

6FD2

6FE2

6FF2

漃漓漣漳潃潓潣潳澃澓澣澳濃濓濣濳

3

6F03

6F13

6F23

6F33

6F43

6F53

6F63

6F73

6F83

6F93

6FA3

6FB3

6FC3

6FD3

6FE3

6FF3

漄演漤漴潄潔潤潴澄澔澤澴濄濔濤濴

4

6F04

6F14

6F24

6F34

6F44

6F54

6F64

6F74

6F84

6F94

6FA4

6FB4

6FC4

6FD4

6FE4

6FF4

漅漕漥漵潅潕潥潵澅澕澥澵濅濕濥濵

5

6F05

6F15

6F25

6F35

6F45

6F55

6F65

6F75

6F85

6F95

6FA5

6FB5

6FC5

6FD5

6FE5

6FF5

漆漖漦漶潆潖潦潶澆澖澦澶濆濖濦濶

6

6F06

6F16

6F26

6F36

6F46

6F56

6F66

6F76

6F86

6F96

6FA6

6FB6

6FC6

6FD6

6FE6

6FF6

漇漗漧漷潇潗潧潷澇澗澧澷濇濗濧濷

7

6F07

6F17

6F27

6F37

6F47

6F57

6F67

6F77

6F87

6F97

6FA7

6FB7

6FC7

6FD7

6FE7

6FF7

漈漘漨漸潈潘潨潸澈澘澨澸濈濘濨濸

8

6F08

6F18

6F28

6F38

6F48

6F58

6F68

6F78

6F88

6F98

6FA8

6FB8

6FC8

6FD8

6FE8

6FF8

漉漙漩漹潉潙潩潹澉澙澩澹濉濙濩濹

9

6F09

6F19

6F29

6F39

6F49

6F59

6F69

6F79

6F89

6F99

6FA9

6FB9

6FC9

6FD9

6FE9

6FF9

漊漚漪漺潊潚潪潺澊澚澪澺濊濚濪濺

A

6F0A

6F1A

6F2A

6F3A

6F4A

6F5A

6F6A

6F7A

6F8A

6F9A

6FAA

6FBA

6FCA

6FDA

6FEA

6FFA

漋漛漫漻潋潛潫潻澋澛澫澻濋濛濫濻

B

6F0B

C

D

6F1B

6F2B

6F3B

6F4B

6F5B

6F6B

6F7B

6F8B

6F9B

6FAB

6FBB

6FCB

6FDB

6FEB

6FFB

漌漜漬漼潌潜潬潼澌澜澬澼濌濜濬濼 6F0C

6F1C

6F2C

6F3C

6F4C

6F5C

6F6C

6F7C

6F8C

6F9C

6FAC

6FBC

6FCC

6FDC

6FEC

6FFC

漍漝漭漽潍潝潭潽澍澝澭澽濍濝濭濽 6F0D

6F1D

6F2D

6F3D

6F4D

6F5D

6F6D

6F7D

6F8D

6F9D

6FAD

6FBD

6FCD

6FDD

6FED

6FFD

漎漞漮漾潎潞潮潾澎澞澮澾濎濞濮濾 6F0E

F

6F1

6FFF

漀漐漠漰潀潐潠潰澀澐澠澰激濐濠濰

0

E

CJK Unified Ideographs

6F1E

6F2E

6F3E

6F4E

6F5E

6F6E

6F7E

6F8E

6F9E

6FAE

6FBE

6FCE

6FDE

6FEE

6FFE

漏漟漯漿潏潟潯潿澏澟澯澿濏濟濯濿 6F0F

6F1F

6F2F

6F3F

6F4F

6F5F

6F6F

6F7F

6F8F

6F9F

6FAF

6FBF

6FCF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

6FDF

6FEF

6FFF

349

7000 700

7000

702

703

704

705

706

707

708

709

70A 70B 70C 70D 70E

70F

7010

7020

7030

7040

7050

7060

7070

7080

7090

70A0

70B0

70C0

70D0

70E0

70F0

瀁瀑瀡瀱灁灑灡灱炁炑炡炱烁烑烡烱

1

7001

7011

7021

7031

7041

7051

7061

7071

7081

7091

70A1

70B1

70C1

70D1

70E1

70F1

瀂瀒瀢瀲灂灒灢灲炂炒炢炲烂烒烢烲

2

7002

7012

7022

7032

7042

7052

7062

7072

7082

7092

70A2

70B2

70C2

70D2

70E2

70F2

瀃瀓瀣瀳灃灓灣灳炃炓炣炳烃烓烣烳

3

7003

7013

7023

7033

7043

7053

7063

7073

7083

7093

70A3

70B3

70C3

70D3

70E3

70F3

瀄瀔瀤瀴灄灔灤灴炄炔炤炴烄烔烤烴

4

7004

7014

7024

7034

7044

7054

7064

7074

7084

7094

70A4

70B4

70C4

70D4

70E4

70F4

瀅瀕瀥瀵灅灕灥灵炅炕炥炵烅烕烥烵

5

7005

7015

7025

7035

7045

7055

7065

7075

7085

7095

70A5

70B5

70C5

70D5

70E5

70F5

瀆瀖瀦瀶灆灖灦灶炆炖炦炶烆烖烦烶

6

7006

7016

7026

7036

7046

7056

7066

7076

7086

7096

70A6

70B6

70C6

70D6

70E6

70F6

瀇瀗瀧瀷灇灗灧灷炇炗炧炷烇烗烧烷

7

7007

7017

7027

7037

7047

7057

7067

7077

7087

7097

70A7

70B7

70C7

70D7

70E7

70F7

瀈瀘瀨瀸灈灘灨灸炈炘炨炸烈烘烨烸

8

7008

7018

7028

7038

7048

7058

7068

7078

7088

7098

70A8

70B8

70C8

70D8

70E8

70F8

瀉瀙瀩瀹灉灙灩灹炉炙炩点烉烙烩烹

9

7009

7019

7029

7039

7049

7059

7069

7079

7089

7099

70A9

70B9

70C9

70D9

70E9

70F9

瀊瀚瀪瀺灊灚灪灺炊炚炪為烊烚烪烺

A

700A

701A

702A

703A

704A

705A

706A

707A

708A

709A

70AA

70BA

70CA

70DA

70EA

70FA

瀋瀛瀫瀻灋灛火灻炋炛炫炻烋烛烫烻

B

700B

C

D

701B

702B

703B

704B

705B

706B

707B

708B

709B

70AB

70BB

70CB

70DB

70EB

70FB

瀌瀜瀬瀼灌灜灬灼炌炜炬炼烌烜烬烼 700C

701C

702C

703C

704C

705C

706C

707C

708C

709C

70AC

70BC

70CC

70DC

70EC

70FC

瀍瀝瀭瀽灍灝灭災炍炝炭炽烍烝热烽 700D

701D

702D

703D

704D

705D

706D

707D

708D

709D

70AD

70BD

70CD

70DD

70ED

70FD

瀎瀞瀮瀾灎灞灮灾炎炞炮炾烎烞烮烾 700E

F

701

70FF

瀀瀐瀠瀰灀灐灠灰炀炐炠炰烀烐烠烰

0

E

CJK Unified Ideographs

701E

702E

703E

704E

705E

706E

707E

708E

709E

70AE

70BE

70CE

70DE

70EE

70FE

瀏瀟瀯瀿灏灟灯灿炏炟炯炿烏烟烯烿 700F

350

701F

702F

703F

704F

705F

706F

707F

708F

709F

70AF

70BF

70CF

70DF

70EF

70FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

7100 710

7100

712

713

714

715

716

717

718

719

71A 71B 71C 71D 71E

71F

7110

7120

7130

7140

7150

7160

7170

7180

7190

71A0

71B0

71C0

71D0

71E0

71F0

焁焑無焱煁煑煡煱熁熑熡熱燁燑燡燱

1

7101

7111

7121

7131

7141

7151

7161

7171

7181

7191

71A1

71B1

71C1

71D1

71E1

71F1

焂焒焢焲煂煒煢煲熂熒熢熲燂燒燢燲

2

7102

7112

7122

7132

7142

7152

7162

7172

7182

7192

71A2

71B2

71C2

71D2

71E2

71F2

焃焓焣焳煃煓煣煳熃熓熣熳燃燓燣燳

3

7103

7113

7123

7133

7143

7153

7163

7173

7183

7193

71A3

71B3

71C3

71D3

71E3

71F3

焄焔焤焴煄煔煤煴熄熔熤熴燄燔燤燴

4

7104

7114

7124

7134

7144

7154

7164

7174

7184

7194

71A4

71B4

71C4

71D4

71E4

71F4

焅焕焥焵煅煕煥煵熅熕熥熵燅燕燥燵

5

7105

7115

7125

7135

7145

7155

7165

7175

7185

7195

71A5

71B5

71C5

71D5

71E5

71F5

焆焖焦然煆煖煦煶熆熖熦熶燆燖燦燶

6

7106

7116

7126

7136

7146

7156

7166

7176

7186

7196

71A6

71B6

71C6

71D6

71E6

71F6

焇焗焧焷煇煗照煷熇熗熧熷燇燗燧燷

7

7107

7117

7127

7137

7147

7157

7167

7177

7187

7197

71A7

71B7

71C7

71D7

71E7

71F7

焈焘焨焸煈煘煨煸熈熘熨熸燈燘燨燸

8

7108

7118

7128

7138

7148

7158

7168

7178

7188

7198

71A8

71B8

71C8

71D8

71E8

71F8

焉焙焩焹煉煙煩煹熉熙熩熹燉燙燩燹

9

7109

7119

7129

7139

7149

7159

7169

7179

7189

7199

71A9

71B9

71C9

71D9

71E9

71F9

焊焚焪焺煊煚煪煺熊熚熪熺燊燚燪燺

A

710A

711A

712A

713A

714A

715A

716A

717A

718A

719A

71AA

71BA

71CA

71DA

71EA

71FA

焋焛焫焻煋煛煫煻熋熛熫熻燋燛燫燻

B

710B

C

D

711B

712B

713B

714B

715B

716B

717B

718B

719B

71AB

71BB

71CB

71DB

71EB

71FB

焌焜焬焼煌煜煬煼熌熜熬熼燌燜燬燼 710C

711C

712C

713C

714C

715C

716C

717C

718C

719C

71AC

71BC

71CC

71DC

71EC

71FC

焍焝焭焽煍煝煭煽熍熝熭熽燍燝燭燽 710D

711D

712D

713D

714D

715D

716D

717D

718D

719D

71AD

71BD

71CD

71DD

71ED

71FD

焎焞焮焾煎煞煮煾熎熞熮熾燎燞燮燾 710E

F

711

71FF

焀焐焠焰煀煐煠煰熀熐熠熰燀燐燠燰

0

E

CJK Unified Ideographs

711E

712E

713E

714E

715E

716E

717E

718E

719E

71AE

71BE

71CE

71DE

71EE

71FE

焏焟焯焿煏煟煯煿熏熟熯熿燏營燯燿 710F

711F

712F

713F

714F

715F

716F

717F

718F

719F

71AF

71BF

71CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

71DF

71EF

71FF

351

7200 720

7200

722

723

724

725

726

727

728

729

72A 72B 72C 72D 72E

72F

7210

7220

7230

7240

7250

7260

7270

7280

7290

72A0

72B0

72C0

72D0

72E0

72F0

爁爑爡爱牁牑牡牱犁犑犡犱狁狑狡狱

1

7201

7211

7221

7231

7241

7251

7261

7271

7281

7291

72A1

72B1

72C1

72D1

72E1

72F1

爂爒爢爲牂牒牢牲犂犒犢犲狂狒狢狲

2

7202

7212

7222

7232

7242

7252

7262

7272

7282

7292

72A2

72B2

72C2

72D2

72E2

72F2

爃爓爣爳牃牓牣牳犃犓犣犳狃狓狣狳

3

7203

7213

7223

7233

7243

7253

7263

7273

7283

7293

72A3

72B3

72C3

72D3

72E3

72F3

爄爔爤爴牄牔牤牴犄犔犤犴狄狔狤狴

4

7204

7214

7224

7234

7244

7254

7264

7274

7284

7294

72A4

72B4

72C4

72D4

72E4

72F4

爅爕爥爵牅牕牥牵犅犕犥犵狅狕狥狵

5

7205

7215

7225

7235

7245

7255

7265

7275

7285

7295

72A5

72B5

72C5

72D5

72E5

72F5

爆爖爦父牆牖牦牶犆犖犦状狆狖狦狶

6

7206

7216

7226

7236

7246

7256

7266

7276

7286

7296

72A6

72B6

72C6

72D6

72E6

72F6

爇爗爧爷片牗牧牷犇犗犧犷狇狗狧狷

7

7207

7217

7227

7237

7247

7257

7267

7277

7287

7297

72A7

72B7

72C7

72D7

72E7

72F7

爈爘爨爸版牘牨牸犈犘犨犸狈狘狨狸

8

7208

7218

7228

7238

7248

7258

7268

7278

7288

7298

72A8

72B8

72C8

72D8

72E8

72F8

爉爙爩爹牉牙物特犉犙犩犹狉狙狩狹

9

7209

7219

7229

7239

7249

7259

7269

7279

7289

7299

72A9

72B9

72C9

72D9

72E9

72F9

爊爚爪爺牊牚牪牺犊犚犪犺狊狚狪狺

A

720A

721A

722A

723A

724A

725A

726A

727A

728A

729A

72AA

72BA

72CA

72DA

72EA

72FA

爋爛爫爻牋牛牫牻犋犛犫犻狋狛狫狻

B

720B

C

D

721B

722B

723B

724B

725B

726B

727B

728B

729B

72AB

72BB

72CB

72DB

72EB

72FB

爌爜爬爼牌牜牬牼犌犜犬犼狌狜独狼 720C

721C

722C

723C

724C

725C

726C

727C

728C

729C

72AC

72BC

72CC

72DC

72EC

72FC

爍爝爭爽牍牝牭牽犍犝犭犽狍狝狭狽 720D

721D

722D

723D

724D

725D

726D

727D

728D

729D

72AD

72BD

72CD

72DD

72ED

72FD

爎爞爮爾牎牞牮牾犎犞犮犾狎狞狮狾 720E

F

721

72FF

爀爐爠爰牀牐牠牰犀犐犠犰狀狐狠狰

0

E

CJK Unified Ideographs

721E

722E

723E

724E

725E

726E

727E

728E

729E

72AE

72BE

72CE

72DE

72EE

72FE

爏爟爯爿牏牟牯牿犏犟犯犿狏狟狯狿 720F

352

721F

722F

723F

724F

725F

726F

727F

728F

729F

72AF

72BF

72CF

72DF

72EF

72FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

7300 730

7300

732

733

734

735

736

737

738

739

73A 73B 73C 73D 73E

73F

7310

7320

7330

7340

7350

7360

7370

7380

7390

73A0

73B0

73C0

73D0

73E0

73F0

猁猑猡猱獁獑獡獱玁玑玡玱珁珑珡珱

1

7301

7311

7321

7331

7341

7351

7361

7371

7381

7391

73A1

73B1

73C1

73D1

73E1

73F1

猂猒猢猲獂獒獢獲玂玒玢玲珂珒珢珲

2

7302

7312

7322

7332

7342

7352

7362

7372

7382

7392

73A2

73B2

73C2

73D2

73E2

73F2

猃猓猣猳獃獓獣獳玃玓玣玳珃珓珣珳

3

7303

7313

7323

7333

7343

7353

7363

7373

7383

7393

73A3

73B3

73C3

73D3

73E3

73F3

猄猔猤猴獄獔獤獴玄玔玤玴珄珔珤珴

4

7304

7314

7324

7334

7344

7354

7364

7374

7384

7394

73A4

73B4

73C4

73D4

73E4

73F4

猅猕猥猵獅獕獥獵玅玕玥玵珅珕珥珵

5

7305

7315

7325

7335

7345

7355

7365

7375

7385

7395

73A5

73B5

73C5

73D5

73E5

73F5

猆猖猦猶獆獖獦獶玆玖玦玶珆珖珦珶

6

7306

7316

7326

7336

7346

7356

7366

7376

7386

7396

73A6

73B6

73C6

73D6

73E6

73F6

猇猗猧猷獇獗獧獷率玗玧玷珇珗珧珷

7

7307

7317

7327

7337

7347

7357

7367

7377

7387

7397

73A7

73B7

73C7

73D7

73E7

73F7

猈猘猨猸獈獘獨獸玈玘玨玸珈珘珨珸

8

7308

7318

7328

7338

7348

7358

7368

7378

7388

7398

73A8

73B8

73C8

73D8

73E8

73F8

猉猙猩猹獉獙獩獹玉玙玩玹珉珙珩珹

9

7309

7319

7329

7339

7349

7359

7369

7379

7389

7399

73A9

73B9

73C9

73D9

73E9

73F9

猊猚猪猺獊獚獪獺玊玚玪玺珊珚珪珺

A

730A

731A

732A

733A

734A

735A

736A

737A

738A

739A

73AA

73BA

73CA

73DA

73EA

73FA

猋猛猫猻獋獛獫獻王玛玫玻珋珛珫珻

B

730B

C

D

731B

732B

733B

734B

735B

736B

737B

738B

739B

73AB

73BB

73CB

73DB

73EB

73FB

猌猜猬猼獌獜獬獼玌玜玬玼珌珜珬珼 730C

731C

732C

733C

734C

735C

736C

737C

738C

739C

73AC

73BC

73CC

73DC

73EC

73FC

猍猝猭猽獍獝獭獽玍玝玭玽珍珝班珽 730D

731D

732D

733D

734D

735D

736D

737D

738D

739D

73AD

73BD

73CD

73DD

73ED

73FD

猎猞献猾獎獞獮獾玎玞玮玾珎珞珮現 730E

F

731

73FF

猀猐猠猰獀獐獠獰玀玐玠现珀珐珠珰

0

E

CJK Unified Ideographs

731E

732E

733E

734E

735E

736E

737E

738E

739E

73AE

73BE

73CE

73DE

73EE

73FE

猏猟猯猿獏獟獯獿玏玟环玿珏珟珯珿 730F

731F

732F

733F

734F

735F

736F

737F

738F

739F

73AF

73BF

73CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

73DF

73EF

73FF

353

7400 740

7400

742

743

744

745

746

747

748

749

74A 74B 74C 74D 74E

74F

7410

7420

7430

7440

7450

7460

7470

7480

7490

74A0

74B0

74C0

74D0

74E0

74F0

琁琑琡琱瑁瑑瑡瑱璁璑璡璱瓁瓑瓡瓱

1

7401

7411

7421

7431

7441

7451

7461

7471

7481

7491

74A1

74B1

74C1

74D1

74E1

74F1

琂琒琢琲瑂瑒瑢瑲璂璒璢璲瓂瓒瓢瓲

2

7402

7412

7422

7432

7442

7452

7462

7472

7482

7492

74A2

74B2

74C2

74D2

74E2

74F2

球琓琣琳瑃瑓瑣瑳璃璓璣璳瓃瓓瓣瓳

3

7403

7413

7423

7433

7443

7453

7463

7473

7483

7493

74A3

74B3

74C3

74D3

74E3

74F3

琄琔琤琴瑄瑔瑤瑴璄璔璤璴瓄瓔瓤瓴

4

7404

7414

7424

7434

7444

7454

7464

7474

7484

7494

74A4

74B4

74C4

74D4

74E4

74F4

琅琕琥琵瑅瑕瑥瑵璅璕璥璵瓅瓕瓥瓵

5

7405

7415

7425

7435

7445

7455

7465

7475

7485

7495

74A5

74B5

74C5

74D5

74E5

74F5

理琖琦琶瑆瑖瑦瑶璆璖璦璶瓆瓖瓦瓶

6

7406

7416

7426

7436

7446

7456

7466

7476

7486

7496

74A6

74B6

74C6

74D6

74E6

74F6

琇琗琧琷瑇瑗瑧瑷璇璗璧璷瓇瓗瓧瓷

7

7407

7417

7427

7437

7447

7457

7467

7477

7487

7497

74A7

74B7

74C7

74D7

74E7

74F7

琈琘琨琸瑈瑘瑨瑸璈璘璨璸瓈瓘瓨瓸

8

7408

7418

7428

7438

7448

7458

7468

7478

7488

7498

74A8

74B8

74C8

74D8

74E8

74F8

琉琙琩琹瑉瑙瑩瑹璉璙璩璹瓉瓙瓩瓹

9

7409

7419

7429

7439

7449

7459

7469

7479

7489

7499

74A9

74B9

74C9

74D9

74E9

74F9

琊琚琪琺瑊瑚瑪瑺璊璚璪璺瓊瓚瓪瓺

A

740A

741A

742A

743A

744A

745A

746A

747A

748A

749A

74AA

74BA

74CA

74DA

74EA

74FA

琋琛琫琻瑋瑛瑫瑻璋璛璫璻瓋瓛瓫瓻

B

740B

C

D

741B

742B

743B

744B

745B

746B

747B

748B

749B

74AB

74BB

74CB

74DB

74EB

74FB

琌琜琬琼瑌瑜瑬瑼璌璜璬璼瓌瓜瓬瓼 740C

741C

742C

743C

744C

745C

746C

747C

748C

749C

74AC

74BC

74CC

74DC

74EC

74FC

琍琝琭琽瑍瑝瑭瑽璍璝璭璽瓍瓝瓭瓽 740D

741D

742D

743D

744D

745D

746D

747D

748D

749D

74AD

74BD

74CD

74DD

74ED

74FD

琎琞琮琾瑎瑞瑮瑾璎璞璮璾瓎瓞瓮瓾 740E

F

741

74FF

琀琐琠琰瑀瑐瑠瑰璀璐璠環瓀瓐瓠瓰

0

E

CJK Unified Ideographs

741E

742E

743E

744E

745E

746E

747E

748E

749E

74AE

74BE

74CE

74DE

74EE

74FE

琏琟琯琿瑏瑟瑯瑿璏璟璯璿瓏瓟瓯瓿 740F

354

741F

742F

743F

744F

745F

746F

747F

748F

749F

74AF

74BF

74CF

74DF

74EF

74FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

7500 750

7500

752

753

754

755

756

757

758

759

75A 75B 75C 75D 75E

75F

7510

7520

7530

7540

7550

7560

7570

7580

7590

75A0

75B0

75C0

75D0

75E0

75F0

甁甑甡由畁畑畡畱疁疑疡疱痁痑痡痱

1

7501

7511

7521

7531

7541

7551

7561

7571

7581

7591

75A1

75B1

75C1

75D1

75E1

75F1

甂甒產甲畂畒畢畲疂疒疢疲痂痒痢痲

2

7502

7512

7522

7532

7542

7552

7562

7572

7582

7592

75A2

75B2

75C2

75D2

75E2

75F2

甃甓産申畃畓畣畳疃疓疣疳痃痓痣痳

3

7503

7513

7523

7533

7543

7553

7563

7573

7583

7593

75A3

75B3

75C3

75D3

75E3

75F3

甄甔甤甴畄畔畤畴疄疔疤疴痄痔痤痴

4

7504

7514

7524

7534

7544

7554

7564

7574

7584

7594

75A4

75B4

75C4

75D4

75E4

75F4

甅甕甥电畅畕略畵疅疕疥疵病痕痥痵

5

7505

7515

7525

7535

7545

7555

7565

7575

7585

7595

75A5

75B5

75C5

75D5

75E5

75F5

甆甖甦甶畆畖畦當疆疖疦疶痆痖痦痶

6

7506

7516

7526

7536

7546

7556

7566

7576

7586

7596

75A6

75B6

75C6

75D6

75E6

75F6

甇甗甧男畇畗畧畷疇疗疧疷症痗痧痷

7

7507

7517

7527

7537

7547

7557

7567

7577

7587

7597

75A7

75B7

75C7

75D7

75E7

75F7

甈甘用甸畈畘畨畸疈疘疨疸痈痘痨痸

8

7508

7518

7528

7538

7548

7558

7568

7578

7588

7598

75A8

75B8

75C8

75D8

75E8

75F8

甉甙甩甹畉留畩畹疉疙疩疹痉痙痩痹

9

7509

7519

7529

7539

7549

7559

7569

7579

7589

7599

75A9

75B9

75C9

75D9

75E9

75F9

甊甚甪町畊畚番畺疊疚疪疺痊痚痪痺

A

750A

751A

752A

753A

754A

755A

756A

757A

758A

759A

75AA

75BA

75CA

75DA

75EA

75FA

甋甛甫画畋畛畫畻疋疛疫疻痋痛痫痻

B

750B

C

D

751B

752B

753B

754B

755B

756B

757B

758B

759B

75AB

75BB

75CB

75DB

75EB

75FB

甌甜甬甼界畜畬畼疌疜疬疼痌痜痬痼 750C

751C

752C

753C

754C

755C

756C

757C

758C

759C

75AC

75BC

75CC

75DC

75EC

75FC

甍甝甭甽畍畝畭畽疍疝疭疽痍痝痭痽 750D

751D

752D

753D

754D

755D

756D

757D

758D

759D

75AD

75BD

75CD

75DD

75ED

75FD

甎甞甮甾畎畞畮畾疎疞疮疾痎痞痮痾 750E

F

751

75FF

甀甐甠田畀畐畠異疀疐疠疰痀痐痠痰

0

E

CJK Unified Ideographs

751E

752E

753E

754E

755E

756E

757E

758E

759E

75AE

75BE

75CE

75DE

75EE

75FE

甏生甯甿畏畟畯畿疏疟疯疿痏痟痯痿 750F

751F

752F

753F

754F

755F

756F

757F

758F

759F

75AF

75BF

75CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

75DF

75EF

75FF

355

Yi Syllables Range: A000–A48F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

A000 A00 0

A00E

F

A0E

A010

A020

A030

A040

A050

A060

A070

A080

A090

A0A0

A0B0

A0C0

A0D0

A0E0

A011

A021

A031

A041

A051

A061

A071

A081

A091

A0A1

A0B1

A0C1

A0D1

A0E1

A012

A022

A032

A042

A052

A062

A072

A082

A092

A0A2

A0B2

A0C2

A0D2

A0E2

A013

A023

A033

A043

A053

A063

A073

A083

A093

A0A3

A0B3

A0C3

A0D3

A0E3

A014

A024

A034

A044

A054

A064

A074

A084

A094

A0A4

A0B4

A0C4

A0D4

A0E4

A015

A025

A035

A045

A055

A065

A075

A085

A095

A0A5

A0B5

A0C5

A0D5

A0E5

A016

A026

A036

A046

A056

A066

A076

A086

A096

A0A6

A0B6

A0C6

A0D6

A0E6

A017

A027

A037

A047

A057

A067

A077

A087

A097

A0A7

A0B7

A0C7

A0D7

A0E7

A018

A028

A038

A048

A058

A068

A078

A088

A098

A0A8

A0B8

A0C8

A0D8

A0E8

A019

A029

A039

A049

A059

A069

A079

A089

A099

A0A9

A0B9

A0C9

A0D9

A0E9

A01A

A02A

A03A

A04A

A05A

A06A

A07A

A08A

A09A

A0AA

A0BA

A0CA

A0DA

A0EA

A01B

A02B

A03B

A04B

A05B

A06B

A07B

A08B

A09B

A0AB

A0BB

A0CB

A0DB

A0EB

A01C

A02C

A03C

A04C

A05C

A06C

A07C

A08C

A09C

A0AC

A0BC

A0CC

A0DC

A0EC

/ ? O _ o ¯ ¿ Ï ß ï A00D

E

A0D

. > N ^ n ~ ® ¾ Î Þ î A00C

D

A0C

- = M ] m } ½ Í Ý í A00B

C

A0B

, < L \ l | ¬ ¼ Ì Ü ì A00A

B

A0A

+ ; K [ k { « » Ë Û ë A009

A

A09

* : J Z j z ª º Ê Ú ê A008

9

A08

) 9 I Y i y © ¹ É Ù é A007

8

A07

( 8 H X h x ¨ ¸ È Ø è A006

7

A06

' 7 G W g w § · Ç × ç A005

6

A05

& 6 F V f v ¦ ¶ Æ Ö æ A004

5

A04

% 5 E U e u ¥ µ Å Õ å A003

4

A03

$ 4 D T d t ¤ ´ Ä Ô ä A002

3

A02

# 3 C S c s £ ³ Ã Ó ã A001

2

A01

A0EF

" 2 B R b r ¢ ² Â Ò â A000

1

Yi Syllables

A01D

A01E

A02D

A03D

A04D

A05D

A06D

A07D

A08D

A09D

A0AD

A0BD

A0CD

A0DD

A0ED

0 @ P ` p ° À Ð à ð A02E

A03E

A04E

A05E

A06E

A07E

A08E

A09E

A0AE

A0BE

A0CE

A0DE

A0EE

! 1 A Q a q ¡ ± Á Ñ á ñ A00F

398

A01F

A02F

A03F

A04F

A05F

A06F

A07F

A08F

A09F

A0AF

A0BF

A0CF

A0DF

A0EF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

A0F0 A0F 0

A1D

A110

A120

A130

A140

A150

A160

A170

A180

A190

A1A0

A1B0

A1C0

A1D0

A101

A111

A121

A131

A141

A151

A161

A171

A181

A191

A1A1

A1B1

A1C1

A1D1

A102

A112

A122

A132

A142

A152

A162

A172

A182

A192

A1A2

A1B2

A1C2

A1D2

A103

A113

A123

A133

A143

A153

A163

A173

A183

A193

A1A3

A1B3

A1C3

A1D3

A104

A114

A124

A134

A144

A154

A164

A174

A184

A194

A1A4

A1B4

A1C4

A1D4

A105

A115

A125

A135

A145

A155

A165

A175

A185

A195

A1A5

A1B5

A1C5

A1D5

A106

A116

A126

A136

A146

A156

A166

A176

A186

A196

A1A6

A1B6

A1C6

A1D6

A107

A117

A127

A137

A147

A157

A167

A177

A187

A197

A1A7

A1B7

A1C7

A1D7

A108

A118

A128

A138

A148

A158

A168

A178

A188

A198

A1A8

A1B8

A1C8

A1D8

A109

A119

A129

A139

A149

A159

A169

A179

A189

A199

A1A9

A1B9

A1C9

A1D9

A10A

A11A

A12A

A13A

A14A

A15A

A16A

A17A

A18A

A19A

A1AA

A1BA

A1CA

A1DA

A10B

A10C

A11B

A11C

A12B

A13B

A14B

A15B

A16B

A17B

A18B

A19B

A1AB

A1BB

A1CB

A1DB

0 @ P ` p ° À Ð à A12C

A13C

A14C

A15C

A16C

A17C

A18C

A19C

A1AC

A1BC

A1CC

A1DC

A10D

A11D

A12D

A13D

A14D

A15D

A16D

A17D

A18D

A19D

A1AD

A1BD

A1CD

A1DD

" 2 B R b r ¢ ² Â Ò â A0FE

F

A1C

! 1 A Q a q ¡ ± Á Ñ á A0FD

E

A100

þ A0FC

D

A1B

ý / ? O _ o ¯ ¿ Ï ß A0FB

C

A1A

ü . > N ^ n ~ ® ¾ Î Þ A0FA

B

A19

û - = M ] m } ½ Í Ý A0F9

A

A18

ú , < L \ l | ¬ ¼ Ì Ü A0F8

9

A17

ù + ; K [ k { « » Ë Û A0F7

8

A16

ø * : J Z j z ª º Ê Ú A0F6

7

A15

÷ ) 9 I Y i y © ¹ É Ù A0F5

6

A14

ö ( 8 H X h x ¨ ¸ È Ø A0F4

5

A13

õ ' 7 G W g w § · Ç × A0F3

4

A12

ô & 6 F V f v ¦ ¶ Æ Ö A0F2

3

A11

ó % 5 E U e u ¥ µ Å Õ A0F1

2

A10

A1DF

ò $ 4 D T d t ¤ ´ Ä Ô A0F0

1

Yi Syllables

A10E

A11E

A12E

A13E

A14E

A15E

A16E

A17E

A18E

A19E

A1AE

A1BE

A1CE

A1DE

# 3 C S c s £ ³ Ã Ó ã A0FF

A10F

A11F

A12F

A13F

A14F

A15F

A16F

A17F

A18F

A19F

A1AF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

A1BF

A1CF

A1DF

399

A1E0 A1E 0

A2C

A200

A210

A220

A230

A240

A250

A260

A270

A280

A290

A2A0

A2B0

A2C0

A1F1

A201

A211

A221

A231

A241

A251

A261

A271

A281

A291

A2A1

A2B1

A2C1

A1F2

A202

A212

A222

A232

A242

A252

A262

A272

A282

A292

A2A2

A2B2

A2C2

A1F3

A203

A213

A223

A233

A243

A253

A263

A273

A283

A293

A2A3

A2B3

A2C3

A1F4

A204

A214

A224

A234

A244

A254

A264

A274

A284

A294

A2A4

A2B4

A2C4

A1F5

A205

A215

A225

A235

A245

A255

A265

A275

A285

A295

A2A5

A2B5

A2C5

A1F6

A206

A216

A226

A236

A246

A256

A266

A276

A286

A296

A2A6

A2B6

A2C6

A1F7

A207

A217

A227

A237

A247

A257

A267

A277

A287

A297

A2A7

A2B7

A2C7

A1F8

A208

A218

A228

A238

A248

A258

A268

A278

A288

A298

A2A8

A2B8

A2C8

A1F9

A209

A1FA

A20A

A219

A21A

A229

A239

A249

A259

A269

A279

A289

A299

A2A9

A2B9

A2C9

0 @ P ` p ° À Ð A22A

A23A

A24A

A25A

A26A

A27A

A28A

A29A

A2AA

A2BA

A2CA

A1FB

A20B

A21B

A22B

A23B

A24B

A25B

A26B

A27B

A28B

A29B

A2AB

A2BB

A2CB

A1FC

A20C

A21C

A22C

A23C

A24C

A25C

A26C

A27C

A28C

A29C

A2AC

A2BC

A2CC

A1FD

A20D

A21D

A22D

A23D

A24D

A25D

A26D

A27D

A28D

A29D

A2AD

A2BD

A2CD

ò $ 4 D T d t ¤ ´ Ä Ô A1EE

F

A2B

ñ # 3 C S c s £ ³ Ã Ó A1ED

E

A2A

ð " 2 B R b r ¢ ² Â Ò A1EC

D

A29

ï ! 1 A Q a q ¡ ± Á Ñ A1EB

C

A1F0

î þ A1EA

B

A28

í ý / ? O _ o ¯ ¿ Ï A1E9

A

A27

ì ü . > N ^ n ~ ® ¾ Î A1E8

9

A26

ë û - = M ] m } ½ Í A1E7

8

A25

ê ú , < L \ l | ¬ ¼ Ì A1E6

7

A24

é ù + ; K [ k { « » Ë A1E5

6

A23

è ø * : J Z j z ª º Ê A1E4

5

A22

ç ÷ ) 9 I Y i y © ¹ É A1E3

4

A21

æ ö ( 8 H X h x ¨ ¸ È A1E2

3

A20

å õ ' 7 G W g w § · Ç A1E1

2

A1F

A2CF

ä ô & 6 F V f v ¦ ¶ Æ A1E0

1

Yi Syllables

A1FE

A20E

A21E

A22E

A23E

A24E

A25E

A26E

A27E

A28E

A29E

A2AE

A2BE

A2CE

ó % 5 E U e u ¥ µ Å Õ A1EF

400

A1FF

A20F

A21F

A22F

A23F

A24F

A25F

A26F

A27F

A28F

A29F

A2AF

A2BF

A2CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

A2D0 A2D 0

A2F0

A300

A310

A320

A330

A340

A350

A360

A370

A380

A390

A3A0

A2E1

A2F1

A301

A311

A321

A331

A341

A351

A361

A371

A381

A391

A3A1

A2E2

A2F2

A302

A312

A322

A332

A342

A352

A362

A372

A382

A392

A3A2

A2E3

A2F3

A303

A313

A323

A333

A343

A353

A363

A373

A383

A393

A3A3

A2E4

A2F4

A304

A314

A324

A334

A344

A354

A364

A374

A384

A394

A3A4

A2E5

A2F5

A305

A315

A325

A335

A345

A355

A365

A375

A385

A395

A3A5

A2E6

A2F6

A306

A316

A326

A336

A346

A356

A366

A376

A386

A396

A3A6

A2E7

A2F7

A307

A2E8

A2F8

A308

A317

A318

A327

A337

A347

A357

A367

A377

A387

A397

A3A7

0 @ P ` p ° A328

A338

A348

A358

A368

A378

A388

A398

A3A8

A2E9

A2F9

A309

A319

A329

A339

A349

A359

A369

A379

A389

A399

A3A9

A2EA

A2FA

A30A

A31A

A32A

A33A

A34A

A35A

A36A

A37A

A38A

A39A

A3AA

A2EB

A2FB

A30B

A31B

A32B

A33B

A34B

A35B

A36B

A37B

A38B

A39B

A3AB

A2EC

A2FC

A30C

A31C

A32C

A33C

A34C

A35C

A36C

A37C

A38C

A39C

A3AC

A2ED

A2FD

A30D

A31D

A32D

A33D

A34D

A35D

A36D

A37D

A38D

A39D

A3AD

ä ô & 6 F V f v ¦ ¶ A2DE

F

A3A

ã ó % 5 E U e u ¥ µ A2DD

E

A39

â ò $ 4 D T d t ¤ ´ A2DC

D

A38

á ñ # 3 C S c s £ ³ A2DB

C

A37

à ð " 2 B R b r ¢ ² A2DA

B

A36

ß ï ! 1 A Q a q ¡ ± A2D9

A

A2E0

Þ î þ A2D8

9

A35

Ý í ý / ? O _ o ¯ A2D7

8

A34

Ü ì ü . > N ^ n ~ ® A2D6

7

A33

Û ë û - = M ] m } A2D5

6

A32

Ú ê ú , < L \ l | ¬ A2D4

5

A31

Ù é ù + ; K [ k { « A2D3

4

A30

Ø è ø * : J Z j z ª A2D2

3

A2F

× ç ÷ ) 9 I Y i y ©

A2D1

2

A2E

A3AF

Ö æ ö ( 8 H X h x ¨ A2D0

1

Yi Syllables

A2EE

A2FE

A30E

A31E

A32E

A33E

A34E

A35E

A36E

A37E

A38E

A39E

A3AE

å õ ' 7 G W g w § · A2DF

A2EF

A2FF

A30F

A31F

A32F

A33F

A34F

A35F

A36F

A37F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

A38F

A39F

A3AF

401

A3B0 A3B 0

A3D0

A3E0

A3F0

A400

A410

A420

A430

A440

A450

A460

A470

A480

A3C1

A3D1

A3E1

A3F1

A401

A411

A421

A431

A441

A451

A461

A471

A481

A3C2

A3D2

A3E2

A3F2

A402

A412

A422

A432

A442

A452

A462

A472

A482

A3C3

A3D3

A3E3

A3F3

A403

A413

A423

A433

A443

A453

A463

A473

A483

A3C4

A3D4

A3E4

A3F4

A404

A414

A424

A434

A444

A454

A464

A474

A484

A3C5

A3D5

A3E5

A3F5

A405

A3C6

A3D6

A3E6

A3F6

A406

A415

A416

A425

A435

A445

A455

A465

A475

A485

0 @ P ` p A426

A436

A446

A456

A466

A476

A486

A3C7

A3D7

A3E7

A3F7

A407

A417

A427

A437

A447

A457

A467

A477

A487

A3C8

A3D8

A3E8

A3F8

A408

A418

A428

A438

A448

A458

A468

A478

A488

A3C9

A3D9

A3E9

A3F9

A409

A419

A429

A439

A449

A459

A469

A479

A489

A3CA

A3DA

A3EA

A3FA

A40A

A41A

A42A

A43A

A44A

A45A

A46A

A47A

A48A

A3CB

A3DB

A3EB

A3FB

A40B

A41B

A42B

A43B

A44B

A45B

A46B

A47B

A48B

A3CC

A3DC

A3EC

A3FC

A40C

A41C

A42C

A43C

A44C

A45C

A46C

A47C

A48C

A3CD

A3DD

A3ED

A3FD

A40D

A41D

A42D

A43D

A44D

A45D

A46D

A47D

Æ Ö æ ö ( 8 H X h x A3BE

F

A48

Å Õ å õ ' 7 G W g w A3BD

E

A47

Ä Ô ä ô & 6 F V f v A3BC

D

A46

Ã Ó ã ó % 5 E U e u A3BB

C

A45

Â Ò â ò $ 4 D T d t A3BA

B

A44

Á Ñ á ñ # 3 C S c s A3B9

A

A43

À Ð à ð " 2 B R b r A3B8

9

A42

¿ Ï ß ï ! 1 A Q a q A3B7

8

A3C0

¾ Î Þ î þ A3B6

7

A41

½ Í Ý í ý / ? O _ o A3B5

6

A40

¼ Ì Ü ì ü . > N ^ n ~ A3B4

5

A3F

» Ë Û ë û - = M ] m } A3B3

4

A3E

º Ê Ú ê ú , < L \ l | A3B2

3

A3D

¹ É Ù é ù + ; K [ k { A3B1

2

A3C

A48F

¸ È Ø è ø * : J Z j z A3B0

1

Yi Syllables

A3CE

A3DE

A3EE

A3FE

A40E

A41E

A42E

A43E

A44E

A45E

A46E

A47E

Ç × ç ÷ ) 9 I Y i y A3BF

402

A3CF

A3DF

A3EF

A3FF

A40F

A41F

A42F

A43F

A44F

A45F

A46F

A47F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

A000

Yi Syllables

Syllables A000 A001 A002 A003 A004 A005 A006 A007 A008 A009 A00A A00B A00C A00D A00E A00F A010 A011 A012 A013 A014

F G H I J K L M N O P Q R S T U V W X Y Z

YI SYLLABLE IT YI SYLLABLE IX YI SYLLABLE I YI SYLLABLE IP YI SYLLABLE IET YI SYLLABLE IEX YI SYLLABLE IE YI SYLLABLE IEP YI SYLLABLE AT YI SYLLABLE AX YI SYLLABLE A YI SYLLABLE AP YI SYLLABLE UOX YI SYLLABLE UO YI SYLLABLE UOP YI SYLLABLE OT YI SYLLABLE OX YI SYLLABLE O YI SYLLABLE OP YI SYLLABLE EX YI SYLLABLE E

Syllable iteration mark A015

[ YI SYLLABLE WU YI SYLLABLE ITERATION MARK • name is a misnomer

Syllables A016 A017 A018 A019 A01A A01B A01C A01D A01E A01F A020 A021 A022 A023 A024 A025 A026 A027 A028 A029 A02A A02B A02C A02D A02E A02F A030 A031 A032 A033 A034 A035 A036 A037 A038 A039 A03A A03B

\ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~

YI SYLLABLE BIT YI SYLLABLE BIX YI SYLLABLE BI YI SYLLABLE BIP YI SYLLABLE BIET YI SYLLABLE BIEX YI SYLLABLE BIE YI SYLLABLE BIEP YI SYLLABLE BAT YI SYLLABLE BAX YI SYLLABLE BA YI SYLLABLE BAP YI SYLLABLE BUOX YI SYLLABLE BUO YI SYLLABLE BUOP YI SYLLABLE BOT YI SYLLABLE BOX YI SYLLABLE BO YI SYLLABLE BOP YI SYLLABLE BEX YI SYLLABLE BE YI SYLLABLE BEP YI SYLLABLE BUT YI SYLLABLE BUX YI SYLLABLE BU YI SYLLABLE BUP YI SYLLABLE BURX YI SYLLABLE BUR YI SYLLABLE BYT YI SYLLABLE BYX YI SYLLABLE BY YI SYLLABLE BYP YI SYLLABLE BYRX YI SYLLABLE BYR YI SYLLABLE PIT YI SYLLABLE PIX YI SYLLABLE PI YI SYLLABLE PIP

A03C A03D A03E A03F A040 A041 A042 A043 A044 A045 A046 A047 A048 A049 A04A A04B A04C A04D A04E A04F A050 A051 A052 A053 A054 A055 A056 A057 A058 A059 A05A A05B A05C A05D A05E A05F A060 A061 A062 A063 A064 A065 A066 A067 A068 A069 A06A A06B A06C A06D A06E A06F A070 A071 A072 A073 A074 A075 A076 A077 A078 A079 A07A A07B A07C A07D A07E A07F

! " # $ % & ' ( ) * + , . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E

A07F YI SYLLABLE PIEX YI SYLLABLE PIE YI SYLLABLE PIEP YI SYLLABLE PAT YI SYLLABLE PAX YI SYLLABLE PA YI SYLLABLE PAP YI SYLLABLE PUOX YI SYLLABLE PUO YI SYLLABLE PUOP YI SYLLABLE POT YI SYLLABLE POX YI SYLLABLE PO YI SYLLABLE POP YI SYLLABLE PUT YI SYLLABLE PUX YI SYLLABLE PU YI SYLLABLE PUP YI SYLLABLE PURX YI SYLLABLE PUR YI SYLLABLE PYT YI SYLLABLE PYX YI SYLLABLE PY YI SYLLABLE PYP YI SYLLABLE PYRX YI SYLLABLE PYR YI SYLLABLE BBIT YI SYLLABLE BBIX YI SYLLABLE BBI YI SYLLABLE BBIP YI SYLLABLE BBIET YI SYLLABLE BBIEX YI SYLLABLE BBIE YI SYLLABLE BBIEP YI SYLLABLE BBAT YI SYLLABLE BBAX YI SYLLABLE BBA YI SYLLABLE BBAP YI SYLLABLE BBUOX YI SYLLABLE BBUO YI SYLLABLE BBUOP YI SYLLABLE BBOT YI SYLLABLE BBOX YI SYLLABLE BBO YI SYLLABLE BBOP YI SYLLABLE BBEX YI SYLLABLE BBE YI SYLLABLE BBEP YI SYLLABLE BBUT YI SYLLABLE BBUX YI SYLLABLE BBU YI SYLLABLE BBUP YI SYLLABLE BBURX YI SYLLABLE BBUR YI SYLLABLE BBYT YI SYLLABLE BBYX YI SYLLABLE BBY YI SYLLABLE BBYP YI SYLLABLE NBIT YI SYLLABLE NBIX YI SYLLABLE NBI YI SYLLABLE NBIP YI SYLLABLE NBIEX YI SYLLABLE NBIE YI SYLLABLE NBIEP YI SYLLABLE NBAT YI SYLLABLE NBAX YI SYLLABLE NBA

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

403

Yi Radicals Range: A490–A4CF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Modifier Tone Letters Range: A700–A71F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Latin Extended-D Range: A720–A7FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Syloti Nagri Range: A800–A82F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Phags-pa Range: A840–A87F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

A840

Phags-pa A84 0

A854

A864

A874

A855

A865

A875

A856

A866

A876

A857

A867

A877

A858

A868

A859

A869

A85A

A86A

A85B

A86B

A85C

A86C

A85D

A86D

æ Œ ﬁ A84E

F

A873

Ω Õ › A84D

E

A863

º Ã ‹ A84C

D

A853

ª À ¤ A84B

C

A872

∫ ⁄ A84A

B

A862

π … Ÿ A849

A

A852

∏ » ÿ A848

9

A871

∑ « ◊ Á A847

8

A861

∂ ∆ ÷ Ê A846

7

A851

μ ≈ ’ Â A845

6

A870

¥ ƒ ‘ ‰ A844

5

A860

≥ √ ” „ A843

4

A850

≤ ¬ “ ‚ A842

3

A87

± ¡ — · A841

2

A86

∞ ¿ – ‡ A840

1

A85

A87F

A85E

A86E

ø œ ﬂ A84F

A85F

A86F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

417

Hangul Syllables Range: AC00–D7AF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

AC00

Hangul Syllables

ACFF

AC0 AC1 AC2 AC3 AC4 AC5 AC6 AC7 AC8 AC9 ACA ACB ACC ACD ACE ACF

가감갠갰걀걐걠거검겐겠결곀곐고곰

0

AC00

AC01

AC30

AC40

AC50

AC60

AC70

AC80

AC90

ACA0

ACB0

ACC0

ACD0

ACE0

ACF0

AC11

AC21

AC31

AC41

AC51

AC61

AC71

AC81

AC91

ACA1

ACB1

ACC1

ACD1

ACE1

ACF1

갂값갢갲걂걒걢걲겂겒겢겲곂곒곢곲

2

AC02

AC12

AC22

AC32

AC42

AC52

AC62

AC72

AC82

AC92

ACA2

ACB2

ACC2

ACD2

ACE2

ACF2

갃갓갣갳걃걓걣걳것겓겣겳곃곓곣곳

3

AC03

AC13

AC23

AC33

AC43

AC53

AC63

AC73

AC83

AC93

ACA3

ACB3

ACC3

ACD3

ACE3

ACF3

간갔갤갴걄걔걤건겄겔겤겴계곔곤곴

4

AC04

AC14

AC24

AC34

AC44

AC54

AC64

AC74

AC84

AC94

ACA4

ACB4

ACC4

ACD4

ACE4

ACF4

갅강갥갵걅걕걥걵겅겕겥겵곅곕곥공

5

AC05

AC15

AC25

AC35

AC45

AC55

AC65

AC75

AC85

AC95

ACA5

ACB5

ACC5

ACD5

ACE5

ACF5

갆갖갦갶걆걖걦걶겆겖겦겶곆곖곦곶

6

AC06

AC16

AC26

AC36

AC46

AC56

AC66

AC76

AC86

AC96

ACA6

ACB6

ACC6

ACD6

ACE6

ACF6

갇갗갧갷걇걗걧걷겇겗겧겷곇곗곧곷

7

AC07

AC17

AC27

AC37

AC47

AC57

AC67

AC77

AC87

AC97

ACA7

ACB7

ACC7

ACD7

ACE7

ACF7

갈갘갨갸걈걘걨걸겈겘겨겸곈곘골곸

8

AC08

AC18

AC28

AC38

AC48

AC58

AC68

AC78

AC88

AC98

ACA8

ACB8

ACC8

ACD8

ACE8

ACF8

갉같갩갹걉걙걩걹겉겙격겹곉곙곩곹

9

AC09

AC19

AC29

AC39

AC49

AC59

AC69

AC79

AC89

AC99

ACA9

ACB9

ACC9

ACD9

ACE9

ACF9

갊갚갪갺걊걚걪걺겊겚겪겺곊곚곪곺

A

AC0A

AC1A

AC2A

AC3A

AC4A

AC5A

AC6A

AC7A

AC8A

AC9A

ACAA

ACBA

ACCA

ACDA

ACEA

ACFA

갋갛갫갻걋걛걫걻겋겛겫겻곋곛곫곻

B

AC0B

C

D

AC1B

AC2B

AC3B

AC4B

AC5B

AC6B

AC7B

AC8B

AC9B

ACAB

ACBB

ACCB

ACDB

ACEB

ACFB

갌개갬갼걌걜걬걼게겜견겼곌곜곬과 AC0C

AC1C

AC2C

AC3C

AC4C

AC5C

AC6C

AC7C

AC8C

AC9C

ACAC

ACBC

ACCC

ACDC

ACEC

ACFC

갍객갭갽걍걝걭걽겍겝겭경곍곝곭곽 AC0D

AC1D

AC2D

AC3D

AC4D

AC5D

AC6D

AC7D

AC8D

AC9D

ACAD

ACBD

ACCD

ACDD

ACED

ACFD

갎갞갮갾걎걞걮걾겎겞겮겾곎곞곮곾 AC0E

F

AC20

각갑갡갱걁걑걡걱겁겑겡겱곁곑곡곱

1

E

AC10

AC1E

AC2E

AC3E

AC4E

AC5E

AC6E

AC7E

AC8E

AC9E

ACAE

ACBE

ACCE

ACDE

ACEE

ACFE

갏갟갯갿걏걟걯걿겏겟겯겿곏곟곯곿 AC0F

420

AC1F

AC2F

AC3F

AC4F

AC5F

AC6F

AC7F

AC8F

AC9F

ACAF

ACBF

ACCF

ACDF

ACEF

ACFF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

AD00

Hangul Syllables

ADFF

AD0 AD1 AD2 AD3 AD4 AD5 AD6 AD7 AD8 AD9 ADA ADB ADC ADD ADE ADF

관괐괠괰굀교굠군궀궐궠궰귀귐균귰

0

AD00

AD01

AD30

AD40

AD50

AD60

AD70

AD80

AD90

ADA0

ADB0

ADC0

ADD0

ADE0

ADF0

AD11

AD21

AD31

AD41

AD51

AD61

AD71

AD81

AD91

ADA1

ADB1

ADC1

ADD1

ADE1

ADF1

괂괒괢괲굂굒굢굲궂궒궢궲귂귒귢귲

2

AD02

AD12

AD22

AD32

AD42

AD52

AD62

AD72

AD82

AD92

ADA2

ADB2

ADC2

ADD2

ADE2

ADF2

괃괓괣괳굃굓굣굳궃궓궣궳귃귓귣귳

3

AD03

AD13

AD23

AD33

AD43

AD53

AD63

AD73

AD83

AD93

ADA3

ADB3

ADC3

ADD3

ADE3

ADF3

괄괔괤괴굄굔굤굴궄궔궤궴귄귔귤귴

4

AD04

AD14

AD24

AD34

AD44

AD54

AD64

AD74

AD84

AD94

ADA4

ADB4

ADC4

ADD4

ADE4

ADF4

괅괕괥괵굅굕굥굵궅궕궥궵귅귕귥귵

5

AD05

AD15

AD25

AD35

AD45

AD55

AD65

AD75

AD85

AD95

ADA5

ADB5

ADC5

ADD5

ADE5

ADF5

괆괖괦괶굆굖굦굶궆궖궦궶귆귖귦귶

6

AD06

AD16

AD26

AD36

AD46

AD56

AD66

AD76

AD86

AD96

ADA6

ADB6

ADC6

ADD6

ADE6

ADF6

괇괗괧괷굇굗굧굷궇궗궧궷귇귗귧귷

7

AD07

AD17

AD27

AD37

AD47

AD57

AD67

AD77

AD87

AD97

ADA7

ADB7

ADC7

ADD7

ADE7

ADF7

괈괘괨괸굈굘굨굸궈궘궨궸귈귘귨그

8

AD08

AD18

AD28

AD38

AD48

AD58

AD68

AD78

AD88

AD98

ADA8

ADB8

ADC8

ADD8

ADE8

ADF8

괉괙괩괹굉굙굩굹궉궙궩궹귉귙귩극

9

AD09

AD19

AD29

AD39

AD49

AD59

AD69

AD79

AD89

AD99

ADA9

ADB9

ADC9

ADD9

ADE9

ADF9

괊괚괪괺굊굚굪굺궊궚궪궺귊귚귪귺

A

AD0A

AD1A

AD2A

AD3A

AD4A

AD5A

AD6A

AD7A

AD8A

AD9A

ADAA

ADBA

ADCA

ADDA

ADEA

ADFA

괋괛괫괻굋굛굫굻궋궛궫궻귋귛귫귻

B

AD0B

C

D

AD1B

AD2B

AD3B

AD4B

AD5B

AD6B

AD7B

AD8B

AD9B

ADAB

ADBB

ADCB

ADDB

ADEB

ADFB

괌괜괬괼굌굜구굼권궜궬궼귌규귬근 AD0C

AD1C

AD2C

AD3C

AD4C

AD5C

AD6C

AD7C

AD8C

AD9C

ADAC

ADBC

ADCC

ADDC

ADEC

ADFC

괍괝괭괽굍굝국굽궍궝궭궽귍귝귭귽 AD0D

AD1D

AD2D

AD3D

AD4D

AD5D

AD6D

AD7D

AD8D

AD9D

ADAD

ADBD

ADCD

ADDD

ADED

ADFD

괎괞괮괾굎굞굮굾궎궞궮궾귎귞귮귾 AD0E

F

AD20

괁광괡괱굁굑굡굱궁궑궡궱귁귑귡귱

1

E

AD10

AD1E

AD2E

AD3E

AD4E

AD5E

AD6E

AD7E

AD8E

AD9E

ADAE

ADBE

ADCE

ADDE

ADEE

ADFE

괏괟괯괿굏굟굯굿궏궟궯궿귏귟귯귿 AD0F

AD1F

AD2F

AD3F

AD4F

AD5F

AD6F

AD7F

AD8F

AD9F

ADAF

ADBF

ADCF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

ADDF

ADEF

ADFF

421

AE00

Hangul Syllables

AEFF

AE0 AE1 AE2 AE3 AE4 AE5 AE6 AE7 AE8 AE9 AEA AEB AEC AED AEE AEF

글긐긠기김깐깠깰꺀꺐꺠꺰껀껐껠껰

0

AE00

AE01

AE30

AE40

AE50

AE60

AE70

AE80

AE90

AEA0

AEB0

AEC0

AED0

AEE0

AEF0

AE11

AE21

AE31

AE41

AE51

AE61

AE71

AE81

AE91

AEA1

AEB1

AEC1

AED1

AEE1

AEF1

긂긒긢긲깂깒깢깲꺂꺒꺢꺲껂껒껢껲

2

AE02

AE12

AE22

AE32

AE42

AE52

AE62

AE72

AE82

AE92

AEA2

AEB2

AEC2

AED2

AEE2

AEF2

긃긓긣긳깃깓깣깳꺃꺓꺣꺳껃껓껣껳

3

AE03

AE13

AE23

AE33

AE43

AE53

AE63

AE73

AE83

AE93

AEA3

AEB3

AEC3

AED3

AEE3

AEF3

긄긔긤긴깄깔깤깴꺄꺔꺤꺴껄껔껤껴

4

AE04

AE14

AE24

AE34

AE44

AE54

AE64

AE74

AE84

AE94

AEA4

AEB4

AEC4

AED4

AEE4

AEF4

긅긕긥긵깅깕깥깵꺅꺕꺥꺵껅껕껥껵

5

AE05

AE15

AE25

AE35

AE45

AE55

AE65

AE75

AE85

AE95

AEA5

AEB5

AEC5

AED5

AEE5

AEF5

긆긖긦긶깆깖깦깶꺆꺖꺦꺶껆껖껦껶

6

AE06

AE16

AE26

AE36

AE46

AE56

AE66

AE76

AE86

AE96

AEA6

AEB6

AEC6

AED6

AEE6

AEF6

긇긗긧긷깇깗깧깷꺇꺗꺧꺷껇껗껧껷

7

AE07

AE17

AE27

AE37

AE47

AE57

AE67

AE77

AE87

AE97

AEA7

AEB7

AEC7

AED7

AEE7

AEF7

금긘긨길깈깘깨깸꺈꺘꺨꺸껈께껨껸

8

AE08

AE18

AE28

AE38

AE48

AE58

AE68

AE78

AE88

AE98

AEA8

AEB8

AEC8

AED8

AEE8

AEF8

급긙긩긹깉깙깩깹꺉꺙꺩꺹껉껙껩껹

9

AE09

AE19

AE29

AE39

AE49

AE59

AE69

AE79

AE89

AE99

AEA9

AEB9

AEC9

AED9

AEE9

AEF9

긊긚긪긺깊깚깪깺꺊꺚꺪꺺껊껚껪껺

A

AE0A

AE1A

AE2A

AE3A

AE4A

AE5A

AE6A

AE7A

AE8A

AE9A

AEAA

AEBA

AECA

AEDA

AEEA

AEFA

긋긛긫긻깋깛깫깻꺋꺛꺫꺻껋껛껫껻

B

AE0B

C

D

AE1B

AE2B

AE3B

AE4B

AE5B

AE6B

AE7B

AE8B

AE9B

AEAB

AEBB

AECB

AEDB

AEEB

AEFB

긌긜긬긼까깜깬깼꺌꺜꺬꺼껌껜껬껼 AE0C

AE1C

AE2C

AE3C

AE4C

AE5C

AE6C

AE7C

AE8C

AE9C

AEAC

AEBC

AECC

AEDC

AEEC

AEFC

긍긝긭긽깍깝깭깽꺍꺝꺭꺽껍껝껭껽 AE0D

AE1D

AE2D

AE3D

AE4D

AE5D

AE6D

AE7D

AE8D

AE9D

AEAD

AEBD

AECD

AEDD

AEED

AEFD

긎긞긮긾깎깞깮깾꺎꺞꺮꺾껎껞껮껾 AE0E

F

AE20

긁긑긡긱깁깑깡깱꺁꺑꺡꺱껁껑껡껱

1

E

AE10

AE1E

AE2E

AE3E

AE4E

AE5E

AE6E

AE7E

AE8E

AE9E

AEAE

AEBE

AECE

AEDE

AEEE

AEFE

긏긟긯긿깏깟깯깿꺏꺟꺯꺿껏껟껯껿 AE0F

422

AE1F

AE2F

AE3F

AE4F

AE5F

AE6F

AE7F

AE8F

AE9F

AEAF

AEBF

AECF

AEDF

AEEF

AEFF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

AF00

Hangul Syllables

AFFF

AF0 AF1 AF2 AF3 AF4 AF5 AF6 AF7 AF8 AF9 AFA AFB AFC AFD AFE AFF

꼀꼐꼠꼰꽀꽐꽠꽰꾀꾐꾠꾰꿀꿐꿠꿰

0

AF00

AF01

AF30

AF40

AF50

AF60

AF70

AF80

AF90

AFA0

AFB0

AFC0

AFD0

AFE0

AFF0

AF11

AF21

AF31

AF41

AF51

AF61

AF71

AF81

AF91

AFA1

AFB1

AFC1

AFD1

AFE1

AFF1

꼂꼒꼢꼲꽂꽒꽢꽲꾂꾒꾢꾲꿂꿒꿢꿲

2

AF02

AF12

AF22

AF32

AF42

AF52

AF62

AF72

AF82

AF92

AFA2

AFB2

AFC2

AFD2

AFE2

AFF2

꼃꼓꼣꼳꽃꽓꽣꽳꾃꾓꾣꾳꿃꿓꿣꿳

3

AF03

AF13

AF23

AF33

AF43

AF53

AF63

AF73

AF83

AF93

AFA3

AFB3

AFC3

AFD3

AFE3

AFF3

꼄꼔꼤꼴꽄꽔꽤꽴꾄꾔꾤꾴꿄꿔꿤꿴

4

AF04

AF14

AF24

AF34

AF44

AF54

AF64

AF74

AF84

AF94

AFA4

AFB4

AFC4

AFD4

AFE4

AFF4

꼅꼕꼥꼵꽅꽕꽥꽵꾅꾕꾥꾵꿅꿕꿥꿵

5

AF05

AF15

AF25

AF35

AF45

AF55

AF65

AF75

AF85

AF95

AFA5

AFB5

AFC5

AFD5

AFE5

AFF5

꼆꼖꼦꼶꽆꽖꽦꽶꾆꾖꾦꾶꿆꿖꿦꿶

6

AF06

AF16

AF26

AF36

AF46

AF56

AF66

AF76

AF86

AF96

AFA6

AFB6

AFC6

AFD6

AFE6

AFF6

꼇꼗꼧꼷꽇꽗꽧꽷꾇꾗꾧꾷꿇꿗꿧꿷

7

AF07

AF17

AF27

AF37

AF47

AF57

AF67

AF77

AF87

AF97

AFA7

AFB7

AFC7

AFD7

AFE7

AFF7

꼈꼘꼨꼸꽈꽘꽨꽸꾈꾘꾨꾸꿈꿘꿨꿸

8

AF08

AF18

AF28

AF38

AF48

AF58

AF68

AF78

AF88

AF98

AFA8

AFB8

AFC8

AFD8

AFE8

AFF8

꼉꼙꼩꼹꽉꽙꽩꽹꾉꾙꾩꾹꿉꿙꿩꿹

9

AF09

AF19

AF29

AF39

AF49

AF59

AF69

AF79

AF89

AF99

AFA9

AFB9

AFC9

AFD9

AFE9

AFF9

꼊꼚꼪꼺꽊꽚꽪꽺꾊꾚꾪꾺꿊꿚꿪꿺

A

AF0A

AF1A

AF2A

AF3A

AF4A

AF5A

AF6A

AF7A

AF8A

AF9A

AFAA

AFBA

AFCA

AFDA

AFEA

AFFA

꼋꼛꼫꼻꽋꽛꽫꽻꾋꾛꾫꾻꿋꿛꿫꿻

B

AF0B

C

D

AF1B

AF2B

AF3B

AF4B

AF5B

AF6B

AF7B

AF8B

AF9B

AFAB

AFBB

AFCB

AFDB

AFEB

AFFB

꼌꼜꼬꼼꽌꽜꽬꽼꾌꾜꾬꾼꿌꿜꿬꿼 AF0C

AF1C

AF2C

AF3C

AF4C

AF5C

AF6C

AF7C

AF8C

AF9C

AFAC

AFBC

AFCC

AFDC

AFEC

AFFC

꼍꼝꼭꼽꽍꽝꽭꽽꾍꾝꾭꾽꿍꿝꿭꿽 AF0D

AF1D

AF2D

AF3D

AF4D

AF5D

AF6D

AF7D

AF8D

AF9D

AFAD

AFBD

AFCD

AFDD

AFED

AFFD

꼎꼞꼮꼾꽎꽞꽮꽾꾎꾞꾮꾾꿎꿞꿮꿾 AF0E

F

AF20

꼁꼑꼡꼱꽁꽑꽡꽱꾁꾑꾡꾱꿁꿑꿡꿱

1

E

AF10

AF1E

AF2E

AF3E

AF4E

AF5E

AF6E

AF7E

AF8E

AF9E

AFAE

AFBE

AFCE

AFDE

AFEE

AFFE

꼏꼟꼯꼿꽏꽟꽯꽿꾏꾟꾯꾿꿏꿟꿯꿿 AF0F

AF1F

AF2F

AF3F

AF4F

AF5F

AF6F

AF7F

AF8F

AF9F

AFAF

AFBF

AFCF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

AFDF

AFEF

AFFF

423

B000

Hangul Syllables

B0FF

B00 B01 B02 B03 B04 B05 B06 B07 B08 B09 B0A B0B B0C B0D B0E B0F

뀀뀐뀠뀰끀끐끠끰낀낐날낰냀냐냠냰

0

B000

B001

B030

B040

B050

B060

B070

B080

B090

B0A0

B0B0

B0C0

B0D0

B0E0

B0F0

B011

B021

B031

B041

B051

B061

B071

B081

B091

B0A1

B0B1

B0C1

B0D1

B0E1

B0F1

뀂뀒뀢뀲끂끒끢끲낂낒낢낲냂냒냢냲

2

B002

B012

B022

B032

B042

B052

B062

B072

B082

B092

B0A2

B0B2

B0C2

B0D2

B0E2

B0F2

뀃뀓뀣뀳끃끓끣끳낃낓낣낳냃냓냣냳

3

B003

B013

B023

B033

B043

B053

B063

B073

B083

B093

B0A3

B0B3

B0C3

B0D3

B0E3

B0F3

뀄뀔뀤뀴끄끔끤끴낄낔낤내냄냔냤냴

4

B004

B014

B024

B034

B044

B054

B064

B074

B084

B094

B0A4

B0B4

B0C4

B0D4

B0E4

B0F4

뀅뀕뀥뀵끅끕끥끵낅낕낥낵냅냕냥냵

5

B005

B015

B025

B035

B045

B055

B065

B075

B085

B095

B0A5

B0B5

B0C5

B0D5

B0E5

B0F5

뀆뀖뀦뀶끆끖끦끶낆낖낦낶냆냖냦냶

6

B006

B016

B026

B036

B046

B056

B066

B076

B086

B096

B0A6

B0B6

B0C6

B0D6

B0E6

B0F6

뀇뀗뀧뀷끇끗끧끷낇낗낧낷냇냗냧냷

7

B007

B017

B027

B037

B047

B057

B067

B077

B087

B097

B0A7

B0B7

B0C7

B0D7

B0E7

B0F7

뀈뀘뀨뀸끈끘끨끸낈나남낸냈냘냨냸

8

B008

B018

B028

B038

B048

B058

B068

B078

B088

B098

B0A8

B0B8

B0C8

B0D8

B0E8

B0F8

뀉뀙뀩뀹끉끙끩끹낉낙납낹냉냙냩냹

9

B009

B019

B029

B039

B049

B059

B069

B079

B089

B099

B0A9

B0B9

B0C9

B0D9

B0E9

B0F9

뀊뀚뀪뀺끊끚끪끺낊낚낪낺냊냚냪냺

A

B00A

B01A

B02A

B03A

B04A

B05A

B06A

B07A

B08A

B09A

B0AA

B0BA

B0CA

B0DA

B0EA

B0FA

뀋뀛뀫뀻끋끛끫끻낋낛낫낻냋냛냫냻

B

B00B

C

D

B01B

B02B

B03B

B04B

B05B

B06B

B07B

B08B

B09B

B0AB

B0BB

B0CB

B0DB

B0EB

B0FB

뀌뀜뀬뀼끌끜끬끼낌난났낼냌냜냬냼 B00C

B01C

B02C

B03C

B04C

B05C

B06C

B07C

B08C

B09C

B0AC

B0BC

B0CC

B0DC

B0EC

B0FC

뀍뀝뀭뀽끍끝끭끽낍낝낭낽냍냝냭냽 B00D

B01D

B02D

B03D

B04D

B05D

B06D

B07D

B08D

B09D

B0AD

B0BD

B0CD

B0DD

B0ED

B0FD

뀎뀞뀮뀾끎끞끮끾낎낞낮낾냎냞냮냾 B00E

F

B020

뀁뀑뀡뀱끁끑끡끱낁낑낡낱냁냑냡냱

1

E

B010

B01E

B02E

B03E

B04E

B05E

B06E

B07E

B08E

B09E

B0AE

B0BE

B0CE

B0DE

B0EE

B0FE

뀏뀟뀯뀿끏끟끯끿낏낟낯낿냏냟냯냿 B00F

424

B01F

B02F

B03F

B04F

B05F

B06F

B07F

B08F

B09F

B0AF

B0BF

B0CF

B0DF

B0EF

B0FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

B100

Hangul Syllables

B1FF

B10 B11 B12 B13 B14 B15 B16 B17 B18 B19 B1A B1B B1C B1D B1E B1F

넀널넠넰녀념녠녰놀놐놠놰뇀뇐뇠뇰

0

B100

B101

B130

B140

B150

B160

B170

B180

B190

B1A0

B1B0

B1C0

B1D0

B1E0

B1F0

B111

B121

B131

B141

B151

B161

B171

B181

B191

B1A1

B1B1

B1C1

B1D1

B1E1

B1F1

넂넒넢넲녂녒녢녲놂높놢놲뇂뇒뇢뇲

2

B102

B112

B122

B132

B142

B152

B162

B172

B182

B192

B1A2

B1B2

B1C2

B1D2

B1E2

B1F2

넃넓넣넳녃녓녣녳놃놓놣놳뇃뇓뇣뇳

3

B103

B113

B123

B133

B143

B153

B163

B173

B183

B193

B1A3

B1B3

B1C3

B1D3

B1E3

B1F3

넄넔네넴년녔녤녴놄놔놤놴뇄뇔뇤뇴

4

B104

B114

B124

B134

B144

B154

B164

B174

B184

B194

B1A4

B1B4

B1C4

B1D4

B1E4

B1F4

넅넕넥넵녅녕녥녵놅놕놥놵뇅뇕뇥뇵

5

B105

B115

B125

B135

B145

B155

B165

B175

B185

B195

B1A5

B1B5

B1C5

B1D5

B1E5

B1F5

넆넖넦넶녆녖녦녶놆놖놦놶뇆뇖뇦뇶

6

B106

B116

B126

B136

B146

B156

B166

B176

B186

B196

B1A6

B1B6

B1C6

B1D6

B1E6

B1F6

넇넗넧넷녇녗녧녷놇놗놧놷뇇뇗뇧뇷

7

B107

B117

B127

B137

B147

B157

B167

B177

B187

B197

B1A7

B1B7

B1C7

B1D7

B1E7

B1F7

너넘넨넸녈녘녨노놈놘놨놸뇈뇘뇨뇸

8

B108

B118

B128

B138

B148

B158

B168

B178

B188

B198

B1A8

B1B8

B1C8

B1D8

B1E8

B1F8

넉넙넩넹녉녙녩녹놉놙놩놹뇉뇙뇩뇹

9

B109

B119

B129

B139

B149

B159

B169

B179

B189

B199

B1A9

B1B9

B1C9

B1D9

B1E9

B1F9

넊넚넪넺녊녚녪녺놊놚놪놺뇊뇚뇪뇺

A

B10A

B11A

B12A

B13A

B14A

B15A

B16A

B17A

B18A

B19A

B1AA

B1BA

B1CA

B1DA

B1EA

B1FA

넋넛넫넻녋녛녫녻놋놛놫놻뇋뇛뇫뇻

B

B10B

C

D

B11B

B12B

B13B

B14B

B15B

B16B

B17B

B18B

B19B

B1AB

B1BB

B1CB

B1DB

B1EB

B1FB

넌넜넬넼녌녜녬논놌놜놬놼뇌뇜뇬뇼 B10C

B11C

B12C

B13C

B14C

B15C

B16C

B17C

B18C

B19C

B1AC

B1BC

B1CC

B1DC

B1EC

B1FC

넍넝넭넽녍녝녭녽농놝놭놽뇍뇝뇭뇽 B10D

B11D

B12D

B13D

B14D

B15D

B16D

B17D

B18D

B19D

B1AD

B1BD

B1CD

B1DD

B1ED

B1FD

넎넞넮넾녎녞녮녾놎놞놮놾뇎뇞뇮뇾 B10E

F

B120

넁넑넡넱녁녑녡녱놁놑놡놱뇁뇑뇡뇱

1

E

B110

B11E

B12E

B13E

B14E

B15E

B16E

B17E

B18E

B19E

B1AE

B1BE

B1CE

B1DE

B1EE

B1FE

넏넟넯넿녏녟녯녿놏놟놯놿뇏뇟뇯뇿 B10F

B11F

B12F

B13F

B14F

B15F

B16F

B17F

B18F

B19F

B1AF

B1BF

B1CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

B1DF

B1EF

B1FF

425

B200

Hangul Syllables

B2FF

B20 B21 B22 B23 B24 B25 B26 B27 B28 B29 B2A B2B B2C B2D B2E B2F

눀눐눠눰뉀뉐뉠뉰늀느늠늰닀닐닠닰

0

B200

B201

B230

B240

B250

B260

B270

B280

B290

B2A0

B2B0

B2C0

B2D0

B2E0

B2F0

B211

B221

B231

B241

B251

B261

B271

B281

B291

B2A1

B2B1

B2C1

B2D1

B2E1

B2F1

눂눒눢눲뉂뉒뉢뉲늂늒늢늲닂닒닢닲

2

B202

B212

B222

B232

B242

B252

B262

B272

B282

B292

B2A2

B2B2

B2C2

B2D2

B2E2

B2F2

눃눓눣눳뉃뉓뉣뉳늃늓늣늳닃닓닣닳

3

B203

B213

B223

B233

B243

B253

B263

B273

B283

B293

B2A3

B2B3

B2C3

B2D3

B2E3

B2F3

누눔눤눴뉄뉔뉤뉴늄는늤늴닄닔다담

4

B204

B214

B224

B234

B244

B254

B264

B274

B284

B294

B2A4

B2B4

B2C4

B2D4

B2E4

B2F4

눅눕눥눵뉅뉕뉥뉵늅늕능늵닅닕닥답

5

B205

B215

B225

B235

B245

B255

B265

B275

B285

B295

B2A5

B2B5

B2C5

B2D5

B2E5

B2F5

눆눖눦눶뉆뉖뉦뉶늆늖늦늶닆닖닦닶

6

B206

B216

B226

B236

B246

B256

B266

B276

B286

B296

B2A6

B2B6

B2C6

B2D6

B2E6

B2F6

눇눗눧눷뉇뉗뉧뉷늇늗늧늷닇닗닧닷

7

B207

B217

B227

B237

B247

B257

B267

B277

B287

B297

B2A7

B2B7

B2C7

B2D7

B2E7

B2F7

눈눘눨눸뉈뉘뉨뉸늈늘늨늸니님단닸

8

B208

B218

B228

B238

B248

B258

B268

B278

B288

B298

B2A8

B2B8

B2C8

B2D8

B2E8

B2F8

눉눙눩눹뉉뉙뉩뉹늉늙늩늹닉닙닩당

9

B209

B219

B229

B239

B249

B259

B269

B279

B289

B299

B2A9

B2B9

B2C9

B2D9

B2E9

B2F9

눊눚눪눺뉊뉚뉪뉺늊늚늪늺닊닚닪닺

A

B20A

B21A

B22A

B23A

B24A

B25A

B26A

B27A

B28A

B29A

B2AA

B2BA

B2CA

B2DA

B2EA

B2FA

눋눛눫눻뉋뉛뉫뉻늋늛늫늻닋닛닫닻

B

B20B

C

D

B21B

B22B

B23B

B24B

B25B

B26B

B27B

B28B

B29B

B2AB

B2BB

B2CB

B2DB

B2EB

B2FB

눌눜눬눼뉌뉜뉬뉼늌늜늬늼닌닜달닼 B20C

B21C

B22C

B23C

B24C

B25C

B26C

B27C

B28C

B29C

B2AC

B2BC

B2CC

B2DC

B2EC

B2FC

눍눝눭눽뉍뉝뉭뉽늍늝늭늽닍닝닭닽 B20D

B21D

B22D

B23D

B24D

B25D

B26D

B27D

B28D

B29D

B2AD

B2BD

B2CD

B2DD

B2ED

B2FD

눎눞눮눾뉎뉞뉮뉾늎늞늮늾닎닞닮닾 B20E

F

B220

눁눑눡눱뉁뉑뉡뉱늁늑늡늱닁닑닡닱

1

E

B210

B21E

B22E

B23E

B24E

B25E

B26E

B27E

B28E

B29E

B2AE

B2BE

B2CE

B2DE

B2EE

B2FE

눏눟눯눿뉏뉟뉯뉿늏늟늯늿닏닟닯닿 B20F

426

B21F

B22F

B23F

B24F

B25F

B26F

B27F

B28F

B29F

B2AF

B2BF

B2CF

B2DF

B2EF

B2FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

B300

Hangul Syllables

B3FF

B30 B31 B32 B33 B34 B35 B36 B37 B38 B39 B3A B3B B3C B3D B3E B3F

대댐댠댰덀덐덠데뎀뎐뎠뎰돀돐돠돰

0

B300

B301

B330

B340

B350

B360

B370

B380

B390

B3A0

B3B0

B3C0

B3D0

B3E0

B3F0

B311

B321

B331

B341

B351

B361

B371

B381

B391

B3A1

B3B1

B3C1

B3D1

B3E1

B3F1

댂댒댢댲덂덒덢덲뎂뎒뎢뎲돂돒돢돲

2

B302

B312

B322

B332

B342

B352

B362

B372

B382

B392

B3A2

B3B2

B3C2

B3D2

B3E2

B3F2

댃댓댣댳덃덓덣덳뎃뎓뎣뎳돃돓돣돳

3

B303

B313

B323

B333

B343

B353

B363

B373

B383

B393

B3A3

B3B3

B3C3

B3D3

B3E3

B3F3

댄댔댤댴덄더덤덴뎄뎔뎤뎴도돔돤돴

4

B304

B314

B324

B334

B344

B354

B364

B374

B384

B394

B3A4

B3B4

B3C4

B3D4

B3E4

B3F4

댅댕댥댵덅덕덥덵뎅뎕뎥뎵독돕돥돵

5

B305

B315

B325

B335

B345

B355

B365

B375

B385

B395

B3A5

B3B5

B3C5

B3D5

B3E5

B3F5

댆댖댦댶덆덖덦덶뎆뎖뎦뎶돆돖돦돶

6

B306

B316

B326

B336

B346

B356

B366

B376

B386

B396

B3A6

B3B6

B3C6

B3D6

B3E6

B3F6

댇댗댧댷덇덗덧덷뎇뎗뎧뎷돇돗돧돷

7

B307

B317

B327

B337

B347

B357

B367

B377

B387

B397

B3A7

B3B7

B3C7

B3D7

B3E7

B3F7

댈댘댨댸덈던덨델뎈뎘뎨뎸돈돘돨돸

8

B308

B318

B328

B338

B348

B358

B368

B378

B388

B398

B3A8

B3B8

B3C8

B3D8

B3E8

B3F8

댉댙댩댹덉덙덩덹뎉뎙뎩뎹돉동돩돹

9

B309

B319

B329

B339

B349

B359

B369

B379

B389

B399

B3A9

B3B9

B3C9

B3D9

B3E9

B3F9

댊댚댪댺덊덚덪덺뎊뎚뎪뎺돊돚돪돺

A

B30A

B31A

B32A

B33A

B34A

B35A

B36A

B37A

B38A

B39A

B3AA

B3BA

B3CA

B3DA

B3EA

B3FA

댋댛댫댻덋덛덫덻뎋뎛뎫뎻돋돛돫돻

B

B30B

C

D

B31B

B32B

B33B

B34B

B35B

B36B

B37B

B38B

B39B

B3AB

B3BB

B3CB

B3DB

B3EB

B3FB

댌댜댬댼덌덜덬덼뎌뎜뎬뎼돌돜돬돼 B30C

B31C

B32C

B33C

B34C

B35C

B36C

B37C

B38C

B39C

B3AC

B3BC

B3CC

B3DC

B3EC

B3FC

댍댝댭댽덍덝덭덽뎍뎝뎭뎽돍돝돭돽 B30D

B31D

B32D

B33D

B34D

B35D

B36D

B37D

B38D

B39D

B3AD

B3BD

B3CD

B3DD

B3ED

B3FD

댎댞댮댾덎덞덮덾뎎뎞뎮뎾돎돞돮돾 B30E

F

B320

댁댑댡댱덁덑덡덱뎁뎑뎡뎱돁돑돡돱

1

E

B310

B31E

B32E

B33E

B34E

B35E

B36E

B37E

B38E

B39E

B3AE

B3BE

B3CE

B3DE

B3EE

B3FE

댏댟댯댿덏덟덯덿뎏뎟뎯뎿돏돟돯돿 B30F

B31F

B32F

B33F

B34F

B35F

B36F

B37F

B38F

B39F

B3AF

B3BF

B3CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

B3DF

B3EF

B3FF

427

B400

Hangul Syllables

B4FF

B40 B41 B42 B43 B44 B45 B46 B47 B48 B49 B4A B4B B4C B4D B4E B4F

됀됐될됰둀두둠둰뒀뒐뒠뒰듀듐든듰

0

B400

B401

B430

B440

B450

B460

B470

B480

B490

B4A0

B4B0

B4C0

B4D0

B4E0

B4F0

B411

B421

B431

B441

B451

B461

B471

B481

B491

B4A1

B4B1

B4C1

B4D1

B4E1

B4F1

됂됒됢됲둂둒둢둲뒂뒒뒢뒲듂듒듢듲

2

B402

B412

B422

B432

B442

B452

B462

B472

B482

B492

B4A2

B4B2

B4C2

B4D2

B4E2

B4F2

됃됓됣됳둃둓둣둳뒃뒓뒣뒳듃듓듣듳

3

B403

B413

B423

B433

B443

B453

B463

B473

B483

B493

B4A3

B4B3

B4C3

B4D3

B4E3

B4F3

됄됔됤됴둄둔둤둴뒄뒔뒤뒴듄듔들듴

4

B404

B414

B424

B434

B444

B454

B464

B474

B484

B494

B4A4

B4B4

B4C4

B4D4

B4E4

B4F4

됅됕됥됵둅둕둥둵뒅뒕뒥뒵듅듕듥듵

5

B405

B415

B425

B435

B445

B455

B465

B475

B485

B495

B4A5

B4B5

B4C5

B4D5

B4E5

B4F5

됆됖됦됶둆둖둦둶뒆뒖뒦뒶듆듖듦듶

6

B406

B416

B426

B436

B446

B456

B466

B476

B486

B496

B4A6

B4B6

B4C6

B4D6

B4E6

B4F6

됇됗됧됷둇둗둧둷뒇뒗뒧뒷듇듗듧듷

7

B407

B417

B427

B437

B447

B457

B467

B477

B487

B497

B4A7

B4B7

B4C7

B4D7

B4E7

B4F7

됈되됨됸둈둘둨둸뒈뒘뒨뒸듈듘듨듸

8

B408

B418

B428

B438

B448

B458

B468

B478

B488

B498

B4A8

B4B8

B4C8

B4D8

B4E8

B4F8

됉됙됩됹둉둙둩둹뒉뒙뒩뒹듉듙듩듹

9

B409

B419

B429

B439

B449

B459

B469

B479

B489

B499

B4A9

B4B9

B4C9

B4D9

B4E9

B4F9

됊됚됪됺둊둚둪둺뒊뒚뒪뒺듊듚듪듺

A

B40A

B41A

B42A

B43A

B44A

B45A

B46A

B47A

B48A

B49A

B4AA

B4BA

B4CA

B4DA

B4EA

B4FA

됋됛됫됻둋둛둫둻뒋뒛뒫뒻듋듛듫듻

B

B40B

C

D

B41B

B42B

B43B

B44B

B45B

B46B

B47B

B48B

B49B

B4AB

B4BB

B4CB

B4DB

B4EB

B4FB

됌된됬됼둌둜둬둼뒌뒜뒬뒼듌드듬듼 B40C

B41C

B42C

B43C

B44C

B45C

B46C

B47C

B48C

B49C

B4AC

B4BC

B4CC

B4DC

B4EC

B4FC

됍됝됭됽둍둝둭둽뒍뒝뒭뒽듍득듭듽 B40D

B41D

B42D

B43D

B44D

B45D

B46D

B47D

B48D

B49D

B4AD

B4BD

B4CD

B4DD

B4ED

B4FD

됎됞됮됾둎둞둮둾뒎뒞뒮뒾듎듞듮듾 B40E

F

B420

됁됑됡됱둁둑둡둱뒁뒑뒡뒱듁듑듡등

1

E

B410

B41E

B42E

B43E

B44E

B45E

B46E

B47E

B48E

B49E

B4AE

B4BE

B4CE

B4DE

B4EE

B4FE

됏됟됯됿둏둟둯둿뒏뒟뒯뒿듏듟듯듿 B40F

428

B41F

B42F

B43F

B44F

B45F

B46F

B47F

B48F

B49F

B4AF

B4BF

B4CF

B4DF

B4EF

B4FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

B500

Hangul Syllables

B5FF

B50 B51 B52 B53 B54 B55 B56 B57 B58 B59 B5A B5B B5C B5D B5E B5F

딀딐딠따땀땐땠땰떀떐떠떰뗀뗐뗠뗰

0

B500

B501

B530

B540

B550

B560

B570

B580

B590

B5A0

B5B0

B5C0

B5D0

B5E0

B5F0

B511

B521

B531

B541

B551

B561

B571

B581

B591

B5A1

B5B1

B5C1

B5D1

B5E1

B5F1

딂딒딢딲땂땒땢땲떂떒떢떲뗂뗒뗢뗲

2

B502

B512

B522

B532

B542

B552

B562

B572

B582

B592

B5A2

B5B2

B5C2

B5D2

B5E2

B5F2

딃딓딣딳땃땓땣땳떃떓떣떳뗃뗓뗣뗳

3

B503

B513

B523

B533

B543

B553

B563

B573

B583

B593

B5A3

B5B3

B5C3

B5D3

B5E3

B5F3

딄디딤딴땄땔땤땴떄떔떤떴뗄뗔뗤뗴

4

B504

B514

B524

B534

B544

B554

B564

B574

B584

B594

B5A4

B5B4

B5C4

B5D4

B5E4

B5F4

딅딕딥딵땅땕땥땵떅떕떥떵뗅뗕뗥뗵

5

B505

B515

B525

B535

B545

B555

B565

B575

B585

B595

B5A5

B5B5

B5C5

B5D5

B5E5

B5F5

딆딖딦딶땆땖땦땶떆떖떦떶뗆뗖뗦뗶

6

B506

B516

B526

B536

B546

B556

B566

B576

B586

B596

B5A6

B5B6

B5C6

B5D6

B5E6

B5F6

딇딗딧딷땇땗땧땷떇떗떧떷뗇뗗뗧뗷

7

B507

B517

B527

B537

B547

B557

B567

B577

B587

B597

B5A7

B5B7

B5C7

B5D7

B5E7

B5F7

딈딘딨딸땈땘땨땸떈떘떨떸뗈뗘뗨뗸

8

B508

B518

B528

B538

B548

B558

B568

B578

B588

B598

B5A8

B5B8

B5C8

B5D8

B5E8

B5F8

딉딙딩딹땉땙땩땹떉떙떩떹뗉뗙뗩뗹

9

B509

B519

B529

B539

B549

B559

B569

B579

B589

B599

B5A9

B5B9

B5C9

B5D9

B5E9

B5F9

딊딚딪딺땊땚땪땺떊떚떪떺뗊뗚뗪뗺

A

B50A

B51A

B52A

B53A

B54A

B55A

B56A

B57A

B58A

B59A

B5AA

B5BA

B5CA

B5DA

B5EA

B5FA

딋딛딫딻땋땛땫땻떋떛떫떻뗋뗛뗫뗻

B

B50B

C

D

B51B

B52B

B53B

B54B

B55B

B56B

B57B

B58B

B59B

B5AB

B5BB

B5CB

B5DB

B5EB

B5FB

딌딜딬딼때땜땬땼떌떜떬떼뗌뗜뗬뗼 B50C

B51C

B52C

B53C

B54C

B55C

B56C

B57C

B58C

B59C

B5AC

B5BC

B5CC

B5DC

B5EC

B5FC

딍딝딭딽땍땝땭땽떍떝떭떽뗍뗝뗭뗽 B50D

B51D

B52D

B53D

B54D

B55D

B56D

B57D

B58D

B59D

B5AD

B5BD

B5CD

B5DD

B5ED

B5FD

딎딞딮딾땎땞땮땾떎떞떮떾뗎뗞뗮뗾 B50E

F

B520

딁딑딡딱땁땑땡땱떁떑떡떱뗁뗑뗡뗱

1

E

B510

B51E

B52E

B53E

B54E

B55E

B56E

B57E

B58E

B59E

B5AE

B5BE

B5CE

B5DE

B5EE

B5FE

딏딟딯딿땏땟땯땿떏떟떯떿뗏뗟뗯뗿 B50F

B51F

B52F

B53F

B54F

B55F

B56F

B57F

B58F

B59F

B5AF

B5BF

B5CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

B5DF

B5EF

B5FF

429

B600

Hangul Syllables

B6FF

B60 B61 B62 B63 B64 B65 B66 B67 B68 B69 B6A B6B B6C B6D B6E B6F

똀또똠똰뙀뙐뙠뙰뚀뚐뚠뚰뛀뛐뛠뛰

0

B600

B601

B630

B640

B650

B660

B670

B680

B690

B6A0

B6B0

B6C0

B6D0

B6E0

B6F0

B611

B621

B631

B641

B651

B661

B671

B681

B691

B6A1

B6B1

B6C1

B6D1

B6E1

B6F1

똂똒똢똲뙂뙒뙢뙲뚂뚒뚢뚲뛂뛒뛢뛲

2

B602

B612

B622

B632

B642

B652

B662

B672

B682

B692

B6A2

B6B2

B6C2

B6D2

B6E2

B6F2

똃똓똣똳뙃뙓뙣뙳뚃뚓뚣뚳뛃뛓뛣뛳

3

B603

B613

B623

B633

B643

B653

B663

B673

B683

B693

B6A3

B6B3

B6C3

B6D3

B6E3

B6F3

똄똔똤똴뙄뙔뙤뙴뚄뚔뚤뚴뛄뛔뛤뛴

4

B604

B614

B624

B634

B644

B654

B664

B674

B684

B694

B6A4

B6B4

B6C4

B6D4

B6E4

B6F4

똅똕똥똵뙅뙕뙥뙵뚅뚕뚥뚵뛅뛕뛥뛵

5

B605

B615

B625

B635

B645

B655

B665

B675

B685

B695

B6A5

B6B5

B6C5

B6D5

B6E5

B6F5

똆똖똦똶뙆뙖뙦뙶뚆뚖뚦뚶뛆뛖뛦뛶

6

B606

B616

B626

B636

B646

B656

B666

B676

B686

B696

B6A6

B6B6

B6C6

B6D6

B6E6

B6F6

똇똗똧똷뙇뙗뙧뙷뚇뚗뚧뚷뛇뛗뛧뛷

7

B607

B617

B627

B637

B647

B657

B667

B677

B687

B697

B6A7

B6B7

B6C7

B6D7

B6E7

B6F7

똈똘똨똸뙈뙘뙨뙸뚈뚘뚨뚸뛈뛘뛨뛸

8

B608

B618

B628

B638

B648

B658

B668

B678

B688

B698

B6A8

B6B8

B6C8

B6D8

B6E8

B6F8

똉똙똩똹뙉뙙뙩뙹뚉뚙뚩뚹뛉뛙뛩뛹

9

B609

B619

B629

B639

B649

B659

B669

B679

B689

B699

B6A9

B6B9

B6C9

B6D9

B6E9

B6F9

똊똚똪똺뙊뙚뙪뙺뚊뚚뚪뚺뛊뛚뛪뛺

A

B60A

B61A

B62A

B63A

B64A

B65A

B66A

B67A

B68A

B69A

B6AA

B6BA

B6CA

B6DA

B6EA

B6FA

똋똛똫똻뙋뙛뙫뙻뚋뚛뚫뚻뛋뛛뛫뛻

B

B60B

C

D

B61B

B62B

B63B

B64B

B65B

B66B

B67B

B68B

B69B

B6AB

B6BB

B6CB

B6DB

B6EB

B6FB

똌똜똬똼뙌뙜뙬뙼뚌뚜뚬뚼뛌뛜뛬뛼 B60C

B61C

B62C

B63C

B64C

B65C

B66C

B67C

B68C

B69C

B6AC

B6BC

B6CC

B6DC

B6EC

B6FC

똍똝똭똽뙍뙝뙭뙽뚍뚝뚭뚽뛍뛝뛭뛽 B60D

B61D

B62D

B63D

B64D

B65D

B66D

B67D

B68D

B69D

B6AD

B6BD

B6CD

B6DD

B6ED

B6FD

똎똞똮똾뙎뙞뙮뙾뚎뚞뚮뚾뛎뛞뛮뛾 B60E

F

B620

똁똑똡똱뙁뙑뙡뙱뚁뚑뚡뚱뛁뛑뛡뛱

1

E

B610

B61E

B62E

B63E

B64E

B65E

B66E

B67E

B68E

B69E

B6AE

B6BE

B6CE

B6DE

B6EE

B6FE

똏똟똯똿뙏뙟뙯뙿뚏뚟뚯뚿뛏뛟뛯뛿 B60F

430

B61F

B62F

B63F

B64F

B65F

B66F

B67F

B68F

B69F

B6AF

B6BF

B6CF

B6DF

B6EF

B6FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

B700

Hangul Syllables

B7FF

B70 B71 B72 B73 B74 B75 B76 B77 B78 B79 B7A B7B B7C B7D B7E B7F

뜀뜐뜠뜰띀띐띠띰란랐랠랰럀럐럠런

0

B700

B701

B730

B740

B750

B760

B770

B780

B790

B7A0

B7B0

B7C0

B7D0

B7E0

B7F0

B711

B721

B731

B741

B751

B761

B771

B781

B791

B7A1

B7B1

B7C1

B7D1

B7E1

B7F1

뜂뜒뜢뜲띂띒띢띲랂랒랢랲럂럒럢럲

2

B702

B712

B722

B732

B742

B752

B762

B772

B782

B792

B7A2

B7B2

B7C2

B7D2

B7E2

B7F2

뜃뜓뜣뜳띃띓띣띳랃랓랣랳럃럓럣럳

3

B703

B713

B723

B733

B743

B753

B763

B773

B783

B793

B7A3

B7B3

B7C3

B7D3

B7E3

B7F3

뜄뜔뜤뜴띄띔띤띴랄랔랤랴럄럔럤럴

4

B704

B714

B724

B734

B744

B754

B764

B774

B784

B794

B7A4

B7B4

B7C4

B7D4

B7E4

B7F4

뜅뜕뜥뜵띅띕띥띵랅랕랥략럅럕럥럵

5

B705

B715

B725

B735

B745

B755

B765

B775

B785

B795

B7A5

B7B5

B7C5

B7D5

B7E5

B7F5

뜆뜖뜦뜶띆띖띦띶랆랖랦랶럆럖럦럶

6

B706

B716

B726

B736

B746

B756

B766

B776

B786

B796

B7A6

B7B6

B7C6

B7D6

B7E6

B7F6

뜇뜗뜧뜷띇띗띧띷랇랗랧랷럇럗럧럷

7

B707

B717

B727

B737

B747

B757

B767

B777

B787

B797

B7A7

B7B7

B7C7

B7D7

B7E7

B7F7

뜈뜘뜨뜸띈띘띨띸랈래램랸럈럘럨럸

8

B708

B718

B728

B738

B748

B758

B768

B778

B788

B798

B7A8

B7B8

B7C8

B7D8

B7E8

B7F8

뜉뜙뜩뜹띉띙띩띹랉랙랩랹량럙럩럹

9

B709

B719

B729

B739

B749

B759

B769

B779

B789

B799

B7A9

B7B9

B7C9

B7D9

B7E9

B7F9

뜊뜚뜪뜺띊띚띪띺랊랚랪랺럊럚럪럺

A

B70A

B71A

B72A

B73A

B74A

B75A

B76A

B77A

B78A

B79A

B7AA

B7BA

B7CA

B7DA

B7EA

B7FA

뜋뜛뜫뜻띋띛띫띻랋랛랫랻럋럛럫럻

B

B70B

C

D

B71B

B72B

B73B

B74B

B75B

B76B

B77B

B78B

B79B

B7AB

B7BB

B7CB

B7DB

B7EB

B7FB

뜌뜜뜬뜼띌띜띬라람랜랬랼럌럜러럼 B70C

B71C

B72C

B73C

B74C

B75C

B76C

B77C

B78C

B79C

B7AC

B7BC

B7CC

B7DC

B7EC

B7FC

뜍뜝뜭뜽띍띝띭락랍랝랭랽럍럝럭럽 B70D

B71D

B72D

B73D

B74D

B75D

B76D

B77D

B78D

B79D

B7AD

B7BD

B7CD

B7DD

B7ED

B7FD

뜎뜞뜮뜾띎띞띮띾랎랞랮랾럎럞럮럾 B70E

F

B720

뜁뜑뜡뜱띁띑띡띱랁랑랡랱럁럑럡럱

1

E

B710

B71E

B72E

B73E

B74E

B75E

B76E

B77E

B78E

B79E

B7AE

B7BE

B7CE

B7DE

B7EE

B7FE

뜏뜟뜯뜿띏띟띯띿랏랟랯랿럏럟럯럿 B70F

B71F

B72F

B73F

B74F

B75F

B76F

B77F

B78F

B79F

B7AF

B7BF

B7CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

B7DF

B7EF

B7FF

431

B800

Hangul Syllables

B8FF

B80 B81 B82 B83 B84 B85 B86 B87 B88 B89 B8A B8B B8C B8D B8E B8F

렀렐렠렰례롐론롰뢀뢐뢠뢰룀룐룠룰

0

B800

B801

B830

B840

B850

B860

B870

B880

B890

B8A0

B8B0

B8C0

B8D0

B8E0

B8F0

B811

B821

B831

B841

B851

B861

B871

B881

B891

B8A1

B8B1

B8C1

B8D1

B8E1

B8F1

렂렒렢렲롂롒롢롲뢂뢒뢢뢲룂룒룢룲

2

B802

B812

B822

B832

B842

B852

B862

B872

B882

B892

B8A2

B8B2

B8C2

B8D2

B8E2

B8F2

렃렓렣렳롃롓롣롳뢃뢓뢣뢳룃룓룣룳

3

B803

B813

B823

B833

B843

B853

B863

B873

B883

B893

B8A3

B8B3

B8C3

B8D3

B8E3

B8F3

렄렔려렴롄롔롤롴뢄뢔뢤뢴룄룔룤룴

4

B804

B814

B824

B834

B844

B854

B864

B874

B884

B894

B8A4

B8B4

B8C4

B8D4

B8E4

B8F4

렅렕력렵롅롕롥롵뢅뢕뢥뢵룅룕룥룵

5

B805

B815

B825

B835

B845

B855

B865

B875

B885

B895

B8A5

B8B5

B8C5

B8D5

B8E5

B8F5

렆렖렦렶롆롖롦롶뢆뢖뢦뢶룆룖룦룶

6

B806

B816

B826

B836

B846

B856

B866

B876

B886

B896

B8A6

B8B6

B8C6

B8D6

B8E6

B8F6

렇렗렧렷롇롗롧롷뢇뢗뢧뢷룇룗룧룷

7

B807

B817

B827

B837

B847

B857

B867

B877

B887

B897

B8A7

B8B7

B8C7

B8D7

B8E7

B8F7

레렘련렸롈롘롨롸뢈뢘뢨뢸룈룘루룸

8

B808

B818

B828

B838

B848

B858

B868

B878

B888

B898

B8A8

B8B8

B8C8

B8D8

B8E8

B8F8

렉렙렩령롉롙롩롹뢉뢙뢩뢹룉룙룩룹

9

B809

B819

B829

B839

B849

B859

B869

B879

B889

B899

B8A9

B8B9

B8C9

B8D9

B8E9

B8F9

렊렚렪렺롊롚롪롺뢊뢚뢪뢺룊룚룪룺

A

B80A

B81A

B82A

B83A

B84A

B85A

B86A

B87A

B88A

B89A

B8AA

B8BA

B8CA

B8DA

B8EA

B8FA

렋렛렫렻롋롛롫롻뢋뢛뢫뢻룋룛룫룻

B

B80B

C

D

B81B

B82B

B83B

B84B

B85B

B86B

B87B

B88B

B89B

B8AB

B8BB

B8CB

B8DB

B8EB

B8FB

렌렜렬렼롌로롬롼뢌뢜뢬뢼료룜룬룼 B80C

B81C

B82C

B83C

B84C

B85C

B86C

B87C

B88C

B89C

B8AC

B8BC

B8CC

B8DC

B8EC

B8FC

렍렝렭렽롍록롭롽뢍뢝뢭뢽룍룝룭룽 B80D

B81D

B82D

B83D

B84D

B85D

B86D

B87D

B88D

B89D

B8AD

B8BD

B8CD

B8DD

B8ED

B8FD

렎렞렮렾롎롞롮롾뢎뢞뢮뢾룎룞룮룾 B80E

F

B820

렁렑렡렱롁롑롡롱뢁뢑뢡뢱룁룑룡룱

1

E

B810

B81E

B82E

B83E

B84E

B85E

B86E

B87E

B88E

B89E

B8AE

B8BE

B8CE

B8DE

B8EE

B8FE

렏렟렯렿롏롟롯롿뢏뢟뢯뢿룏룟룯룿 B80F

432

B81F

B82F

B83F

B84F

B85F

B86F

B87F

B88F

B89F

B8AF

B8BF

B8CF

B8DF

B8EF

B8FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

B900

Hangul Syllables

B9FF

B90 B91 B92 B93 B94 B95 B96 B97 B98 B99 B9A B9B B9C B9D B9E B9F

뤀뤐뤠뤰륀륐률륰릀릐릠린맀말맠맰

0

B900

B901

B930

B940

B950

B960

B970

B980

B990

B9A0

B9B0

B9C0

B9D0

B9E0

B9F0

B911

B921

B931

B941

B951

B961

B971

B981

B991

B9A1

B9B1

B9C1

B9D1

B9E1

B9F1

뤂뤒뤢뤲륂륒륢륲릂릒릢릲맂맒맢맲

2

B902

B912

B922

B932

B942

B952

B962

B972

B982

B992

B9A2

B9B2

B9C2

B9D2

B9E2

B9F2

뤃뤓뤣뤳륃륓륣륳릃릓릣릳맃맓맣맳

3

B903

B913

B923

B933

B943

B953

B963

B973

B983

B993

B9A3

B9B3

B9C3

B9D3

B9E3

B9F3

뤄뤔뤤뤴륄륔륤르름릔릤릴맄맔매맴

4

B904

B914

B924

B934

B944

B954

B964

B974

B984

B994

B9A4

B9B4

B9C4

B9D4

B9E4

B9F4

뤅뤕뤥뤵륅륕륥륵릅릕릥릵맅맕맥맵

5

B905

B915

B925

B935

B945

B955

B965

B975

B985

B995

B9A5

B9B5

B9C5

B9D5

B9E5

B9F5

뤆뤖뤦뤶륆륖륦륶릆릖릦릶맆맖맦맶

6

B906

B916

B926

B936

B946

B956

B966

B976

B986

B996

B9A6

B9B6

B9C6

B9D6

B9E6

B9F6

뤇뤗뤧뤷륇륗륧륷릇릗릧릷맇맗맧맷

7

B907

B917

B927

B937

B947

B957

B967

B977

B987

B997

B9A7

B9B7

B9C7

B9D7

B9E7

B9F7

뤈뤘뤨뤸륈류륨른릈릘릨릸마맘맨맸

8

B908

B918

B928

B938

B948

B958

B968

B978

B988

B998

B9A8

B9B8

B9C8

B9D8

B9E8

B9F8

뤉뤙뤩뤹륉륙륩륹릉릙릩릹막맙맩맹

9

B909

B919

B929

B939

B949

B959

B969

B979

B989

B999

B9A9

B9B9

B9C9

B9D9

B9E9

B9F9

뤊뤚뤪뤺륊륚륪륺릊릚릪릺맊맚맪맺

A

B90A

B91A

B92A

B93A

B94A

B95A

B96A

B97A

B98A

B99A

B9AA

B9BA

B9CA

B9DA

B9EA

B9FA

뤋뤛뤫뤻륋륛륫륻릋릛릫릻맋맛맫맻

B

B90B

C

D

B91B

B92B

B93B

B94B

B95B

B96B

B97B

B98B

B99B

B9AB

B9BB

B9CB

B9DB

B9EB

B9FB

뤌뤜뤬뤼륌륜륬를릌릜리림만맜맬맼 B90C

B91C

B92C

B93C

B94C

B95C

B96C

B97C

B98C

B99C

B9AC

B9BC

B9CC

B9DC

B9EC

B9FC

뤍뤝뤭뤽륍륝륭륽릍릝릭립맍망맭맽 B90D

B91D

B92D

B93D

B94D

B95D

B96D

B97D

B98D

B99D

B9AD

B9BD

B9CD

B9DD

B9ED

B9FD

뤎뤞뤮뤾륎륞륮륾릎릞릮릾많맞맮맾 B90E

F

B920

뤁뤑뤡뤱륁륑륡륱릁릑릡릱링맑맡맱

1

E

B910

B91E

B92E

B93E

B94E

B95E

B96E

B97E

B98E

B99E

B9AE

B9BE

B9CE

B9DE

B9EE

B9FE

뤏뤟뤯뤿륏륟륯륿릏릟릯릿맏맟맯맿 B90F

B91F

B92F

B93F

B94F

B95F

B96F

B97F

B98F

B99F

B9AF

B9BF

B9CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

B9DF

B9EF

B9FF

433

BA00

Hangul Syllables

BAFF

BA0 BA1 BA2 BA3 BA4 BA5 BA6 BA7 BA8 BA9 BAA BAB BAC BAD BAE BAF

먀먐먠먰멀멐멠며몀몐몠몰뫀뫐뫠뫰

0

BA00

BA01

BA30

BA40

BA50

BA60

BA70

BA80

BA90

BAA0

BAB0

BAC0

BAD0

BAE0

BAF0

BA11

BA21

BA31

BA41

BA51

BA61

BA71

BA81

BA91

BAA1

BAB1

BAC1

BAD1

BAE1

BAF1

먂먒먢먲멂멒멢멲몂몒몢몲뫂뫒뫢뫲

2

BA02

BA12

BA22

BA32

BA42

BA52

BA62

BA72

BA82

BA92

BAA2

BAB2

BAC2

BAD2

BAE2

BAF2

먃먓먣먳멃멓멣멳몃몓몣몳뫃뫓뫣뫳

3

BA03

BA13

BA23

BA33

BA43

BA53

BA63

BA73

BA83

BA93

BAA3

BAB3

BAC3

BAD3

BAE3

BAF3

먄먔먤먴멄메멤면몄몔몤몴뫄뫔뫤뫴

4

BA04

BA14

BA24

BA34

BA44

BA54

BA64

BA74

BA84

BA94

BAA4

BAB4

BAC4

BAD4

BAE4

BAF4

먅먕먥먵멅멕멥멵명몕몥몵뫅뫕뫥뫵

5

BA05

BA15

BA25

BA35

BA45

BA55

BA65

BA75

BA85

BA95

BAA5

BAB5

BAC5

BAD5

BAE5

BAF5

먆먖먦먶멆멖멦멶몆몖몦몶뫆뫖뫦뫶

6

BA06

BA16

BA26

BA36

BA46

BA56

BA66

BA76

BA86

BA96

BAA6

BAB6

BAC6

BAD6

BAE6

BAF6

먇먗먧먷멇멗멧멷몇몗몧몷뫇뫗뫧뫷

7

BA07

BA17

BA27

BA37

BA47

BA57

BA67

BA77

BA87

BA97

BAA7

BAB7

BAC7

BAD7

BAE7

BAF7

먈먘먨머멈멘멨멸몈몘모몸뫈뫘뫨뫸

8

BA08

BA18

BA28

BA38

BA48

BA58

BA68

BA78

BA88

BA98

BAA8

BAB8

BAC8

BAD8

BAE8

BAF8

먉먙먩먹멉멙멩멹몉몙목몹뫉뫙뫩뫹

9

BA09

BA19

BA29

BA39

BA49

BA59

BA69

BA79

BA89

BA99

BAA9

BAB9

BAC9

BAD9

BAE9

BAF9

먊먚먪먺멊멚멪멺몊몚몪몺뫊뫚뫪뫺

A

BA0A

BA1A

BA2A

BA3A

BA4A

BA5A

BA6A

BA7A

BA8A

BA9A

BAAA

BABA

BACA

BADA

BAEA

BAFA

먋먛먫먻멋멛멫멻몋몛몫못뫋뫛뫫뫻

B

BA0B

C

D

BA1B

BA2B

BA3B

BA4B

BA5B

BA6B

BA7B

BA8B

BA9B

BAAB

BABB

BACB

BADB

BAEB

BAFB

먌먜먬먼멌멜멬멼몌몜몬몼뫌뫜뫬뫼 BA0C

BA1C

BA2C

BA3C

BA4C

BA5C

BA6C

BA7C

BA8C

BA9C

BAAC

BABC

BACC

BADC

BAEC

BAFC

먍먝먭먽멍멝멭멽몍몝몭몽뫍뫝뫭뫽 BA0D

BA1D

BA2D

BA3D

BA4D

BA5D

BA6D

BA7D

BA8D

BA9D

BAAD

BABD

BACD

BADD

BAED

BAFD

먎먞먮먾멎멞멮멾몎몞몮몾뫎뫞뫮뫾 BA0E

F

BA20

먁먑먡먱멁멑멡멱몁몑몡몱뫁뫑뫡뫱

1

E

BA10

BA1E

BA2E

BA3E

BA4E

BA5E

BA6E

BA7E

BA8E

BA9E

BAAE

BABE

BACE

BADE

BAEE

BAFE

먏먟먯먿멏멟멯멿몏몟몯몿뫏뫟뫯뫿 BA0F

434

BA1F

BA2F

BA3F

BA4F

BA5F

BA6F

BA7F

BA8F

BA9F

BAAF

BABF

BACF

BADF

BAEF

BAFF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

BB00

Hangul Syllables

BBFF

BB0 BB1 BB2 BB3 BB4 BB5 BB6 BB7 BB8 BB9 BBA BBB BBC BBD BBE BBF

묀묐묠묰뭀뭐뭠뭰뮀뮐뮠뮰므믐믠믰

0

BB00

BB01

BB30

BB40

BB50

BB60

BB70

BB80

BB90

BBA0

BBB0

BBC0

BBD0

BBE0

BBF0

BB11

BB21

BB31

BB41

BB51

BB61

BB71

BB81

BB91

BBA1

BBB1

BBC1

BBD1

BBE1

BBF1

묂묒묢묲뭂뭒뭢뭲뮂뮒뮢뮲믂믒믢믲

2

BB02

BB12

BB22

BB32

BB42

BB52

BB62

BB72

BB82

BB92

BBA2

BBB2

BBC2

BBD2

BBE2

BBF2

묃묓묣묳뭃뭓뭣뭳뮃뮓뮣뮳믃믓믣믳

3

BB03

BB13

BB23

BB33

BB43

BB53

BB63

BB73

BB83

BB93

BBA3

BBB3

BBC3

BBD3

BBE3

BBF3

묄묔묤무뭄뭔뭤뭴뮄뮔뮤뮴믄믔믤믴

4

BB04

BB14

BB24

BB34

BB44

BB54

BB64

BB74

BB84

BB94

BBA4

BBB4

BBC4

BBD4

BBE4

BBF4

묅묕묥묵뭅뭕뭥뭵뮅뮕뮥뮵믅믕믥믵

5

BB05

BB15

BB25

BB35

BB45

BB55

BB65

BB75

BB85

BB95

BBA5

BBB5

BBC5

BBD5

BBE5

BBF5

묆묖묦묶뭆뭖뭦뭶뮆뮖뮦뮶믆믖믦믶

6

BB06

BB16

BB26

BB36

BB46

BB56

BB66

BB76

BB86

BB96

BBA6

BBB6

BBC6

BBD6

BBE6

BBF6

묇묗묧묷뭇뭗뭧뭷뮇뮗뮧뮷믇믗믧믷

7

BB07

BB17

BB27

BB37

BB47

BB57

BB67

BB77

BB87

BB97

BBA7

BBB7

BBC7

BBD7

BBE7

BBF7

묈묘묨문뭈뭘뭨뭸뮈뮘뮨뮸믈믘믨미

8

BB08

BB18

BB28

BB38

BB48

BB58

BB68

BB78

BB88

BB98

BBA8

BBB8

BBC8

BBD8

BBE8

BBF8

묉묙묩묹뭉뭙뭩뭹뮉뮙뮩뮹믉믙믩믹

9

BB09

BB19

BB29

BB39

BB49

BB59

BB69

BB79

BB89

BB99

BBA9

BBB9

BBC9

BBD9

BBE9

BBF9

묊묚묪묺뭊뭚뭪뭺뮊뮚뮪뮺믊믚믪믺

A

BB0A

BB1A

BB2A

BB3A

BB4A

BB5A

BB6A

BB7A

BB8A

BB9A

BBAA

BBBA

BBCA

BBDA

BBEA

BBFA

묋묛묫묻뭋뭛뭫뭻뮋뮛뮫뮻믋믛믫믻

B

BB0B

C

D

BB1B

BB2B

BB3B

BB4B

BB5B

BB6B

BB7B

BB8B

BB9B

BBAB

BBBB

BBCB

BBDB

BBEB

BBFB

묌묜묬물뭌뭜뭬뭼뮌뮜뮬뮼믌믜믬민 BB0C

BB1C

BB2C

BB3C

BB4C

BB5C

BB6C

BB7C

BB8C

BB9C

BBAC

BBBC

BBCC

BBDC

BBEC

BBFC

묍묝묭묽뭍뭝뭭뭽뮍뮝뮭뮽믍믝믭믽 BB0D

BB1D

BB2D

BB3D

BB4D

BB5D

BB6D

BB7D

BB8D

BB9D

BBAD

BBBD

BBCD

BBDD

BBED

BBFD

묎묞묮묾뭎뭞뭮뭾뮎뮞뮮뮾믎믞믮믾 BB0E

F

BB20

묁묑묡묱뭁뭑뭡뭱뮁뮑뮡뮱믁믑믡믱

1

E

BB10

BB1E

BB2E

BB3E

BB4E

BB5E

BB6E

BB7E

BB8E

BB9E

BBAE

BBBE

BBCE

BBDE

BBEE

BBFE

묏묟묯묿뭏뭟뭯뭿뮏뮟뮯뮿믏믟믯믿 BB0F

BB1F

BB2F

BB3F

BB4F

BB5F

BB6F

BB7F

BB8F

BB9F

BBAF

BBBF

BBCF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

BBDF

BBEF

BBFF

435

BC00

Hangul Syllables

BCFF

BC0 BC1 BC2 BC3 BC4 BC5 BC6 BC7 BC8 BC9 BCA BCB BCC BCD BCE BCF

밀밐밠배뱀뱐뱠뱰벀벐베벰변볐볠볰

0

BC00

BC01

BC30

BC40

BC50

BC60

BC70

BC80

BC90

BCA0

BCB0

BCC0

BCD0

BCE0

BCF0

BC11

BC21

BC31

BC41

BC51

BC61

BC71

BC81

BC91

BCA1

BCB1

BCC1

BCD1

BCE1

BCF1

밂밒밢밲뱂뱒뱢뱲벂벒벢벲볂볒볢볲

2

BC02

BC12

BC22

BC32

BC42

BC52

BC62

BC72

BC82

BC92

BCA2

BCB2

BCC2

BCD2

BCE2

BCF2

밃밓밣밳뱃뱓뱣뱳벃벓벣벳볃볓볣볳

3

BC03

BC13

BC23

BC33

BC43

BC53

BC63

BC73

BC83

BC93

BCA3

BCB3

BCC3

BCD3

BCE3

BCF3

밄바밤밴뱄뱔뱤뱴버범벤벴별볔볤보

4

BC04

BC14

BC24

BC34

BC44

BC54

BC64

BC74

BC84

BC94

BCA4

BCB4

BCC4

BCD4

BCE4

BCF4

밅박밥밵뱅뱕뱥뱵벅법벥벵볅볕볥복

5

BC05

BC15

BC25

BC35

BC45

BC55

BC65

BC75

BC85

BC95

BCA5

BCB5

BCC5

BCD5

BCE5

BCF5

밆밖밦밶뱆뱖뱦뱶벆벖벦벶볆볖볦볶

6

BC06

BC16

BC26

BC36

BC46

BC56

BC66

BC76

BC86

BC96

BCA6

BCB6

BCC6

BCD6

BCE6

BCF6

밇밗밧밷뱇뱗뱧뱷벇벗벧벷볇볗볧볷

7

BC07

BC17

BC27

BC37

BC47

BC57

BC67

BC77

BC87

BC97

BCA7

BCB7

BCC7

BCD7

BCE7

BCF7

밈반밨밸뱈뱘뱨뱸번벘벨벸볈볘볨본

8

BC08

BC18

BC28

BC38

BC48

BC58

BC68

BC78

BC88

BC98

BCA8

BCB8

BCC8

BCD8

BCE8

BCF8

밉밙방밹뱉뱙뱩뱹벉벙벩벹볉볙볩볹

9

BC09

BC19

BC29

BC39

BC49

BC59

BC69

BC79

BC89

BC99

BCA9

BCB9

BCC9

BCD9

BCE9

BCF9

밊밚밪밺뱊뱚뱪뱺벊벚벪벺볊볚볪볺

A

BC0A

BC1A

BC2A

BC3A

BC4A

BC5A

BC6A

BC7A

BC8A

BC9A

BCAA

BCBA

BCCA

BCDA

BCEA

BCFA

밋받밫밻뱋뱛뱫뱻벋벛벫벻볋볛볫볻

B

BC0B

C

D

BC1B

BC2B

BC3B

BC4B

BC5B

BC6B

BC7B

BC8B

BC9B

BCAB

BCBB

BCCB

BCDB

BCEB

BCFB

밌발밬밼뱌뱜뱬뱼벌벜벬벼볌볜볬볼 BC0C

BC1C

BC2C

BC3C

BC4C

BC5C

BC6C

BC7C

BC8C

BC9C

BCAC

BCBC

BCCC

BCDC

BCEC

BCFC

밍밝밭밽뱍뱝뱭뱽벍벝벭벽볍볝볭볽 BC0D

BC1D

BC2D

BC3D

BC4D

BC5D

BC6D

BC7D

BC8D

BC9D

BCAD

BCBD

BCCD

BCDD

BCED

BCFD

밎밞밮밾뱎뱞뱮뱾벎벞벮벾볎볞볮볾 BC0E

F

BC20

밁밑밡백뱁뱑뱡뱱벁벑벡벱볁병볡볱

1

E

BC10

BC1E

BC2E

BC3E

BC4E

BC5E

BC6E

BC7E

BC8E

BC9E

BCAE

BCBE

BCCE

BCDE

BCEE

BCFE

및밟밯밿뱏뱟뱯뱿벏벟벯벿볏볟볯볿 BC0F

436

BC1F

BC2F

BC3F

BC4F

BC5F

BC6F

BC7F

BC8F

BC9F

BCAF

BCBF

BCCF

BCDF

BCEF

BCFF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

BD00

Hangul Syllables

BDFF

BD0 BD1 BD2 BD3 BD4 BD5 BD6 BD7 BD8 BD9 BDA BDB BDC BDD BDE BDF

봀봐봠봰뵀뵐뵠뵰부붐붠붰뷀뷐뷠뷰

0

BD00

BD01

BD30

BD40

BD50

BD60

BD70

BD80

BD90

BDA0

BDB0

BDC0

BDD0

BDE0

BDF0

BD11

BD21

BD31

BD41

BD51

BD61

BD71

BD81

BD91

BDA1

BDB1

BDC1

BDD1

BDE1

BDF1

봂봒봢봲뵂뵒뵢뵲붂붒붢붲뷂뷒뷢뷲

2

BD02

BD12

BD22

BD32

BD42

BD52

BD62

BD72

BD82

BD92

BDA2

BDB2

BDC2

BDD2

BDE2

BDF2

봃봓봣봳뵃뵓뵣뵳붃붓붣붳뷃뷓뷣뷳

3

BD03

BD13

BD23

BD33

BD43

BD53

BD63

BD73

BD83

BD93

BDA3

BDB3

BDC3

BDD3

BDE3

BDF3

봄봔봤봴뵄뵔뵤뵴분붔붤붴뷄뷔뷤뷴

4

BD04

BD14

BD24

BD34

BD44

BD54

BD64

BD74

BD84

BD94

BDA4

BDB4

BDC4

BDD4

BDE4

BDF4

봅봕봥봵뵅뵕뵥뵵붅붕붥붵뷅뷕뷥뷵

5

BD05

BD15

BD25

BD35

BD45

BD55

BD65

BD75

BD85

BD95

BDA5

BDB5

BDC5

BDD5

BDE5

BDF5

봆봖봦봶뵆뵖뵦뵶붆붖붦붶뷆뷖뷦뷶

6

BD06

BD16

BD26

BD36

BD46

BD56

BD66

BD76

BD86

BD96

BDA6

BDB6

BDC6

BDD6

BDE6

BDF6

봇봗봧봷뵇뵗뵧뵷붇붗붧붷뷇뷗뷧뷷

7

BD07

BD17

BD27

BD37

BD47

BD57

BD67

BD77

BD87

BD97

BDA7

BDB7

BDC7

BDD7

BDE7

BDF7

봈봘봨봸뵈뵘뵨뵸불붘붨붸뷈뷘뷨뷸

8

BD08

BD18

BD28

BD38

BD48

BD58

BD68

BD78

BD88

BD98

BDA8

BDB8

BDC8

BDD8

BDE8

BDF8

봉봙봩봹뵉뵙뵩뵹붉붙붩붹뷉뷙뷩뷹

9

BD09

BD19

BD29

BD39

BD49

BD59

BD69

BD79

BD89

BD99

BDA9

BDB9

BDC9

BDD9

BDE9

BDF9

봊봚봪봺뵊뵚뵪뵺붊붚붪붺뷊뷚뷪뷺

A

BD0A

BD1A

BD2A

BD3A

BD4A

BD5A

BD6A

BD7A

BD8A

BD9A

BDAA

BDBA

BDCA

BDDA

BDEA

BDFA

봋봛봫봻뵋뵛뵫뵻붋붛붫붻뷋뷛뷫뷻

B

BD0B

C

D

BD1B

BD2B

BD3B

BD4B

BD5B

BD6B

BD7B

BD8B

BD9B

BDAB

BDBB

BDCB

BDDB

BDEB

BDFB

봌봜봬봼뵌뵜뵬뵼붌붜붬붼뷌뷜뷬뷼 BD0C

BD1C

BD2C

BD3C

BD4C

BD5C

BD6C

BD7C

BD8C

BD9C

BDAC

BDBC

BDCC

BDDC

BDEC

BDFC

봍봝봭봽뵍뵝뵭뵽붍붝붭붽뷍뷝뷭뷽 BD0D

BD1D

BD2D

BD3D

BD4D

BD5D

BD6D

BD7D

BD8D

BD9D

BDAD

BDBD

BDCD

BDDD

BDED

BDFD

봎봞봮봾뵎뵞뵮뵾붎붞붮붾뷎뷞뷮뷾 BD0E

F

BD20

봁봑봡봱뵁뵑뵡뵱북붑붡붱뷁뷑뷡뷱

1

E

BD10

BD1E

BD2E

BD3E

BD4E

BD5E

BD6E

BD7E

BD8E

BD9E

BDAE

BDBE

BDCE

BDDE

BDEE

BDFE

봏봟봯봿뵏뵟뵯뵿붏붟붯붿뷏뷟뷯뷿 BD0F

BD1F

BD2F

BD3F

BD4F

BD5F

BD6F

BD7F

BD8F

BD9F

BDAF

BDBF

BDCF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

BDDF

BDEF

BDFF

437

BE00

Hangul Syllables

BEFF

BE0 BE1 BE2 BE3 BE4 BE5 BE6 BE7 BE8 BE9 BEA BEB BEC BED BEE BEF

븀븐븠븰빀빐빠빰뺀뺐뺠뺰뻀뻐뻠뻰

0

BE00

BE01

BE30

BE40

BE50

BE60

BE70

BE80

BE90

BEA0

BEB0

BEC0

BED0

BEE0

BEF0

BE11

BE21

BE31

BE41

BE51

BE61

BE71

BE81

BE91

BEA1

BEB1

BEC1

BED1

BEE1

BEF1

븂븒븢븲빂빒빢빲뺂뺒뺢뺲뻂뻒뻢뻲

2

BE02

BE12

BE22

BE32

BE42

BE52

BE62

BE72

BE82

BE92

BEA2

BEB2

BEC2

BED2

BEE2

BEF2

븃븓븣븳빃빓빣빳뺃뺓뺣뺳뻃뻓뻣뻳

3

BE03

BE13

BE23

BE33

BE43

BE53

BE63

BE73

BE83

BE93

BEA3

BEB3

BEC3

BED3

BEE3

BEF3

븄블븤븴비빔빤빴뺄뺔뺤뺴뻄뻔뻤뻴

4

BE04

BE14

BE24

BE34

BE44

BE54

BE64

BE74

BE84

BE94

BEA4

BEB4

BEC4

BED4

BEE4

BEF4

븅븕븥븵빅빕빥빵뺅뺕뺥뺵뻅뻕뻥뻵

5

BE05

BE15

BE25

BE35

BE45

BE55

BE65

BE75

BE85

BE95

BEA5

BEB5

BEC5

BED5

BEE5

BEF5

븆븖븦븶빆빖빦빶뺆뺖뺦뺶뻆뻖뻦뻶

6

BE06

BE16

BE26

BE36

BE46

BE56

BE66

BE76

BE86

BE96

BEA6

BEB6

BEC6

BED6

BEE6

BEF6

븇븗븧븷빇빗빧빷뺇뺗뺧뺷뻇뻗뻧뻷

7

BE07

BE17

BE27

BE37

BE47

BE57

BE67

BE77

BE87

BE97

BEA7

BEB7

BEC7

BED7

BEE7

BEF7

븈븘븨븸빈빘빨빸뺈뺘뺨뺸뻈뻘뻨뻸

8

BE08

BE18

BE28

BE38

BE48

BE58

BE68

BE78

BE88

BE98

BEA8

BEB8

BEC8

BED8

BEE8

BEF8

븉븙븩븹빉빙빩빹뺉뺙뺩뺹뻉뻙뻩뻹

9

BE09

BE19

BE29

BE39

BE49

BE59

BE69

BE79

BE89

BE99

BEA9

BEB9

BEC9

BED9

BEE9

BEF9

븊븚븪븺빊빚빪빺뺊뺚뺪뺺뻊뻚뻪뻺

A

BE0A

BE1A

BE2A

BE3A

BE4A

BE5A

BE6A

BE7A

BE8A

BE9A

BEAA

BEBA

BECA

BEDA

BEEA

BEFA

븋븛븫븻빋빛빫빻뺋뺛뺫뺻뻋뻛뻫뻻

B

BE0B

C

D

BE1B

BE2B

BE3B

BE4B

BE5B

BE6B

BE7B

BE8B

BE9B

BEAB

BEBB

BECB

BEDB

BEEB

BEFB

브븜븬븼빌빜빬빼뺌뺜뺬뺼뻌뻜뻬뻼 BE0C

BE1C

BE2C

BE3C

BE4C

BE5C

BE6C

BE7C

BE8C

BE9C

BEAC

BEBC

BECC

BEDC

BEEC

BEFC

븍븝븭븽빍빝빭빽뺍뺝뺭뺽뻍뻝뻭뻽 BE0D

BE1D

BE2D

BE3D

BE4D

BE5D

BE6D

BE7D

BE8D

BE9D

BEAD

BEBD

BECD

BEDD

BEED

BEFD

븎븞븮븾빎빞빮빾뺎뺞뺮뺾뻎뻞뻮뻾 BE0E

F

BE20

븁븑븡븱빁빑빡빱뺁뺑뺡뺱뻁뻑뻡뻱

1

E

BE10

BE1E

BE2E

BE3E

BE4E

BE5E

BE6E

BE7E

BE8E

BE9E

BEAE

BEBE

BECE

BEDE

BEEE

BEFE

븏븟븯븿빏빟빯빿뺏뺟뺯뺿뻏뻟뻯뻿 BE0F

438

BE1F

BE2F

BE3F

BE4F

BE5F

BE6F

BE7F

BE8F

BE9F

BEAF

BEBF

BECF

BEDF

BEEF

BEFF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

BF00

Hangul Syllables

BFFF

BF0 BF1 BF2 BF3 BF4 BF5 BF6 BF7 BF8 BF9 BFA BFB BFC BFD BFE BFF

뼀뼐뼠뼰뽀뽐뽠뽰뾀뾐뾠뾰뿀뿐뿠뿰

0

BF00

BF01

BF30

BF40

BF50

BF60

BF70

BF80

BF90

BFA0

BFB0

BFC0

BFD0

BFE0

BFF0

BF11

BF21

BF31

BF41

BF51

BF61

BF71

BF81

BF91

BFA1

BFB1

BFC1

BFD1

BFE1

BFF1

뼂뼒뼢뼲뽂뽒뽢뽲뾂뾒뾢뾲뿂뿒뿢뿲

2

BF02

BF12

BF22

BF32

BF42

BF52

BF62

BF72

BF82

BF92

BFA2

BFB2

BFC2

BFD2

BFE2

BFF2

뼃뼓뼣뼳뽃뽓뽣뽳뾃뾓뾣뾳뿃뿓뿣뿳

3

BF03

BF13

BF23

BF33

BF43

BF53

BF63

BF73

BF83

BF93

BFA3

BFB3

BFC3

BFD3

BFE3

BFF3

뼄뼔뼤뼴뽄뽔뽤뽴뾄뾔뾤뾴뿄뿔뿤뿴

4

BF04

BF14

BF24

BF34

BF44

BF54

BF64

BF74

BF84

BF94

BFA4

BFB4

BFC4

BFD4

BFE4

BFF4

뼅뼕뼥뼵뽅뽕뽥뽵뾅뾕뾥뾵뿅뿕뿥뿵

5

BF05

BF15

BF25

BF35

BF45

BF55

BF65

BF75

BF85

BF95

BFA5

BFB5

BFC5

BFD5

BFE5

BFF5

뼆뼖뼦뼶뽆뽖뽦뽶뾆뾖뾦뾶뿆뿖뿦뿶

6

BF06

BF16

BF26

BF36

BF46

BF56

BF66

BF76

BF86

BF96

BFA6

BFB6

BFC6

BFD6

BFE6

BFF6

뼇뼗뼧뼷뽇뽗뽧뽷뾇뾗뾧뾷뿇뿗뿧뿷

7

BF07

BF17

BF27

BF37

BF47

BF57

BF67

BF77

BF87

BF97

BFA7

BFB7

BFC7

BFD7

BFE7

BFF7

뼈뼘뼨뼸뽈뽘뽨뽸뾈뾘뾨뾸뿈뿘뿨뿸

8

BF08

BF18

BF28

BF38

BF48

BF58

BF68

BF78

BF88

BF98

BFA8

BFB8

BFC8

BFD8

BFE8

BFF8

뼉뼙뼩뼹뽉뽙뽩뽹뾉뾙뾩뾹뿉뿙뿩뿹

9

BF09

BF19

BF29

BF39

BF49

BF59

BF69

BF79

BF89

BF99

BFA9

BFB9

BFC9

BFD9

BFE9

BFF9

뼊뼚뼪뼺뽊뽚뽪뽺뾊뾚뾪뾺뿊뿚뿪뿺

A

BF0A

BF1A

BF2A

BF3A

BF4A

BF5A

BF6A

BF7A

BF8A

BF9A

BFAA

BFBA

BFCA

BFDA

BFEA

BFFA

뼋뼛뼫뼻뽋뽛뽫뽻뾋뾛뾫뾻뿋뿛뿫뿻

B

BF0B

C

D

BF1B

BF2B

BF3B

BF4B

BF5B

BF6B

BF7B

BF8B

BF9B

BFAB

BFBB

BFCB

BFDB

BFEB

BFFB

뼌뼜뼬뼼뽌뽜뽬뽼뾌뾜뾬뾼뿌뿜뿬뿼 BF0C

BF1C

BF2C

BF3C

BF4C

BF5C

BF6C

BF7C

BF8C

BF9C

BFAC

BFBC

BFCC

BFDC

BFEC

BFFC

뼍뼝뼭뼽뽍뽝뽭뽽뾍뾝뾭뾽뿍뿝뿭뿽 BF0D

BF1D

BF2D

BF3D

BF4D

BF5D

BF6D

BF7D

BF8D

BF9D

BFAD

BFBD

BFCD

BFDD

BFED

BFFD

뼎뼞뼮뼾뽎뽞뽮뽾뾎뾞뾮뾾뿎뿞뿮뿾 BF0E

F

BF20

뼁뼑뼡뼱뽁뽑뽡뽱뾁뾑뾡뾱뿁뿑뿡뿱

1

E

BF10

BF1E

BF2E

BF3E

BF4E

BF5E

BF6E

BF7E

BF8E

BF9E

BFAE

BFBE

BFCE

BFDE

BFEE

BFFE

뼏뼟뼯뼿뽏뽟뽯뽿뾏뾟뾯뾿뿏뿟뿯뿿 BF0F

BF1F

BF2F

BF3F

BF4F

BF5F

BF6F

BF7F

BF8F

BF9F

BFAF

BFBF

BFCF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

BFDF

BFEF

BFFF

439

C000

Hangul Syllables

C0FF

C00 C01 C02 C03 C04 C05 C06 C07 C08 C09 C0A C0B C0C C0D C0E C0F

쀀쀐쀠쀰쁀쁐쁠쁰삀삐삠산샀샐샠샰

0

C000

C001

C030

C040

C050

C060

C070

C080

C090

C0A0

C0B0

C0C0

C0D0

C0E0

C0F0

C011

C021

C031

C041

C051

C061

C071

C081

C091

C0A1

C0B1

C0C1

C0D1

C0E1

C0F1

쀂쀒쀢쀲쁂쁒쁢쁲삂삒삢삲샂샒샢샲

2

C002

C012

C022

C032

C042

C052

C062

C072

C082

C092

C0A2

C0B2

C0C2

C0D2

C0E2

C0F2

쀃쀓쀣쀳쁃쁓쁣쁳삃삓삣삳샃샓샣샳

3

C003

C013

C023

C033

C043

C053

C063

C073

C083

C093

C0A3

C0B3

C0C3

C0D3

C0E3

C0F3

쀄쀔쀤쀴쁄쁔쁤쁴삄삔삤살샄샔샤샴

4

C004

C014

C024

C034

C044

C054

C064

C074

C084

C094

C0A4

C0B4

C0C4

C0D4

C0E4

C0F4

쀅쀕쀥쀵쁅쁕쁥쁵삅삕삥삵샅샕샥샵

5

C005

C015

C025

C035

C045

C055

C065

C075

C085

C095

C0A5

C0B5

C0C5

C0D5

C0E5

C0F5

쀆쀖쀦쀶쁆쁖쁦쁶삆삖삦삶샆샖샦샶

6

C006

C016

C026

C036

C046

C056

C066

C076

C086

C096

C0A6

C0B6

C0C6

C0D6

C0E6

C0F6

쀇쀗쀧쀷쁇쁗쁧쁷삇삗삧삷샇샗샧샷

7

C007

C017

C027

C037

C047

C057

C067

C077

C087

C097

C0A7

C0B7

C0C7

C0D7

C0E7

C0F7

쀈쀘쀨쀸쁈쁘쁨쁸삈삘삨삸새샘샨샸

8

C008

C018

C028

C038

C048

C058

C068

C078

C088

C098

C0A8

C0B8

C0C8

C0D8

C0E8

C0F8

쀉쀙쀩쀹쁉쁙쁩쁹삉삙삩삹색샙샩샹

9

C009

C019

C029

C039

C049

C059

C069

C079

C089

C099

C0A9

C0B9

C0C9

C0D9

C0E9

C0F9

쀊쀚쀪쀺쁊쁚쁪쁺삊삚삪삺샊샚샪샺

A

C00A

C01A

C02A

C03A

C04A

C05A

C06A

C07A

C08A

C09A

C0AA

C0BA

C0CA

C0DA

C0EA

C0FA

쀋쀛쀫쀻쁋쁛쁫쁻삋삛삫삻샋샛샫샻

B

C00B

C

D

C01B

C02B

C03B

C04B

C05B

C06B

C07B

C08B

C09B

C0AB

C0BB

C0CB

C0DB

C0EB

C0FB

쀌쀜쀬쀼쁌쁜쁬쁼삌삜사삼샌샜샬샼 C00C

C01C

C02C

C03C

C04C

C05C

C06C

C07C

C08C

C09C

C0AC

C0BC

C0CC

C0DC

C0EC

C0FC

쀍쀝쀭쀽쁍쁝쁭쁽삍삝삭삽샍생샭샽 C00D

C01D

C02D

C03D

C04D

C05D

C06D

C07D

C08D

C09D

C0AD

C0BD

C0CD

C0DD

C0ED

C0FD

쀎쀞쀮쀾쁎쁞쁮쁾삎삞삮삾샎샞샮샾 C00E

F

C020

쀁쀑쀡쀱쁁쁑쁡쁱삁삑삡삱상샑샡샱

1

E

C010

C01E

C02E

C03E

C04E

C05E

C06E

C07E

C08E

C09E

C0AE

C0BE

C0CE

C0DE

C0EE

C0FE

쀏쀟쀯쀿쁏쁟쁯쁿삏삟삯삿샏샟샯샿 C00F

440

C01F

C02F

C03F

C04F

C05F

C06F

C07F

C08F

C09F

C0AF

C0BF

C0CF

C0DF

C0EF

C0FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

CJK Compatibility Ideographs Range: F900–FAFF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

F900 F90

F900

F92

F93

F94

F95

F96

F97

F98

F99 F9A F9B F9C F9D F9E F9F

F910

F920

F930

F940

F950

F960

F970

F980

F990

F9A0

F9B0

F9C0

F9D0

F9E0

F9F0

更螺嵐櫓論陋率辰女撚說鈴療六李隣

1

F901

F911

F921

F931

F941

F951

F961

F971

F981

F991

F9A1

F9B1

F9C1

F9D1

F9E1

F9F1

車裸濫爐壟勒異沈廬漣廉零蓼戮梨鱗

2

F902

F912

F922

F932

F942

F952

F962

F972

F982

F992

F9A2

F9B2

F9C2

F9D2

F9E2

F9F2

賈邏藍盧弄肋北拾旅煉念靈遼陸泥麟

3

F903

F913

F923

F933

F943

F953

F963

F973

F983

F993

F9A3

F9B3

F9C3

F9D3

F9E3

F9F3

滑樂襤老籠凜磻若濾璉捻領龍倫理林

4

F904

F914

F924

F934

F944

F954

F964

F974

F984

F994

F9A4

F9B4

F9C4

F9D4

F9E4

F9F4

串洛拉蘆聾凌便掠礪秊殮例暈崙痢淋

5

F905

F915

F925

F935

F945

F955

F965

F975

F985

F995

F9A5

F9B5

F9C5

F9D5

F9E5

F9F5

句烙臘虜牢稜復略閭練簾禮阮淪罹臨

6

F906

F916

F926

F936

F946

F956

F966

F976

F986

F996

F9A6

F9B6

F9C6

F9D6

F9E6

F9F6

龜珞蠟路磊綾不亮驪聯獵醴劉輪裏立

7

F907

F917

F927

F937

F947

F957

F967

F977

F987

F997

F9A7

F9B7

F9C7

F9D7

F9E7

F9F7

龜落廊露賂菱泌兩麗輦令隸杻律裡笠

8

F908

F918

F928

F938

F948

F958

F968

F978

F988

F998

F9A8

F9B8

F9C8

F9D8

F9E8

F9F8

契酪朗魯雷陵數凉黎蓮囹惡柳慄里粒

9

F909

F919

F929

F939

F949

F959

F969

F979

F989

F999

F9A9

F9B9

F9C9

F9D9

F9E9

F9F9

金駱浪鷺壘讀索梁力連寧了流栗離狀

A

F90A

F91A

F92A

F93A

F94A

F95A

F96A

F97A

F98A

F99A

F9AA

F9BA

F9CA

F9DA

F9EA

F9FA

喇亂狼碌屢拏參糧曆鍊嶺僚溜率匿炙

B

F90B

C

D

F91B

F92B

F93B

F94B

F95B

F96B

F97B

F98B

F99B

F9AB

F9BB

F9CB

F9DB

F9EB

F9FB

奈卵郎祿樓樂塞良歷列怜寮琉隆溺識 F90C

F91C

F92C

F93C

F94C

F95C

F96C

F97C

F98C

F99C

F9AC

F9BC

F9CC

F9DC

F9EC

F9FC

懶欄來綠淚諾省諒轢劣玲尿留利吝什 F90D

F91D

F92D

F93D

F94D

F95D

F96D

F97D

F98D

F99D

F9AD

F9BD

F9CD

F9DD

F9ED

F9FD

癩爛冷菉漏丹葉量年咽瑩料硫吏燐茶 F90E

F

F91

F9FF

豈蘿鸞擄鹿縷怒殺呂戀裂聆燎類易藺

0

E

CJK Compatibility Ideographs

F91E

F92E

F93E

F94E

F95E

F96E

F97E

F98E

F99E

F9AE

F9BE

F9CE

F9DE

F9EE

F9FE

羅蘭勞錄累寧說勵憐烈羚樂紐履璘刺 F90F

464

F91F

F92F

F93F

F94F

F95F

F96F

F97F

F98F

F99F

F9AF

F9BF

F9CF

F9DF

F9EF

F9FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

FA00

CJK Compatibility Ideographs

FAFF

FA0 FA1 FA2 FA3 FA4 FA5 FA6 FA7 FA8 FA9 FAA FAB FAC FAD FAE FAF

切塚蘒 " 2=M]m}

0

FA00

FA01

FA30

FA40

FA50

FA60

FA70

FA80

FA90

FAA0

FAB0

FAC0

FAD0

FA11

FA21

FA31

FA41

FA51

FA61

FA71

FA81

FA91

FAA1

FAB1

FAC1

FAD1

拓晴諸 $ 4?O_o

2

FA02

FA12

FA22

FA32

FA42

FA52

FA62

FA72

FA82

FA92

FAA2

FAB2

FAC2

FAD2

糖﨓﨣 % 5@P`p

3

FA03

FA13

FA23

FA33

FA43

FA53

FA63

FA73

FA83

FA93

FAA3

FAB3

FAC3

FAD3

宅﨔﨤 & 6AQaq¡

4

FA04

FA14

FA24

FA34

FA44

FA54

FA64

FA74

FA84

FA94

FAA4

FAB4

FAC4

FAD4

洞凞逸 ' 7BRbr¢

5

FA05

FA15

FA25

FA35

FA45

FA55

FA65

FA75

FA85

FA95

FAA5

FAB5

FAC5

FAD5

暴猪都 ( 8CScs£

6

FA06

FA16

FA26

FA36

FA46

FA56

FA66

FA76

FA86

FA96

FAA6

FAB6

FAC6

FAD6

輻益﨧 ) 9DTdt¤

7

FA07

FA17

FA27

FA37

FA47

FA57

FA67

FA77

FA87

FA97

FAA7

FAB7

FAC7

FAD7

行礼﨨 * :EUeu¥

8

FA08

FA18

FA28

FA38

FA48

FA58

FA68

FA78

FA88

FA98

FAA8

FAB8

FAC8

FAD8

降神﨩 + ;FVfv¦

9

FA09

FA19

FA29

FA39

FA49

FA59

FA69

FA79

FA89

FA99

FAA9

FAB9

FAC9

FAD9

見祥飯 ,
A

FA0A

B

C

FA2A

FA3A

FA4A

FA5A

FA6A

FA7A

FA8A

FA9A

FAAA

FABA

FACA

HXhx

兀靖館 .

IYiy

嗀精鶴 /

JZjz

FA0C

D

FA1A

廓福飼 FA0B

FA0D

FA1B

FA1C

FA1D

FA2B

FA2C

FA2D

FA3B

FA3C

FA3D

FA4B

FA4C

FA4D

FA5B

FA5C

FA5D

FA7B

FA7C

FA7D

FA8B

FA8C

FA8D

FA9B

FA9C

FA9D

FAAB

FAAC

FAAD

FABB

FABC

FABD

FACB

FACC

FACD

﨎羽

0

K[k{

﨏﨟

!1

L\l|

FA0E

F

FA20

度﨑﨡 # 3>N^n~

1

E

FA10

FA0F

FA1E

FA1F

FA3E

FA3F

FA4E

FA4F

FA5E

FA5F

FA7E

FA7F

FA8E

FA8F

FA9E

FA9F

FAAE

FAAF

FABE

FABF

FACE

FACF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

465

F900

CJK Compatibility Ideographs

Pronunciation variants from KS X 1001:1998

F900 豈 CJK COMPATIBILITY IDEOGRAPH-F900 ≡ 8C48 豈 F901 更 CJK COMPATIBILITY IDEOGRAPH-F901 ≡ 66F4 更 F902 車 CJK COMPATIBILITY IDEOGRAPH-F902 ≡ 8ECA 車 F903 賈 CJK COMPATIBILITY IDEOGRAPH-F903 ≡ 8CC8 賈 F904 滑 CJK COMPATIBILITY IDEOGRAPH-F904 ≡ 6ED1 滑 F905 串 CJK COMPATIBILITY IDEOGRAPH-F905 ≡ 4E32 串 F906 句 CJK COMPATIBILITY IDEOGRAPH-F906 ≡ 53E5 句 F907 龜 CJK COMPATIBILITY IDEOGRAPH-F907 ≡ 9F9C 龜 F908 龜 CJK COMPATIBILITY IDEOGRAPH-F908 ≡ 9F9C 龜 F909 契 CJK COMPATIBILITY IDEOGRAPH-F909 ≡ 5951 契 F90A 金 CJK COMPATIBILITY IDEOGRAPH-F90A ≡ 91D1 金 F90B 喇 CJK COMPATIBILITY IDEOGRAPH-F90B ≡ 5587 喇 F90C 奈 CJK COMPATIBILITY IDEOGRAPH-F90C ≡ 5948 奈 F90D 懶 CJK COMPATIBILITY IDEOGRAPH-F90D ≡ 61F6 懶 F90E 癩 CJK COMPATIBILITY IDEOGRAPH-F90E ≡ 7669 癩 F90F 羅 CJK COMPATIBILITY IDEOGRAPH-F90F ≡ 7F85 羅 F910 蘿 CJK COMPATIBILITY IDEOGRAPH-F910 ≡ 863F 蘿 F911 螺 CJK COMPATIBILITY IDEOGRAPH-F911 ≡ 87BA 螺 F912 裸 CJK COMPATIBILITY IDEOGRAPH-F912 ≡ 88F8 裸 F913 邏 CJK COMPATIBILITY IDEOGRAPH-F913 ≡ 908F 邏 F914 樂 CJK COMPATIBILITY IDEOGRAPH-F914 ≡ 6A02 樂 F915 洛 CJK COMPATIBILITY IDEOGRAPH-F915 ≡ 6D1B 洛 F916 烙 CJK COMPATIBILITY IDEOGRAPH-F916 ≡ 70D9 烙 F917 珞 CJK COMPATIBILITY IDEOGRAPH-F917 ≡ 73DE 珞 F918 落 CJK COMPATIBILITY IDEOGRAPH-F918 ≡ 843D 落 F919 酪 CJK COMPATIBILITY IDEOGRAPH-F919 ≡ 916A 酪 F91A 駱 CJK COMPATIBILITY IDEOGRAPH-F91A ≡ 99F1 駱 F91B 亂 CJK COMPATIBILITY IDEOGRAPH-F91B ≡ 4E82 亂 F91C 卵 CJK COMPATIBILITY IDEOGRAPH-F91C ≡ 5375 卵

466

F93B

F91D 欄 CJK COMPATIBILITY IDEOGRAPH-F91D ≡ 6B04 欄 F91E 爛 CJK COMPATIBILITY IDEOGRAPH-F91E ≡ 721B 爛 F91F 蘭 CJK COMPATIBILITY IDEOGRAPH-F91F ≡ 862D 蘭 F920 鸞 CJK COMPATIBILITY IDEOGRAPH-F920 ≡ 9E1E 鸞 F921 嵐 CJK COMPATIBILITY IDEOGRAPH-F921 ≡ 5D50 嵐 F922 濫 CJK COMPATIBILITY IDEOGRAPH-F922 ≡ 6FEB 濫 F923 藍 CJK COMPATIBILITY IDEOGRAPH-F923 ≡ 85CD 藍 F924 襤 CJK COMPATIBILITY IDEOGRAPH-F924 ≡ 8964 襤 F925 拉 CJK COMPATIBILITY IDEOGRAPH-F925 ≡ 62C9 拉 F926 臘 CJK COMPATIBILITY IDEOGRAPH-F926 ≡ 81D8 臘 F927 蠟 CJK COMPATIBILITY IDEOGRAPH-F927 ≡ 881F 蠟 F928 廊 CJK COMPATIBILITY IDEOGRAPH-F928 ≡ 5ECA 廊 F929 朗 CJK COMPATIBILITY IDEOGRAPH-F929 ≡ 6717 朗 F92A 浪 CJK COMPATIBILITY IDEOGRAPH-F92A ≡ 6D6A 浪 F92B 狼 CJK COMPATIBILITY IDEOGRAPH-F92B ≡ 72FC 狼 F92C 郎 CJK COMPATIBILITY IDEOGRAPH-F92C ≡ 90CE 郎 F92D 來 CJK COMPATIBILITY IDEOGRAPH-F92D ≡ 4F86 來 F92E 冷 CJK COMPATIBILITY IDEOGRAPH-F92E ≡ 51B7 冷 F92F 勞 CJK COMPATIBILITY IDEOGRAPH-F92F ≡ 52DE 勞 F930 擄 CJK COMPATIBILITY IDEOGRAPH-F930 ≡ 64C4 擄 F931 櫓 CJK COMPATIBILITY IDEOGRAPH-F931 ≡ 6AD3 櫓 F932 爐 CJK COMPATIBILITY IDEOGRAPH-F932 ≡ 7210 爐 F933 盧 CJK COMPATIBILITY IDEOGRAPH-F933 ≡ 76E7 盧 F934 老 CJK COMPATIBILITY IDEOGRAPH-F934 ≡ 8001 老 F935 蘆 CJK COMPATIBILITY IDEOGRAPH-F935 ≡ 8606 蘆 F936 虜 CJK COMPATIBILITY IDEOGRAPH-F936 ≡ 865C 虜 F937 路 CJK COMPATIBILITY IDEOGRAPH-F937 ≡ 8DEF 路 F938 露 CJK COMPATIBILITY IDEOGRAPH-F938 ≡ 9732 露 F939 魯 CJK COMPATIBILITY IDEOGRAPH-F939 ≡ 9B6F 魯 F93A 鷺 CJK COMPATIBILITY IDEOGRAPH-F93A ≡ 9DFA 鷺 F93B 碌 CJK COMPATIBILITY IDEOGRAPH-F93B ≡ 788C 碌

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

F93C

CJK Compatibility Ideographs

F93C 祿 CJK COMPATIBILITY IDEOGRAPH-F93C ≡ 797F 祿 F93D 綠 CJK COMPATIBILITY IDEOGRAPH-F93D ≡ 7DA0 綠 F93E 菉 CJK COMPATIBILITY IDEOGRAPH-F93E ≡ 83C9 菉 F93F 錄 CJK COMPATIBILITY IDEOGRAPH-F93F ≡ 9304 錄 F940 鹿 CJK COMPATIBILITY IDEOGRAPH-F940 ≡ 9E7F 鹿 F941 論 CJK COMPATIBILITY IDEOGRAPH-F941 ≡ 8AD6 論 F942 壟 CJK COMPATIBILITY IDEOGRAPH-F942 ≡ 58DF 壟 F943 弄 CJK COMPATIBILITY IDEOGRAPH-F943 ≡ 5F04 弄 F944 籠 CJK COMPATIBILITY IDEOGRAPH-F944 ≡ 7C60 籠 F945 聾 CJK COMPATIBILITY IDEOGRAPH-F945 ≡ 807E 聾 F946 牢 CJK COMPATIBILITY IDEOGRAPH-F946 ≡ 7262 牢 F947 磊 CJK COMPATIBILITY IDEOGRAPH-F947 ≡ 78CA 磊 F948 賂 CJK COMPATIBILITY IDEOGRAPH-F948 ≡ 8CC2 賂 F949 雷 CJK COMPATIBILITY IDEOGRAPH-F949 ≡ 96F7 雷 F94A 壘 CJK COMPATIBILITY IDEOGRAPH-F94A ≡ 58D8 壘 F94B 屢 CJK COMPATIBILITY IDEOGRAPH-F94B ≡ 5C62 屢 F94C 樓 CJK COMPATIBILITY IDEOGRAPH-F94C ≡ 6A13 樓 F94D 淚 CJK COMPATIBILITY IDEOGRAPH-F94D ≡ 6DDA 淚 F94E 漏 CJK COMPATIBILITY IDEOGRAPH-F94E ≡ 6F0F 漏 F94F 累 CJK COMPATIBILITY IDEOGRAPH-F94F ≡ 7D2F 累 F950 縷 CJK COMPATIBILITY IDEOGRAPH-F950 ≡ 7E37 縷 F951 陋 CJK COMPATIBILITY IDEOGRAPH-F951 ≡ 964B 陋 F952 勒 CJK COMPATIBILITY IDEOGRAPH-F952 ≡ 52D2 勒 F953 肋 CJK COMPATIBILITY IDEOGRAPH-F953 ≡ 808B 肋 F954 凜 CJK COMPATIBILITY IDEOGRAPH-F954 ≡ 51DC 凜 F955 凌 CJK COMPATIBILITY IDEOGRAPH-F955 ≡ 51CC 凌 F956 稜 CJK COMPATIBILITY IDEOGRAPH-F956 ≡ 7A1C 稜 F957 綾 CJK COMPATIBILITY IDEOGRAPH-F957 ≡ 7DBE 綾 F958 菱 CJK COMPATIBILITY IDEOGRAPH-F958 ≡ 83F1 菱 F959 陵 CJK COMPATIBILITY IDEOGRAPH-F959 ≡ 9675 陵 F95A 讀 CJK COMPATIBILITY IDEOGRAPH-F95A ≡ 8B80 讀

F979

F95B 拏 CJK COMPATIBILITY IDEOGRAPH-F95B ≡ 62CF 拏 F95C 樂 CJK COMPATIBILITY IDEOGRAPH-F95C ≡ 6A02 樂 F95D 諾 CJK COMPATIBILITY IDEOGRAPH-F95D ≡ 8AFE 諾 F95E 丹 CJK COMPATIBILITY IDEOGRAPH-F95E ≡ 4E39 丹 F95F 寧 CJK COMPATIBILITY IDEOGRAPH-F95F ≡ 5BE7 寧 F960 怒 CJK COMPATIBILITY IDEOGRAPH-F960 ≡ 6012 怒 F961 率 CJK COMPATIBILITY IDEOGRAPH-F961 ≡ 7387 率 F962 異 CJK COMPATIBILITY IDEOGRAPH-F962 ≡ 7570 異 F963 北 CJK COMPATIBILITY IDEOGRAPH-F963 ≡ 5317 北 F964 磻 CJK COMPATIBILITY IDEOGRAPH-F964 ≡ 78FB 磻 F965 便 CJK COMPATIBILITY IDEOGRAPH-F965 ≡ 4FBF 便 F966 復 CJK COMPATIBILITY IDEOGRAPH-F966 ≡ 5FA9 復 F967 不 CJK COMPATIBILITY IDEOGRAPH-F967 ≡ 4E0D 不 F968 泌 CJK COMPATIBILITY IDEOGRAPH-F968 ≡ 6CCC 泌 F969 數 CJK COMPATIBILITY IDEOGRAPH-F969 ≡ 6578 數 F96A 索 CJK COMPATIBILITY IDEOGRAPH-F96A ≡ 7D22 索 F96B 參 CJK COMPATIBILITY IDEOGRAPH-F96B ≡ 53C3 參 F96C 塞 CJK COMPATIBILITY IDEOGRAPH-F96C ≡ 585E 塞 F96D 省 CJK COMPATIBILITY IDEOGRAPH-F96D ≡ 7701 省 F96E 葉 CJK COMPATIBILITY IDEOGRAPH-F96E ≡ 8449 葉 F96F 說 CJK COMPATIBILITY IDEOGRAPH-F96F ≡ 8AAA 說 F970 殺 CJK COMPATIBILITY IDEOGRAPH-F970 ≡ 6BBA 殺 F971 辰 CJK COMPATIBILITY IDEOGRAPH-F971 ≡ 8FB0 辰 F972 沈 CJK COMPATIBILITY IDEOGRAPH-F972 ≡ 6C88 沈 F973 拾 CJK COMPATIBILITY IDEOGRAPH-F973 ≡ 62FE 拾 F974 若 CJK COMPATIBILITY IDEOGRAPH-F974 ≡ 82E5 若 F975 掠 CJK COMPATIBILITY IDEOGRAPH-F975 ≡ 63A0 掠 F976 略 CJK COMPATIBILITY IDEOGRAPH-F976 ≡ 7565 略 F977 亮 CJK COMPATIBILITY IDEOGRAPH-F977 ≡ 4EAE 亮 F978 兩 CJK COMPATIBILITY IDEOGRAPH-F978 ≡ 5169 兩 F979 凉 CJK COMPATIBILITY IDEOGRAPH-F979 ≡ 51C9 凉

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

467

Alphabetic Presentation Forms Range: FB00–FB4F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

FB00

Alphabetic Presentation Forms FB0 0

FB1

ﬁ

"

ﬂ

FB20

FB21

FB02

3

FB13

FB14

FB15

FB16

FB23

FB24

FB25

FB26

FB44

FB35

FB36

FB46

& FB47

FB38

FB48

(

9

FB29

FB39

FB49

)

A

FB2A

FB3A

FB4A

*

B

FB2B

FB3B

FB4B

+

C

FB2C

FB3C

FB2E

FB1F

FB4D

FB2D

ﬞ

FB1E

FB2F

FB4C

,

FB1D

474

FB34

FB27

FB28

F

FB43

'

8

E

FB33

FB17

D

FB32

% FB06

7

FB41

FB05

6

FB31

FB40

ﬄ $ FB04

5

FB22

FB30

ﬃ # FB03

4

FB4

!

FB01

2

FB3

ﬀ FB00

1

FB2

FB4F

FB3E

FB4E

. FB4F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Arabic Presentation Forms-A Range: FB50–FDFF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

FB50 FB5 0

FB50

1

FB51

2

5

FB5D

E

FB5E

F

$

FB72

FB66

FB73

FB67

FB68

FB69

FB6A

FB6B

FB6C

FB8

2

FB80

FB9

B

FB90

3 C

FB81

4

FB82

FB91

D

FB92

FB93

S

FBA1

q

FBE0

FBB0

c

r

FBE1

FBB1

T

s

FBE2

FBA2

FBA3

d

FBD3

t

FBE3

FB94

FBA4

V

e u

' 7

G

W

f

FB75

(

FB76

FB77

*

FB78

6

FB85

8

FB86

FB7A

FB7B

.

FB7C

/

FB7D

FB6E

FB7E

0

F

FB95

H

FB96

FB7F

FBA5

X

FBA6

9 I Y

FB87

:

FB88

+ ;

FB79

FB6D

FB6F

FBB FBC FBD FBE

R b

FBA0

5 E U

FB83

FB89

<

FB8A

FB97

J

FB98

K

FB99

L

FB9A

FBA7

Z

FBA8

[

FBA9

\

FBAA

= M ]

FB8B

>

FB8C

?

FB8D

@

FB8E

FB9B

N

FB9C

O

FB9D

FB8F

FBAB

^

FBAC

_

FBAD

P `

FB9E

! 1 A Q FB5F

FBA

FB84

&

FB74

FB5C

D

#

FB71

,

FB5B

C

"

FB70

)

FB5A

B

FB65

FB59

A

FB58

9

FB63

FB57

8

FB62

FB64

FB56

7

FB61

FB54

FB55

6

FB60

FB7

% FB53

4

FB6

FB52

3

Arabic Presentation Forms-A

FB9F

FBD4

FBD5

g

FBD6

h

FBD7

i

FBD8

j

FBD9

k

FBDA

l

FBDB

m

FBDC

n

FBDD

o

FBE4

FBF

FBF0

FBF1

FBF2

FBF4

FBF5

FC01

FC02

FC03

FC04

FC05

x

FBE7

y

FBE8

z

FBE9

FBF6

FBF7

FBF8

FC06

FC07

FC08

FC1

FBF9

FC09

|

FBEB

}

FBEC

FBFA

FBFB

FBFC

~

FBED

FBFD

FBEE

FBFE

a

p

FBEF

FBFF

FC0A

FC0B

FC0C

FC0D

FC0E

FC20

FC21

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Á

FC30

FC31

£ ³ Ã

FC12

FC22

FC32

¤ ´ Ä

FC13

FC23

¥ µ

FC14

FC24

FC33

Å

FC34

¦ ¶ Æ

FC15

FC25

FC35

§ · Ç

FC16

FC26

¨ ¸

FC17

FC27

© ¹

FC18

FC19

FC1A

¬

FC1B

FC28

º

FC29

»

FC2A

¼

FC2B

½

FC1C

®

FC1D

¯

FC1E

FC2C

¾

FC2D

¿

FC2E

° À

FC0F

FC3

¢ ² Â

FC11

{ «

FBEA

FC2

¡ ±

FC10

ª

FBDE

FBDF

FC00

w

FBE6

FBAE

FBAF

FC0

FBF3

v

FBE5

FC3F

FC1F

FC2F

FC36

È

FC37

É

FC38

Ê

FC39

Ë

FC3A

Ì

FC3B

Í

FC3C

Î

FC3D

Ï

FC3E

Ð

FC3F

477

FC40 FC4

Ñ

0

FC40

Ò

1

ã

ó

FC7

FC70

FC8

FC9

# FC80

FC90

$

FC71

FC81

FC82

FC91

%

FC92

FC42

FC52

FC62

FC72

Ô

ä

ô

&

Õ

å FC54

FC64

FC74

Ö

æ

ö

FC53

FC45

FC55

×

ç

6

FC63

õ

FC65

÷

FC73

FC75

FC56

FC66

FC76

Ø

è

ø

FC47

FC57

FC67

Ù

é

ù

8

FC48

Ú

FC49

Û

FC4A

Ü

B

FC58

ê

FC59

ë

FC5A

ì

FC4B

FC5B

Ý

í

C

FC68

ú

FC69

û

FC6A

ü

FC6B

ý

FC4C

FC5C

FC6C

Þ

î

þ

D

FC4D

ß

E

FC5D

ï

FC6D

FC4E

FC5E

FC6E

à

ð

FC4F

478

FC5F

FC6F

FC83

FC93

'

FC46

7

F

ò

Ó

5

A

â

ñ

FC60

FC61

FC44

9

á

FC50

FC6

FC51

FC43

4

FC5

FC41

2

3

Arabic Presentation Forms-A

FC77

FC78

FC84

FC85

FC86

FC87

FC88

FC7A

FC7B

FC7C

FC89

FC8A

FC8B

FC7D

FC7E

FC7F

C

4

FCA1

5

FCA2

FCB0

D

FCB1

E

FCB2

6 F

FCA3

7

FCB3

G

S

FCC0

T

FCC1

U

FCC2

V

FCC3

W

FCD

c

FCD0

d

FCD1

e

FCD2

f

FCD3

g

FCE

s

FCE0

t

FCE1

u

FCE2

v

FCE3

w

FD0

FD1

£

¤

¥

FCF0

FCF1

FCF2

FD00

FD01

FD02

FD10

FD11

FD12

¦

FCF3

FD03

FD13

§

FCF4

FD04

FD14

FCC4

FCD4

FCE4

(

8

H

X

h

x

¨

y

©

FC95

)

FC96

*

FC97

+

FC98

FC99

-

FC9A

.

FC9B

/

FC8D

FC9D

0 1

FCA5

9

FCB5

I

FCA6

FCB6

:

J

FCA7

;

FCB7

K

FCA8

FCB8

<

L

FCA9

=

FCAA

>

FCAB

?

FCAC

@

FCAD

A

FCB9

M

FCBA

N

FCBB

O

FCBC

P

FCBD

Q

FCC5

Y

FCC6

Z

FCC7

[

FCC8

\

FCC9

]

FCCA

^

FCCB

_

FCCC

`

FCCD

a

FCD5

i

FCD6

j

FCD7

FCE5

FCE6

FCE7

l

FCD9

FCE8

FCE9

n

FCDB

o

FCDC

p

FCDD

q

FCEA

FCF7

FD07

FD17

FCF8

FD08

FD18

FCF9

FD09

FD19

FD0A

FD1A

FCFB

FD0B

FD1B

¯

FCEC

FCFC

FD0C

FD1C

°

FCED

FCFD

FD0D

FD1D

¡ ±

FCDE

FCEE

"

2

B

R

b

r

FCDF

FCFA

~ ®

FCEB

FCCE

FCCF

FD16

m }

FCDA

FCBE

FCBF

FD06

FD15

| ¬

FCAE

FCAF

FCF6

FD05

k { «

FCD8

FC9E

FC9F

FCF5

z ª

FC8E

FC8F

FCF

FCB4

FC9C

!

3

FCA0

FCC

FCA4

FC8C

FCB

FC94

,

FC79

FCA

FD1F

FCEF

FCFE

FD0E

FD1E

¢ ² FCFF

FD0F

FD1F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

FD20 FD2 0

FDA

FDB

FDC

FDD

FDE

FDF

Ñ á ñ

! 1 A – ‡ I

Ò â ò

" 2 B — · J

FD50

FD51

FD60

FD61

FD70

FD71

FD62

FD72

FD82

Ô äô

FD53

FD63

FD73

FDA1

FD81

Ó ã ó

FD52

FDA0

FD80

FD83

FDB0

FDB1

FD64

FD74

FD84

Ö

æ

ö

FD55

FD65

FD75

FD85

FDC1

FDD0

FDD1

FDE0

FDE1

FDF0

FDF1

# 3 C “ ‚ K

FD92

FDA2

FDB2

FDC2

FDD2

FDE2

FDF2

$ 4 D ” „ L

FD93

FDA3

FDB3

Õ å õ % 5

FD54

FDC0

FD94

FDA4

FDB4

FDC3

FDD3

FDE3

FDF3

E ‘ ‰ M

FDC4

FDD4

FDE4

FDF4

& 6 F’ Â N FD95

FDA5

FDB5

FDC5

FDD5

FDE5

FDF5

× ç ÷ ' 7 G ÷ ÊO

FD56

Ø

FD57

FD66

FD76

FD86

è ø

FD67

FD77

FD87

FD96

FDA6

FDB6

FDC6

FDD6

FDE6

FDF6

( 8 H ◊ Á P

FD97

FDA7

FDB7

FDC7

FDD7

FDE7

FDF7

Ÿ È R

½ Í

Û ë û + ;

⁄ ÍS

¾ Î

Ü ì ü , <

¤ Î T

FD38

FD39

FD3A

FD3B

¿ Ï À

FD3C

Ð

FD3D

FD58

FD59

FD5A

FD5B

FD68

FD69

FD6A

FD6B

FD78

FD79

FD7A

FD7B

Ý í ý

FD5C

FD6C

FD7C

FD88

FD89

FD8A

FD8B

FD98

FD99

FD9A

FD9B

FDA8

FDA9

FDAA

FDAB

-

FD8C

FD9C

FDAC

Þ î þ .

FD5D

FD6D

FD7D

FD8D

FD9D

FDAD

FDB8

FDB9

FDBA

FDBB

FDD8

FDD9

FDDA

FDDB

FDE8

FDE9

FDEA

FDEB

FDDD

FDED

à ð

0 @

ﬂ Ô

FD2F

FD3F

FD5F

FD6F

FD7E

FD7F

FD8E

FD8F

FD9E

FD9F

FDAE

FDAF

FDFB

› Ìs

FDEC

Â

FD6E

FDFA

>

FDBD

FDDC

ﬁ Ó

FD5E

FDF9

‹ Ï U

ß ï / ?

FD3E

FDF8

=

FDBC

Á

FD2E

F

FD9

Ú ê ú * :

FD2D

E

FD8

¼ Ì

FD2C

D

FD7

ÿ Ë Q

FD2B

C

FD6

) 9

FD2A

B

FD5

Ù é ù

FD29

A

FD37

FD4

FDFF

» Ë FD28

9

FD36

º Ê FD27

8

FD35

¹ É FD26

7

FD34

¸ È FD25

6

FD33

· Ç FD24

5

FD32

¶ Æ FD23

4

FD31

µ Å FD22

3

FD30

´ Ä FD21

2

FD3

³ Ã FD20

1

Arabic Presentation Forms-A

FDBE

FDBF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

FDDE

FDDF

FDFC

FDFD

FDEE

FDEF

479

FB50

Arabic Presentation Forms-A

Preferred characters are found in the Arabic block 0600 06FF. This block also contains 32 noncharacters in the range FDD0 - FDDF.

Glyphs for contextual forms of letters for Persian, Urdu, Sindhi, etc. FB50 FB51 FB52 FB53 FB54 FB55 FB56 FB57 FB58 FB59 FB5A

ARABIC LETTER ALEF WASLA ISOLATED FORM

! " # $ % & ' ( ) *

FB5B + FB5C FB5D FB5E FB5F FB60 FB61 FB62 FB63 FB64 FB65 FB66 FB67 FB68 FB69 FB6A

480

, . / 0 1 2 3 4 5 6 7 8 9 :

0671

FB6B

ARABIC LETTER VEH FINAL FORM

FB6C

ARABIC LETTER VEH INITIAL FORM

FB6D

FB6E

FB6F

FB70

FB71

FB72

FB73

FB74

FB75

FB76

FB77

FB78

FB79

FB7A

FB7B

FB7C

FB7D

FB7E

FB7F

FB80

FB81

FB82

FB83

FB84

FB85

FB86

FB87

FB88

ARABIC LETTER ALEF WASLA FINAL FORM

0671

ARABIC LETTER BEEH ISOLATED FORM

067B

ARABIC LETTER BEEH FINAL FORM

067B

ARABIC LETTER BEEH INITIAL FORM

067B

ARABIC LETTER BEEH MEDIAL FORM

<medial> 067B

ARABIC LETTER PEH ISOLATED FORM

067E

ARABIC LETTER PEH FINAL FORM

067E

ARABIC LETTER PEH INITIAL FORM

067E

ARABIC LETTER PEH MEDIAL FORM

<medial> 067E

ARABIC LETTER BEHEH ISOLATED FORM

0680

ARABIC LETTER BEHEH FINAL FORM

0680

ARABIC LETTER BEHEH INITIAL FORM

0680

ARABIC LETTER BEHEH MEDIAL FORM

<medial> 0680

ARABIC LETTER TTEHEH ISOLATED FORM

067A

ARABIC LETTER TTEHEH FINAL FORM

067A

ARABIC LETTER TTEHEH INITIAL FORM

067A

ARABIC LETTER TTEHEH MEDIAL FORM

<medial> 067A

ARABIC LETTER TEHEH ISOLATED FORM

067F

ARABIC LETTER TEHEH FINAL FORM

067F

ARABIC LETTER TEHEH INITIAL FORM

067F

ARABIC LETTER TEHEH MEDIAL FORM

<medial> 067F

ARABIC LETTER TTEH ISOLATED FORM

0679

ARABIC LETTER TTEH FINAL FORM

0679

ARABIC LETTER TTEH INITIAL FORM

0679

ARABIC LETTER TTEH MEDIAL FORM

<medial> 0679

ARABIC LETTER VEH ISOLATED FORM

06A4

FB88

06A4

06A4

ARABIC LETTER VEH MEDIAL FORM

<medial> 06A4

ARABIC LETTER PEHEH ISOLATED FORM

06A6

ARABIC LETTER PEHEH FINAL FORM

06A6

ARABIC LETTER PEHEH INITIAL FORM

06A6

ARABIC LETTER PEHEH MEDIAL FORM

<medial> 06A6

ARABIC LETTER DYEH ISOLATED FORM

0684

ARABIC LETTER DYEH FINAL FORM

0684

ARABIC LETTER DYEH INITIAL FORM

0684

ARABIC LETTER DYEH MEDIAL FORM

<medial> 0684

ARABIC LETTER NYEH ISOLATED FORM

0683

ARABIC LETTER NYEH FINAL FORM

0683

ARABIC LETTER NYEH INITIAL FORM

0683

ARABIC LETTER NYEH MEDIAL FORM

<medial> 0683

ARABIC LETTER TCHEH ISOLATED FORM

0686

ARABIC LETTER TCHEH FINAL FORM

0686

ARABIC LETTER TCHEH INITIAL FORM

0686

ARABIC LETTER TCHEH MEDIAL FORM

<medial> 0686

ARABIC LETTER TCHEHEH ISOLATED FORM

0687

ARABIC LETTER TCHEHEH FINAL FORM

0687

ARABIC LETTER TCHEHEH INITIAL FORM

0687

ARABIC LETTER TCHEHEH MEDIAL FORM

<medial> 0687

ARABIC LETTER DDAHAL ISOLATED FORM

068D

ARABIC LETTER DDAHAL FINAL FORM

068D

ARABIC LETTER DAHAL ISOLATED FORM

068C

ARABIC LETTER DAHAL FINAL FORM

068C

ARABIC LETTER DUL ISOLATED FORM

068E

ARABIC LETTER DUL FINAL FORM

068E

ARABIC LETTER DDAL ISOLATED FORM

0688

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

FB89

Arabic Presentation Forms-A

FBE0

FB89

U

ARABIC LETTER DDAL FINAL FORM

FBA6

;

ARABIC LETTER HEH GOAL ISOLATED FORM

FB8A

V

ARABIC LETTER JEH ISOLATED FORM

FBA7

<

ARABIC LETTER HEH GOAL FINAL FORM

FB8B

W

ARABIC LETTER JEH FINAL FORM

FBA8

=

FB8C

X

ARABIC LETTER RREH ISOLATED FORM

FBA9

>

FB8D

Y

ARABIC LETTER RREH FINAL FORM

FBAA

?

FB8E

Z

FBAB

@

FB8F

[

FBAC

A

ARABIC LETTER KEHEH MEDIAL FORM

FBAD

B

ARABIC LETTER GAF ISOLATED FORM

<medial> 06BE FBAE C ARABIC LETTER YEH BARREE ISOLATED

FB90

\

FB91

]

FB92

^

FB93

_

FB94

`

FB95

a

FB96

b

FB97

c

FB98

d

FB99

e

FB9A

f

FB9B g FB9C

h

FB9D

i

FB9E

j

FB9F

k

FBA0

l

FBA1

m

FBA2

n

FBA3

o

FBA4

p

FBA5

q

0688

0698 0698

0691 0691

ARABIC LETTER KEHEH ISOLATED FORM

06A9

ARABIC LETTER KEHEH FINAL FORM

06A9

ARABIC LETTER KEHEH INITIAL FORM

06A9

<medial> 06A9

06AF !

ARABIC LETTER GAF INITIAL FORM

06AF !

ARABIC LETTER GAF MEDIAL FORM

FBAF

D

06B3 "

ARABIC LETTER GUEH INITIAL FORM

06B3 "

ARABIC LETTER GUEH MEDIAL FORM

<medial> 06B3 "

FBB1

F

FBD3

G

ARABIC LETTER NGOEH FINAL FORM

FBD5

I

ARABIC LETTER NGOEH INITIAL FORM

FBD6

J

ARABIC LETTER NGOEH MEDIAL FORM

FBD7

K

ARABIC LETTER NOON GHUNNA ISOLATED FORM

FBD8

L

ARABIC LETTER NOON GHUNNA FINAL FORM

FBD9

M

ARABIC LETTER RNOON ISOLATED FORM

FBDA

N

ARABIC LETTER RNOON FINAL FORM

FBDB

O

ARABIC LETTER RNOON INITIAL FORM

FBDC

P

ARABIC LETTER RNOON MEDIAL FORM

FBDD

Q

ARABIC LETTER HEH WITH YEH ABOVE ISOLATED FORM

FBDE

R

FBDF

S

FBE0

T

06B1 #

<medial> 06B1 #

06BA $ 06BA $

06BB % 06BB %

06BB %

<medial> 06BB %

06C0 &

ARABIC LETTER HEH WITH YEH ABOVE FINAL FORM

06C0 &

<medial> 06C1

ARABIC LETTER HEH DOACHASHMEE ISOLATED FORM

06BE

ARABIC LETTER HEH DOACHASHMEE FINAL FORM

06BE

ARABIC LETTER HEH DOACHASHMEE INITIAL FORM

06BE

ARABIC LETTER HEH DOACHASHMEE MEDIAL FORM

06D2 ARABIC LETTER YEH BARREE FINAL FORM

06D3

ARABIC LETTER YEH BARREE WITH HAMZA ABOVE FINAL FORM

Glyphs for contextual forms of letters for Central Asian languages FBD4 H

06B1 #

ARABIC LETTER HEH GOAL MEDIAL FORM

06D3

ARABIC LETTER NGOEH ISOLATED FORM

06B1 #

06C1

ABOVE ISOLATED FORM

ARABIC LETTER GUEH FINAL FORM

06B3 "

ARABIC LETTER HEH GOAL INITIAL FORM

06D2 FBB0 E ARABIC LETTER YEH BARREE WITH HAMZA

<medial> 06AF !

ARABIC LETTER GUEH ISOLATED FORM

06C1

FORM

ARABIC LETTER GAF FINAL FORM

06AF !

06C1

ARABIC LETTER NG ISOLATED FORM

06AD

ARABIC LETTER NG FINAL FORM

06AD

ARABIC LETTER NG INITIAL FORM

06AD

ARABIC LETTER NG MEDIAL FORM

<medial> 06AD

ARABIC LETTER U ISOLATED FORM

06C7

ARABIC LETTER U FINAL FORM

06C7

ARABIC LETTER OE ISOLATED FORM

06C6

ARABIC LETTER OE FINAL FORM

06C6

ARABIC LETTER YU ISOLATED FORM

06C8

ARABIC LETTER YU FINAL FORM

06C8

ARABIC LETTER U WITH HAMZA ABOVE ISOLATED FORM

0677

ARABIC LETTER VE ISOLATED FORM

06CB

ARABIC LETTER VE FINAL FORM

06CB

ARABIC LETTER KIRGHIZ OE ISOLATED FORM

06C5

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

481

FBE1

Arabic Presentation Forms-A

FBE1

ARABIC LETTER KIRGHIZ OE FINAL FORM

FBE2

ARABIC LETTER KIRGHIZ YU ISOLATED FORM

FBE3

FBE4

FBE5

FBE6

FBE7

FBE8 FBE9

06C5

06C9 2

FBEB

FBEC

FBED FBEE

FBEF

FBF0

FBF1

FBF2

FBF3

FBF4

FBF5

FBF6

FBF7

ARABIC LETTER E ISOLATED FORM

06D0 (

s

FBFA t

ARABIC LETTER E FINAL FORM

06D0 (

ARABIC LETTER E INITIAL FORM

FBFB

u

FBFC

v

FBFD

w

FBFE

x

FBFF

y

FC00

z

FC01

{

FC02

|

FC03

}

FC04

~

FC05

FC06

FC07

FC08

FC09

FC0A

FC0B

FC0C

FC0D

06D0 (

ARABIC LETTER E MEDIAL FORM

<medial> 06D0 (

ARABIC LETTER UIGHUR KAZAKH KIRGHIZ ALEF MAKSURA INITIAL FORM

0649 )

ARABIC LETTER UIGHUR KAZAKH KIRGHIZ ALEF MAKSURA MEDIAL FORM

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH ALEF ISOLATED FORM

0626 ' 0627 3

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH ALEF FINAL FORM

0626 ' 0627 3

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH AE ISOLATED FORM

0626 ' 06D5 4

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH AE FINAL FORM

0626 ' 06D5 4

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH WAW ISOLATED FORM

0626 ' 0648 5

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH WAW FINAL FORM

0626 ' 0648 5

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH U ISOLATED FORM

0626 ' 06C7

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH U FINAL FORM

0626 ' 06C7

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH OE ISOLATED FORM

0626 ' 06C6

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH OE FINAL FORM

0626 ' 06C6

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH YU ISOLATED FORM

0626 ' 06C8

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH YU FINAL FORM

0626 ' 06C8

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH E ISOLATED FORM

0626 ' 06D0 (

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH E FINAL FORM

0626 ' 06D0 (

482

FBF9

06C9 2

Ligatures (two elements)

r

ARABIC LETTER KIRGHIZ YU FINAL FORM

<medial> 0649 ) FBEA

FBF8

FC0D

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH E INITIAL FORM

0626 ' 06D0 (

ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM

0626 ' 0649 )

ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA FINAL FORM

0626 ' 0649 )

ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA INITIAL FORM

0626 ' 0649 )

ARABIC LETTER FARSI YEH ISOLATED FORM

06CC *

ARABIC LETTER FARSI YEH FINAL FORM

06CC *

ARABIC LETTER FARSI YEH INITIAL FORM

06CC *

ARABIC LETTER FARSI YEH MEDIAL FORM

<medial> 06CC *

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH JEEM ISOLATED FORM

0626 ' 062C +

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH HAH ISOLATED FORM

0626 ' 062D ,

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH MEEM ISOLATED FORM

0626 ' 0645 -

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM

0626 ' 0649 )

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH YEH ISOLATED FORM

0626 ' 064A .

ARABIC LIGATURE BEH WITH JEEM ISOLATED FORM

0628 / 062C +

ARABIC LIGATURE BEH WITH HAH ISOLATED FORM

0628 / 062D ,

ARABIC LIGATURE BEH WITH KHAH ISOLATED FORM

0628 / 062E 0

ARABIC LIGATURE BEH WITH MEEM ISOLATED FORM

0628 / 0645 -

ARABIC LIGATURE BEH WITH ALEF MAKSURA ISOLATED FORM

0628 / 0649 )

ARABIC LIGATURE BEH WITH YEH ISOLATED FORM

0628 / 064A .

ARABIC LIGATURE TEH WITH JEEM ISOLATED FORM

062A 1 062C +

ARABIC LIGATURE TEH WITH HAH ISOLATED FORM

062A 1 062D ,

ARABIC LIGATURE TEH WITH KHAH ISOLATED FORM

062A 1 062E 0

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

FC0E FC0E

µ

FC0F

¶

FC10

·

FC11

¸

FC12

¹

FC13

º

FC14

»

FC15

¼

FC16

½

FC17

¾

FC18

¿

FC19

À

FC1A

Á

FC1B

Â

FC1C Ã FC1D Ä FC1E

Å

FC1F Æ FC20

Ç

FC21

È

FC22

É

FC23

Ê

Arabic Presentation Forms-A ARABIC LIGATURE TEH WITH MEEM ISOLATED FORM

FC24

ARABIC LIGATURE TEH WITH ALEF MAKSURA ISOLATED FORM

FC25

ARABIC LIGATURE TEH WITH YEH ISOLATED FORM

FC26

¡

ARABIC LIGATURE THEH WITH JEEM ISOLATED FORM

FC27

¢

ARABIC LIGATURE THEH WITH MEEM ISOLATED FORM

FC28

£

ARABIC LIGATURE THEH WITH ALEF MAKSURA ISOLATED FORM

FC29

¤

ARABIC LIGATURE THEH WITH YEH ISOLATED FC2A FORM

¥

ARABIC LIGATURE JEEM WITH HAH ISOLATED FORM

FC2B

¦

ARABIC LIGATURE JEEM WITH MEEM ISOLATED FORM

FC2C

§

ARABIC LIGATURE HAH WITH JEEM ISOLATED FORM

FC2D

¨

ARABIC LIGATURE HAH WITH MEEM ISOLATED FORM

FC2E

©

ARABIC LIGATURE KHAH WITH JEEM ISOLATED FORM

FC2F

ª

ARABIC LIGATURE KHAH WITH HAH ISOLATED FORM

FC30

«

ARABIC LIGATURE KHAH WITH MEEM ISOLATED FORM

FC31

¬

ARABIC LIGATURE SEEN WITH JEEM ISOLATED FORM

FC32

ARABIC LIGATURE SEEN WITH HAH ISOLATED FORM

FC33

®

ARABIC LIGATURE SEEN WITH KHAH ISOLATED FORM

FC34

¯

ARABIC LIGATURE SEEN WITH MEEM ISOLATED FORM

FC35

°

ARABIC LIGATURE SAD WITH HAH ISOLATED FORM

FC36

±

ARABIC LIGATURE SAD WITH MEEM ISOLATED FORM

FC37

²

ARABIC LIGATURE DAD WITH JEEM ISOLATED FORM

FC38

³

ARABIC LIGATURE DAD WITH HAH ISOLATED FORM

FC39

´

062A 1 0645 -

062A 1 0649 )

062A 1 064A . 062B > 062C + 062B > 0645 -

062B > 0649 )

062B > 064A . 062C + 062D , 062C + 0645 -

062D , 062C + 062D , 0645 -

062E 0 062C +

062E 0 062D , 062E 0 0645 -

0633 ? 062C +

0633 ? 062D , 0633 ? 062E 0 0633 ? 0645 -

0635 @ 062D , 0635 @ 0645 -

0636 6 062C +

0636 6 062D ,

FC39

ARABIC LIGATURE DAD WITH KHAH ISOLATED FORM

0636 6 062E 0

ARABIC LIGATURE DAD WITH MEEM ISOLATED FORM

0636 6 0645 -

ARABIC LIGATURE TAH WITH HAH ISOLATED FORM

0637 7 062D ,

ARABIC LIGATURE TAH WITH MEEM ISOLATED FORM

0637 7 0645 -

ARABIC LIGATURE ZAH WITH MEEM ISOLATED FORM

0638 8 0645 -

ARABIC LIGATURE AIN WITH JEEM ISOLATED FORM

0639 9 062C +

ARABIC LIGATURE AIN WITH MEEM ISOLATED FORM

0639 9 0645 -

ARABIC LIGATURE GHAIN WITH JEEM ISOLATED FORM

063A : 062C +

ARABIC LIGATURE GHAIN WITH MEEM ISOLATED FORM

063A : 0645 -

ARABIC LIGATURE FEH WITH JEEM ISOLATED FORM

0641 ; 062C +

ARABIC LIGATURE FEH WITH HAH ISOLATED FORM

0641 ; 062D ,

ARABIC LIGATURE FEH WITH KHAH ISOLATED FORM

0641 ; 062E 0

ARABIC LIGATURE FEH WITH MEEM ISOLATED FORM

0641 ; 0645 -

ARABIC LIGATURE FEH WITH ALEF MAKSURA ISOLATED FORM

0641 ; 0649 )

ARABIC LIGATURE FEH WITH YEH ISOLATED FORM

0641 ; 064A .

ARABIC LIGATURE QAF WITH HAH ISOLATED FORM

0642 < 062D ,

ARABIC LIGATURE QAF WITH MEEM ISOLATED FORM

0642 < 0645 -

ARABIC LIGATURE QAF WITH ALEF MAKSURA ISOLATED FORM

0642 < 0649 )

ARABIC LIGATURE QAF WITH YEH ISOLATED FORM

0642 < 064A .

ARABIC LIGATURE KAF WITH ALEF ISOLATED FORM

0643 = 0627 3

ARABIC LIGATURE KAF WITH JEEM ISOLATED FORM

0643 = 062C +

ARABIC LIGATURE KAF WITH HAH ISOLATED FORM

0643 = 062D ,

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

483

FC3A FC3A

á

FC3B

â

FC3C

ã

FC3D

ä

FC3E

å

FC3F

æ

FC40

ç

FC41

è

FC42

é

FC43

ê

FC44

ë

FC45

ì

FC46

í

FC47

î

FC48

ï

FC49

ð

FC4A

ñ

FC4B

ò

FC4C

ó

FC4D

ô

FC4E

õ

FC4F

ö

Arabic Presentation Forms-A ARABIC LIGATURE KAF WITH KHAH ISOLATED FORM

FC50

Ë

ARABIC LIGATURE KAF WITH LAM ISOLATED FORM

FC51

Ì

ARABIC LIGATURE KAF WITH MEEM ISOLATED FORM

FC52

Í

ARABIC LIGATURE KAF WITH ALEF MAKSURA ISOLATED FORM

FC53

Î

ARABIC LIGATURE KAF WITH YEH ISOLATED FORM

FC54

Ï

ARABIC LIGATURE LAM WITH JEEM ISOLATED FC55 FORM

Ð

ARABIC LIGATURE LAM WITH HAH ISOLATED FORM

FC56

Ñ

ARABIC LIGATURE LAM WITH KHAH ISOLATED FORM

FC57

Ò

ARABIC LIGATURE LAM WITH MEEM ISOLATED FORM

FC58

Ó

ARABIC LIGATURE LAM WITH ALEF MAKSURA FC59 ISOLATED FORM

Ô

ARABIC LIGATURE LAM WITH YEH ISOLATED FORM

FC5A

Õ

ARABIC LIGATURE MEEM WITH JEEM ISOLATED FORM

FC5B

Ö

ARABIC LIGATURE MEEM WITH HAH ISOLATED FORM

FC5C

×

ARABIC LIGATURE MEEM WITH KHAH ISOLATED FORM

FC5D

Ø

ARABIC LIGATURE MEEM WITH MEEM ISOLATED FORM

FC5E

Ù

ARABIC LIGATURE MEEM WITH ALEF MAKSURA ISOLATED FORM

FC5F

Ú

ARABIC LIGATURE MEEM WITH YEH ISOLATED FORM

FC60

Û

ARABIC LIGATURE NOON WITH JEEM ISOLATED FORM

FC61

Ü

ARABIC LIGATURE NOON WITH HAH ISOLATED FORM

FC62

Ý

ARABIC LIGATURE NOON WITH KHAH ISOLATED FORM

FC63

Þ

ARABIC LIGATURE NOON WITH MEEM ISOLATED FORM

FC64

ß

ARABIC LIGATURE NOON WITH ALEF MAKSURA ISOLATED FORM

FC65

à

0643 = 062E 0 0643 = 0644 M 0643 = 0645 -

0643 = 0649 )

0643 = 064A . 0644 M 062C +

0644 M 062D , 0644 M 062E 0 0644 M 0645 -

0644 M 0649 )

0644 M 064A . 0645 - 062C +

0645 - 062D , 0645 - 062E 0 0645 - 0645 -

0645 - 0649 )

0645 - 064A . 0646 A 062C +

0646 A 062D , 0646 A 062E 0 0646 A 0645 -

0646 A 0649 )

484

FC65

ARABIC LIGATURE NOON WITH YEH ISOLATED FORM

0646 A 064A .

ARABIC LIGATURE HEH WITH JEEM ISOLATED FORM

0647 B 062C +

ARABIC LIGATURE HEH WITH MEEM ISOLATED FORM

0647 B 0645 -

ARABIC LIGATURE HEH WITH ALEF MAKSURA ISOLATED FORM

0647 B 0649 )

ARABIC LIGATURE HEH WITH YEH ISOLATED FORM

0647 B 064A .

ARABIC LIGATURE YEH WITH JEEM ISOLATED FORM

064A . 062C +

ARABIC LIGATURE YEH WITH HAH ISOLATED FORM

064A . 062D ,

ARABIC LIGATURE YEH WITH KHAH ISOLATED FORM

064A . 062E 0

ARABIC LIGATURE YEH WITH MEEM ISOLATED FORM

064A . 0645 -

ARABIC LIGATURE YEH WITH ALEF MAKSURA ISOLATED FORM

064A . 0649 )

ARABIC LIGATURE YEH WITH YEH ISOLATED FORM

064A . 064A .

ARABIC LIGATURE THAL WITH SUPERSCRIPT ALEF ISOLATED FORM

0630 C 0670 D

ARABIC LIGATURE REH WITH SUPERSCRIPT ALEF ISOLATED FORM

0631 E 0670 D

ARABIC LIGATURE ALEF MAKSURA WITH SUPERSCRIPT ALEF ISOLATED FORM

0649 ) 0670 D

ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM

0020 064C F 0651 G ARABIC LIGATURE SHADDA WITH KASRATAN ISOLATED FORM

0020 064D H 0651 G

ARABIC LIGATURE SHADDA WITH FATHA ISOLATED FORM

0020 064E I 0651 G

ARABIC LIGATURE SHADDA WITH DAMMA ISOLATED FORM

0020 064F J 0651 G

ARABIC LIGATURE SHADDA WITH KASRA ISOLATED FORM

0020 0650 K 0651 G

ARABIC LIGATURE SHADDA WITH SUPERSCRIPT ALEF ISOLATED FORM

0020 0651 G 0670 D

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH REH FINAL FORM

0626 ' 0631 E

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH ZAIN FINAL FORM

0626 ' 0632 L

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Variation Selectors Range: FE00–FE0F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Vertical forms Range: FE10–FE1F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Combining Half Marks Range: FE20–FE2F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

CJK Compatibility Forms Range: FE30–FE4F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Small Form Variants Range: FE50–FE6F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Arabic Presentation Forms-B Range: FE70–FEFF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

FE70

Arabic Presentation Forms-B FE7 0

FE70

1

FE71

2

FE72

3

FE73

4

FE74

5

FE8

FE80

FE81

FE82

FE83

FE76

7

FE77

8

FE78

9

FE79

A

FE7A

B

FE7B

C

FE7C

D

FE7D

E

FE7E

F

FE7F

500

FEA

! 1 FE90

"

FE91

#

FE92

$

FE93

%

FEA0

2

FEA1

4

FEA3

5

FE94

FEA4

&

6

FE86

FE87

FE88

FE89

FE95

FE8B

FE8C

FE8D

FE8E

FE8F

A

FEB0

FEC

Q

FEC0

FE96

(

FE97

)

FE98

*

FE99

FE9A

,

FE9B

-

FE9C

.

FE9D

FED

a

FED0

B R b

FEB1

FEC1

FEA5

FEB2

D

FEB3

E

FEC2

T

FEC3

FED1

U

FEA6

8

FEA7

9

FEB5

FEC5

FEB6

H

FEB7

I

FEB9

FEBA

L

FEBB

M

FEBC

>

N FEBD

FEC6

X

FEC7

Y

FEC8

Z

FEC9

[

FECA

\

FECB

]

FECC

^

FECD

_

FED5

g

FED6

h

FED7

i

FED8

j

FED9

k

FEDA

l

FEDB

m

FEDC

n

FEDD

FE9E

FEAE

?

O FEBE

FECE

FEDE

0

@

P

`

p

/

FE9F

FEAF

v

FEAC

FEAD

u

FEF3

f

; K

=

t

FEE3

FEF2

F V

J

<

s

FEE2

FEF1

FEF4

:

FEAB

e

r

FEE1

FEF0

FEE4

FEB8

FEAA

d

FED3

q

FEE0

FEF

FED4

FEA8

FEA9

FED2

FEE

FEC4

FEB4

' 7 G W

+

FE8A

FEB

3 C S c

FEA2

FE84

FE85

6

FE9

FEFF

FEBF

FECF

o

FEDF

FEE5

w

FEE6

x

FEE7

y

FEE8

z

FEE9

{

FEEA

|

FEEB

}

FEEC

FEF5

FEF6

FEF7

FEF8

FEF9

FEFA

FEFB

FEFC

~

FEED

FEEE

FEEF

FEFF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Halfwidth and Fullwidth Forms Range: FF00–FFEF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

FF00 FF0

Halfwidth and Fullwidth Forms FF1

FF31

FF40

FF41

FF50

FF51

FF60

FF61

FF70

FF71

FF80

FF81

FF90

FF91

FFA0

FFA1

FFE0

FFB0

FFE1

FFB1

FF12

FF22

FF32

FF42

FF52

FF62

FF72

FF82

FF92

FFA2

FFB2

FFC2

FFD2

FFE2

FF13

FF23

FF33

FF43

FF53

FF63

FF73

FF83

FF93

FFA3

FFB3

FFC3

FFD3

FFE3

FF14

FF24

FF34

FF44

FF54

FF64

FF74

FF84

FF94

FFA4

FFB4

FFC4

FFD4

FFE4

FF15

FF25

FF35

FF45

FF55

FF65

FF75

FF85

FF95

FFA5

FFB5

FFC5

FFD5

FFE5

FF16

FF26

FF36

FF46

FF56

FF66

FF76

FF86

FF96

FFA6

FFB6

FFC6

FFD6

FFE6

FF17

FF27

FF37

FF47

FF57

FF67

FF77

FF87

FF97

FFA7

FFB7

FFC7

FFD7

FF18

FF28

FF38

FF48

FF58

FF68

FF78

FF88

FF98

FFA8

￨

FFE8

FFB8

FF19

FF29

FF39

FF49

FF59

FF69

FF79

FF89

FF99

FFA9

￩

FFE9

FFB9

FF1A

FF2A

FF3A

FF4A

FF5A

FF6A

FF7A

FF8A

FF9A

FFAA

FFBA

FFCA

FFDA

FFEA

FF1B

FF2B

FF3B

FF4B

FF5B

FF6B

FF7B

FF8B

FF9B

FFAB

FFBB

FFCB

FFDB

FFEB

，＜Ｌ＼ｌ｜ｬｼﾌﾜﾬﾼￌￜ￬ FF1C

FF2C

FF3C

FF4C

FF5C

FF6C

FF7C

FF8C

FF9C

FFAC

FFBC

FFCC

－＝Ｍ］ｍ｝ｭｽﾍﾝﾭﾽￍ FF0D

FF1D

FF2D

FF3D

FF4D

FF5D

FF6D

FF7D

FF8D

FF9D

FFAD

FFBD

FFCD

．＞Ｎ＾ｎ～ｮｾﾎﾞﾮﾾￎ FF0E

F

FF21

FF30

＋；Ｋ［ｋ｛ｫｻﾋﾛﾫﾻￋￛ￫ FF0C

E

FFE

＊：ＪＺｊｚｪｺﾊﾚﾪﾺￊￚ￪ FF0B

D

FF11

FF20

）９ＩＹｉｙｩｹﾉﾙﾩﾹ FF0A

C

FFD

（８ＨＸｈｘｨｸﾈﾘﾨﾸ FF09

B

FFC

＇７ＧＷｇｗｧｷﾇﾗﾧﾷￇￗ FF08

A

FFB

＆６ＦＶｆｖｦｶﾆﾖﾦﾶￆￖ￦ FF07

9

FFA

％５ＥＵｅｕ･ｵﾅﾕﾥﾵￅￕ￥ FF06

8

FF9

＄４ＤＴｄｔ､ｴﾄﾔﾤﾴￄￔ￤ FF05

7

FF8

＃３ＣＳｃｓ｣ｳﾃﾓﾣﾳￃￓ￣ FF04

6

FF7

＂２ＢＲｂｒ｢ｲﾂﾒﾢﾲￂￒ￢ FF03

5

FF6

￡

FF02

4

FF5

！１ＡＱａｑ｡ｱﾁﾑﾡﾱ FF01

3

FF4

￠

FF10

2

FF3

０＠Ｐ｀ｐｰﾀﾐﾰ

0

1

FF2

FFEF

FF1E

FF2E

FF3E

FF4E

FF5E

FF6E

FF7E

FF8E

FF9E

FFAE

／？Ｏ＿ｏｯｿﾏﾟﾯ FF0F

504

FF1F

FF2F

FF3F

FF4F

FF5F

FF6F

FF7F

FF8F

FF9F

FFAF

FFBE

FFCE

FFDC

FFEC

￭

FFED

￮

FFEE

ￏ FFCF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

FF01

Halfwidth and Fullwidth Forms

Fullwidth ASCII variants See ASCII 0020 - 007E FF01 ！ FULLWIDTH EXCLAMATION MARK <wide> 0021 ! FF02 ＂ FULLWIDTH QUOTATION MARK <wide> 0022 " FF03 ＃ FULLWIDTH NUMBER SIGN <wide> 0023 # FF04 ＄ FULLWIDTH DOLLAR SIGN <wide> 0024 $ FF05 ％ FULLWIDTH PERCENT SIGN <wide> 0025 % FF06 ＆ FULLWIDTH AMPERSAND <wide> 0026 & FF07 ＇ FULLWIDTH APOSTROPHE <wide> 0027 ' FF08 （ FULLWIDTH LEFT PARENTHESIS <wide> 0028 ( FF09 ） FULLWIDTH RIGHT PARENTHESIS <wide> 0029 ) FF0A ＊ FULLWIDTH ASTERISK <wide> 002A * FF0B ＋ FULLWIDTH PLUS SIGN <wide> 002B + FF0C ， FULLWIDTH COMMA <wide> 002C , FF0D － FULLWIDTH HYPHEN-MINUS <wide> 002D FF0E ． FULLWIDTH FULL STOP <wide> 002E . FF0F ／ FULLWIDTH SOLIDUS <wide> 002F / FF10 ０ FULLWIDTH DIGIT ZERO <wide> 0030 0 FF11 １ FULLWIDTH DIGIT ONE <wide> 0031 1 FF12 ２ FULLWIDTH DIGIT TWO <wide> 0032 2 FF13 ３ FULLWIDTH DIGIT THREE <wide> 0033 3 FF14 ４ FULLWIDTH DIGIT FOUR <wide> 0034 4 FF15 ５ FULLWIDTH DIGIT FIVE <wide> 0035 5 FF16 ６ FULLWIDTH DIGIT SIX <wide> 0036 6 FF17 ７ FULLWIDTH DIGIT SEVEN <wide> 0037 7 FF18 ８ FULLWIDTH DIGIT EIGHT <wide> 0038 8 FF19 ９ FULLWIDTH DIGIT NINE <wide> 0039 9 FF1A ： FULLWIDTH COLON <wide> 003A : FF1B ； FULLWIDTH SEMICOLON <wide> 003B ; FF1C ＜ FULLWIDTH LESS-THAN SIGN <wide> 003C < FF1D ＝ FULLWIDTH EQUALS SIGN <wide> 003D = FF1E ＞ FULLWIDTH GREATER-THAN SIGN <wide> 003E > FF1F ？ FULLWIDTH QUESTION MARK <wide> 003F ?

FF3F

FF20 ＠ FULLWIDTH COMMERCIAL AT <wide> 0040 @ FF21 Ａ FULLWIDTH LATIN CAPITAL LETTER A <wide> 0041 A FF22 Ｂ FULLWIDTH LATIN CAPITAL LETTER B <wide> 0042 B FF23 Ｃ FULLWIDTH LATIN CAPITAL LETTER C <wide> 0043 C FF24 Ｄ FULLWIDTH LATIN CAPITAL LETTER D <wide> 0044 D FF25 Ｅ FULLWIDTH LATIN CAPITAL LETTER E <wide> 0045 E FF26 Ｆ FULLWIDTH LATIN CAPITAL LETTER F <wide> 0046 F FF27 Ｇ FULLWIDTH LATIN CAPITAL LETTER G <wide> 0047 G FF28 Ｈ FULLWIDTH LATIN CAPITAL LETTER H <wide> 0048 H FF29 Ｉ FULLWIDTH LATIN CAPITAL LETTER I <wide> 0049 I FF2A Ｊ FULLWIDTH LATIN CAPITAL LETTER J <wide> 004A J FF2B Ｋ FULLWIDTH LATIN CAPITAL LETTER K <wide> 004B K FF2C Ｌ FULLWIDTH LATIN CAPITAL LETTER L <wide> 004C L FF2D Ｍ FULLWIDTH LATIN CAPITAL LETTER M <wide> 004D M FF2E Ｎ FULLWIDTH LATIN CAPITAL LETTER N <wide> 004E N FF2F Ｏ FULLWIDTH LATIN CAPITAL LETTER O <wide> 004F O FF30 Ｐ FULLWIDTH LATIN CAPITAL LETTER P <wide> 0050 P FF31 Ｑ FULLWIDTH LATIN CAPITAL LETTER Q <wide> 0051 Q FF32 Ｒ FULLWIDTH LATIN CAPITAL LETTER R <wide> 0052 R FF33 Ｓ FULLWIDTH LATIN CAPITAL LETTER S <wide> 0053 S FF34 Ｔ FULLWIDTH LATIN CAPITAL LETTER T <wide> 0054 T FF35 Ｕ FULLWIDTH LATIN CAPITAL LETTER U <wide> 0055 U FF36 Ｖ FULLWIDTH LATIN CAPITAL LETTER V <wide> 0056 V FF37 Ｗ FULLWIDTH LATIN CAPITAL LETTER W <wide> 0057 W FF38 Ｘ FULLWIDTH LATIN CAPITAL LETTER X <wide> 0058 X FF39 Ｙ FULLWIDTH LATIN CAPITAL LETTER Y <wide> 0059 Y FF3A Ｚ FULLWIDTH LATIN CAPITAL LETTER Z <wide> 005A Z FF3B ［ FULLWIDTH LEFT SQUARE BRACKET <wide> 005B [ FF3C ＼ FULLWIDTH REVERSE SOLIDUS <wide> 005C \ FF3D ］ FULLWIDTH RIGHT SQUARE BRACKET <wide> 005D ] FF3E ＾ FULLWIDTH CIRCUMFLEX ACCENT <wide> 005E ^ FF3F ＿ FULLWIDTH LOW LINE <wide> 005F _

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

505

Specials Range: FFF0–FFFF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Linear B Syllabary Range: 10000–1007F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Linear B Ideograms Range: 10080–100FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

10080

Linear B Ideograms

100FF

1008 1009 100A 100B 100C 100D 100E 100F 0

"2B N ^ n

10080

1

1008E

F

10092

100A2

100B2

100C2

100D2

100E2

100F2

10093

100A3

100B3

100C3

100D3

100E3

100F3

10094

100A4

100B4

100C4

100D4

100E4

100F4

10095

100A5

100B5

100C5

100D5

100E5

100F5

10096

100A6

100B6

100C6

100D6

100E6

100F6

10097

100A7

100B7

100C7

100D7

100E7

100F7

10098

100A8

100B8

100C8

100D8

100E8

100F8

10099

100A9

100B9

100C9

100D9

100E9

100F9

1009A

100AA

100BA

100CA

100DA

100EA

100FA

1009B

100AB

100BB

100CB

100DB

100EB

1009C

100AC

100BC

100CC

100DC

100EC

1009D

1009E

100AD

100BD

100CD

100DD

100ED

0 @ \ l 100AE

100BE

100CE

100DE

100EE

! 1A M ] m

1008F

512

100F1

/ ? [ k

1008D

E

100E1

. > Z j

1008C

D

100D1

- = L Y i

1008B

C

100C1

, < K Xh x

1008A

B

100B1

+ ; J W gw

10089

A

100A1

* : I V f v

10088

9

10091

) 9 H U e u

10087

8

100F0

( 8 T d t

10086

7

100E0

' 7 G S c s 10085

6

100D0

& 6 F R b r 10084

5

100C0

% 5 E Q a q 10083

4

100B0

$ 4 D P ` p 10082

3

100A0

# 3 C O _ o

10081

2

10090

1009F

100AF

100BF

100CF

100DF

100EF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Aegean Numbers Range: 10100–1013F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Ancient Greek Numbers Range: 10140–1018F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Old Italic Range: 10300–1032F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Gothic Range: 10330–1034F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Ugaritic Range: 10380–1039F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Old Persian Range: 103A0–103DF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Deseret Range: 10400–1044F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Shavian Range: 10450–1047F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Osmanya Range: 10480–104AF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Cypriot Syllabary Range: 10800–1083F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Phoenician Range: 10900–1091F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Kharoshthi Range: 10A00–10A5F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Cuneiform Range: 12000–123FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

12000

Cuneiform

120FF

1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 120A 120B 120C 120D 120E 120F

"2BRbr¢²ÂÒâò

0

12000

12001

12030

12040

12050

12060

12070

12080

12090

120A0

120B0

120C0

120D0

120E0

120F0

12011

12021

12031

12041

12051

12061

12071

12081

12091

120A1

120B1

120C1

120D1

120E1

120F1

$4DTdt¤´ÄÔäô

2

12002

12012

12022

12032

12042

12052

12062

12072

12082

12092

120A2

120B2

120C2

120D2

120E2

120F2

%5EUeu¥µÅÕåõ

3

12003

12013

12023

12033

12043

12053

12063

12073

12083

12093

120A3

120B3

120C3

120D3

120E3

120F3

&6FVfv¦¶ÆÖæö

4

12004

12014

12024

12034

12044

12054

12064

12074

12084

12094

120A4

120B4

120C4

120D4

120E4

120F4

'7GWgw§·Ç×ç÷

5

12005

12015

12025

12035

12045

12055

12065

12075

12085

12095

120A5

120B5

120C5

120D5

120E5

120F5

(8HXhx¨¸ÈØèø

6

12006

12016

12026

12036

12046

12056

12066

12076

12086

12096

120A6

120B6

120C6

120D6

120E6

120F6

)9IYiy©¹ÉÙéù

7

12007

12017

12027

12037

12047

12057

12067

12077

12087

12097

120A7

120B7

120C7

120D7

120E7

120F7

*:JZjzªºÊÚêú

8

12008

12018

12028

12038

12048

12058

12068

12078

12088

12098

120A8

120B8

120C8

120D8

120E8

120F8

+;K[k{«»ËÛëû

9

12009

12019

12029

12039

12049

12059

12069

12079

12089

12099

120A9

120B9

120C9

120D9

120E9

120F9

,
A

1200A

1201A

1202A

1203A

1204A

1205A

1206A

1207A

1208A

1209A

120AA

120BA

120CA

120DA

120EA

120FA

-=M]m}½ÍÝíý

B

1200B

C

D

1201B

1202B

1203B

1204B

1205B

1206B

1207B

1208B

1209B

120AB

120BB

120CB

120DB

120EB

120FB

.>N^n~®¾ÎÞîþ 1200C

1201C

1202C

1203C

1204C

1205C

1206C

1207C

1208C

1209C

120AC

120BC

120CC

120DC

120EC

120FC

/?O_o¯¿Ïßï 1200D

1201D

1200E

F

12020

#3CScs£³ÃÓãó

1

E

12010

1202D

1203D

1204D

1205D

1206D

1207D

1208D

1209D

120AD

120BD

120CD

120DD

120ED

120FD

0@P`p °ÀÐàð 1201E

1202E

1203E

1204E

1205E

1206E

1207E

1208E

1209E

120AE

120BE

120CE

120DE

120EE

120FE

!1AQaq¡±ÁÑáñ 1200F

530

1201F

1202F

1203F

1204F

1205F

1206F

1207F

1208F

1209F

120AF

120BF

120CF

120DF

120EF

120FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

12100

Cuneiform

121FF

1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 121A 121B 121C 121D 121E 121F

$4DTdt¤´ÄÔäô

0

12100

12101

12130

12140

12150

12160

12170

12180

12190

121A0

121B0

121C0

121D0

121E0

121F0

12111

12121

12131

12141

12151

12161

12171

12181

12191

121A1

121B1

121C1

121D1

121E1

121F1

&6FVfv¦¶ÆÖæö

2

12102

12112

12122

12132

12142

12152

12162

12172

12182

12192

121A2

121B2

121C2

121D2

121E2

121F2

'7GWgw§·Ç×ç÷

3

12103

12113

12123

12133

12143

12153

12163

12173

12183

12193

121A3

121B3

121C3

121D3

121E3

121F3

(8HXhx¨¸ÈØèø

4

12104

12114

12124

12134

12144

12154

12164

12174

12184

12194

121A4

121B4

121C4

121D4

121E4

121F4

)9IYiy©¹ÉÙéù

5

12105

12115

12125

12135

12145

12155

12165

12175

12185

12195

121A5

121B5

121C5

121D5

121E5

121F5

*:JZjzªºÊÚêú

6

12106

12116

12126

12136

12146

12156

12166

12176

12186

12196

121A6

121B6

121C6

121D6

121E6

121F6

+;K[k{«»ËÛëû

7

12107

12117

12127

12137

12147

12157

12167

12177

12187

12197

121A7

121B7

121C7

121D7

121E7

121F7

,
8

12108

12118

12128

12138

12148

12158

12168

12178

12188

12198

121A8

121B8

121C8

121D8

121E8

121F8

-=M]m}½ÍÝíý

9

12109

12119

12129

12139

12149

12159

12169

12179

12189

12199

121A9

121B9

121C9

121D9

121E9

121F9

.>N^n~®¾ÎÞîþ

A

1210A

1211A

1212A

1213A

1214A

1215A

1216A

1217A

1218A

1219A

121AA

121BA

121CA

121DA

121EA

121FA

/?O_o¯¿Ïßï

B

1210B

C

D

1211B

1210C

1212B

1213B

1214B

1215B

1216B

1217B

1218B

1219B

121AB

121BB

121CB

121DB

121EB

121FB

0@P`p °ÀÐàð 1211C

1212C

1213C

1214C

1215C

1216C

1217C

1218C

1219C

121AC

121BC

121CC

121DC

121EC

121FC

!1AQaq¡±ÁÑáñ 1210D

1211D

1212D

1213D

1214D

1215D

1216D

1217D

1218D

1219D

121AD

121BD

121CD

121DD

121ED

121FD

"2BRbr¢²ÂÒâò 1210E

F

12120

%5EUeu¥µÅÕåõ

1

E

12110

1211E

1212E

1213E

1214E

1215E

1216E

1217E

1218E

1219E

121AE

121BE

121CE

121DE

121EE

121FE

#3CScs£³ÃÓãó 1210F

1211F

1212F

1213F

1214F

1215F

1216F

1217F

1218F

1219F

121AF

121BF

121CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

121DF

121EF

121FF

531

12200

Cuneiform

122FF

1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 122A 122B 122C 122D 122E 122F

&6FVfv¦¶ÆÖæö

0

12200

12201

12230

12240

12250

12260

12270

12280

12290

122A0

122B0

122C0

122D0

122E0

122F0

12211

12221

12231

12241

12251

12261

12271

12281

12291

122A1

122B1

122C1

122D1

122E1

122F1

(8HXhx¨¸ÈØèø

2

12202

12212

12222

12232

12242

12252

12262

12272

12282

12292

122A2

122B2

122C2

122D2

122E2

122F2

)9IYiy©¹ÉÙéù

3

12203

12213

12223

12233

12243

12253

12263

12273

12283

12293

122A3

122B3

122C3

122D3

122E3

122F3

*:JZjzªºÊÚêú

4

12204

12214

12224

12234

12244

12254

12264

12274

12284

12294

122A4

122B4

122C4

122D4

122E4

122F4

+;K[k{«»ËÛëû

5

12205

12215

12225

12235

12245

12255

12265

12275

12285

12295

122A5

122B5

122C5

122D5

122E5

122F5

,
6

12206

12216

12226

12236

12246

12256

12266

12276

12286

12296

122A6

122B6

122C6

122D6

122E6

122F6

-=M]m}½ÍÝíý

7

12207

12217

12227

12237

12247

12257

12267

12277

12287

12297

122A7

122B7

122C7

122D7

122E7

122F7

.>N^n~®¾ÎÞîþ

8

12208

12218

12228

12238

12248

12258

12268

12278

12288

12298

122A8

122B8

122C8

122D8

122E8

122F8

/?O_o¯¿Ïßï

9

12209

12219

A

1220A

12229

12239

12249

12259

12269

12279

12289

12299

122A9

122B9

122C9

122D9

122E9

122F9

0@P`p °ÀÐàð 1221A

1222A

1223A

1224A

1225A

1226A

1227A

1228A

1229A

122AA

122BA

122CA

122DA

122EA

122FA

!1AQaq¡±ÁÑáñ

B

1220B

C

D

1221B

1222B

1223B

1224B

1225B

1226B

1227B

1228B

1229B

122AB

122BB

122CB

122DB

122EB

122FB

"2BRbr¢²ÂÒâò 1220C

1221C

1222C

1223C

1224C

1225C

1226C

1227C

1228C

1229C

122AC

122BC

122CC

122DC

122EC

122FC

#3CScs£³ÃÓãó 1220D

1221D

1222D

1223D

1224D

1225D

1226D

1227D

1228D

1229D

122AD

122BD

122CD

122DD

122ED

122FD

$4DTdt¤´ÄÔäô 1220E

F

12220

'7GWgw§·Ç×ç÷

1

E

12210

1221E

1222E

1223E

1224E

1225E

1226E

1227E

1228E

1229E

122AE

122BE

122CE

122DE

122EE

122FE

%5EUeu¥µÅÕåõ 1220F

532

1221F

1222F

1223F

1224F

1225F

1226F

1227F

1228F

1229F

122AF

122BF

122CF

122DF

122EF

122FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

12300

Cuneiform

123FF

1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 123A 123B 123C 123D 123E 123F

(8HXh

0

12300

12301

12330

12340

12350

12360

12311

12321

12331

12341

12351

12361

*:JZj

2

12302

12312

12322

12332

12342

12352

12362

+;K[k

3

12303

12313

12323

12333

12343

12353

12363

,
4

12304

12314

12324

12334

12344

12354

12364

-=M]m

5

12305

12315

12325

12335

12345

12355

12365

.>N^n

6

12306

12316

12326

12336

12346

12356

12366

/?O_o

7

12307

12317

8

12308

12327

12337

12347

12357

12367

0@P`p 12318

12328

12338

12348

12358

12368

!1AQaq

9

12309

12319

12329

12339

12349

12359

12369

"2BRbr

A

1230A

1231A

1232A

1233A

1234A

1235A

1236A

#3CScs

B

1230B

C

D

1231B

1232B

1233B

1234B

1235B

1236B

$4DTdt 1230C

1231C

1232C

1233C

1234C

1235C

1236C

%5EUeu 1230D

1231D

1232D

1233D

1234D

1235D

1236D

&6FVfv 1230E

F

12320

)9IYi

1

E

12310

1231E

1232E

1233E

1234E

1235E

1236E

'7GWg 1230F

1231F

1232F

1233F

1234F

1235F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

533

12000

Cuneiform

Signs 12000 A CUNEIFORM SIGN A 12001 B CUNEIFORM SIGN A TIMES A 12002 C CUNEIFORM SIGN A TIMES BAD 12003 D CUNEIFORM SIGN A TIMES GAN2 TENU 12004 E CUNEIFORM SIGN A TIMES HA 12005 F CUNEIFORM SIGN A TIMES IGI 12006 G CUNEIFORM SIGN A TIMES LAGAR GUNU 12007 H CUNEIFORM SIGN A TIMES MUSH 12008 I CUNEIFORM SIGN A TIMES SAG 12009 J CUNEIFORM SIGN A2 1200A K CUNEIFORM SIGN AB 1200B L CUNEIFORM SIGN AB TIMES ASH2 1200C M CUNEIFORM SIGN AB TIMES DUN3 GUNU 1200D N CUNEIFORM SIGN AB TIMES GAL 1200E O CUNEIFORM SIGN AB TIMES GAN2 TENU 1200F P CUNEIFORM SIGN AB TIMES HA 12010 Q CUNEIFORM SIGN AB TIMES IGI GUNU 12011 R CUNEIFORM SIGN AB TIMES IMIN 12012 S CUNEIFORM SIGN AB TIMES LAGAB 12013 T CUNEIFORM SIGN AB TIMES SHESH 12014 U CUNEIFORM SIGN AB TIMES U PLUS U PLUS U 12015 V CUNEIFORM SIGN AB GUNU 12016 W CUNEIFORM SIGN AB2 12017 X CUNEIFORM SIGN AB2 TIMES BALAG 12018 Y CUNEIFORM SIGN AB2 TIMES GAN2 TENU 12019 Z CUNEIFORM SIGN AB2 TIMES ME PLUS EN 1201A [ CUNEIFORM SIGN AB2 TIMES SHA3 1201B \ CUNEIFORM SIGN AB2 TIMES TAK4 1201C ] CUNEIFORM SIGN AD 1201D ^ CUNEIFORM SIGN AK 1201E _ CUNEIFORM SIGN AK TIMES ERIN2 1201F ` CUNEIFORM SIGN AK TIMES SHITA PLUS GISH 12020 a CUNEIFORM SIGN AL 12021 b CUNEIFORM SIGN AL TIMES AL 12022 c CUNEIFORM SIGN AL TIMES DIM2 12023 d CUNEIFORM SIGN AL TIMES GISH 12024 e CUNEIFORM SIGN AL TIMES HA 12025 f CUNEIFORM SIGN AL TIMES KAD3 12026 g CUNEIFORM SIGN AL TIMES KI 12027 h CUNEIFORM SIGN AL TIMES SHE 12028 i CUNEIFORM SIGN AL TIMES USH 12029 j CUNEIFORM SIGN ALAN 1202A k CUNEIFORM SIGN ALEPH 1202B l CUNEIFORM SIGN AMAR 1202C m CUNEIFORM SIGN AMAR TIMES SHE 1202D n CUNEIFORM SIGN AN 1202E o CUNEIFORM SIGN AN OVER AN 1202F p CUNEIFORM SIGN AN THREE TIMES 12030 q CUNEIFORM SIGN AN PLUS NAGA OPPOSING AN PLUS NAGA

12031 r CUNEIFORM SIGN AN PLUS NAGA SQUARED 12032 s CUNEIFORM SIGN ANSHE 12033 t CUNEIFORM SIGN APIN 12034 u CUNEIFORM SIGN ARAD 12035 v CUNEIFORM SIGN ARAD TIMES KUR 12036 w CUNEIFORM SIGN ARKAB 12037 x CUNEIFORM SIGN ASAL2 12038 y CUNEIFORM SIGN ASH 12039 z CUNEIFORM SIGN ASH ZIDA TENU 1203A { CUNEIFORM SIGN ASH KABA TENU 1203B | CUNEIFORM SIGN ASH OVER ASH TUG2 OVER TUG2 TUG2 OVER TUG2 PAP

1203C } CUNEIFORM SIGN ASH OVER ASH OVER ASH 1203D ~ CUNEIFORM SIGN ASH OVER ASH OVER ASH CROSSING ASH OVER ASH OVER ASH

1203E CUNEIFORM SIGN ASH2 1203F CUNEIFORM SIGN ASHGAB

534

1207E

12040 CUNEIFORM SIGN BA 12041 CUNEIFORM SIGN BAD 12042 CUNEIFORM SIGN BAG3 12043 CUNEIFORM SIGN BAHAR2 12044 CUNEIFORM SIGN BAL 12045 CUNEIFORM SIGN BAL OVER BAL 12046 CUNEIFORM SIGN BALAG 12047 CUNEIFORM SIGN BAR 12048 CUNEIFORM SIGN BARA2 12049 CUNEIFORM SIGN BI 1204A CUNEIFORM SIGN BI TIMES A 1204B CUNEIFORM SIGN BI TIMES GAR 1204C CUNEIFORM SIGN BI TIMES IGI GUNU 1204D CUNEIFORM SIGN BU 1204E CUNEIFORM SIGN BU OVER BU AB 1204F CUNEIFORM SIGN BU OVER BU UN 12050 CUNEIFORM SIGN BU CROSSING BU 12051 CUNEIFORM SIGN BULUG 12052 CUNEIFORM SIGN BULUG OVER BULUG 12053 CUNEIFORM SIGN BUR 12054 CUNEIFORM SIGN BUR2 12055 CUNEIFORM SIGN DA 12056 CUNEIFORM SIGN DAG 12057 CUNEIFORM SIGN DAG KISIM5 TIMES A PLUS MASH

12058 CUNEIFORM SIGN DAG KISIM5 TIMES AMAR 12059 CUNEIFORM SIGN DAG KISIM5 TIMES BALAG 1205A CUNEIFORM SIGN DAG KISIM5 TIMES BI 1205B CUNEIFORM SIGN DAG KISIM5 TIMES GA 1205C CUNEIFORM SIGN DAG KISIM5 TIMES GA PLUS MASH

1205D CUNEIFORM SIGN DAG KISIM5 TIMES GI 1205E CUNEIFORM SIGN DAG KISIM5 TIMES GIR2 1205F ! CUNEIFORM SIGN DAG KISIM5 TIMES GUD 12060 " CUNEIFORM SIGN DAG KISIM5 TIMES HA 12061 # CUNEIFORM SIGN DAG KISIM5 TIMES IR 12062 $ CUNEIFORM SIGN DAG KISIM5 TIMES IR PLUS LU

12063 % CUNEIFORM SIGN DAG KISIM5 TIMES KAK 12064 & CUNEIFORM SIGN DAG KISIM5 TIMES LA 12065 ' CUNEIFORM SIGN DAG KISIM5 TIMES LU 12066 ( CUNEIFORM SIGN DAG KISIM5 TIMES LU PLUS MASH2

12067 ) CUNEIFORM SIGN DAG KISIM5 TIMES LUM 12068 * CUNEIFORM SIGN DAG KISIM5 TIMES NE 12069 + CUNEIFORM SIGN DAG KISIM5 TIMES PAP PLUS PAP

1206A , CUNEIFORM SIGN DAG KISIM5 TIMES SI 1206B - CUNEIFORM SIGN DAG KISIM5 TIMES TAK4 1206C . CUNEIFORM SIGN DAG KISIM5 TIMES U2 PLUS GIR2

1206D / CUNEIFORM SIGN DAG KISIM5 TIMES USH 1206E 0 CUNEIFORM SIGN DAM 1206F 1 CUNEIFORM SIGN DAR 12070 2 CUNEIFORM SIGN DARA3 12071 3 CUNEIFORM SIGN DARA4 12072 4 CUNEIFORM SIGN DI 12073 5 CUNEIFORM SIGN DIB 12074 6 CUNEIFORM SIGN DIM 12075 7 CUNEIFORM SIGN DIM TIMES SHE 12076 8 CUNEIFORM SIGN DIM2 12077 9 CUNEIFORM SIGN DIN 12078 : CUNEIFORM SIGN DIN KASKAL U GUNU DISH 12079 ; CUNEIFORM SIGN DISH 1207A < CUNEIFORM SIGN DU 1207B = CUNEIFORM SIGN DU OVER DU 1207C > CUNEIFORM SIGN DU GUNU 1207D ? CUNEIFORM SIGN DU SHESHIG 1207E @ CUNEIFORM SIGN DUB

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Cuneiform Numbers and Punctuation Range: 12400–1247F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

12400

Cuneiform Numbers and Punctuation

1247F

1240 1241 1242 1243 1244 1245 1246 1247 0

"2BRb 12400

1

12413

12414

12415

12416

12417

12418

12419

1241A

1241B

1241C

1241D

1240E

F

12422

12432

12442

12452

12423

12433

12443

12453

12462

Ú 12472

Û 12473

12424

12434

12444

12454

12425

12435

12445

12455

12426

12436

12446

12456

12427

12437

12447

12457

12428

12438

12448

12458

12429

12439

12449

12459

1242A

1243A

1244A

1245A

1242B

1243B

1244B

1245B

1242C

1243C

1244C

1245C

/?O_ 1240D

E

12412

Ò 12471

.>N^ 1240C

D

12461

-=M] 1240B

C

12451

,
B

12441

+;K[ 12409

A

12431

*:JZ 12408

9

12421

)9IY 12407

8

12411

 12470

(8HX 12406

7

12460

'7GW 12405

6

12450

&6FV 12404

5

12440

%5EU 12403

4

12430

$4DTd 12402

3

12420

#3CSc 12401

2

12410

1242D

1243D

1244D

1245D

0@P` 1241E

1242E

1243E

1244E

1245E

!1AQa 1240F

1241F

1242F

1243F

1244F

1245F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

541

Byzantine Musical Symbols Range: 1D000–1D0FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

1D000

Byzantine Musical Symbols

1D0FF

1D00 1D01 1D02 1D03 1D04 1D05 1D06 1D07 1D08 1D09 1D0A 1D0B 1D0C 1D0D 1D0E 1D0F

Äê†∞¿–‡Äê†∞¿–‡

0

1D000

1D001

1D030

1D040

1D050

1D060

1D070

1D080

1D090

1D0A0

1D0B0

1D0C0

1D0D0

1D0E0

1D0F0

1D011

1D021

1D031

1D041

1D051

1D061

1D071

1D081

1D091

1D0A1

1D0B1

1D0C1

1D0D1

1D0E1

1D0F1

Çí¢≤¬“‚ÚÇí¢≤¬“‚Ú

2

1D002

1D012

1D022

1D032

1D042

1D052

1D062

1D072

1D082

1D092

1D0A2

1D0B2

1D0C2

1D0D2

1D0E2

1D0F2

Éì£≥√”„ÛÉì£≥√”„Û

3

1D003

1D013

1D023

1D033

1D043

1D053

1D063

1D073

1D083

1D093

1D0A3

1D0B3

1D0C3

1D0D3

1D0E3

1D0F3

Ñî§¥ƒ‘‰ÙÑî§¥ƒ‘‰Ù

4

1D004

1D014

1D024

1D034

1D044

1D054

1D064

1D074

1D084

1D094

1D0A4

1D0B4

1D0C4

1D0D4

1D0E4

1D0F4

Öï•μ≈’ÂıÖï•μ≈’Âı

5

1D005

1D015

1D025

1D035

1D045

1D055

1D065

1D075

1D085

1D095

1D0A5

1D0B5

1D0C5

1D0D5

1D0E5

1D0F5

Üñ¶∂∆÷ÊˆÜñ¶∂∆÷Ê

6

1D006

1D016

1D026

1D036

1D046

1D056

1D066

1D076

1D086

1D096

1D0A6

1D0B6

1D0C6

1D0D6

1D0E6

áóß∑«◊Á˜áóß∑«◊Á

7

1D007

1D017

1D027

1D037

1D047

1D057

1D067

1D077

1D087

1D097

1D0A7

1D0B7

1D0C7

1D0D7

1D0E7

àò®∏»ÿË¯àò®∏»ÿË

8

1D008

1D018

1D028

1D038

1D048

1D058

1D068

1D078

1D088

1D098

1D0A8

1D0B8

1D0C8

1D0D8

1D0E8

âô©π…ŸÈ˘âô©π…ŸÈ

9

1D009

1D019

1D029

1D039

1D049

1D059

1D069

1D079

1D089

1D099

1D0A9

1D0B9

1D0C9

1D0D9

1D0E9

äö™∫ ⁄Í˙äö™∫ ⁄Í

A

1D00A

1D01A

1D02A

1D03A

1D04A

1D05A

1D06A

1D07A

1D08A

1D09A

1D0AA

1D0BA

1D0CA

1D0DA

1D0EA

ãõ´ªÀ¤Î˚ãõ´ªÀ¤Î

B

1D00B

C

D

1D01B

1D02B

1D03B

1D04B

1D05B

1D06B

1D07B

1D08B

1D09B

1D0AB

1D0BB

1D0CB

1D0DB

1D0EB

åú¨ºÃ‹Ï¸åü¨ºÃ‹Ï 1D00C

1D01C

1D02C

1D03C

1D04C

1D05C

1D06C

1D07C

1D08C

1D09C

1D0AC

1D0BC

1D0CC

1D0DC

1D0EC

çù≠ΩÕ›Ì˝çù≠ΩÕ›Ì 1D00D

1D01D

1D02D

1D03D

1D04D

1D05D

1D06D

1D07D

1D08D

1D09D

1D0AD

1D0BD

1D0CD

1D0DD

1D0ED

éûÆæŒﬁÓ˛éûÆæŒﬁÓ 1D00E

F

1D020

Åë°±¡—·ÒÅë°±¡—·Ò

1

E

1D010

1D01E

1D02E

1D03E

1D04E

1D05E

1D06E

1D07E

1D08E

1D09E

1D0AE

1D0BE

1D0CE

1D0DE

1D0EE

èüØøœﬂÔˇèúØøœﬂÔ 1D00F

544

1D01F

1D02F

1D03F

1D04F

1D05F

1D06F

1D07F

1D08F

1D09F

1D0AF

1D0BF

1D0CF

1D0DF

1D0EF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

1D000

Byzantine Musical Symbols

Prosodies (Prosodics) These three characters are not actually attested in musical contexts. 1D000 Ä BYZANTINE MUSICAL SYMBOL PSILI 1D001 Å BYZANTINE MUSICAL SYMBOL DASEIA 1D002 Ç BYZANTINE MUSICAL SYMBOL PERISPOMENI

Ekfonetika

1D003 É BYZANTINE MUSICAL SYMBOL OXEIA EKFONITIKON

1D004 Ñ BYZANTINE MUSICAL SYMBOL OXEIA DIPLI 1D005 Ö BYZANTINE MUSICAL SYMBOL VAREIA 1D006 1D007 1D008 1D009 1D00A 1D00B

Ü á à â ä ã

1D00C å 1D00D ç 1D00E é 1D00F 1D010 1D011 1D012

è ê ë í

1D013 ì 1D014 î

EKFONITIKON BYZANTINE MUSICAL SYMBOL VAREIA DIPLI BYZANTINE MUSICAL SYMBOL KATHISTI BYZANTINE MUSICAL SYMBOL SYRMATIKI BYZANTINE MUSICAL SYMBOL PARAKLITIKI BYZANTINE MUSICAL SYMBOL YPOKRISIS BYZANTINE MUSICAL SYMBOL YPOKRISIS DIPLI BYZANTINE MUSICAL SYMBOL KREMASTI BYZANTINE MUSICAL SYMBOL APESO EKFONITIKON BYZANTINE MUSICAL SYMBOL EXO EKFONITIKON BYZANTINE MUSICAL SYMBOL TELEIA BYZANTINE MUSICAL SYMBOL KENTIMATA BYZANTINE MUSICAL SYMBOL APOSTROFOS BYZANTINE MUSICAL SYMBOL APOSTROFOS DIPLI BYZANTINE MUSICAL SYMBOL SYNEVMA BYZANTINE MUSICAL SYMBOL THITA

Melodimata (Melodics)

1D015 ï BYZANTINE MUSICAL SYMBOL OLIGON 1D016 ñ 1D017 1D018 1D019 1D01A

ó ò ô ö

1D01B õ 1D01C ú 1D01D ù 1D01E û 1D01F ü 1D020 † 1D021 ° 1D022 ¢ 1D023 1D024 1D025 1D026

£ § • ¶

1D027 ß 1D028 ® 1D029 © 1D02A ™ 1D02B ´

ARCHAION BYZANTINE MUSICAL SYMBOL GORGON ARCHAION BYZANTINE MUSICAL SYMBOL PSILON BYZANTINE MUSICAL SYMBOL CHAMILON BYZANTINE MUSICAL SYMBOL VATHY BYZANTINE MUSICAL SYMBOL ISON ARCHAION BYZANTINE MUSICAL SYMBOL KENTIMA ARCHAION BYZANTINE MUSICAL SYMBOL KENTIMATA ARCHAION BYZANTINE MUSICAL SYMBOL SAXIMATA BYZANTINE MUSICAL SYMBOL PARICHON BYZANTINE MUSICAL SYMBOL STAVROS APODEXIA BYZANTINE MUSICAL SYMBOL OXEIAI ARCHAION BYZANTINE MUSICAL SYMBOL VAREIAI ARCHAION BYZANTINE MUSICAL SYMBOL APODERMA ARCHAION BYZANTINE MUSICAL SYMBOL APOTHEMA BYZANTINE MUSICAL SYMBOL KLASMA BYZANTINE MUSICAL SYMBOL REVMA BYZANTINE MUSICAL SYMBOL PIASMA ARCHAION BYZANTINE MUSICAL SYMBOL TINAGMA BYZANTINE MUSICAL SYMBOL ANATRICHISMA BYZANTINE MUSICAL SYMBOL SEISMA BYZANTINE MUSICAL SYMBOL SYNAGMA ARCHAION BYZANTINE MUSICAL SYMBOL SYNAGMA META STAVROU

1D059

1D02C ¨ BYZANTINE MUSICAL SYMBOL OYRANISMA 1D02D 1D02E 1D02F 1D030 1D031 1D032 1D033

≠ Æ Ø ∞ ± ≤ ≥

1D034 ¥ 1D035 μ 1D036 ∂ 1D037 ∑ 1D038 1D039 1D03A 1D03B

∏ π ∫ ª

1D03C º 1D03D Ω 1D03E æ 1D03F 1D040 1D041 1D042

ø ¿ ¡ ¬

1D043 √ 1D044 ƒ 1D045 ≈

ARCHAION BYZANTINE MUSICAL SYMBOL THEMA BYZANTINE MUSICAL SYMBOL LEMOI BYZANTINE MUSICAL SYMBOL DYO BYZANTINE MUSICAL SYMBOL TRIA BYZANTINE MUSICAL SYMBOL TESSERA BYZANTINE MUSICAL SYMBOL KRATIMATA BYZANTINE MUSICAL SYMBOL APESO EXO NEO BYZANTINE MUSICAL SYMBOL FTHORA ARCHAION BYZANTINE MUSICAL SYMBOL IMIFTHORA BYZANTINE MUSICAL SYMBOL TROMIKON ARCHAION BYZANTINE MUSICAL SYMBOL KATAVA TROMIKON BYZANTINE MUSICAL SYMBOL PELASTON BYZANTINE MUSICAL SYMBOL PSIFISTON BYZANTINE MUSICAL SYMBOL KONTEVMA BYZANTINE MUSICAL SYMBOL CHOREVMA ARCHAION BYZANTINE MUSICAL SYMBOL RAPISMA BYZANTINE MUSICAL SYMBOL PARAKALESMA ARCHAION BYZANTINE MUSICAL SYMBOL PARAKLITIKI ARCHAION BYZANTINE MUSICAL SYMBOL ICHADIN BYZANTINE MUSICAL SYMBOL NANA BYZANTINE MUSICAL SYMBOL PETASMA BYZANTINE MUSICAL SYMBOL KONTEVMA ALLO BYZANTINE MUSICAL SYMBOL TROMIKON ALLO BYZANTINE MUSICAL SYMBOL STRAGGISMATA BYZANTINE MUSICAL SYMBOL GRONTHISMATA

Fonitika (Vocals) 1D046 1D047 1D048 1D049 1D04A 1D04B

∆ « » … À

1D04C Ã 1D04D Õ 1D04E Œ 1D04F œ 1D050 – 1D051 — 1D052 “ 1D053 ” 1D054 ‘ 1D055 ’ 1D056 ÷

BYZANTINE MUSICAL SYMBOL ISON NEO BYZANTINE MUSICAL SYMBOL OLIGON NEO BYZANTINE MUSICAL SYMBOL OXEIA NEO BYZANTINE MUSICAL SYMBOL PETASTI BYZANTINE MUSICAL SYMBOL KOUFISMA BYZANTINE MUSICAL SYMBOL PETASTOKOUFISMA BYZANTINE MUSICAL SYMBOL KRATIMOKOUFISMA BYZANTINE MUSICAL SYMBOL PELASTON NEO BYZANTINE MUSICAL SYMBOL KENTIMATA NEO ANO BYZANTINE MUSICAL SYMBOL KENTIMA NEO ANO BYZANTINE MUSICAL SYMBOL YPSILI BYZANTINE MUSICAL SYMBOL APOSTROFOS NEO BYZANTINE MUSICAL SYMBOL APOSTROFOI SYNDESMOS NEO BYZANTINE MUSICAL SYMBOL YPORROI BYZANTINE MUSICAL SYMBOL KRATIMOYPORROON BYZANTINE MUSICAL SYMBOL ELAFRON BYZANTINE MUSICAL SYMBOL CHAMILI

Afona or Ypostaseis (Mutes or Hypostases)

1D057 ◊ BYZANTINE MUSICAL SYMBOL MIKRON ISON 1D058 ÿ BYZANTINE MUSICAL SYMBOL VAREIA NEO 1D059 Ÿ BYZANTINE MUSICAL SYMBOL PIASMA NEO

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

545

Musical Symbols Range: 1D100–1D1FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

1D100

Musical Symbols

1D1FF

1D10 1D11 1D12 1D13 1D14 1D15 1D16 1D17 1D18 1D19 1D1A 1D1B 1D1C 1D1D 1D1E 1D1F

Ä ê † ∞ ¿ – ‡ Ä ê

0

1D100

1D101

1D130

1D140

1D150

1D160

1D170

1D180

1D190

† ∞ ¿ –

1D1A0

1D1B0

1D1C0

1D111

1D121

1D131

1D141

1D151

1D161

1D171

1D181

1D191

1D1A1

1D1B1

1D1C1

Ç í ¢ ≤ ¬ “ ‚ Ú Ç í ¢ ≤ ¬

2

1D102

1D112

1D122

1D132

1D142

1D152

1D162

1D172

1D182

1D192

1D1A2

1D1B2

É ì £ ≥ √ ” „ É ì £ ≥

3

1D103

Ñ

4

5

1D113

1D123

1D133

1D143

î § ¥ ƒ

1D104

1D114

Ö

ï

1D105

1D1D0

1D115

1D124

1D134

1D144

• μ ≈

1D125

1D135

1D145

1D153

1D163

1D173

1D183

1D193

1D1A3

1D1B3

‘ ‰ Ñ î § ¥

1D154

1D164

1D174

1D184

1D194

1D1A4

1D1B4

’ Â Ö ï • μ

1D155

1D165

1D175

1D185

1D195

1D1A5

1D1B5

1D1C2

√

1D1C3

ƒ

1D1D1

“

1D1D2

”

1D1D3

‘

1D1C4

1D1D4

≈

’

1D1C5

1D1D5

Ü ñ ¶ ∂ ∆ ÷ Ê Ü ñ ¶ ∂ ∆ ÷

6

1D106

1D116

1D126

1D107

8

1D146

1D156

1D166

1D176

1D186

1D196

1D1A6

1D1B6

1D1C6

1D1D6

1D137

1D117

1D147

1D157

1D167

1D177

1D187

1D197

1D1A7

1D1B7

1D1C7

1D1D7

à ò

∏ » ÿ Ë à ò ® ∏ » ÿ

â ô

π … È â ô ©

1D108

9

1D136

∑ « ◊ Á á ó ß ∑ « ◊

á ó

7

1D109

1D138

1D118

1D139

1D119

1D148

1D149

1D158

1D159

1D168

1D169

1D178

1D179

1D188

1D189

1D198

1D199

1D1A8

1D1A9

1D1B8

1D1C8

1D1D8

π … Ÿ

1D1B9

1D1C9

1D1D9

ä ö ™ ∫ ⁄ Í ä ö ™ ∫ ⁄

A

1D10A

1D11A

1D12A

1D13A

1D14A

ã õ ´ ª À

B

1D10B

C

D

1D11B

1D12B

1D13B

1D14B

1D15A

1D16A

1D17A

1D18A

1D11C

1D12C

ç ù ≠

1D10D

1D11D

1D12D

é û Æ

1D10E

1D11E

1D12E

1D13C

Ω

1D13D

1D14C

1D16B

1D17B

1D18B

1D15C

1D16C

1D17C

1D18C

1D15D

1D16D

1D17D

1D18D

1D14E

1D15E

1D16E

1D17E

1D18E

è ü Ø ø œ ﬂ Ô ˇ è 1D10F

1D11F

1D12F

1D13F

1D1AA

1D1BA

1D1CA

1D1DA

1D19B

1D1AB

1D1BB

1D1CB

1D1DB

ú ¨ º Ã ‹

1D19C

1D1AC

1D1BC

1D1CC

1D1DC

Õ › Ì ˝ ç ù ≠ Ω Õ ›

1D14D

æ Œ ﬁ Ó ˛ é

1D13E

1D19A

¤ Î ˚ ã õ ´ ª À ¤

1D15B

å ú ¨ º Ã ‹ Ï ¸ å

1D10C

F

1D120

Å ë ° ± ¡ — · Ò Å ë ° ± ¡ —

1

E

1D110

1D14F

1D15F

1D16F

1D17F

1D18F

1D19D

1D1AD

1D1BD

1D1CD

1D1DD

û Æ æ Œ

1D19E

1D1AE

1D1BE

1D1CE

ü Ø ø œ

1D19F

1D1AF

1D1BF

1D1CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

549

Ancient Greek Musical Notation Range: 1D200–1D24F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

1D200

Ancient Greek Musical Notation

1D24F

1D20 1D21 1D22 1D23 1D24 0

𝈀 𝈐 𝈠 𝈰 𝉀 1D200

1

1D233

1D243

1D214

1D224

1D234

1D244

1D215

1D225

1D235

1D245

1D216

1D226

1D236

1D217

1D227

1D237

1D218

1D228

1D238

1D219

1D229

1D239

1D21A

1D22A

1D23A

1D21B

1D22B

1D23B

1D21C

1D22C

1D23C

1D21D

1D22D

1D23D

𝈎 𝈞 𝈮 𝈾 1D20E

F

1D223

𝈍 𝈝 𝈭 𝈽

1D20D

E

1D213

𝈌 𝈜 𝈬 𝈼 1D20C

D

1D242

𝈋 𝈛 𝈫 𝈻 1D20B

C

1D232

𝈊 𝈚 𝈪 𝈺 1D20A

B

1D222

𝈉 𝈙 𝈩 𝈹

1D209

A

1D212

𝈈 𝈘 𝈨 𝈸 1D208

9

1D241

𝈇 𝈗 𝈧 𝈷

1D207

8

1D231

𝈆 𝈖 𝈦 𝈶

1D206

7

1D221

𝈅 𝈕 𝈥 𝈵 𝉅 1D205

6

1D211

𝈄 𝈔 𝈤 𝈴 $ô 1D204

5

1D240

𝈃 𝈓 𝈣 𝈳 $ó 1D203

4

1D230

𝈂 𝈒 𝈢 𝈲 $ò

1D202

3

1D220

𝈁 𝈑 𝈡 𝈱 𝉁 1D201

2

1D210

1D21E

1D22E

1D23E

𝈏 𝈟 𝈯 𝈿 1D20F

1D21F

1D22F

1D23F

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

553

Tai Xuan Jing Symbols Range: 1D300–1D35F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Counting Rod Numerals Range: 1D360–1D37F This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

Mathematical Alphanumeric Symbols Range: 1D400–1D7FF This file contains an excerpt from the character code tables and list of character names for

The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard.

See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0.

Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line.

See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.

Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.

Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site.

See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2006 Unicode, Inc. All rights reserved.

1D400

Mathematical Alphanumeric Symbols

1D4FF

1D40 1D41 1D42 1D43 1D44 1D45 1D46 1D47 1D48 1D49 1D4A 1D4B 1D4C 1D4D 1D4E 1D4F

1D400

1D401

1D420

1D430

1D440

1D450

1D460

1D470

1D480

1D4B0

1D490

1D411

1D421

1D431

1D441

1D451

1D461

1D471

1D481

1D4C0

1D4D0

1D4E0

1D4F0

ª ¸ Ç× ç

1D4B1

1D491

1D4C1

1D4D1

1D4E1

1D4F1

$ 4 D T c s «¹ È Ø è

2

1D402

3

4

1D412

1D422

1D432

1D442

1D452

1D472

1D482

1D492

1D4A2

1D4B2

1D4C2

1D4D2

1D4E2

1D4F2

1D4B3

1D4C3

1D4D3

1D4E3

1D4F3

¬ º É Ù é

& 6 F V e u

1D404

1D413

1D414

1D423

1D424

1D433

1D434

1D443

1D444

1D453

1D454

1D405

1D415

1D425

1D435

1D445

1D406

1D416

1D426

1D436

1D446

1D463

1D464

1D473

1D474

1D483

1D484

1D493

Ê Ú ê

1D4D4

1D4B4

1D494

1D4E4

1D4F4

f v ¡ ® » Ë Û ë

' 7 G

5

1D462

% 5 E U d t 1D403

1D465

1D475

1D485

1D495

1D4A5

1D4B5

1D4C5

1D4D5

1D4E5

1D4F5

1D466

1D476

1D486

1D496

1D4A6

1D4B6

1D4C6

1D4D6

1D4E6

1D4F6

( 8 H W g w ¢ ¯ ¼ ÌÜì

6

1D456

) 9 I X h x

7

1D407

1D417

1D427

1D437

1D447

1D457

1D467

1D477

1D487

° ½ ÍÝ í

1D4B7

1D497

* : J Y i y

8

1D408

1D418

1D428

1D438

1D448

1D458

1D468

1D478

1D488

1D4C7

1D4D7

1D4E7

1D4F7

± ¾ Î Þ î

1D4B8

1D498

1D4C8

1D4D8

1D4E8

1D4F8

+ ; K Z j z £ ² ¿ Ï ß ï

9

1D409

1D419

1D429

1D439

1D449

1D459

1D469

1D479

1D489

1D499

1D4A9

1D4B9

, < L [ k { ¤

A

1D40A

1D41A

1D42A

1D43A

1D44A

1D45A

1D46A

1D47A

1D48A

1D49A

1D4AA

1D4C9

1D4D9

1D4E9

1D4F9

1D4CA

1D4DA

1D4EA

1D4FA

À Ð à ð

- = M \ l | ¥ ³ Á Ñ á ñ

B

1D40B

C

D

1D41B

1D42B

1D43B

1D44B

1D45B

1D46B

1D47B

1D48B

1D49B

1D4AB

1D4BB

. > N ] m } ¦ 1D40C

1D41C

1D42C

1D43C

1D44C

1D45C

1D46C

1D47C

1D48C

1D49C

1D4DB

1D4EB

1D4FB

Â Ò â ò 1D4DC

1D4EC

1D4FC

´ Ã Ó ã ó

1D41D

1D42D

1D43D

1D44D

1D45D

1D46D

1D47D

1D48D

1D40E

1D41E

1D42E

1D43E

1D44E

1D45E

1D46E

1D47E

1D48E

1D49E

1D40F

1D41F

1D42F

1D43F

1D44F

1D45F

1D46F

1D47F

1D48F

1D49F

1D4CB

1D4CC

1D4AC

/ ? O ^ n ~ 1D40D

F

1D410

# 3 C S b r

1

E

© · Æ Ö æ

" 2 B R a q

0

1D4BD

1D4CD

1D4DD

1D4ED

1D4FD

1D4AE

1D4BE

1D4CE

1D4DE

1D4EE

1D4FE

1D4AF

1D4BF

1D4CF

1D4DF

1D4EF

1D4FF

0 @ P _ o § µ Ä Ô ä ô

! 1 A Q ` p ¨ ¶ Å Õ å õ

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

559

1D500

Mathematical Alphanumeric Symbols

1D5FF

1D50 1D51 1D52 1D53 1D54 1D55 1D56 1D57 1D58 1D59 1D5A 1D5B 1D5C 1D5D 1D5E 1D5F

ö # 1 = L \ l | ¬ ¼ Ì Ü

0

1D500

1D510

1D540

1D550

1D501

1D511

1D521

1D531

1D541

1D502

1D512

1D522

1D532

1D542

1D503

1D513

1D523

1D533

1D543

1D560

1D570

1D580

1D590

1D5A0

1D5B0

1D5C0

1D5D0

1D5E0

1D5F0

M ] m } ½ Í Ý

1D561

1D571

1D581

1D591

1D5A1

1D5B1

1D5C1

1D5D1

1D5E1

1D5F1

1D552

1D562

1D572

1D582

1D592

1D5A2

1D5B2

1D5C2

1D5D2

1D5E2

1D5F2

1D553

1D563

1D573

1D583

1D593

1D5A3

1D5B3

1D5C3

1D5D3

1D5E3

1D5F3

ø % 3 > N ^ n ~ ® ¾ Î Þ

2

ù & 4 ? O _ o ¯ ¿ Ï ß

3

ú ' 5 @ P ` p ° À Ð à

4

1D504

1D514

1D524

1D534

1D544

1D525

1D505

1D554

1D564

1D574

1D584

1D594

1D5A4

1D5B4

1D5C4

1D5D4

1D5E4

1D5F4

A Q a q ¡ ± Á Ñ á

(

û

5

1D555

1D535

1D565

1D575

1D585

1D595

1D5A5

1D5B5

1D5C5

1D5D5

1D5E5

1D5F5

) 6 B R b r ¢² Â Ò â

6

1D516

1D526

1D536

1D546

1D507

1D517

1D527

1D557

1D537

1D508

1D518

1D528

1D558

1D538

1D509

1D519

1D529

1D50A

1D51A

1D559

1D539

1D586

1D596

1D5A6

1D5B6

1D5C6

1D5D6

1D5E6

1D5F6

1D567

1D577

1D587

1D597

1D5A7

1D5B7

1D5C7

1D5D7

1D5E7

1D5F7

1D568

1D578

1D588

1D598

1D5A8

1D5B8

1D5C8

1D5D8

1D5E8

1D5F8

1D569

1D579

1D589

1D599

1D5A9

1D5B9

1D5C9

1D5D9

1D5E9

1D5F9

7 F V f v ¦ ¶ Æ Öæ

A

1D576

E U e u ¥ µ Å Õ å

þ ,

9

1D566

D T d t ¤ ´ Ä Ô ä

ý +

8

1D556

C S c s £ ³ Ã Ó ã

ü *

7

1D54A

1D52A

1D55A

1D56A

1D57A

1D58A

1D59A

1D5AA

1D5BA

1D5CA

1D5DA

1D5EA

1D5FA

- 8 G W g w § · Ç × ç

B

1D51B

1D51C

D

1D52B

1D53B

1D54B

1D55B

1D56B

1D57B

1D58B

1D59B

1D5AB

1D5BB

1D5CB

1D5DB

1D5EB

1D5FB

. 9 H X h x ¨ ¸ È Ø è

C

1D50D

1D52C

1D53C

1D52D

1D53D

1D54C

1D55C

1D56C

1D57C

1D58C

1D59C

1D5AC

1D5BC

1D5CC

1D5DC

1D5EC

1D5FC

/ : I Y i y © ¹ É Ù é 1D54D

1D55D

1D56D

1D57D

1D58D

1D59D

1D5AD

1D5BD

1D5CD

1D5DD

1D5ED

1D5FD

! 0 ; J Z j z ª º Ê Ú ê

1D50E

F

1D530

÷ $ 2

1

E

1D520

1D51E

1D52E

"

1D50F

560

1D51F

1D52F

1D53E

1D54E

1D55E

1D56E

1D57E

1D58E

1D59E

1D5AE

1D5BE

1D5CE

1D5DE

1D5EE

1D5FE

< K [ k { « » Ë Û ë

1D54F

1D55F

1D56F

1D57F

1D58F

1D59F

1D5AF

1D5BF

1D5CF

1D5DF

1D5EF

1D5FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

1D600

Mathematical Alphanumeric Symbols

1D6FF

1D60 1D61 1D62 1D63 1D64 1D65 1D66 1D67 1D68 1D69 1D6A 1D6B 1D6C 1D6D 1D6E 1D6F

ì ü . > N ^ n ~ ¬ ¼ ÌÜ

0

1D600

1D601

1D611

1D621

î þ

2

1D602

1D630

1D640

1D650

1D660

1D670

1D680

1D690

1D6A0

1D6B0

1D6C0

1D6D0

1D6E0

1D6F0

1D612

1D622

1D631

1D632

1D641

1D651

1D661

1D671

1D681

1D691

1D6A1

1D6B1

1D6C1

1D6D1

1D6E1

1D6F1

0 @ P ` p ® ¾ ÎÞ 1D642

1D652

1D662

1D672

1D682

1D692

1D6A2

1D6B2

1D6C2

1D6D2

1D6E2

1D6F2

ï Õ ! 1 A Q a q ¯ ¿ Ïß

3

1D603

1D613

1D623

1D633

1D643

1D653

1D663

1D673

1D683

1D693

1D6A3

1D6B3

1D6C3

1D6D3

1D6E3

1D6F3

ð " 2 B R b r ° À Ð à

4

1D604

1D614

1D624

1D634

1D644

1D654

1D664

1D674

1D684

1D694

1D6A4

1D6B4

1D6C4

1D6D4

1D6E4

1D6F4

ñ # 3 C S c s ¡ ± Á Ñá

5

1D605

6

7

1D615

1D625

1D635

1D645

1D655

1D665

1D675

1D685

1D695

1D6A5

1D6B5

1D6C5

1D6D5

1D6E5

1D6F5

ò $ 4 D T d t

¢ ² Â Òâ

ó % 5 E U e u

£ ³ Ã Óã

1D606

1D607

1D616

1D617

1D626

1D627

1D636

1D637

1D646

1D647

1D656

1D657

1D666

1D667

1D676

1D677

1D686

1D687

1D6B6

1D696

1D6B7

1D697

1D6C6

1D6C7

1D6D6

1D6D7

1D6E6

1D6E7

1D6F6

1D6F7

ô & 6 F V f v ¤ ´ Ä Ôä

8

1D608

1D618

1D628

1D638

1D648

1D658

1D668

1D678

1D688

1D698

1D6A8

1D6B8

1D6C8

1D6D8

1D6E8

1D6F8

õ ' 7 G W g w ¥ µ Å Õå

9

1D609

1D619

1D629

1D639

1D649

1D659

1D669

1D679

1D689

1D699

1D6A9

1D6B9

1D6C9

1D6D9

1D6E9

1D6F9

ö ( 8 H X h x ¦ ¶ Æ Öæ

A

1D60A

1D61A

1D62A

1D63A

1D64A

1D65A

1D66A

1D67A

1D68A

1D69A

1D6AA

1D6BA

1D6CA

1D6DA

1D6EA

1D6FA

÷ ) 9 I Y i y § · Ç ×ç

B

1D60B

C

D

1D61B

1D62B

1D63B

1D64B

1D65B

1D66B

1D67B

1D68B

1D69B

1D6AB

1D6BB

1D6CB

1D6DB

1D6EB

1D6FB

ø * : J Z j z ¨ ¸ È Øè

1D60C

1D61C

1D62C

1D63C

1D64C

1D65C

1D66C

1D67C

1D68C

1D69C

1D6AC

1D6BC

1D6CC

1D6DC

1D6EC

1D6FC

ù + ; K [ k { © ¹ É Ùé

1D60D

1D61D

1D62D

1D63D

1D64D

1D65D

1D66D

1D67D

1D68D

1D69D

1D6AD

1D6BD

1D6CD

1D6DD

1D6ED

1D6FD

ú , < L \ l | ª º ÊÚê 1D60E

F

1D620

í ý / ? O _ o ½ ÍÝ

1

E

1D610

1D61E

1D62E

1D63E

1D64E

1D65E

1D66E

1D67E

1D68E

1D69E

1D6AE

1D6BE

1D6CE

1D6DE

1D6EE

1D6FE

û - = M ] m } « » ËÛ ë 1D60F

1D61F

1D62F

1D63F

1D64F

1D65F

1D66F

1D67F

1D68F

1D69F

1D6AF

1D6BF

1D6CF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

1D6DF

1D6EF

1D6FF

561

1D700

Mathematical Alphanumeric Symbols

1D7FF

1D70 1D71 1D72 1D73 1D74 1D75 1D76 1D77 1D78 1D79 1D7A 1D7B 1D7C 1D7D 1D7E 1D7F

ì ü . > N ^ n ~ ® º Ê Ú

0

1D700

1D701

1D711

1D721

îþ

2

1D702

1D730

1D740

1D750

1D760

1D770

1D780

1D790

1D7A0

1D7B0

1D7C0

1D7D0

1D7E0

1D7F0

1D712

1D722

1D731

1D741

1D732

1D742

1D751

1D761

1D771

1D781

1D791

1D7A1

1D7B1

1D7C1

1D7D1

1D7E1

1D7F1

0 @ P ` p ° ¼ Ì Ü 1D752

1D762

1D772

1D782

1D792

1D7A2

1D7B2

1D7C2

1D7D2

1D7E2

1D7F2

ïĹ ! 1 A Q a q ¡ ± ½ Í Ý

3

1D703

1D713

1D723

1D733

1D743

1D753

1D763

1D773

1D783

1D793

1D7A3

1D7B3

1D7C3

1D7D3

1D7E3

1D7F3

ð " 2 B R b r ¢ ² ¾ Î Þ

4

1D704

1D714

1D724

1D734

1D744

1D754

1D764

1D774

1D784

1D794

1D7A4

1D7B4

1D7C4

1D7D4

1D7E4

1D7F4

ñ # 3CS c s £ ³ ¿ Ï ß

5

1D705

1D715

1D725

1D735

1D745

1D755

1D765

1D775

1D785

1D795

1D7A5

1D7B5

1D7C5

1D7D5

1D7E5

1D7F5

ò $ 4 D T d t ¤ ´ À Ð à

6

1D706

1D716

1D726

1D736

1D746

1D756

1D766

1D776

1D786

1D796

1D7A6

1D7B6

1D7C6

1D7D6

1D7E6

1D7F6

ó % 5 E U e u ¥ µ Á Ñ á

7

1D707

1D717

1D727

1D737

1D747

1D757

1D767

1D777

1D787

1D797

1D7A7

1D7B7

1D7C7

1D7D7

1D7E7

1D7F7

ô & 6 F V f v ¦ ¶ Â Ò â

8

1D708

1D718

1D728

1D738

1D748

1D758

1D768

1D778

1D788

1D798

1D7A8

1D7B8

1D7C8

1D7D8

1D7E8

1D7F8

õ ' 7 G W g w § · Ã Ó ã

9

1D709

1D719

1D729

1D739

1D749

1D759

1D769

1D779

1D789

1D799

1D7A9

1D7B9

1D7C9

1D7D9

1D7E9

1D7F9

ö ( 8 H X h x ¨ ≠ Ä Ô ä

A

1D70A

1D71A

1D72A

1D73A

1D74A

1D75A

1D76A

1D77A

1D78A

1D79A

1D7AA

1D7BA

1D7CA

1D7DA

1D7EA

1D7FA

÷ ) 9 I Y i y © Æ Å Õ å

B

1D70B

C

D

1D71B

1D72B

1D73B

1D74B

1D75B

1D76B

1D77B

1D78B

1D79B

1D7AB

1D7BB

1D7CB

1D71C

1D72C

1D73C

1D74C

1D75C

1D76C

1D77C

1D78C

1D79C

1D7AC

1D7DC

1D7BC

1D71D

1D72D

1D73D

1D74D

1D75D

1D76D

1D77D

1D78D

1D79D

1D7AD

1D7EB

1D7FB

1D7EC

1D7FC

Ç × ç

ù + ; K [ k { «

1D70D

1D7DB

Æ Ö æ

ø * : J Z j z ª 1D70C

1D7DD

1D7BD

1D7ED

1D7FD

ú , < L \ l | ¬ ¸ È Ø è 1D70E

F

1D720

íý / ? O _ o ¯ » Ë Û

1

E

1D710

1D71E

1D72E

1D73E

1D74E

1D75E

1D76E

1D77E

1D78E

1D79E

1D7AE

1D7BE

1D7CE

1D7DE

1D7EE

1D7FE

û - = M ] m } ¹ É Ù é 1D70F

562

1D71F

1D72F

1D73F

1D74F

1D75F

1D76F

1D77F

1D78F

1D79F

1D7AF

1D7BF

1D7CF

1D7DF

1D7EF

1D7FF

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

1D400

Mathematical Alphanumeric Symbols

1D43B

To be used for mathematical variables where style variations 1D41E MATHEMATICAL BOLD SMALL E 0065 e latin small letter e are important semantically. For general text, use standard 1D41F MATHEMATICAL BOLD SMALL F Latin and Greek letters with markup. 0066 f latin small letter f 1D420 MATHEMATICAL BOLD SMALL G Bold symbols 0067 g latin small letter g 1D400 MATHEMATICAL BOLD CAPITAL A 1D421 MATHEMATICAL BOLD SMALL H 0041 A latin capital letter a 0068 h latin small letter h 1D401 ! MATHEMATICAL BOLD CAPITAL B 1D422 MATHEMATICAL BOLD SMALL I 0042 B latin capital letter b 0069 i latin small letter i 1D402 " MATHEMATICAL BOLD CAPITAL C 1D423 MATHEMATICAL BOLD SMALL J 0043 C latin capital letter c 006A j latin small letter j 1D403 # MATHEMATICAL BOLD CAPITAL D 1D424 MATHEMATICAL BOLD SMALL K 0044 D latin capital letter d 006B k latin small letter k 1D404 $ MATHEMATICAL BOLD CAPITAL E 1D425 MATHEMATICAL BOLD SMALL L 0045 E latin capital letter e 006C l latin small letter l 1D405 % MATHEMATICAL BOLD CAPITAL F 1D426 MATHEMATICAL BOLD SMALL M 0046 F latin capital letter f 006D m latin small letter m 1D406 & MATHEMATICAL BOLD CAPITAL G 1D427 MATHEMATICAL BOLD SMALL N 0047 G latin capital letter g 006E n latin small letter n 1D407 ' MATHEMATICAL BOLD CAPITAL H 1D428 MATHEMATICAL BOLD SMALL O 0048 H latin capital letter h 006F o latin small letter o 1D408 ( MATHEMATICAL BOLD CAPITAL I 1D429 MATHEMATICAL BOLD SMALL P 0049 I latin capital letter i 0070 p latin small letter p 1D409 ) MATHEMATICAL BOLD CAPITAL J 1D42A MATHEMATICAL BOLD SMALL Q 004A J latin capital letter j 0071 q latin small letter q 1D40A * MATHEMATICAL BOLD CAPITAL K 1D42B MATHEMATICAL BOLD SMALL R 004B K latin capital letter k 0072 r latin small letter r 1D40B + MATHEMATICAL BOLD CAPITAL L 1D42C MATHEMATICAL BOLD SMALL S 004C L latin capital letter l 0073 s latin small letter s 1D40C , MATHEMATICAL BOLD CAPITAL M 1D42D MATHEMATICAL BOLD SMALL T 004D M latin capital letter m 0074 t latin small letter t 1D40D - MATHEMATICAL BOLD CAPITAL N 1D42E MATHEMATICAL BOLD SMALL U 004E N latin capital letter n 0075 u latin small letter u 1D40E . MATHEMATICAL BOLD CAPITAL O 1D42F MATHEMATICAL BOLD SMALL V 004F O latin capital letter o 0076 v latin small letter v 1D40F / MATHEMATICAL BOLD CAPITAL P 1D430 MATHEMATICAL BOLD SMALL W 0050 P latin capital letter p 0077 w latin small letter w 1D410 0 MATHEMATICAL BOLD CAPITAL Q 1D431 MATHEMATICAL BOLD SMALL X 0051 Q latin capital letter q 0078 x latin small letter x 1D411 1 MATHEMATICAL BOLD CAPITAL R 1D432 MATHEMATICAL BOLD SMALL Y 0052 R latin capital letter r 0079 y latin small letter y 1D412 2 MATHEMATICAL BOLD CAPITAL S 1D433 MATHEMATICAL BOLD SMALL Z 0053 S latin capital letter s 007A z latin small letter z 1D413 3 MATHEMATICAL BOLD CAPITAL T 0054 T latin capital letter t Italic symbols 1D414 4 MATHEMATICAL BOLD CAPITAL U Several italic symbols have been previously coded in the 0055 U latin capital letter u Letterlike Symbols block and are retained there to ensure 1D415 5 MATHEMATICAL BOLD CAPITAL V unambiguous representation. 0056 V latin capital letter v 1D434 MATHEMATICAL ITALIC CAPITAL A 1D416 6 MATHEMATICAL BOLD CAPITAL W 0057 W latin capital letter w 0041 A latin capital letter a 1D417 7 MATHEMATICAL BOLD CAPITAL X 1D435 MATHEMATICAL ITALIC CAPITAL B 0058 X latin capital letter x 0042 B latin capital letter b 1D418 8 MATHEMATICAL BOLD CAPITAL Y 1D436 MATHEMATICAL ITALIC CAPITAL C 0059 Y latin capital letter y 0043 C latin capital letter c 1D419 9 MATHEMATICAL BOLD CAPITAL Z 1D437 MATHEMATICAL ITALIC CAPITAL D 005A Z latin capital letter z 0044 D latin capital letter d 1D41A : MATHEMATICAL BOLD SMALL A 1D438 MATHEMATICAL ITALIC CAPITAL E 0061 a latin small letter a 0045 E latin capital letter e 1D41B ; MATHEMATICAL BOLD SMALL B 1D439 MATHEMATICAL ITALIC CAPITAL F 0062 b latin small letter b 0046 F latin capital letter f 1D41C < MATHEMATICAL BOLD SMALL C 1D43A MATHEMATICAL ITALIC CAPITAL G 0063 c latin small letter c 0047 G latin capital letter g 1D41D = MATHEMATICAL BOLD SMALL D 1D43B MATHEMATICAL ITALIC CAPITAL H 0064 d latin small letter d 0048 H latin capital letter h

The Unicode Standard 5.0, Copyright © 1991-2006 Unicode, Inc. All rights reserved.

563

1D43C

Mathematical Alphanumeric Symbols

1D43C ] MATHEMATICAL ITALIC CAPITAL I 0049 I latin capital letter i 1D43D ^ MATHEMATICAL ITALIC CAPITAL J 004A J latin capital letter j 1D43E _ MATHEMATICAL ITALIC CAPITAL K 004B K latin capital letter k 1D43F ` MATHEMATICAL ITALIC CAPITAL L 004C L latin capital letter l 1D440 a MATHEMATICAL ITALIC CAPITAL M 004D M latin capital letter m 1D441 b MATHEMATICAL ITALIC CAPITAL N 004E N latin capital letter n 1D442 c MATHEMATICAL ITALIC CAPITAL O 004F O latin capital letter o 1D443 d MATHEMATICAL ITALIC CAPITAL P 0050 P latin capital letter p 1D444 e MATHEMATICAL ITALIC CAPITAL Q 0051 Q latin capital letter q 1D445 f MATHEMATICAL ITALIC CAPITAL R 0052 R latin capital letter r 1D446 g MATHEMATICAL ITALIC CAPITAL S 0053 S latin capital letter s 1D447 h MATHEMATICAL ITALIC CAPITAL T 0054 T latin capital letter t 1D448 i MATHEMATICAL ITALIC CAPITAL U 0055 U latin capital letter u 1D449 j MATHEMATICAL ITALIC CAPITAL V 0056 V latin capital letter v 1D44A k MATHEMATICAL ITALIC CAPITAL W 0057 W latin capital letter w 1D44B l MATHEMATICAL ITALIC CAPITAL X 0058 X latin capital letter x 1D44C m MATHEMATICAL ITALIC CAPITAL Y 0059 Y latin capital letter y 1D44D n MATHEMATICAL ITALIC CAPITAL Z 005A Z latin capital letter z 1D44E o MATHEMATICAL ITALIC SMALL A 0061 a latin small letter a 1D44F p MATHEMATICAL ITALIC SMALL B 0062 b latin small letter b 1D450 q MATHEMATICAL ITALIC SMALL C 0063 c latin small letter c 1D451 r MATHEMATICAL ITALIC SMALL D 0064 d latin small letter d 1D452 s MATHEMATICAL ITALIC SMALL E 0065 e latin small letter e 1D453 t MATHEMATICAL ITALIC SMALL F