# Khmer character tables # This document lists the per-character shaping information needed to [shape Khmer text](../opentype-shaping-khmer.md). **Contents** - [Khmer character table](#khmer-character-table) - [Khmer Symbols character table](#khmer-symbols-character-table) - [Miscellaneous character table](#miscellaneous-character-table) ## Khmer character table ## Khmer glyphs should be classified as in the following table. Codepoints in the Khmer block with no assigned meaning are designated as _unassigned_ in the _Unicode category_ column. Assigned codepoints with a _null_ in the _Shaping class_ column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols. > Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important > during syllable identification, but generally evoke no further > special behavior during the rest of the shaping process. The _Mark-placement subclass_ column indicates mark-placement positioning for codepoints in the _Mark_ category. Assigned, non-mark codepoints have a _null_ in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the _Unicode category_ column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining. Some codepoints in the following table use a _Shaping class_ that differs from the codepoint's Unicode _General Category_. The _Shaping class_ takes precedence during OpenType shaping, as it captures more specific, script-aware behavior. :::{table} Khmer character table | Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph | |:----------|:-----------------|:------------------|:---------------------------|:-----------------------------| |`U+1780` | Letter | CONSONANT | _null_ | ក Ka | |`U+1781` | Letter | CONSONANT | _null_ | ខ Kha | |`U+1782` | Letter | CONSONANT | _null_ | គ Ko | |`U+1783` | Letter | CONSONANT | _null_ | ឃ Kho | |`U+1784` | Letter | CONSONANT | _null_ | ង Ngo | |`U+1785` | Letter | CONSONANT | _null_ | ច Ca | |`U+1786` | Letter | CONSONANT | _null_ | ឆ Cha | |`U+1787` | Letter | CONSONANT | _null_ | ជ Co | |`U+1788` | Letter | CONSONANT | _null_ | ឈ Cho | |`U+1789` | Letter | CONSONANT | _null_ | ញ Nyo | |`U+178A` | Letter | CONSONANT | _null_ | ដ Da | |`U+178B` | Letter | CONSONANT | _null_ | ឋ Ttha | |`U+178C` | Letter | CONSONANT | _null_ | ឌ Do | |`U+178D` | Letter | CONSONANT | _null_ | ឍ Ttho | |`U+178E` | Letter | CONSONANT | _null_ | ណ Nno | |`U+178F` | Letter | CONSONANT | _null_ | ត Ta | | | | | | |`U+1790` | Letter | CONSONANT | _null_ | ថ Tha | |`U+1791` | Letter | CONSONANT | _null_ | ទ To | |`U+1792` | Letter | CONSONANT | _null_ | ធ Tho | |`U+1793` | Letter | CONSONANT | _null_ | ន No | |`U+1794` | Letter | CONSONANT | _null_ | ប Ba | |`U+1795` | Letter | CONSONANT | _null_ | ផ Pha | |`U+1796` | Letter | CONSONANT | _null_ | ព Po | |`U+1797` | Letter | CONSONANT | _null_ | ភ Pho | |`U+1798` | Letter | CONSONANT | _null_ | ម Mo | |`U+1799` | Letter | CONSONANT | _null_ | យ Yo | |`U+179A` | Letter | CONSONANT | _null_ | រ Ro | |`U+179B` | Letter | CONSONANT | _null_ | ល Lo | |`U+179C` | Letter | CONSONANT | _null_ | វ Vo | |`U+179D` | Letter | CONSONANT | _null_ | ឝ Sha | |`U+179E` | Letter | CONSONANT | _null_ | ឞ Sso | |`U+179F` | Letter | CONSONANT | _null_ | ស Sa | | | | | | |`U+17A0` | Letter | CONSONANT | _null_ | ហ Ha | |`U+17A1` | Letter | CONSONANT | _null_ | ឡ La | |`U+17A2` | Letter | CONSONANT | _null_ | អ Qa | |`U+17A3` | Letter | VOWEL_INDEPENDENT | _null_ | ឣ Qaq | |`U+17A4` | Letter | VOWEL_INDEPENDENT | _null_ | ឤ Qaa | |`U+17A5` | Letter | VOWEL_INDEPENDENT | _null_ | ឥ Qi | |`U+17A6` | Letter | VOWEL_INDEPENDENT | _null_ | ឦ Qii | |`U+17A7` | Letter | VOWEL_INDEPENDENT | _null_ | ឧ Qu | |`U+17A8` | Letter | VOWEL_INDEPENDENT | _null_ | ឨ Quk | |`U+17A9` | Letter | VOWEL_INDEPENDENT | _null_ | ឩ Quu | |`U+17AA` | Letter | VOWEL_INDEPENDENT | _null_ | ឪ Quuv | |`U+17AB` | Letter | VOWEL_INDEPENDENT | _null_ | ឫ Ry | |`U+17AC` | Letter | VOWEL_INDEPENDENT | _null_ | ឬ Ryy | |`U+17AD` | Letter | VOWEL_INDEPENDENT | _null_ | ឭ Ly | |`U+17AE` | Letter | VOWEL_INDEPENDENT | _null_ | ឮ Lyy | |`U+17AF` | Letter | VOWEL_INDEPENDENT | _null_ | ឯ Qe | | | | | | |`U+17B0` | Letter | VOWEL_INDEPENDENT | _null_ | ឰ Qai | |`U+17B1` | Letter | VOWEL_INDEPENDENT | _null_ | ឱ Qoo Type One | |`U+17B2` | Letter | VOWEL_INDEPENDENT | _null_ | ឲ Qoo Type Two | |`U+17B3` | Letter | VOWEL_INDEPENDENT | _null_ | ឳ Qau | |`U+17B4` | Mark [Mn] | _null_ | _null_ | ឴ Inherent Aq | |`U+17B5` | Mark [Mn] | _null_ | _null_ | ឵ Inherent Aa | |`U+17B6` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ា Sign Aa | |`U+17B7` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ិ Sign I | |`U+17B8` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ី Sign Ii | |`U+17B9` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ឹ Sign Y | |`U+17BA` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ឺ Sign Yy | |`U+17BB` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ុ Sign U | |`U+17BC` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ូ Sign Uu | |`U+17BD` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ួ Sign Ua | |`U+17BE` | Mark [Mc] | VOWEL_DEPENDENT | TOP_AND_LEFT_POSITION | ើ Sign Oe | |`U+17BF` | Mark [Mc] | VOWEL_DEPENDENT | TOP_LEFT_AND_RIGHT_POSITION| ឿ Sign Ya | | | | | | |`U+17C0` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ៀ Sign Ie | |`U+17C1` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | េ Sign E | |`U+17C2` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ែ Sign Ae | |`U+17C3` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ៃ Sign Ai | |`U+17C4` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ោ Sign Oo | |`U+17C5` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ៅ Sign Au | |`U+17C6` | Mark [Mn] | NUKTA | TOP_POSITION | ំ Nikahit | |`U+17C7` | Mark [Mc] | VISARGA | RIGHT_POSITION | ះ Reahmuk | |`U+17C8` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ៈ Yuukaleapintu | |`U+17C9` | Mark [Mn] | REGISTER_SHIFTER | TOP_POSITION | ៉ Muusikatoan | |`U+17CA` | Mark [Mn] | REGISTER_SHIFTER | TOP_POSITION | ៊ Triisap | |`U+17CB` | Mark [Mn] | SYLLABLE_MODIFIER | TOP_POSITION | ់ Bantoc | |`U+17CC` | Mark [Mn] | CONSONANT_POST_REPHA| TOP_POSITION | ៌ Robat | |`U+17CD` | Mark [Mn] | CONSONANT_KILLER | TOP_POSITION | ៍ Toandakhiat | |`U+17CE` | Mark [Mn] | SYLLABLE_MODIFIER | TOP_POSITION | ៎ Kakabat | |`U+17CF` | Mark [Mn] | SYLLABLE_MODIFIER | TOP_POSITION | ៏ Ahsda | | | | | | |`U+17D0` | Mark [Mn] | SYLLABLE_MODIFIER | TOP_POSITION | ័ Samyok Sannya | |`U+17D1` | Mark [Mn] | PURE_KILLER | TOP_POSITION | ៑ Viriam | |`U+17D2` | Mark [Mn] | INVISIBLE_STACKER | _null_ | ្ Sign Coeng | |`U+17D3` | Mark [Mn] | SYLLABLE_MODIFIER | TOP_POSITION | ៓ Bathamasat | |`U+17D4` | Punctuation | _null_ | _null_ | ។ Khan | |`U+17D5` | Punctuation | _null_ | _null_ | ៕ Bariyoosan | |`U+17D6` | Punctuation | _null_ | _null_ | ៖ Camnuc Pii Kuuh | |`U+17D7` | Letter | _null_ | _null_ | ៗ Lek Too | |`U+17D8` | Punctuation | _null_ | _null_ | ៘ Beyyal | |`U+17D9` | Punctuation | _null_ | _null_ | ៙ Phnaek Muan | |`U+17DA` | Punctuation | _null_ | _null_ | ៚ Koomuut | |`U+17DB` | Symbol | SYMBOL | _null_ | ៛ Riel | |`U+17DC` | Letter | AVAGRAHA | _null_ | ៜ Avakrahasanya | |`U+17DD` | Mark [Mn] | SYLLABLE_MODIFIER | TOP_POSITION | ៝ Atthacan | |`U+17DE` | _unassigned_ | | | | |`U+17DF` | _unassigned_ | | | | | | | | | |`U+17E0` | Number | NUMBER | _null_ | ០ Digit Zero | |`U+17E1` | Number | NUMBER | _null_ | ១ Digit One | |`U+17E2` | Number | NUMBER | _null_ | ២ Digit Two | |`U+17E3` | Number | NUMBER | _null_ | ៣ Digit Three | |`U+17E4` | Number | NUMBER | _null_ | ៤ Digit Four | |`U+17E5` | Number | NUMBER | _null_ | ៥ Digit Five | |`U+17E6` | Number | NUMBER | _null_ | ៦ Digit Six | |`U+17E7` | Number | NUMBER | _null_ | ៧ Digit Seven | |`U+17E8` | Number | NUMBER | _null_ | ៨ Digit Eight | |`U+17E9` | Number | NUMBER | _null_ | ៩ Digit Nine | |`U+17EA` | _unassigned_ | | | | |`U+17EB` | _unassigned_ | | | | |`U+17EC` | _unassigned_ | | | | |`U+17ED` | _unassigned_ | | | | |`U+17EE` | _unassigned_ | | | | |`U+17EF` | _unassigned_ | | | | | | | | | |`U+17F0` | Number | _null_ | _null_ | ៰ Lek Attak Son | |`U+17F1` | Number | _null_ | _null_ | ៱ Lek Attak Muoy | |`U+17F2` | Number | _null_ | _null_ | ៲ Lek Attak Pii | |`U+17F3` | Number | _null_ | _null_ | ៳ Lek Attak Bei | |`U+17F4` | Number | _null_ | _null_ | ៴ Lek Attak Buon | |`U+17F5` | Number | _null_ | _null_ | ៵ Lek Attak Pram | |`U+17F6` | Number | _null_ | _null_ | ៶ Lek Attak Pram-Muoy | |`U+17F7` | Number | _null_ | _null_ | ៷ Lek Attak Pram-Pii | |`U+17F8` | Number | _null_ | _null_ | ៸ Lek Attak Pram-Bei | |`U+17F9` | Number | _null_ | _null_ | ៹ Lek Attak Pram-Buon | |`U+17FA` | _unassigned_ | | | | |`U+17FB` | _unassigned_ | | | | |`U+17FC` | _unassigned_ | | | | |`U+17FD` | _unassigned_ | | | | |`U+17FE` | _unassigned_ | | | | |`U+17FF` | _unassigned_ | | | | ::: ## Khmer Symbols character table ## The Khmer Symbols block contains miscellaneous symbols used for lunar-date calendars. None evoke any special behavior from the shaping engine. :::{table} Khmer Symbols character table | Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph | |:----------|:-----------------|:------------------|:---------------------------|:-----------------------------| |`U+19E0` | Symbol | _null_ | _null_ | ᧠ Pathamasat | |`U+19E1` | Symbol | _null_ | _null_ | ᧡ Muoy Koet | |`U+19E2` | Symbol | _null_ | _null_ | ᧢ Pii Koet | |`U+19E3` | Symbol | _null_ | _null_ | ᧣ Bei Koet | |`U+19E4` | Symbol | _null_ | _null_ | ᧤ Buon Koet | |`U+19E5` | Symbol | _null_ | _null_ | ᧥ Pram Koet | |`U+19E6` | Symbol | _null_ | _null_ | ᧦ Pram-Muoy Koet | |`U+19E7` | Symbol | _null_ | _null_ | ᧧ Pram-Pii Koet | |`U+19E8` | Symbol | _null_ | _null_ | ᧨ Pram-Bei Koet | |`U+19E9` | Symbol | _null_ | _null_ | ᧩ Pram-Buon Koet | |`U+19EA` | Symbol | _null_ | _null_ | ᧪ Dap Koet | |`U+19EB` | Symbol | _null_ | _null_ | ᧫ Dap-Muoy Koet | |`U+19EC` | Symbol | _null_ | _null_ | ᧬ Dap-Pii Koet | |`U+19ED` | Symbol | _null_ | _null_ | ᧭ Dap-Bei Koet | |`U+19EE` | Symbol | _null_ | _null_ | ᧮ Dap-Buon Koet | |`U+19EF` | Symbol | _null_ | _null_ | ᧯ Dap-Pram Koet | | | | | | |`U+19F0` | Symbol | _null_ | _null_ | ᧰ Tuteyasat | |`U+19F1` | Symbol | _null_ | _null_ | ᧱ Muoy ROC | |`U+19F2` | Symbol | _null_ | _null_ | ᧲ Pii Roc | |`U+19F3` | Symbol | _null_ | _null_ | ᧳ Bei Roc | |`U+19F4` | Symbol | _null_ | _null_ | ᧴ Buon Roc | |`U+19F5` | Symbol | _null_ | _null_ | ᧵ Pram Roc | |`U+19F6` | Symbol | _null_ | _null_ | ᧶ Pram-Muoy Roc | |`U+19F7` | Symbol | _null_ | _null_ | ᧷ Pram-Pii Roc | |`U+19F8` | Symbol | _null_ | _null_ | ᧸ Pram-Bei Roc | |`U+19F9` | Symbol | _null_ | _null_ | ᧹ Pram-Buon Roc | |`U+19FA` | Symbol | _null_ | _null_ | ᧺ Dap Roc | |`U+19FB` | Symbol | _null_ | _null_ | ᧻ Dap-Muoy Roc | |`U+19FC` | Symbol | _null_ | _null_ | ᧼ Dap-Pii Roc | |`U+19FD` | Symbol | _null_ | _null_ | ᧽ Dap-Bei Roc | |`U+19FE` | Symbol | _null_ | _null_ | ᧾ Dap-Buon Roc | |`U+19FF` | Symbol | _null_ | _null_ | ᧿ Dap-Pram Roc | ::: ## Miscellaneous character table ## Other important characters that may be encountered when shaping runs of Khmer text include the dotted-circle placeholder (`U+25CC`), the zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and the no-break space (`U+00A0`). The dotted-circle placeholder is frequently used when displaying a dependent vowel (matra) or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully. :::{table} Miscellaneous character table | Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph | |:----------|:-----------------|:------------------|:---------------------------|:-------------------------------| |`U+00A0` | Separator | PLACEHOLDER | _null_ |   No-break space | |`U+200C` | Other | NON_JOINER | _null_ | ‌ Zero-width non-joiner | |`U+200D` | Other | JOINER | _null_ | ‍ Zero-width joiner | |`U+2010` | Punctuation | PLACEHOLDER | _null_ | ‐ Hyphen | |`U+2011` | Punctuation | PLACEHOLDER | _null_ | ‑ No-break hyphen | |`U+2012` | Punctuation | PLACEHOLDER | _null_ | ‒ Figure dash | |`U+2013` | Punctuation | PLACEHOLDER | _null_ | – En dash | |`U+2014` | Punctuation | PLACEHOLDER | _null_ | — Em dash | |`U+25CC` | Symbol | DOTTED_CIRCLE | _null_ | ◌ Dotted circle | ::: The zero-width joiner (ZWJ) is primarily used to prevent the formation of a conjunct from a "_Consonant_,Halant,_Consonant_" sequence. The sequence "_Consonant_,Halant,ZWJ,_Consonant_" blocks the formation of a conjunct between the two consonants. Note, however, that the "_Consonant_,Halant" subsequence in the above example may still trigger a half-forms feature. To prevent the application of the half-forms feature in addition to preventing the conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The sequence "_Consonant_,Halant,ZWNJ,_Consonant_" should produce the first consonant in its standard form, followed by an explicit "Halant". A secondary usage of the zero-width joiner is to prevent the formation of "Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph", where an initial "Ra,Halant" sequence without the zero-width joiner otherwise would. The no-break space (NBSP<.abbr>) is primarily used to display those codepoints that are defined as non-spacing (marks, dependent vowels (matras), below-base consonant forms, and post-base consonant forms) in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder. These sequences will match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or "NBSP,_matra_". In addition to general punctuation, runs of Khmer text often use the danda (`U+0964`) and double danda (`U+0965`) punctuation marks from the Devanagari block.