# Khmer character tables #
This document lists the per-character shaping information needed to
[shape Khmer text](../opentype-shaping-khmer.md).
**Contents**
- [Khmer character table](#khmer-character-table)
- [Khmer Symbols character table](#khmer-symbols-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
## Khmer character table ##
Khmer glyphs should be classified as in the following
table. Codepoints in the Khmer block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine. Note that
this does include some valid codepoints, such as currency marks,
punctuation, and other symbols.
> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important
> during syllable identification, but generally evoke no further
> special behavior during the rest of the shaping process.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the following table use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
:::{table} Khmer character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+1780` | Letter | CONSONANT | _null_ | ក Ka |
|`U+1781` | Letter | CONSONANT | _null_ | ខ Kha |
|`U+1782` | Letter | CONSONANT | _null_ | គ Ko |
|`U+1783` | Letter | CONSONANT | _null_ | ឃ Kho |
|`U+1784` | Letter | CONSONANT | _null_ | ង Ngo |
|`U+1785` | Letter | CONSONANT | _null_ | ច Ca |
|`U+1786` | Letter | CONSONANT | _null_ | ឆ Cha |
|`U+1787` | Letter | CONSONANT | _null_ | ជ Co |
|`U+1788` | Letter | CONSONANT | _null_ | ឈ Cho |
|`U+1789` | Letter | CONSONANT | _null_ | ញ Nyo |
|`U+178A` | Letter | CONSONANT | _null_ | ដ Da |
|`U+178B` | Letter | CONSONANT | _null_ | ឋ Ttha |
|`U+178C` | Letter | CONSONANT | _null_ | ឌ Do |
|`U+178D` | Letter | CONSONANT | _null_ | ឍ Ttho |
|`U+178E` | Letter | CONSONANT | _null_ | ណ Nno |
|`U+178F` | Letter | CONSONANT | _null_ | ត Ta |
| | | | |
|`U+1790` | Letter | CONSONANT | _null_ | ថ Tha |
|`U+1791` | Letter | CONSONANT | _null_ | ទ To |
|`U+1792` | Letter | CONSONANT | _null_ | ធ Tho |
|`U+1793` | Letter | CONSONANT | _null_ | ន No |
|`U+1794` | Letter | CONSONANT | _null_ | ប Ba |
|`U+1795` | Letter | CONSONANT | _null_ | ផ Pha |
|`U+1796` | Letter | CONSONANT | _null_ | ព Po |
|`U+1797` | Letter | CONSONANT | _null_ | ភ Pho |
|`U+1798` | Letter | CONSONANT | _null_ | ម Mo |
|`U+1799` | Letter | CONSONANT | _null_ | យ Yo |
|`U+179A` | Letter | CONSONANT | _null_ | រ Ro |
|`U+179B` | Letter | CONSONANT | _null_ | ល Lo |
|`U+179C` | Letter | CONSONANT | _null_ | វ Vo |
|`U+179D` | Letter | CONSONANT | _null_ | ឝ Sha |
|`U+179E` | Letter | CONSONANT | _null_ | ឞ Sso |
|`U+179F` | Letter | CONSONANT | _null_ | ស Sa |
| | | | |
|`U+17A0` | Letter | CONSONANT | _null_ | ហ Ha |
|`U+17A1` | Letter | CONSONANT | _null_ | ឡ La |
|`U+17A2` | Letter | CONSONANT | _null_ | អ Qa |
|`U+17A3` | Letter | VOWEL_INDEPENDENT | _null_ | ឣ Qaq |
|`U+17A4` | Letter | VOWEL_INDEPENDENT | _null_ | ឤ Qaa |
|`U+17A5` | Letter | VOWEL_INDEPENDENT | _null_ | ឥ Qi |
|`U+17A6` | Letter | VOWEL_INDEPENDENT | _null_ | ឦ Qii |
|`U+17A7` | Letter | VOWEL_INDEPENDENT | _null_ | ឧ Qu |
|`U+17A8` | Letter | VOWEL_INDEPENDENT | _null_ | ឨ Quk |
|`U+17A9` | Letter | VOWEL_INDEPENDENT | _null_ | ឩ Quu |
|`U+17AA` | Letter | VOWEL_INDEPENDENT | _null_ | ឪ Quuv |
|`U+17AB` | Letter | VOWEL_INDEPENDENT | _null_ | ឫ Ry |
|`U+17AC` | Letter | VOWEL_INDEPENDENT | _null_ | ឬ Ryy |
|`U+17AD` | Letter | VOWEL_INDEPENDENT | _null_ | ឭ Ly |
|`U+17AE` | Letter | VOWEL_INDEPENDENT | _null_ | ឮ Lyy |
|`U+17AF` | Letter | VOWEL_INDEPENDENT | _null_ | ឯ Qe |
| | | | |
|`U+17B0` | Letter | VOWEL_INDEPENDENT | _null_ | ឰ Qai |
|`U+17B1` | Letter | VOWEL_INDEPENDENT | _null_ | ឱ Qoo Type One |
|`U+17B2` | Letter | VOWEL_INDEPENDENT | _null_ | ឲ Qoo Type Two |
|`U+17B3` | Letter | VOWEL_INDEPENDENT | _null_ | ឳ Qau |
|`U+17B4` | Mark [Mn] | _null_ | _null_ | ឴ Inherent Aq |
|`U+17B5` | Mark [Mn] | _null_ | _null_ | ឵ Inherent Aa |
|`U+17B6` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ា Sign Aa |
|`U+17B7` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ិ Sign I |
|`U+17B8` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ី Sign Ii |
|`U+17B9` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ឹ Sign Y |
|`U+17BA` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ឺ Sign Yy |
|`U+17BB` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ុ Sign U |
|`U+17BC` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ូ Sign Uu |
|`U+17BD` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ួ Sign Ua |
|`U+17BE` | Mark [Mc] | VOWEL_DEPENDENT | TOP_AND_LEFT_POSITION | ើ Sign Oe |
|`U+17BF` | Mark [Mc] | VOWEL_DEPENDENT | TOP_LEFT_AND_RIGHT_POSITION| ឿ Sign Ya |
| | | | |
|`U+17C0` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ៀ Sign Ie |
|`U+17C1` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | េ Sign E |
|`U+17C2` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ែ Sign Ae |
|`U+17C3` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ៃ Sign Ai |
|`U+17C4` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ោ Sign Oo |
|`U+17C5` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ៅ Sign Au |
|`U+17C6` | Mark [Mn] | NUKTA | TOP_POSITION | ំ Nikahit |
|`U+17C7` | Mark [Mc] | VISARGA | RIGHT_POSITION | ះ Reahmuk |
|`U+17C8` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ៈ Yuukaleapintu |
|`U+17C9` | Mark [Mn] | REGISTER_SHIFTER | TOP_POSITION | ៉ Muusikatoan |
|`U+17CA` | Mark [Mn] | REGISTER_SHIFTER | TOP_POSITION | ៊ Triisap |
|`U+17CB` | Mark [Mn] | SYLLABLE_MODIFIER | TOP_POSITION | ់ Bantoc |
|`U+17CC` | Mark [Mn] | CONSONANT_POST_REPHA| TOP_POSITION | ៌ Robat |
|`U+17CD` | Mark [Mn] | CONSONANT_KILLER | TOP_POSITION | ៍ Toandakhiat |
|`U+17CE` | Mark [Mn] | SYLLABLE_MODIFIER | TOP_POSITION | ៎ Kakabat |
|`U+17CF` | Mark [Mn] | SYLLABLE_MODIFIER | TOP_POSITION | ៏ Ahsda |
| | | | |
|`U+17D0` | Mark [Mn] | SYLLABLE_MODIFIER | TOP_POSITION | ័ Samyok Sannya |
|`U+17D1` | Mark [Mn] | PURE_KILLER | TOP_POSITION | ៑ Viriam |
|`U+17D2` | Mark [Mn] | INVISIBLE_STACKER | _null_ | ្ Sign Coeng |
|`U+17D3` | Mark [Mn] | SYLLABLE_MODIFIER | TOP_POSITION | ៓ Bathamasat |
|`U+17D4` | Punctuation | _null_ | _null_ | ។ Khan |
|`U+17D5` | Punctuation | _null_ | _null_ | ៕ Bariyoosan |
|`U+17D6` | Punctuation | _null_ | _null_ | ៖ Camnuc Pii Kuuh |
|`U+17D7` | Letter | _null_ | _null_ | ៗ Lek Too |
|`U+17D8` | Punctuation | _null_ | _null_ | ៘ Beyyal |
|`U+17D9` | Punctuation | _null_ | _null_ | ៙ Phnaek Muan |
|`U+17DA` | Punctuation | _null_ | _null_ | ៚ Koomuut |
|`U+17DB` | Symbol | SYMBOL | _null_ | ៛ Riel |
|`U+17DC` | Letter | AVAGRAHA | _null_ | ៜ Avakrahasanya |
|`U+17DD` | Mark [Mn] | SYLLABLE_MODIFIER | TOP_POSITION | ៝ Atthacan |
|`U+17DE` | _unassigned_ | | | |
|`U+17DF` | _unassigned_ | | | |
| | | | |
|`U+17E0` | Number | NUMBER | _null_ | ០ Digit Zero |
|`U+17E1` | Number | NUMBER | _null_ | ១ Digit One |
|`U+17E2` | Number | NUMBER | _null_ | ២ Digit Two |
|`U+17E3` | Number | NUMBER | _null_ | ៣ Digit Three |
|`U+17E4` | Number | NUMBER | _null_ | ៤ Digit Four |
|`U+17E5` | Number | NUMBER | _null_ | ៥ Digit Five |
|`U+17E6` | Number | NUMBER | _null_ | ៦ Digit Six |
|`U+17E7` | Number | NUMBER | _null_ | ៧ Digit Seven |
|`U+17E8` | Number | NUMBER | _null_ | ៨ Digit Eight |
|`U+17E9` | Number | NUMBER | _null_ | ៩ Digit Nine |
|`U+17EA` | _unassigned_ | | | |
|`U+17EB` | _unassigned_ | | | |
|`U+17EC` | _unassigned_ | | | |
|`U+17ED` | _unassigned_ | | | |
|`U+17EE` | _unassigned_ | | | |
|`U+17EF` | _unassigned_ | | | |
| | | | |
|`U+17F0` | Number | _null_ | _null_ | ៰ Lek Attak Son |
|`U+17F1` | Number | _null_ | _null_ | ៱ Lek Attak Muoy |
|`U+17F2` | Number | _null_ | _null_ | ៲ Lek Attak Pii |
|`U+17F3` | Number | _null_ | _null_ | ៳ Lek Attak Bei |
|`U+17F4` | Number | _null_ | _null_ | ៴ Lek Attak Buon |
|`U+17F5` | Number | _null_ | _null_ | ៵ Lek Attak Pram |
|`U+17F6` | Number | _null_ | _null_ | ៶ Lek Attak Pram-Muoy |
|`U+17F7` | Number | _null_ | _null_ | ៷ Lek Attak Pram-Pii |
|`U+17F8` | Number | _null_ | _null_ | ៸ Lek Attak Pram-Bei |
|`U+17F9` | Number | _null_ | _null_ | ៹ Lek Attak Pram-Buon |
|`U+17FA` | _unassigned_ | | | |
|`U+17FB` | _unassigned_ | | | |
|`U+17FC` | _unassigned_ | | | |
|`U+17FD` | _unassigned_ | | | |
|`U+17FE` | _unassigned_ | | | |
|`U+17FF` | _unassigned_ | | | |
:::
## Khmer Symbols character table ##
The Khmer Symbols block contains miscellaneous symbols used for
lunar-date calendars. None evoke any special behavior from the shaping engine.
:::{table} Khmer Symbols character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+19E0` | Symbol | _null_ | _null_ | ᧠ Pathamasat |
|`U+19E1` | Symbol | _null_ | _null_ | ᧡ Muoy Koet |
|`U+19E2` | Symbol | _null_ | _null_ | ᧢ Pii Koet |
|`U+19E3` | Symbol | _null_ | _null_ | ᧣ Bei Koet |
|`U+19E4` | Symbol | _null_ | _null_ | ᧤ Buon Koet |
|`U+19E5` | Symbol | _null_ | _null_ | ᧥ Pram Koet |
|`U+19E6` | Symbol | _null_ | _null_ | ᧦ Pram-Muoy Koet |
|`U+19E7` | Symbol | _null_ | _null_ | ᧧ Pram-Pii Koet |
|`U+19E8` | Symbol | _null_ | _null_ | ᧨ Pram-Bei Koet |
|`U+19E9` | Symbol | _null_ | _null_ | ᧩ Pram-Buon Koet |
|`U+19EA` | Symbol | _null_ | _null_ | ᧪ Dap Koet |
|`U+19EB` | Symbol | _null_ | _null_ | ᧫ Dap-Muoy Koet |
|`U+19EC` | Symbol | _null_ | _null_ | ᧬ Dap-Pii Koet |
|`U+19ED` | Symbol | _null_ | _null_ | ᧭ Dap-Bei Koet |
|`U+19EE` | Symbol | _null_ | _null_ | ᧮ Dap-Buon Koet |
|`U+19EF` | Symbol | _null_ | _null_ | ᧯ Dap-Pram Koet |
| | | | |
|`U+19F0` | Symbol | _null_ | _null_ | ᧰ Tuteyasat |
|`U+19F1` | Symbol | _null_ | _null_ | ᧱ Muoy ROC |
|`U+19F2` | Symbol | _null_ | _null_ | ᧲ Pii Roc |
|`U+19F3` | Symbol | _null_ | _null_ | ᧳ Bei Roc |
|`U+19F4` | Symbol | _null_ | _null_ | ᧴ Buon Roc |
|`U+19F5` | Symbol | _null_ | _null_ | ᧵ Pram Roc |
|`U+19F6` | Symbol | _null_ | _null_ | ᧶ Pram-Muoy Roc |
|`U+19F7` | Symbol | _null_ | _null_ | ᧷ Pram-Pii Roc |
|`U+19F8` | Symbol | _null_ | _null_ | ᧸ Pram-Bei Roc |
|`U+19F9` | Symbol | _null_ | _null_ | ᧹ Pram-Buon Roc |
|`U+19FA` | Symbol | _null_ | _null_ | ᧺ Dap Roc |
|`U+19FB` | Symbol | _null_ | _null_ | ᧻ Dap-Muoy Roc |
|`U+19FC` | Symbol | _null_ | _null_ | ᧼ Dap-Pii Roc |
|`U+19FD` | Symbol | _null_ | _null_ | ᧽ Dap-Bei Roc |
|`U+19FE` | Symbol | _null_ | _null_ | ᧾ Dap-Buon Roc |
|`U+19FF` | Symbol | _null_ | _null_ | ᧿ Dap-Pram Roc |
:::
## Miscellaneous character table ##
Other important characters that may be encountered when shaping runs
of Khmer text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
dependent vowel (matra) or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+00A0` | Separator | PLACEHOLDER | _null_ | No-break space |
|`U+200C` | Other | NON_JOINER | _null_ | Zero-width non-joiner |
|`U+200D` | Other | JOINER | _null_ | Zero-width joiner |
|`U+2010` | Punctuation | PLACEHOLDER | _null_ | ‐ Hyphen |
|`U+2011` | Punctuation | PLACEHOLDER | _null_ | ‑ No-break hyphen |
|`U+2012` | Punctuation | PLACEHOLDER | _null_ | ‒ Figure dash |
|`U+2013` | Punctuation | PLACEHOLDER | _null_ | – En dash |
|`U+2014` | Punctuation | PLACEHOLDER | _null_ | — Em dash |
|`U+25CC` | Symbol | DOTTED_CIRCLE | _null_ | ◌ Dotted circle |
:::
The zero-width joiner (ZWJ) is primarily used to prevent the formation of a
conjunct from a "_Consonant_,Halant,_Consonant_" sequence. The sequence
"_Consonant_,Halant,ZWJ,_Consonant_" blocks the formation of a
conjunct between the two consonants.
Note, however, that the "_Consonant_,Halant" subsequence in the above
example may still trigger a half-forms feature. To prevent the
application of the half-forms feature in addition to preventing the
conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The sequence
"_Consonant_,Halant,ZWNJ,_Consonant_" should produce the first
consonant in its standard form, followed by an explicit "Halant".
A secondary usage of the zero-width joiner is to prevent the formation of
"Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph",
where an initial "Ra,Halant" sequence without the zero-width joiner
otherwise would.
The no-break space (NBSP<.abbr>) is primarily used to display
those codepoints that are defined as non-spacing (marks, dependent
vowels (matras), below-base consonant forms, and post-base consonant
forms) in an isolated context, as an alternative to displaying them
superimposed on the dotted-circle placeholder. These sequences will
match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or
"NBSP,_matra_".
In addition to general punctuation, runs of Khmer text often use the
danda (`U+0964`) and double danda (`U+0965`) punctuation marks from
the Devanagari block.