Khmer character tables

This document lists the per-character shaping information needed to shape Khmer text.

Contents

Khmer character table

Khmer glyphs should be classified as in the following table. Codepoints in the Khmer block with no assigned meaning are designated as unassigned in the Unicode category column.

Assigned codepoints with a null in the Shaping class column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols.

Note: the NUMBER and SYMBOL Shaping classes are important during syllable identification, but generally evoke no further special behavior during the rest of the shaping process.

The Mark-placement subclass column indicates mark-placement positioning for codepoints in the Mark category. Assigned, non-mark codepoints have a null in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the Unicode category column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining.

Some codepoints in the following table use a Shaping class that differs from the codepoint’s Unicode General Category. The Shaping class takes precedence during OpenType shaping, as it captures more specific, script-aware behavior.

Table 39 Khmer character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Glyph

U+1780

Letter

CONSONANT

null

ក Ka

U+1781

Letter

CONSONANT

null

ខ Kha

U+1782

Letter

CONSONANT

null

គ Ko

U+1783

Letter

CONSONANT

null

ឃ Kho

U+1784

Letter

CONSONANT

null

ង Ngo

U+1785

Letter

CONSONANT

null

ច Ca

U+1786

Letter

CONSONANT

null

ឆ Cha

U+1787

Letter

CONSONANT

null

ជ Co

U+1788

Letter

CONSONANT

null

ឈ Cho

U+1789

Letter

CONSONANT

null

ញ Nyo

U+178A

Letter

CONSONANT

null

ដ Da

U+178B

Letter

CONSONANT

null

ឋ Ttha

U+178C

Letter

CONSONANT

null

ឌ Do

U+178D

Letter

CONSONANT

null

ឍ Ttho

U+178E

Letter

CONSONANT

null

ណ Nno

U+178F

Letter

CONSONANT

null

ត Ta

U+1790

Letter

CONSONANT

null

ថ Tha

U+1791

Letter

CONSONANT

null

ទ To

U+1792

Letter

CONSONANT

null

ធ Tho

U+1793

Letter

CONSONANT

null

ន No

U+1794

Letter

CONSONANT

null

ប Ba

U+1795

Letter

CONSONANT

null

ផ Pha

U+1796

Letter

CONSONANT

null

ព Po

U+1797

Letter

CONSONANT

null

ភ Pho

U+1798

Letter

CONSONANT

null

ម Mo

U+1799

Letter

CONSONANT

null

យ Yo

U+179A

Letter

CONSONANT

null

រ Ro

U+179B

Letter

CONSONANT

null

ល Lo

U+179C

Letter

CONSONANT

null

វ Vo

U+179D

Letter

CONSONANT

null

ឝ Sha

U+179E

Letter

CONSONANT

null

ឞ Sso

U+179F

Letter

CONSONANT

null

ស Sa

U+17A0

Letter

CONSONANT

null

ហ Ha

U+17A1

Letter

CONSONANT

null

ឡ La

U+17A2

Letter

CONSONANT

null

អ Qa

U+17A3

Letter

VOWEL_INDEPENDENT

null

ឣ Qaq

U+17A4

Letter

VOWEL_INDEPENDENT

null

ឤ Qaa

U+17A5

Letter

VOWEL_INDEPENDENT

null

ឥ Qi

U+17A6

Letter

VOWEL_INDEPENDENT

null

ឦ Qii

U+17A7

Letter

VOWEL_INDEPENDENT

null

ឧ Qu

U+17A8

Letter

VOWEL_INDEPENDENT

null

ឨ Quk

U+17A9

Letter

VOWEL_INDEPENDENT

null

ឩ Quu

U+17AA

Letter

VOWEL_INDEPENDENT

null

ឪ Quuv

U+17AB

Letter

VOWEL_INDEPENDENT

null

ឫ Ry

U+17AC

Letter

VOWEL_INDEPENDENT

null

ឬ Ryy

U+17AD

Letter

VOWEL_INDEPENDENT

null

ឭ Ly

U+17AE

Letter

VOWEL_INDEPENDENT

null

ឮ Lyy

U+17AF

Letter

VOWEL_INDEPENDENT

null

ឯ Qe

U+17B0

Letter

VOWEL_INDEPENDENT

null

ឰ Qai

U+17B1

Letter

VOWEL_INDEPENDENT

null

ឱ Qoo Type One

U+17B2

Letter

VOWEL_INDEPENDENT

null

ឲ Qoo Type Two

U+17B3

Letter

VOWEL_INDEPENDENT

null

ឳ Qau

U+17B4

Mark [Mn]

null

null

឴ Inherent Aq

U+17B5

Mark [Mn]

null

null

឵ Inherent Aa

U+17B6

Mark [Mc]

VOWEL_DEPENDENT

RIGHT_POSITION

ា Sign Aa

U+17B7

Mark [Mn]

VOWEL_DEPENDENT

TOP_POSITION

ិ Sign I

U+17B8

Mark [Mn]

VOWEL_DEPENDENT

TOP_POSITION

ី Sign Ii

U+17B9

Mark [Mn]

VOWEL_DEPENDENT

TOP_POSITION

ឹ Sign Y

U+17BA

Mark [Mn]

VOWEL_DEPENDENT

TOP_POSITION

ឺ Sign Yy

U+17BB

Mark [Mn]

VOWEL_DEPENDENT

BOTTOM_POSITION

ុ Sign U

U+17BC

Mark [Mn]

VOWEL_DEPENDENT

BOTTOM_POSITION

ូ Sign Uu

U+17BD

Mark [Mn]

VOWEL_DEPENDENT

BOTTOM_POSITION

ួ Sign Ua

U+17BE

Mark [Mc]

VOWEL_DEPENDENT

TOP_AND_LEFT_POSITION

ើ Sign Oe

U+17BF

Mark [Mc]

VOWEL_DEPENDENT

TOP_LEFT_AND_RIGHT_POSITION

ឿ Sign Ya

U+17C0

Mark [Mc]

VOWEL_DEPENDENT

LEFT_AND_RIGHT_POSITION

ៀ Sign Ie

U+17C1

Mark [Mc]

VOWEL_DEPENDENT

LEFT_POSITION

េ Sign E

U+17C2

Mark [Mc]

VOWEL_DEPENDENT

LEFT_POSITION

ែ Sign Ae

U+17C3

Mark [Mc]

VOWEL_DEPENDENT

LEFT_POSITION

ៃ Sign Ai

U+17C4

Mark [Mc]

VOWEL_DEPENDENT

LEFT_AND_RIGHT_POSITION

ោ Sign Oo

U+17C5

Mark [Mc]

VOWEL_DEPENDENT

LEFT_AND_RIGHT_POSITION

ៅ Sign Au

U+17C6

Mark [Mn]

NUKTA

TOP_POSITION

ំ Nikahit

U+17C7

Mark [Mc]

VISARGA

RIGHT_POSITION

ះ Reahmuk

U+17C8

Mark [Mc]

VOWEL_DEPENDENT

RIGHT_POSITION

ៈ Yuukaleapintu

U+17C9

Mark [Mn]

REGISTER_SHIFTER

TOP_POSITION

៉ Muusikatoan

U+17CA

Mark [Mn]

REGISTER_SHIFTER

TOP_POSITION

៊ Triisap

U+17CB

Mark [Mn]

SYLLABLE_MODIFIER

TOP_POSITION

់ Bantoc

U+17CC

Mark [Mn]

CONSONANT_POST_REPHA

TOP_POSITION

៌ Robat

U+17CD

Mark [Mn]

CONSONANT_KILLER

TOP_POSITION

៍ Toandakhiat

U+17CE

Mark [Mn]

SYLLABLE_MODIFIER

TOP_POSITION

៎ Kakabat

U+17CF

Mark [Mn]

SYLLABLE_MODIFIER

TOP_POSITION

៏ Ahsda

U+17D0

Mark [Mn]

SYLLABLE_MODIFIER

TOP_POSITION

័ Samyok Sannya

U+17D1

Mark [Mn]

PURE_KILLER

TOP_POSITION

៑ Viriam

U+17D2

Mark [Mn]

INVISIBLE_STACKER

null

្ Sign Coeng

U+17D3

Mark [Mn]

SYLLABLE_MODIFIER

TOP_POSITION

៓ Bathamasat

U+17D4

Punctuation

null

null

។ Khan

U+17D5

Punctuation

null

null

៕ Bariyoosan

U+17D6

Punctuation

null

null

៖ Camnuc Pii Kuuh

U+17D7

Letter

null

null

ៗ Lek Too

U+17D8

Punctuation

null

null

៘ Beyyal

U+17D9

Punctuation

null

null

៙ Phnaek Muan

U+17DA

Punctuation

null

null

៚ Koomuut

U+17DB

Symbol

SYMBOL

null

៛ Riel

U+17DC

Letter

AVAGRAHA

null

ៜ Avakrahasanya

U+17DD

Mark [Mn]

SYLLABLE_MODIFIER

TOP_POSITION

៝ Atthacan

U+17DE

unassigned

U+17DF

unassigned

U+17E0

Number

NUMBER

null

០ Digit Zero

U+17E1

Number

NUMBER

null

១ Digit One

U+17E2

Number

NUMBER

null

២ Digit Two

U+17E3

Number

NUMBER

null

៣ Digit Three

U+17E4

Number

NUMBER

null

៤ Digit Four

U+17E5

Number

NUMBER

null

៥ Digit Five

U+17E6

Number

NUMBER

null

៦ Digit Six

U+17E7

Number

NUMBER

null

៧ Digit Seven

U+17E8

Number

NUMBER

null

៨ Digit Eight

U+17E9

Number

NUMBER

null

៩ Digit Nine

U+17EA

unassigned

U+17EB

unassigned

U+17EC

unassigned

U+17ED

unassigned

U+17EE

unassigned

U+17EF

unassigned

U+17F0

Number

null

null

៰ Lek Attak Son

U+17F1

Number

null

null

៱ Lek Attak Muoy

U+17F2

Number

null

null

៲ Lek Attak Pii

U+17F3

Number

null

null

៳ Lek Attak Bei

U+17F4

Number

null

null

៴ Lek Attak Buon

U+17F5

Number

null

null

៵ Lek Attak Pram

U+17F6

Number

null

null

៶ Lek Attak Pram-Muoy

U+17F7

Number

null

null

៷ Lek Attak Pram-Pii

U+17F8

Number

null

null

៸ Lek Attak Pram-Bei

U+17F9

Number

null

null

៹ Lek Attak Pram-Buon

U+17FA

unassigned

U+17FB

unassigned

U+17FC

unassigned

U+17FD

unassigned

U+17FE

unassigned

U+17FF

unassigned

Khmer Symbols character table

The Khmer Symbols block contains miscellaneous symbols used for lunar-date calendars. None evoke any special behavior from the shaping engine.

Table 40 Khmer Symbols character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Glyph

U+19E0

Symbol

null

null

᧠ Pathamasat

U+19E1

Symbol

null

null

᧡ Muoy Koet

U+19E2

Symbol

null

null

᧢ Pii Koet

U+19E3

Symbol

null

null

᧣ Bei Koet

U+19E4

Symbol

null

null

᧤ Buon Koet

U+19E5

Symbol

null

null

᧥ Pram Koet

U+19E6

Symbol

null

null

᧦ Pram-Muoy Koet

U+19E7

Symbol

null

null

᧧ Pram-Pii Koet

U+19E8

Symbol

null

null

᧨ Pram-Bei Koet

U+19E9

Symbol

null

null

᧩ Pram-Buon Koet

U+19EA

Symbol

null

null

᧪ Dap Koet

U+19EB

Symbol

null

null

᧫ Dap-Muoy Koet

U+19EC

Symbol

null

null

᧬ Dap-Pii Koet

U+19ED

Symbol

null

null

᧭ Dap-Bei Koet

U+19EE

Symbol

null

null

᧮ Dap-Buon Koet

U+19EF

Symbol

null

null

᧯ Dap-Pram Koet

U+19F0

Symbol

null

null

᧰ Tuteyasat

U+19F1

Symbol

null

null

᧱ Muoy ROC

U+19F2

Symbol

null

null

᧲ Pii Roc

U+19F3

Symbol

null

null

᧳ Bei Roc

U+19F4

Symbol

null

null

᧴ Buon Roc

U+19F5

Symbol

null

null

᧵ Pram Roc

U+19F6

Symbol

null

null

᧶ Pram-Muoy Roc

U+19F7

Symbol

null

null

᧷ Pram-Pii Roc

U+19F8

Symbol

null

null

᧸ Pram-Bei Roc

U+19F9

Symbol

null

null

᧹ Pram-Buon Roc

U+19FA

Symbol

null

null

᧺ Dap Roc

U+19FB

Symbol

null

null

᧻ Dap-Muoy Roc

U+19FC

Symbol

null

null

᧼ Dap-Pii Roc

U+19FD

Symbol

null

null

᧽ Dap-Bei Roc

U+19FE

Symbol

null

null

᧾ Dap-Buon Roc

U+19FF

Symbol

null

null

᧿ Dap-Pram Roc

Miscellaneous character table

Other important characters that may be encountered when shaping runs of Khmer text include the dotted-circle placeholder (U+25CC), the zero-width joiner (U+200D) and zero-width non-joiner (U+200C), and the no-break space (U+00A0).

The dotted-circle placeholder is frequently used when displaying a dependent vowel (matra) or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.

Table 41 Miscellaneous character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Glyph

U+00A0

Separator

PLACEHOLDER

null

  No-break space

U+200C

Other

NON_JOINER

null

‌ Zero-width non-joiner

U+200D

Other

JOINER

null

‍ Zero-width joiner

U+2010

Punctuation

PLACEHOLDER

null

‐ Hyphen

U+2011

Punctuation

PLACEHOLDER

null

‑ No-break hyphen

U+2012

Punctuation

PLACEHOLDER

null

‒ Figure dash

U+2013

Punctuation

PLACEHOLDER

null

– En dash

U+2014

Punctuation

PLACEHOLDER

null

— Em dash

U+25CC

Symbol

DOTTED_CIRCLE

null

◌ Dotted circle

The zero-width joiner (ZWJ) is primarily used to prevent the formation of a conjunct from a “Consonant,Halant,Consonant” sequence. The sequence “Consonant,Halant,ZWJ,Consonant” blocks the formation of a conjunct between the two consonants.

Note, however, that the “Consonant,Halant” subsequence in the above example may still trigger a half-forms feature. To prevent the application of the half-forms feature in addition to preventing the conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The sequence “Consonant,Halant,ZWNJ,Consonant” should produce the first consonant in its standard form, followed by an explicit “Halant”.

A secondary usage of the zero-width joiner is to prevent the formation of “Reph”. An initial “Ra,Halant,ZWJ” sequence should not produce a “Reph”, where an initial “Ra,Halant” sequence without the zero-width joiner otherwise would.

The no-break space (NBSP<.abbr>) is primarily used to display those codepoints that are defined as non-spacing (marks, dependent vowels (matras), below-base consonant forms, and post-base consonant forms) in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder. These sequences will match “NBSP,ZWJ,Halant,Consonant”, “NBSP,mark”, or “NBSP,matra”.

In addition to general punctuation, runs of Khmer text often use the danda (U+0964) and double danda (U+0965) punctuation marks from the Devanagari block.