# Thai character tables # This document lists the per-character shaping information needed to [shape Thai text](../opentype-shaping-thai-lao.md#the-thailao-shaping-model). **Contents** - [Thai character table](#thai-character-table) - [Miscellaneous character table](#miscellaneous-character-table) ## Thai character table ## Thai glyphs should be classified as in the following table. Codepoints in the Thai block with no assigned meaning are designated as _unassigned_ in the _Unicode category_ column. Assigned codepoints with a _null_ in the _Shaping class_ column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols. > Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important > during syllable identification, but generally evoke no further > special behavior during the rest of the shaping process. The _Mark-placement subclass_ column indicates mark-placement positioning for codepoints in the _Mark_ category. Assigned, non-mark codepoints have a _null_ in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the _Unicode category_ column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining. Some codepoints in the following table use a _Shaping class_ that differs from the codepoint's Unicode _General Category_. The _Shaping class_ takes precedence during OpenType shaping, as it captures more specific, script-aware behavior. :::{table} Thai character table | Codepoint | Unicode category | Shaping class | Mark-placement subclass | Combining class | PUA | Glyph | |:----------|:-----------------|:------------------|:------------------------|:----------------|:-------|:------------------------------| |`U+0E00` | _unassigned_ | | | | | | |`U+0E01` | Letter | CONSONANT | _null_ | _0_ | NC | ก Ko Kai | |`U+0E02` | Letter | CONSONANT | _null_ | _0_ | NC | ข Kho Khai | |`U+0E03` | Letter | CONSONANT | _null_ | _0_ | NC | ฃ Kho Khuat | |`U+0E04` | Letter | CONSONANT | _null_ | _0_ | NC | ค Kho Khwai | |`U+0E05` | Letter | CONSONANT | _null_ | _0_ | NC | ฅ Kho Khon | |`U+0E06` | Letter | CONSONANT | _null_ | _0_ | NC | ฆ Kho Rakhang | |`U+0E07` | Letter | CONSONANT | _null_ | _0_ | NC | ง Ngo Ngu | |`U+0E08` | Letter | CONSONANT | _null_ | _0_ | NC | จ Cho Chan | |`U+0E09` | Letter | CONSONANT | _null_ | _0_ | NC | ฉ Cho Ching | |`U+0E0A` | Letter | CONSONANT | _null_ | _0_ | NC | ช Cho Chang | |`U+0E0B` | Letter | CONSONANT | _null_ | _0_ | NC | ซ So So | |`U+0E0C` | Letter | CONSONANT | _null_ | _0_ | NC | ฌ Cho Choe | |`U+0E0D` | Letter | CONSONANT | _null_ | _0_ | RC | ญ Yo Ying | |`U+0E0E` | Letter | CONSONANT | _null_ | _0_ | DC | ฎ Do Chada | |`U+0E0F` | Letter | CONSONANT | _null_ | _0_ | DC | ฏ To Patak | | | | | | | | | |`U+0E10` | Letter | CONSONANT | _null_ | _0_ | RC | ฐ Tho Than | |`U+0E11` | Letter | CONSONANT | _null_ | _0_ | NC | ฑ Tho Nangmontho | |`U+0E12` | Letter | CONSONANT | _null_ | _0_ | NC | ฒ Tho Phuthao | |`U+0E13` | Letter | CONSONANT | _null_ | _0_ | NC | ณ No Nen | |`U+0E14` | Letter | CONSONANT | _null_ | _0_ | NC | ด Do Dek | |`U+0E15` | Letter | CONSONANT | _null_ | _0_ | NC | ต To Tao | |`U+0E16` | Letter | CONSONANT | _null_ | _0_ | NC | ถ Tho Thung | |`U+0E17` | Letter | CONSONANT | _null_ | _0_ | NC | ท Tho Thahan | |`U+0E18` | Letter | CONSONANT | _null_ | _0_ | NC | ธ Tho Thong | |`U+0E19` | Letter | CONSONANT | _null_ | _0_ | NC | น No Nu | |`U+0E1A` | Letter | CONSONANT | _null_ | _0_ | NC | บ Bo Baimai | |`U+0E1B` | Letter | CONSONANT | _null_ | _0_ | AC | ป Po Pla | |`U+0E1C` | Letter | CONSONANT | _null_ | _0_ | NC | ผ Pho Phung | |`U+0E1D` | Letter | CONSONANT | _null_ | _0_ | AC | ฝ Fo Fa | |`U+0E1E` | Letter | CONSONANT | _null_ | _0_ | NC | พ Pho Phan | |`U+0E1F` | Letter | CONSONANT | _null_ | _0_ | AC | ฟ Fo Fan | | | | | | | | | |`U+0E20` | Letter | CONSONANT | _null_ | _0_ | NC | ภ Pho Samphao | |`U+0E21` | Letter | CONSONANT | _null_ | _0_ | NC | ม Mo Ma | |`U+0E22` | Letter | CONSONANT | _null_ | _0_ | NC | ย Yo Yak | |`U+0E23` | Letter | CONSONANT | _null_ | _0_ | NC | ร Ro Rua | |`U+0E24` | Letter | CONSONANT | _null_ | _0_ | NC | ฤ Ru | |`U+0E25` | Letter | CONSONANT | _null_ | _0_ | NC | ล Lo Ling | |`U+0E26` | Letter | CONSONANT | _null_ | _0_ | NC | ฦ Lu | |`U+0E27` | Letter | CONSONANT | _null_ | _0_ | NC | ว Wo Waen | |`U+0E28` | Letter | CONSONANT | _null_ | _0_ | NC | ศ So Sala | |`U+0E29` | Letter | CONSONANT | _null_ | _0_ | NC | ษ So Rusi | |`U+0E2A` | Letter | CONSONANT | _null_ | _0_ | NC | ส So Sua | |`U+0E2B` | Letter | CONSONANT | _null_ | _0_ | NC | ห Ho Hip | |`U+0E2C` | Letter | CONSONANT | _null_ | _0_ | NC | ฬ Lo Chula | |`U+0E2D` | Letter | CONSONANT | _null_ | _0_ | NC | อ O Ang | |`U+0E2E` | Letter | CONSONANT | _null_ | _0_ | NC | ฮ Ho Nokhuk | |`U+0E2F` | Letter | CONSONANT | _null_ | _0_ | _null_ | ฯ Paiyannoi | | | | | | | | | |`U+0E30` | Letter | VOWEL_DEPENDENT | RIGHT_POSITION | _0_ | CV | ะ Sara A | |`U+0E31` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | _0_ | AV | ั Mai Han-akat | |`U+0E32` | Letter | VOWEL_DEPENDENT | RIGHT_POSITION | _0_ | CV | า Sara Aa | |`U+0E33` | Letter | VOWEL_DEPENDENT | RIGHT_POSITION | _0_ | _null_ | ำ Sara Am | |`U+0E34` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | _0_ | AV | ิ Sara I | |`U+0E35` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | _0_ | AV | ี Sara Ii | |`U+0E36` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | _0_ | AV | ึ Sara Ue | |`U+0E37` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | _0_ | AV | ื Sara Uee | |`U+0E38` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | 3 | BV | ุ Sara U | |`U+0E39` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | 3 | BV | ู Sara Uu | |`U+0E3A` | Mark [Mn] | PURE_KILLER | BOTTOM_POSITION | 9 | BV | ฺ Phinthu | |`U+0E3B` | _unassigned_ | | | | | | |`U+0E3C` | _unassigned_ | | | | | | |`U+0E3D` | _unassigned_ | | | | | | |`U+0E3E` | _unassigned_ | | | | | | |`U+0E3F` | Symbol | SYMBOL | _null_ | _0_ | _null_ | ฿ Currency symbol Baht | | | | | | | | | |`U+0E40` | Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | _0_ | CV | เ Sara E | |`U+0E41` | Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | _0_ | CV | แ Sara Ae | |`U+0E42` | Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | _0_ | CV | โ Sara O | |`U+0E43` | Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | _0_ | CV | ใ Sara Ai Maimuan | |`U+0E44` | Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | _0_ | CV | ไ Sara Ai Maimalai | |`U+0E45` | Letter | VOWEL_DEPENDENT | RIGHT_POSITION | _0_ | CV | ๅ Lakkhangyao | |`U+0E46` | Letter Modifier | _null_ | _null_ | _0_ | _null_ | ๆ Maiyamok | |`U+0E47` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | _0_ | AV | ็ Maitaikhu | |`U+0E48` | Mark [Mn] | TONE_MARKER | TOP_POSITION | 107 | TV | ่ Mai Ek | |`U+0E49` | Mark [Mn] | TONE_MARKER | TOP_POSITION | 107 | TV | ้ Mai Tho | |`U+0E4A` | Mark [Mn] | TONE_MARKER | TOP_POSITION | 107 | TV | ๊ Mai Tri | |`U+0E4B` | Mark [Mn] | TONE_MARKER | TOP_POSITION | 107 | TV | ๋ Mai Chattawa | |`U+0E4C` | Mark [Mn] | CONSONANT_KILLER | TOP_POSITION | _0_ | TV | ์ Thanthakhat | |`U+0E4D` | Mark [Mn] | BINDU | TOP_POSITION | _0_ | AV | ํ Nikhahit | |`U+0E4E` | Mark [Mn] | PURE_KILLER | TOP_POSITION | _0_ | AV | ๎ Yamakkan | |`U+0E4F` | Punctuation | _null_ | _null_ | _0_ | _null_ | ๏ Fongman | | | | | | | | | |`U+0E50` | Number | NUMBER | _null_ | _0_ | _null_ | ๐ Digit zero | |`U+0E51` | Number | NUMBER | _null_ | _0_ | _null_ | ๑ Digit one | |`U+0E52` | Number | NUMBER | _null_ | _0_ | _null_ | ๒ Digit two | |`U+0E53` | Number | NUMBER | _null_ | _0_ | _null_ | ๓ Digit three | |`U+0E54` | Number | NUMBER | _null_ | _0_ | _null_ | ๔ Digit four | |`U+0E55` | Number | NUMBER | _null_ | _0_ | _null_ | ๕ Digit five | |`U+0E56` | Number | NUMBER | _null_ | _0_ | _null_ | ๖ Digit six | |`U+0E57` | Number | NUMBER | _null_ | _0_ | _null_ | ๗ Digit seven | |`U+0E58` | Number | NUMBER | _null_ | _0_ | _null_ | ๘ Digit eight | |`U+0E59` | Number | NUMBER | _null_ | _0_ | _null_ | ๙ Digit nine | |`U+0E5A` | Punctuation | _null_ | _null_ | _0_ | _null_ | ๚ Angkhankhu | |`U+0E5B` | Punctuation | _null_ | _null_ | _0_ | _null_ | ๛ Khomut | |`U+0E5C` | _unassigned_ | | | | | | |`U+0E5D` | _unassigned_ | | | | | | |`U+0E5E` | _unassigned_ | | | | | | |`U+0E5F` | _unassigned_ | | | | | | | | | | | | | | |`U+0E60` | _unassigned_ | | | | | | |`U+0E61` | _unassigned_ | | | | | | |`U+0E62` | _unassigned_ | | | | | | |`U+0E63` | _unassigned_ | | | | | | |`U+0E64` | _unassigned_ | | | | | | |`U+0E65` | _unassigned_ | | | | | | |`U+0E66` | _unassigned_ | | | | | | |`U+0E67` | _unassigned_ | | | | | | |`U+0E68` | _unassigned_ | | | | | | |`U+0E69` | _unassigned_ | | | | | | |`U+0E6A` | _unassigned_ | | | | | | |`U+0E6B` | _unassigned_ | | | | | | |`U+0E6C` | _unassigned_ | | | | | | |`U+0E6D` | _unassigned_ | | | | | | |`U+0E6E` | _unassigned_ | | | | | | |`U+0E6F` | _unassigned_ | | | | | | | | | | | | | | |`U+0E70` | _unassigned_ | | | | | | |`U+0E71` | _unassigned_ | | | | | | |`U+0E72` | _unassigned_ | | | | | | |`U+0E73` | _unassigned_ | | | | | | |`U+0E74` | _unassigned_ | | | | | | |`U+0E75` | _unassigned_ | | | | | | |`U+0E76` | _unassigned_ | | | | | | |`U+0E77` | _unassigned_ | | | | | | |`U+0E78` | _unassigned_ | | | | | | |`U+0E79` | _unassigned_ | | | | | | |`U+0E7A` | _unassigned_ | | | | | | |`U+0E7B` | _unassigned_ | | | | | | |`U+0E7C` | _unassigned_ | | | | | | |`U+0E7D` | _unassigned_ | | | | | | |`U+0E7E` | _unassigned_ | | | | | | |`U+0E7F` | _unassigned_ | | | | | | ::: ## Miscellaneous character table ## In addition to general punctuation, runs of Thai text often use the combining macron below (`U+0331 `), combining tilde (`U+0303`), modifier letter apostrophe (`U+02BC`), and modifier letter minus sign (`U+02D7`), from the Combining Diacritical Marks block, particularly when used to write minority languages. In addition, Thai text typically does not insert spaces between words. Consequently, the Zero-Width Space (`U+200B`) character is often used to insert invisible break points that may be converted to line breaks. :::{table} Additional punctuation character table | Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph | |:----------|:-----------------|:------------------|:---------------------------|:-------------------------------| |`U+02BC` | Mark [Mn] | TONE_MARKER | TOP_POSITION | ʼ Modifier apostrophe | |`U+02D7` | Mark [Mn] | TONE_MARKER | BOTTOM_POSITION | ˗ Modifier minus sign | |`U+0303` | Mark [Mn] | TONE_MARKER | TOP_POSITION | ̃ Combining tilde | |`U+0331` | Mark [Mn] | TONE_MARKER | TOP_POSITION | ̱ Combining macron below| |`U+200B` | Separator | PLACEHOLDER | _null_ | ​ Zero-width space | ::: Other important characters that may be encountered when shaping runs of Thai text include the dotted-circle placeholder (`U+25CC`), the zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and the no-break space (`U+00A0`). The dotted-circle placeholder is frequently used when displaying a dependent vowel or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully. :::{table} Miscellaneous character table | Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph | |:----------|:-----------------|:------------------|:---------------------------|:-------------------------------| |`U+00A0` | Separator | PLACEHOLDER | _null_ |   No-break space | |`U+200C` | Other | NON_JOINER | _null_ | ‌ Zero-width non-joiner | |`U+200D` | Other | JOINER | _null_ | ‍ Zero-width joiner | |`U+2010` | Punctuation | PLACEHOLDER | _null_ | ‐ Hyphen | |`U+2011` | Punctuation | PLACEHOLDER | _null_ | ‑ No-break hyphen | |`U+2012` | Punctuation | PLACEHOLDER | _null_ | ‒ Figure dash | |`U+2013` | Punctuation | PLACEHOLDER | _null_ | – En dash | |`U+2014` | Punctuation | PLACEHOLDER | _null_ | — Em dash | |`U+25CC` | Symbol | DOTTED_CIRCLE | _null_ | ◌ Dotted circle | :::