Thai character tables

This document lists the per-character shaping information needed to shape Thai text.

Contents

Thai character table

Thai glyphs should be classified as in the following table. Codepoints in the Thai block with no assigned meaning are designated as unassigned in the Unicode category column.

Assigned codepoints with a null in the Shaping class column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols.

Note: the NUMBER and SYMBOL Shaping classes are important during syllable identification, but generally evoke no further special behavior during the rest of the shaping process.

The Mark-placement subclass column indicates mark-placement positioning for codepoints in the Mark category. Assigned, non-mark codepoints have a null in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the Unicode category column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining.

Some codepoints in the following table use a Shaping class that differs from the codepoint’s Unicode General Category. The Shaping class takes precedence during OpenType shaping, as it captures more specific, script-aware behavior.

Table 81 Thai character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Combining class

PUA

Glyph

U+0E00

unassigned

U+0E01

Letter

CONSONANT

null

0

NC

ก Ko Kai

U+0E02

Letter

CONSONANT

null

0

NC

ข Kho Khai

U+0E03

Letter

CONSONANT

null

0

NC

ฃ Kho Khuat

U+0E04

Letter

CONSONANT

null

0

NC

ค Kho Khwai

U+0E05

Letter

CONSONANT

null

0

NC

ฅ Kho Khon

U+0E06

Letter

CONSONANT

null

0

NC

ฆ Kho Rakhang

U+0E07

Letter

CONSONANT

null

0

NC

ง Ngo Ngu

U+0E08

Letter

CONSONANT

null

0

NC

จ Cho Chan

U+0E09

Letter

CONSONANT

null

0

NC

ฉ Cho Ching

U+0E0A

Letter

CONSONANT

null

0

NC

ช Cho Chang

U+0E0B

Letter

CONSONANT

null

0

NC

ซ So So

U+0E0C

Letter

CONSONANT

null

0

NC

ฌ Cho Choe

U+0E0D

Letter

CONSONANT

null

0

RC

ญ Yo Ying

U+0E0E

Letter

CONSONANT

null

0

DC

ฎ Do Chada

U+0E0F

Letter

CONSONANT

null

0

DC

ฏ To Patak

U+0E10

Letter

CONSONANT

null

0

RC

ฐ Tho Than

U+0E11

Letter

CONSONANT

null

0

NC

ฑ Tho Nangmontho

U+0E12

Letter

CONSONANT

null

0

NC

ฒ Tho Phuthao

U+0E13

Letter

CONSONANT

null

0

NC

ณ No Nen

U+0E14

Letter

CONSONANT

null

0

NC

ด Do Dek

U+0E15

Letter

CONSONANT

null

0

NC

ต To Tao

U+0E16

Letter

CONSONANT

null

0

NC

ถ Tho Thung

U+0E17

Letter

CONSONANT

null

0

NC

ท Tho Thahan

U+0E18

Letter

CONSONANT

null

0

NC

ธ Tho Thong

U+0E19

Letter

CONSONANT

null

0

NC

น No Nu

U+0E1A

Letter

CONSONANT

null

0

NC

บ Bo Baimai

U+0E1B

Letter

CONSONANT

null

0

AC

ป Po Pla

U+0E1C

Letter

CONSONANT

null

0

NC

ผ Pho Phung

U+0E1D

Letter

CONSONANT

null

0

AC

ฝ Fo Fa

U+0E1E

Letter

CONSONANT

null

0

NC

พ Pho Phan

U+0E1F

Letter

CONSONANT

null

0

AC

ฟ Fo Fan

U+0E20

Letter

CONSONANT

null

0

NC

ภ Pho Samphao

U+0E21

Letter

CONSONANT

null

0

NC

ม Mo Ma

U+0E22

Letter

CONSONANT

null

0

NC

ย Yo Yak

U+0E23

Letter

CONSONANT

null

0

NC

ร Ro Rua

U+0E24

Letter

CONSONANT

null

0

NC

ฤ Ru

U+0E25

Letter

CONSONANT

null

0

NC

ล Lo Ling

U+0E26

Letter

CONSONANT

null

0

NC

ฦ Lu

U+0E27

Letter

CONSONANT

null

0

NC

ว Wo Waen

U+0E28

Letter

CONSONANT

null

0

NC

ศ So Sala

U+0E29

Letter

CONSONANT

null

0

NC

ษ So Rusi

U+0E2A

Letter

CONSONANT

null

0

NC

ส So Sua

U+0E2B

Letter

CONSONANT

null

0

NC

ห Ho Hip

U+0E2C

Letter

CONSONANT

null

0

NC

ฬ Lo Chula

U+0E2D

Letter

CONSONANT

null

0

NC

อ O Ang

U+0E2E

Letter

CONSONANT

null

0

NC

ฮ Ho Nokhuk

U+0E2F

Letter

CONSONANT

null

0

null

ฯ Paiyannoi

U+0E30

Letter

VOWEL_DEPENDENT

RIGHT_POSITION

0

CV

ะ Sara A

U+0E31

Mark [Mn]

VOWEL_DEPENDENT

TOP_POSITION

0

AV

ั Mai Han-akat

U+0E32

Letter

VOWEL_DEPENDENT

RIGHT_POSITION

0

CV

า Sara Aa

U+0E33

Letter

VOWEL_DEPENDENT

RIGHT_POSITION

0

null

ำ Sara Am

U+0E34

Mark [Mn]

VOWEL_DEPENDENT

TOP_POSITION

0

AV

ิ Sara I

U+0E35

Mark [Mn]

VOWEL_DEPENDENT

TOP_POSITION

0

AV

ี Sara Ii

U+0E36

Mark [Mn]

VOWEL_DEPENDENT

TOP_POSITION

0

AV

ึ Sara Ue

U+0E37

Mark [Mn]

VOWEL_DEPENDENT

TOP_POSITION

0

AV

ื Sara Uee

U+0E38

Mark [Mn]

VOWEL_DEPENDENT

BOTTOM_POSITION

3

BV

ุ Sara U

U+0E39

Mark [Mn]

VOWEL_DEPENDENT

BOTTOM_POSITION

3

BV

ู Sara Uu

U+0E3A

Mark [Mn]

PURE_KILLER

BOTTOM_POSITION

9

BV

ฺ Phinthu

U+0E3B

unassigned

U+0E3C

unassigned

U+0E3D

unassigned

U+0E3E

unassigned

U+0E3F

Symbol

SYMBOL

null

0

null

฿ Currency symbol Baht

U+0E40

Letter

VOWEL_DEPENDENT

VISUAL_ORDER_LEFT

0

CV

เ Sara E

U+0E41

Letter

VOWEL_DEPENDENT

VISUAL_ORDER_LEFT

0

CV

แ Sara Ae

U+0E42

Letter

VOWEL_DEPENDENT

VISUAL_ORDER_LEFT

0

CV

โ Sara O

U+0E43

Letter

VOWEL_DEPENDENT

VISUAL_ORDER_LEFT

0

CV

ใ Sara Ai Maimuan

U+0E44

Letter

VOWEL_DEPENDENT

VISUAL_ORDER_LEFT

0

CV

ไ Sara Ai Maimalai

U+0E45

Letter

VOWEL_DEPENDENT

RIGHT_POSITION

0

CV

ๅ Lakkhangyao

U+0E46

Letter Modifier

null

null

0

null

ๆ Maiyamok

U+0E47

Mark [Mn]

VOWEL_DEPENDENT

TOP_POSITION

0

AV

็ Maitaikhu

U+0E48

Mark [Mn]

TONE_MARKER

TOP_POSITION

107

TV

่ Mai Ek

U+0E49

Mark [Mn]

TONE_MARKER

TOP_POSITION

107

TV

้ Mai Tho

U+0E4A

Mark [Mn]

TONE_MARKER

TOP_POSITION

107

TV

๊ Mai Tri

U+0E4B

Mark [Mn]

TONE_MARKER

TOP_POSITION

107

TV

๋ Mai Chattawa

U+0E4C

Mark [Mn]

CONSONANT_KILLER

TOP_POSITION

0

TV

์ Thanthakhat

U+0E4D

Mark [Mn]

BINDU

TOP_POSITION

0

AV

ํ Nikhahit

U+0E4E

Mark [Mn]

PURE_KILLER

TOP_POSITION

0

AV

๎ Yamakkan

U+0E4F

Punctuation

null

null

0

null

๏ Fongman

U+0E50

Number

NUMBER

null

0

null

๐ Digit zero

U+0E51

Number

NUMBER

null

0

null

๑ Digit one

U+0E52

Number

NUMBER

null

0

null

๒ Digit two

U+0E53

Number

NUMBER

null

0

null

๓ Digit three

U+0E54

Number

NUMBER

null

0

null

๔ Digit four

U+0E55

Number

NUMBER

null

0

null

๕ Digit five

U+0E56

Number

NUMBER

null

0

null

๖ Digit six

U+0E57

Number

NUMBER

null

0

null

๗ Digit seven

U+0E58

Number

NUMBER

null

0

null

๘ Digit eight

U+0E59

Number

NUMBER

null

0

null

๙ Digit nine

U+0E5A

Punctuation

null

null

0

null

๚ Angkhankhu

U+0E5B

Punctuation

null

null

0

null

๛ Khomut

U+0E5C

unassigned

U+0E5D

unassigned

U+0E5E

unassigned

U+0E5F

unassigned

U+0E60

unassigned

U+0E61

unassigned

U+0E62

unassigned

U+0E63

unassigned

U+0E64

unassigned

U+0E65

unassigned

U+0E66

unassigned

U+0E67

unassigned

U+0E68

unassigned

U+0E69

unassigned

U+0E6A

unassigned

U+0E6B

unassigned

U+0E6C

unassigned

U+0E6D

unassigned

U+0E6E

unassigned

U+0E6F

unassigned

U+0E70

unassigned

U+0E71

unassigned

U+0E72

unassigned

U+0E73

unassigned

U+0E74

unassigned

U+0E75

unassigned

U+0E76

unassigned

U+0E77

unassigned

U+0E78

unassigned

U+0E79

unassigned

U+0E7A

unassigned

U+0E7B

unassigned

U+0E7C

unassigned

U+0E7D

unassigned

U+0E7E

unassigned

U+0E7F

unassigned

Miscellaneous character table

In addition to general punctuation, runs of Thai text often use the combining macron below (U+0331 ), combining tilde (U+0303), modifier letter apostrophe (U+02BC), and modifier letter minus sign (U+02D7), from the Combining Diacritical Marks block, particularly when used to write minority languages.

In addition, Thai text typically does not insert spaces between words. Consequently, the Zero-Width Space (U+200B) character is often used to insert invisible break points that may be converted to line breaks.

Table 82 Additional punctuation character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Glyph

U+02BC

Mark [Mn]

TONE_MARKER

TOP_POSITION

ʼ Modifier apostrophe

U+02D7

Mark [Mn]

TONE_MARKER

BOTTOM_POSITION

˗ Modifier minus sign

U+0303

Mark [Mn]

TONE_MARKER

TOP_POSITION

̃ Combining tilde

U+0331

Mark [Mn]

TONE_MARKER

TOP_POSITION

̱ Combining macron below

U+200B

Separator

PLACEHOLDER

null

​ Zero-width space

Other important characters that may be encountered when shaping runs of Thai text include the dotted-circle placeholder (U+25CC), the zero-width joiner (U+200D) and zero-width non-joiner (U+200C), and the no-break space (U+00A0).

The dotted-circle placeholder is frequently used when displaying a dependent vowel or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.

Table 83 Miscellaneous character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Glyph

U+00A0

Separator

PLACEHOLDER

null

  No-break space

U+200C

Other

NON_JOINER

null

‌ Zero-width non-joiner

U+200D

Other

JOINER

null

‍ Zero-width joiner

U+2010

Punctuation

PLACEHOLDER

null

‐ Hyphen

U+2011

Punctuation

PLACEHOLDER

null

‑ No-break hyphen

U+2012

Punctuation

PLACEHOLDER

null

‒ Figure dash

U+2013

Punctuation

PLACEHOLDER

null

– En dash

U+2014

Punctuation

PLACEHOLDER

null

— Em dash

U+25CC

Symbol

DOTTED_CIRCLE

null

◌ Dotted circle