Lao character tables

This document lists the per-character shaping information needed to shape Lao text.

Contents

Lao character table

Lao glyphs should be classified as in the following table. Codepoints in the Lao block with no assigned meaning are designated as unassigned in the Unicode category column.

Assigned codepoints with a null in the Shaping class column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols.

Note: the NUMBER and SYMBOL Shaping classes are important during syllable identification, but generally evoke no further special behavior during the rest of the shaping process.

The Mark-placement subclass column indicates mark-placement positioning for codepoints in the Mark category. Assigned, non-mark codepoints have a null in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the Unicode category column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining.

Some codepoints in the following table use a Shaping class that differs from the codepoint’s Unicode General Category. The Shaping class takes precedence during OpenType shaping, as it captures more specific, script-aware behavior.

Table 42 Lao character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Combining class

PUA

Glyph

U+0E80

unassigned

U+0E81

Letter

CONSONANT

null

0

null

ກ Ko

U+0E82

Letter

CONSONANT

null

0

null

ຂ Kho Sung

U+0E83

unassigned

U+0E84

Letter

CONSONANT

null

0

null

ຄ Kho Tam

U+0E85

unassigned

U+0E86

Letter

CONSONANT

null

0

null

ຆ Pali Gha

U+0E87

Letter

CONSONANT

null

0

null

ງ Ngo

U+0E88

Letter

CONSONANT

null

0

null

ຈ Co

U+0E89

Letter

CONSONANT

null

0

null

ຉ Pali Cha

U+0E8A

Letter

CONSONANT

null

0

null

ຊ So Tam

U+0E8B

unassigned

U+0E8C

Letter

CONSONANT

null

0

null

ຌ Pali Jha

U+0E8D

Letter

CONSONANT

null

0

null

ຍ Nyo

U+0E8E

Letter

CONSONANT

null

0

null

ຎ Pali Nya

U+0E8F

Letter

CONSONANT

null

0

null

ຏ Pali Tta

U+0E90

Letter

CONSONANT

null

0

null

ຐ Pali Ttha

U+0E91

Letter

CONSONANT

null

0

null

ຑ Pali Dda

U+0E92

Letter

CONSONANT

null

0

null

ຒ Pali Ddha

U+0E93

Letter

CONSONANT

null

0

null

ຓ Pali Nna

U+0E94

Letter

CONSONANT

null

0

null

ດ Do

U+0E95

Letter

CONSONANT

null

0

null

ຕ To

U+0E96

Letter

CONSONANT

null

0

null

ຖ Tho Sung

U+0E97

Letter

CONSONANT

null

0

null

ທ Tho Tam

U+0E98

Letter

CONSONANT

null

0

null

ຘ Pali Dha

U+0E99

Letter

CONSONANT

null

0

null

ນ No

U+0E9A

Letter

CONSONANT

null

0

null

ບ Bo

U+0E9B

Letter

CONSONANT

null

0

null

ປ Po

U+0E9C

Letter

CONSONANT

null

0

null

ຜ Pho Sung

U+0E9D

Letter

CONSONANT

null

0

null

ຝ Fo Tam

U+0E9E

Letter

CONSONANT

null

0

null

ພ Pho Tam

U+0E9F

Letter

CONSONANT

null

0

null

ຟ Fo Sung

U+0EA0

Letter

CONSONANT

null

0

null

ຠ Pali Bha

U+0EA1

Letter

CONSONANT

null

0

null

ມ Mo

U+0EA2

Letter

CONSONANT

null

0

null

ຢ Yo

U+0EA3

Letter

CONSONANT

null

0

null

ຣ Lo Ling

U+0EA4

unassigned

U+0EA5

Letter

CONSONANT

null

0

null

ລ Lo Loot

U+0EA6

unassigned

U+0EA7

Letter

CONSONANT

null

0

null

ວ Wo

U+0EA8

Letter

CONSONANT

null

0

null

ຨ Sanskrit Sha

U+0EA9

Letter

CONSONANT

null

0

null

ຩ Sanskrit Ssa

U+0EAA

Letter

CONSONANT

null

0

null

ສ So Sung

U+0EAB

Letter

CONSONANT

null

0

null

ຫ Ho Sung

U+0EAC

Letter

CONSONANT

null

0

null

ຬ Pali Lla

U+0EAD

Letter

CONSONANT

null

0

null

ອ O

U+0EAE

Letter

CONSONANT

null

0

null

ຮ Ho Tam

U+0EAF

Letter

null

null

0

null

ຯ Ellipsis

U+0EB0

Letter

VOWEL_DEPENDENT

RIGHT_POSITION

0

null

ະ Sign A

U+0EB1

Mark [Mn]

VOWEL_DEPENDENT

TOP_POSITION

0

null

ັ Sign Mai Kan

U+0EB2

Letter

VOWEL_DEPENDENT

RIGHT_POSITION

0

null

າ Sign Aa

U+0EB3

Letter

VOWEL_DEPENDENT

RIGHT_POSITION

0

null

ຳ Sign Am

U+0EB4

Mark [Mn]

VOWEL_DEPENDENT

TOP_POSITION

0

null

ິ Sign I

U+0EB5

Mark [Mn]

VOWEL_DEPENDENT

TOP_POSITION

0

null

ີ Sign Ii

U+0EB6

Mark [Mn]

VOWEL_DEPENDENT

TOP_POSITION

0

null

ຶ Sign Y

U+0EB7

Mark [Mn]

VOWEL_DEPENDENT

TOP_POSITION

0

null

ື Sign Yy

U+0EB8

Mark [Mn]

VOWEL_DEPENDENT

BOTTOM_POSITION

118

null

ຸ Sign U

U+0EB9

Mark [Mn]

VOWEL_DEPENDENT

BOTTOM_POSITION

118

null

ູ Sign Uu

U+0EBA

Mark [Mn]

VIRAMA

BOTTOM_POSITION

9

null

຺ Pali Virama

U+0EBB

Mark [Mn]

VOWEL_DEPENDENT

TOP_POSITION

0

null

ົ Sign Mai Kon

U+0EBC

Mark [Mn]

CONSONANT_MEDIAL

BOTTOM_POSITION

0

null

ຼ Semivowel Sign Lo

U+0EBD

Letter

CONSONANT_MEDIAL

null

0

null

ຽ Semivowel Sign Nyo

U+0EBE

unassigned

U+0EBF

unassigned

U+0EC0

Letter

VOWEL_DEPENDENT

VISUAL_ORDER_LEFT

0

null

ເ Sign E

U+0EC1

Letter

VOWEL_DEPENDENT

VISUAL_ORDER_LEFT

0

null

ແ Sign Ei

U+0EC2

Letter

VOWEL_DEPENDENT

VISUAL_ORDER_LEFT

0

null

ໂ Sign O

U+0EC3

Letter

VOWEL_DEPENDENT

VISUAL_ORDER_LEFT

0

null

ໃ Sign Ay

U+0EC4

Letter

VOWEL_DEPENDENT

VISUAL_ORDER_LEFT

0

null

ໄ Sign Ai

U+0EC5

unassigned

U+0EC6

Letter Modifier

null

null

0

null

ໆ Ko La

U+0EC7

unassigned

U+0EC8

Mark [Mn]

TONE_MARKER

TOP_POSITION

122

null

່ Tone Mai Ek

U+0EC9

Mark [Mn]

TONE_MARKER

TOP_POSITION

122

null

້ Tone Mai Tho

U+0ECA

Mark [Mn]

TONE_MARKER

TOP_POSITION

122

null

໊ Tone Mai Ti

U+0ECB

Mark [Mn]

TONE_MARKER

TOP_POSITION

122

null

໋ Tone Mai Catawa

U+0ECC

Mark [Mn]

null

TOP_POSITION

0

null

໌ Cancellation mark

U+0ECD

Mark [Mn]

BINDU

TOP_POSITION

0

null

ໍ Niggahita

U+0ECE

Mark [Mn]

TONE_MARKER

TOP_POSITION

0

null

໎ Yamakkan

U+0ECF

unassigned

U+0ED0

Number

NUMBER

null

0

null

໐ Digit Zero

U+0ED1

Number

NUMBER

null

0

null

໑ Digit One

U+0ED2

Number

NUMBER

null

0

null

໒ Digit Two

U+0ED3

Number

NUMBER

null

0

null

໓ Digit Three

U+0ED4

Number

NUMBER

null

0

null

໔ Digit Four

U+0ED5

Number

NUMBER

null

0

null

໕ Digit Five

U+0ED6

Number

NUMBER

null

0

null

໖ Digit Six

U+0ED7

Number

NUMBER

null

0

null

໗ Digit Seven

U+0ED8

Number

NUMBER

null

0

null

໘ Digit Eight

U+0ED9

Number

NUMBER

null

0

null

໙ Digit Nine

U+0EDA

unassigned

U+0EDB

unassigned

U+0EDC

Letter

CONSONANT

null

0

null

ໜ Ho No

U+0EDD

Letter

CONSONANT

null

0

null

ໝ Ho Mo

U+0EDE

Letter

CONSONANT

null

0

null

ໞ Khmu Go

U+0EDF

Letter

CONSONANT

null

0

null

ໟ Khmu Nyo

U+0EE0

unassigned

U+0EE1

unassigned

U+0EE2

unassigned

U+0EE3

unassigned

U+0EE4

unassigned

U+0EE5

unassigned

U+0EE6

unassigned

U+0EE7

unassigned

U+0EE8

unassigned

U+0EE9

unassigned

U+0EEA

unassigned

U+0EEB

unassigned

U+0EEC

unassigned

U+0EED

unassigned

U+0EEE

unassigned

U+0EEF

unassigned

U+0EF0

unassigned

U+0EF1

unassigned

U+0EF2

unassigned

U+0EF3

unassigned

U+0EF4

unassigned

U+0EF5

unassigned

U+0EF6

unassigned

U+0EF7

unassigned

U+0EF8

unassigned

U+0EF9

unassigned

U+0EFA

unassigned

U+0EFB

unassigned

U+0EFC

unassigned

U+0EFD

unassigned

U+0EFE

unassigned

U+0EFF

unassigned

Miscellaneous character table

In addition to general punctuation, runs of Lao text text typically do not insert spaces between words. Consequently, the Zero-Width Space (U+200B) character is often used to insert invisible break points that may be converted to line breaks.

Table 43 Additional punctuation character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Glyph

U+200B

Separator

PLACEHOLDER

null

​ Zero-width space

Other important characters that may be encountered when shaping runs of Lao text include the dotted-circle placeholder (U+25CC), the zero-width joiner (U+200D) and zero-width non-joiner (U+200C), and the no-break space (U+00A0).

The dotted-circle placeholder is frequently used when displaying a dependent vowel or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.

Table 44 Miscellaneous character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Glyph

U+00A0

Separator

PLACEHOLDER

null

  No-break space

U+200C

Other

NON_JOINER

null

‌ Zero-width non-joiner

U+200D

Other

JOINER

null

‍ Zero-width joiner

U+2010

Punctuation

PLACEHOLDER

null

‐ Hyphen

U+2011

Punctuation

PLACEHOLDER

null

‑ No-break hyphen

U+2012

Punctuation

PLACEHOLDER

null

‒ Figure dash

U+2013

Punctuation

PLACEHOLDER

null

– En dash

U+2014

Punctuation

PLACEHOLDER

null

— Em dash

U+25CC

Symbol

DOTTED_CIRCLE

null

◌ Dotted circle