Sinhala character tables

This document lists the per-character shaping information needed to shape Sinhala text.

Contents

Sinhala character table

Sinhala glyphs should be classified as in the following table. Codepoints in the Sinhala block with no assigned meaning are designated as unassigned in the Unicode category column.

Assigned codepoints with a null in the Shaping class column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols.

Note: the NUMBER and SYMBOL Shaping classes are important during syllable identification, but generally evoke no further special behavior during the rest of the shaping process.

The Mark-placement subclass column indicates mark-placement positioning for codepoints in the Mark category. Assigned, non-mark codepoints have a null in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the Unicode category column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining.

Some codepoints in the following table use a Shaping class that differs from the codepoint’s Unicode General Category. The Shaping class takes precedence during OpenType shaping, as it captures more specific, script-aware behavior.

Table 64 Sinhala character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Glyph

U+0D80

unassigned

U+0D81

Mark [Mn] _

BINDU

TOP_POSITION

ඁ Candrabindu

U+0D82

Mark [Mc]

BINDU

RIGHT_POSITION

ං Anusvara

U+0D83

Mark [Mc]

VISARGA

RIGHT_POSITION

ඃ Visarga

U+0D84

unassigned

U+0D85

Letter

VOWEL_INDEPENDENT

null

අ A

U+0D86

Letter

VOWEL_INDEPENDENT

null

ආ Aa

U+0D87

Letter

VOWEL_INDEPENDENT

null

ඇ Ae

U+0D88

Letter

VOWEL_INDEPENDENT

null

ඈ Aae

U+0D89

Letter

VOWEL_INDEPENDENT

null

ඉ I

U+0D8A

Letter

VOWEL_INDEPENDENT

null

ඊ Ii

U+0D8B

Letter

VOWEL_INDEPENDENT

null

උ U

U+0D8C

Letter

VOWEL_INDEPENDENT

null

ඌ Uu

U+0D8D

Letter

VOWEL_INDEPENDENT

null

ඍ Vocalic R

U+0D8E

Letter

VOWEL_INDEPENDENT

null

ඎ Vocalic Rr

U+0D8F

Letter

VOWEL_INDEPENDENT

null

ඏ Vocalic L

U+0D90

Letter

VOWEL_INDEPENDENT

null

ඐ Vocalic Ll

U+0D91

Letter

VOWEL_INDEPENDENT

null

එ E

U+0D92

Letter

VOWEL_INDEPENDENT

null

ඒ Ee

U+0D93

Letter

VOWEL_INDEPENDENT

null

ඓ Ai

U+0D94

Letter

VOWEL_INDEPENDENT

null

ඔ O

U+0D95

Letter

VOWEL_INDEPENDENT

null

ඕ Oo

U+0D96

Letter

VOWEL_INDEPENDENT

null

ඖ Au

U+0D97

unassigned

U+0D98

unassigned

U+0D99

unassigned

U+0D9A

Letter

CONSONANT

null

ක Ka

U+0D9B

Letter

CONSONANT

null

ඛ Kha

U+0D9C

Letter

CONSONANT

null

ග Ga

U+0D9D

Letter

CONSONANT

null

ඝ Gha

U+0D9E

Letter

CONSONANT

null

ඞ Nga

U+0D9F

Letter

CONSONANT

null

ඟ Nnga

U+0DA0

Letter

CONSONANT

null

ච Ca

U+0DA1

Letter

CONSONANT

null

ඡ Cha

U+0DA2

Letter

CONSONANT

null

ජ Ja

U+0DA3

Letter

CONSONANT

null

ඣ Jha

U+0DA4

Letter

CONSONANT

null

ඤ Nya

U+0DA5

Letter

CONSONANT

null

ඥ Jnya

U+0DA6

Letter

CONSONANT

null

ඦ Nyja

U+0DA7

Letter

CONSONANT

null

ට Tta

U+0DA8

Letter

CONSONANT

null

ඨ Ttha

U+0DA9

Letter

CONSONANT

null

ඩ Dda

U+0DAA

Letter

CONSONANT

null

ඪ Ddha

U+0DAB

Letter

CONSONANT

null

ණ Nna

U+0DAC

Letter

CONSONANT

null

ඬ Nndda

U+0DAD

Letter

CONSONANT

null

ත Ta

U+0DAE

Letter

CONSONANT

null

ථ Tha

U+0DAF

Letter

CONSONANT

null

ද Da

U+0DB0

Letter

CONSONANT

null

ධ Dha

U+0DB1

Letter

CONSONANT

null

න Na

U+0DB2

unassigned

U+0DB3

Letter

CONSONANT

null

ඳ Nda

U+0DB4

Letter

CONSONANT

null

ප Pa

U+0DB5

Letter

CONSONANT

null

ඵ Pha

U+0DB6

Letter

CONSONANT

null

බ Ba

U+0DB7

Letter

CONSONANT

null

භ Bha

U+0DB8

Letter

CONSONANT

null

ම Ma

U+0DB9

Letter

CONSONANT

null

ඹ Mba

U+0DBA

Letter

CONSONANT

null

ය Ya

U+0DBB

Letter

CONSONANT

null

ර Ra

U+0DBC

unassigned

U+0DBD

Letter

CONSONANT

null

ල La

U+0DBE

unassigned

U+0DBF

unassigned

U+0DC0

Letter

CONSONANT

null

ව Va

U+0DC1

Letter

CONSONANT

null

ශ Sha

U+0DC2

Letter

CONSONANT

null

ෂ Ssa

U+0DC3

Letter

CONSONANT

null

ස Sa

U+0DC4

Letter

CONSONANT

null

හ Ha

U+0DC5

Letter

CONSONANT

null

ළ Lla

U+0DC6

Letter

CONSONANT

null

ෆ Fa

U+0DC7

unassigned

U+0DC8

unassigned

U+0DC9

unassigned

U+0DCA

Mark [MN]

VIRAMA

TOP_POSITION

් Virama

U+0DCB

unassigned

U+0DCC

unassigned

U+0DCD

unassigned

U+0DCE

unassigned

U+0DCF

Mark [Mc]

VOWEL_DEPENDENT

RIGHT_POSITION

ා Sign Aa

U+0DD0

Mark [Mc]

VOWEL_DEPENDENT

RIGHT_POSITION

ැ Sign Ae

U+0DD1

Mark [Mc]

VOWEL_DEPENDENT

RIGHT_POSITION

ෑ Sign Aae

U+0DD2

Mark [Mn]

VOWEL_DEPENDENT

TOP_POSITION

ි Sign I

U+0DD3

Mark [Mn]

VOWEL_DEPENDENT

TOP_POSITION

ී Sign Ii

U+0DD4

Mark [Mn]

VOWEL_DEPENDENT

BOTTOM_POSITION

ු Sign U

U+0DD5

unassigned

U+0DD6

Mark [Mn]

VOWEL_DEPENDENT

BOTTOM_POSITION

ූ Sign Uu

U+0DD7

unassigned

U+0DD8

Mark [Mc]

VOWEL_DEPENDENT

RIGHT_POSITION

ෘ Sign Vocalic R

U+0DD9

Mark [Mc]

VOWEL_DEPENDENT

LEFT_POSITION

ෙ Sign E

U+0DDA

Mark [Mc]

VOWEL_DEPENDENT

TOP_AND_LEFT_POSITION

ේ Sign Ee

U+0DDB

Mark [Mc]

VOWEL_DEPENDENT

LEFT_POSITION

ෛ Sign Ai

U+0DDC

Mark [Mc]

VOWEL_DEPENDENT

LEFT_AND_RIGHT_POSITION

ො Sign O

U+0DDD

Mark [Mc]

VOWEL_DEPENDENT

TOP_LEFT_AND_RIGHT_POSITION

ෝ Sign Oo

U+0DDE

Mark [Mc]

VOWEL_DEPENDENT

LEFT_AND_RIGHT_POSITION

ෞ Sign Au

U+0DDF

Mark [Mc]

VOWEL_DEPENDENT

RIGHT_POSITION

ෟ Sign Vocalic L

U+0DE0

unassigned

U+0DE1

unassigned

U+0DE2

unassigned

U+0DE3

unassigned

U+0DE4

unassigned

U+0DE5

unassigned

U+0DE6

Number

NUMBER

null

෦ Digit Zero

U+0DE7

Number

NUMBER

null

෧ Digit One

U+0DE8

Number

NUMBER

null

෨ Digit Two

U+0DE9

Number

NUMBER

null

෩ Digit Three

U+0DEA

Number

NUMBER

null

෪ Digit Four

U+0DEB

Number

NUMBER

null

෫ Digit Five

U+0DEC

Number

NUMBER

null

෬ Digit Six

U+0DED

Number

NUMBER

null

෭ Digit Seven

U+0DEE

Number

NUMBER

null

෮ Digit Eight

U+0DEF

Number

NUMBER

null

෯ Digit Nine

U+0DF0

unassigned

U+0DF1

unassigned

U+0DF2

Mark [Mc]

VOWEL_DEPENDENT

RIGHT_POSITION

ෲ Sign Vocalic Rr

U+0DF3

Mark [Mc]

VOWEL_DEPENDENT

RIGHT_POSITION

ෳ Sign Vocalic Ll

U+0DF4

Punctuation

null

null

෴ Kunddaliya

U+0DF5

unassigned

U+0DF6

unassigned

U+0DF7

unassigned

U+0DF8

unassigned

U+0DF9

unassigned

U+0DFA

unassigned

U+0DFB

unassigned

U+0DFC

unassigned

U+0DFD

unassigned

U+0DFE

unassigned

U+0DFF

unassigned

Sinhala Archaic Numbers character table

Sinhala text runs may also include glyphs from the Sinhala Archaic Numbers block. These characters should be classified as follows.

Table 65 Sinhala Archaic Numbers character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Glyph

U+111E0

unassigned

U+111E1

Number

NUMBER

null

𑇡 Archaic Digit One

U+111E2

Number

NUMBER

null

𑇢 Archaic Digit Two

U+111E3

Number

NUMBER

null

𑇣 Archaic Digit Three

U+111E4

Number

NUMBER

null

𑇤 Archaic Digit Four

U+111E5

Number

NUMBER

null

𑇥 Archaic Digit Five

U+111E6

Number

NUMBER

null

𑇦 Archaic Digit Six

U+111E7

Number

NUMBER

null

𑇧 Archaic Digit Seven

U+111E8

Number

NUMBER

null

𑇨 Archaic Digit Eight

U+111E9

Number

NUMBER

null

𑇩 Archaic Digit Nine

U+111EA

Number

NUMBER

null

𑇪 Archaic Number Ten

U+111EB

Number

NUMBER

null

𑇫 Archaic Number 20

U+111EC

Number

NUMBER

null

𑇬 Archaic Number 30

U+111ED

Number

NUMBER

null

𑇭 Archaic Number 40

U+111EE

Number

NUMBER

null

𑇮 Archaic Number 50

U+111EF

Number

NUMBER

null

𑇯 Archaic Number 60

U+111F0

Number

NUMBER

null

𑇰 Archaic Number 70

U+111F1

Number

NUMBER

null

𑇱 Archaic Number 80

U+111F2

Number

NUMBER

null

𑇲 Archaic Number 90

U+111F3

Number

NUMBER

null

𑇳 Archaic Number 100

U+111F4

Number

NUMBER

null

𑇴 Archaic Number 1000

U+111F5

unassigned

U+111F6

unassigned

U+111F7

unassigned

U+111F8

unassigned

U+111F9

unassigned

U+111FA

unassigned

U+111FB

unassigned

U+111FC

unassigned

U+111FD

unassigned

U+111FE

unassigned

U+111FF

unassigned

Vedic Extensions character table

Sanskrit runs written in the Sinhala script may also include characters from the Vedic Extensions block. These characters should be classified as follows.

Note: See the Vedic Extensions document for additional information.

Table 66 Vedic Extensions character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Glyph

U+1CD0

Mark [Mn]

CANTILLATION

TOP_POSITION

᳐ Tone Karshana

U+1CD1

Mark [Mn]

CANTILLATION

TOP_POSITION

᳑ Tone Shara

U+1CD2

Mark [Mn]

CANTILLATION

TOP_POSITION

᳒ Tone Prenkha

U+1CD3

Punctuation

null

null

᳓ Sign Nihshvasa

U+1CD4

Mark [Mn]

CANTILLATION

OVERSTRUCK

᳔ Tone Midline Svarita

U+1CD5

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳕ Tone Aggravated Independent Svarita

U+1CD6

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳖ Tone Independent Svarita

U+1CD7

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳗ Tone Kathaka Independent Svarita

U+1CD8

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳘ Tone Candra Below

U+1CD9

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳙ Tone Kathaka Independent Svarita Schroeder

U+1CDA

Mark [Mn]

CANTILLATION

TOP_POSITION

᳚ Tone Double Svarita

U+1CDB

Mark [Mn]

CANTILLATION

TOP_POSITION

᳛ Tone Triple Svarita

U+1CDC

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳜ Tone Kathaka Anudatta

U+1CDD

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳝ Tone Dot Below

U+1CDE

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳞ Tone Two Dots Below

U+1CDF

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳟ Tone Three Dots Below

U+1CE0

Mark [Mn]

CANTILLATION

TOP_POSITION

᳠ Tone Rigvedic Kashmiri Independent Svarita

U+1CE1

Mark [Mc]

CANTILLATION

RIGHT_POSITION

᳡ Tone Atharavedic Independent Svarita

U+1CE2

Mark [Mn]

AVAGRAHA

OVERSTRUCK

᳢ Sign Visarga Svarita

U+1CE3

Mark [Mn]

null

OVERSTRUCK

᳣ Sign Visarga Udatta

U+1CE4

Mark [Mn]

null

OVERSTRUCK

᳤ Sign Reversed Visarga Udatta

U+1CE5

Mark [Mn]

null

OVERSTRUCK

᳥ Sign Visarga Anudatta

U+1CE6

Mark [Mn]

null

OVERSTRUCK

᳦ Sign Reversed Visarga Anudatta

U+1CE7

Mark [Mn]

null

OVERSTRUCK

᳧ Sign Visarga Udatta With Tail

U+1CE8

Mark [Mn]

AVAGRAHA

OVERSTRUCK

᳨ Sign Visarga Anudatta With Tail

U+1CE9

Letter

SYMBOL

null

ᳩ Sign Anusvara Antargomukha

U+1CEA

Letter

null

null

ᳪ Sign Anusvara Bahirgomukha

U+1CEB

Letter

null

null

ᳫ Sign Anusvara Vamagomukha

U+1CEC

Letter

SYMBOL

null

ᳬ Sign Anusvara Vamagomukha With Tail

U+1CED

Mark [Mn]

AVAGRAHA

BOTTOM_POSITION

᳭ Sign Tiryak

U+1CEE

Letter

SYMBOL

null

ᳮ Sign Hexiform Long Anusvara

U+1CEF

Letter

null

null

ᳯ Sign Long Anusvara

U+1CF0

Letter

null

null

ᳰ Sign Rthang Long Anusvara

U+1CF2

Letter

CONSONANT_DEAD

null

ᳲ Sign Ardhavisarga

U+1CF3

Letter

CONSONANT_DEAD

null

ᳳ Sign Rotated Ardhavisarga

U+1CF3

Mark [Mc]

VISARGA

null

ᳳ Sign Rotated Ardhavisarga

U+1CF4

Mark [Mn]

CANTILLATION

TOP_POSITION

᳴ Tone Candra Above

U+1CF5

Letter

CONSONANT_WITH_STACKER

null

ᳵ Sign Jihvamuliya

U+1CF6

Letter

CONSONANT_WITH_STACKER

null

ᳶ Sign Upadhmaniya

U+1CF7

Mark [Mc]

null

null

᳷ Sign Atikrama

U+1CF8

Mark [Mn]

CANTILLATION

null

᳸ Tone Ring Above

U+1CF9

Mark [Mn]

CANTILLATION

null

᳹ Tone Double Ring Above

U+1CFA

Letter

PLACEHOLDER

null

ᳺ Sign Double Anusvara Antargomukha

U+1CFB

unassigned

U+1CFC

unassigned

U+1CFD

unassigned

U+1CFE

unassigned

U+1CFF

unassigned

Miscellaneous character table

Other important characters that may be encountered when shaping runs of Sinhala text include the dotted-circle placeholder (U+25CC), the zero-width joiner (U+200D) and zero-width non-joiner (U+200C), and the no-break space (U+00A0).

The dotted-circle placeholder is frequently used when displaying a dependent vowel (matra) or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.

Table 67 Miscellaneous character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Glyph

U+00A0

Separator

PLACEHOLDER

null

  No-break space

U+200C

Other

NON_JOINER

null

‌ Zero-width non-joiner

U+200D

Other

JOINER

null

‍ Zero-width joiner

U+2010

Punctuation

PLACEHOLDER

null

‐ Hyphen

U+2011

Punctuation

PLACEHOLDER

null

‑ No-break hyphen

U+2012

Punctuation

PLACEHOLDER

null

‒ Figure dash

U+2013

Punctuation

PLACEHOLDER

null

– En dash

U+2014

Punctuation

PLACEHOLDER

null

— Em dash

U+25CC

Symbol

DOTTED_CIRCLE

null

◌ Dotted circle

The zero-width joiner (ZWJ) is used to request the subjoined form of a consonant. The sequence “Consonant_1,Halant,ZWJ,Consonant_2” is used to specify the subjoined form of “Consonant_2”.

A secondary usage of the zero-width joiner is to explicitly request the formation of “Reph”. An initial “Ra,Halant,ZWJ” sequence should produce a “Reph”.

The zero-width non-joiner (ZWNJ) is not used in shaping runs of Sinhala text. The ZWNJ is referenced below in various regular expressions and shaping rules, however, because it is used by other Indic scripts.

The no-break space (NBSP) is primarily used to display those codepoints that are defined as non-spacing (marks, dependent vowels (matras), below-base consonant forms, and post-base consonant forms) in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder. These sequences will match “NBSP,ZWJ,Halant,Consonant”, “NBSP,mark”, or “NBSP,matra”.