Tamil character tables

This document lists the per-character shaping information needed to shape Tamil text.

Contents

Tamil character table

Tamil glyphs should be classified as in the following table. Codepoints in the Tamil block with no assigned meaning are designated as unassigned in the Unicode category column.

Assigned codepoints with a null in the Shaping class column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols.

Note: the NUMBER and SYMBOL Shaping classes are important during syllable identification, but generally evoke no further special behavior during the rest of the shaping process.

The Mark-placement subclass column indicates mark-placement positioning for codepoints in the Mark category. Assigned, non-mark codepoints have a null in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the Unicode category column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining.

Some codepoints in the following table use a Shaping class that differs from the codepoint’s Unicode General Category. The Shaping class takes precedence during OpenType shaping, as it captures more specific, script-aware behavior.

Table 71 Tamil character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Glyph

U+0B80

unassigned

U+0B81

unassigned

U+0B82

Mark [Mn]

BINDU

TOP_POSITION

ஂ Anusvara

U+0B83

Letter

MODIFYING_LETTER

null

ஃ Visarga

U+0B84

unassigned

U+0B85

Letter

VOWEL_INDEPENDENT

null

அ A

U+0B86

Letter

VOWEL_INDEPENDENT

null

ஆ Aa

U+0B87

Letter

VOWEL_INDEPENDENT

null

இ I

U+0B88

Letter

VOWEL_INDEPENDENT

null

ஈ Ii

U+0B89

Letter

VOWEL_INDEPENDENT

null

உ U

U+0B8A

Letter

VOWEL_INDEPENDENT

null

ஊ Uu

U+0B8B

unassigned

U+0B8C

unassigned

U+0B8D

unassigned

U+0B8E

Letter

VOWEL_INDEPENDENT

null

எ E

U+0B8F

Letter

VOWEL_INDEPENDENT

null

ஏ Ee

U+0B90

Letter

VOWEL_INDEPENDENT

null

ஐ Ai

U+0B91

unassigned

U+0B92

Letter

VOWEL_INDEPENDENT

null

ஒ O

U+0B93

Letter

VOWEL_INDEPENDENT

null

ஓ Oo

U+0B94

Letter

VOWEL_INDEPENDENT

null

ஔ Au

U+0B95

Letter

CONSONANT

null

க Ka

U+0B96

unassigned

U+0B97

unassigned

U+0B98

unassigned

U+0B99

Letter

CONSONANT

null

ங Nga

U+0B9A

Letter

CONSONANT

null

ச Ca

U+0B9B

unassigned

U+0B9C

Letter

CONSONANT

null

ஜ Ja

U+0B9D

unassigned

U+0B9E

Letter

CONSONANT

null

ஞ Nya

U+0B9F

Letter

CONSONANT

null

ட Tta

U+0BA0

unassigned

U+0BA1

unassigned

U+0BA2

unassigned

U+0BA3

Letter

CONSONANT

null

ண Nna

U+0BA4

Letter

CONSONANT

null

த Ta

U+0BA5

unassigned

U+0BA6

unassigned

U+0BA7

unassigned

U+0BA8

Letter

CONSONANT

null

ந Na

U+0BA9

Letter

CONSONANT

null

ன Nnna

U+0BAA

Letter

CONSONANT

null

ப Pa

U+0BAB

unassigned

U+0BAC

unassigned

U+0BAD

unassigned

U+0BAE

Letter

CONSONANT

null

ம Ma

U+0BAF

Letter

CONSONANT

null

ய Ya

U+0BB0

Letter

CONSONANT

null

ர Ra

U+0BB1

Letter

CONSONANT

null

ற Rra

U+0BB2

Letter

CONSONANT

null

ல La

U+0BB3

Letter

CONSONANT

null

ள Lla

U+0BB4

Letter

CONSONANT

null

ழ Llla

U+0BB5

Letter

CONSONANT

null

வ Va

U+0BB6

Letter

CONSONANT

null

ஶ Sha

U+0BB7

Letter

CONSONANT

null

ஷ Ssa

U+0BB8

Letter

CONSONANT

null

ஸ Sa

U+0BB9

Letter

CONSONANT

null

ஹ Ha

U+0BBA

unassigned

U+0BBB

unassigned

U+0BBC

unassigned

U+0BBD

unassigned

U+0BBE

Mark [Mc]

VOWEL_DEPENDENT

RIGHT_POSITION

ா Sign Aa

U+0BBF

Mark [Mc]

VOWEL_DEPENDENT

RIGHT_POSITION

ி Sign I

U+0BC0

Mark [Mn]

VOWEL_DEPENDENT

TOP_POSITION

ீ Sign Ii

U+0BC1

Mark [Mc]

VOWEL_DEPENDENT

RIGHT_POSITION

ு Sign U

U+0BC2

Mark [Mc]

VOWEL_DEPENDENT

RIGHT_POSITION

ூ Sign Uu

U+0BC3

unassigned

U+0BC4

unassigned

U+0BC5

unassigned

U+0BC6

Mark [Mc]

VOWEL_DEPENDENT

LEFT_POSITION

ெ Sign E

U+0BC7

Mark [Mc]

VOWEL_DEPENDENT

LEFT_POSITION

ே Sign Ee

U+0BC8

Mark [Mc]

VOWEL_DEPENDENT

LEFT_POSITION

ை Sign Ai

U+0BC9

unassigned

U+0BCA

Mark [Mc]

VOWEL_DEPENDENT

LEFT_AND_RIGHT_POSITION

ொ Sign O

U+0BCB

Mark [Mc]

VOWEL_DEPENDENT

LEFT_AND_RIGHT_POSITION

ோ Sign Oo

U+0BCC

Mark [Mc]

VOWEL_DEPENDENT

LEFT_AND_RIGHT_POSITION

ௌ Sign Au

U+0BCD

Mark [Mn]

VIRAMA

TOP_POSITION

் Virama

U+0BCE

unassigned

U+0BCF

unassigned

U+0BD0

Letter

null

null

ௐ Om

U+0BD1

unassigned

U+0BD2

unassigned

U+0BD3

unassigned

U+0BD4

unassigned

U+0BD5

unassigned

U+0BD6

unassigned

U+0BD7

Mark [Mc]

VOWEL_DEPENDENT

RIGHT_POSITION

ௗ Au Length Mark

U+0BD8

unassigned

U+0BD9

unassigned

U+0BDA

unassigned

U+0BDB

unassigned

U+0BDC

unassigned

U+0BDD

unassigned

U+0BDE

unassigned

U+0BDF

unassigned

U+0BE0

unassigned

U+0BE1

unassigned

U+0BE2

unassigned

U+0BE3

unassigned

U+0BE4

unassigned

U+0BE5

unassigned

U+0BE6

Number

NUMBER

null

௦ Digit Zero

U+0BE7

Number

NUMBER

null

௧ Digit One

U+0BE8

Number

NUMBER

null

௨ Digit Two

U+0BE9

Number

NUMBER

null

௩ Digit Three

U+0BEA

Number

NUMBER

null

௪ Digit Four

U+0BEB

Number

NUMBER

null

௫ Digit Five

U+0BEC

Number

NUMBER

null

௬ Digit Six

U+0BED

Number

NUMBER

null

௭ Digit Seven

U+0BEE

Number

NUMBER

null

௮ Digit Eight

U+0BEF

Number

NUMBER

null

௯ Digit Nine

U+0BF0

Number

NUMBER

null

௰ Number Ten

U+0BF1

Number

NUMBER

null

௱ Number One Hundred

U+0BF2

Number

NUMBER

null

௲ Number One Thousand

U+0BF3

Symbol

SYMBOL

null

௳ Day Sign

U+0BF4

Symbol

SYMBOL

null

௴ Month Sign

U+0BF5

Symbol

SYMBOL

null

௵ Year Sign

U+0BF6

Symbol

SYMBOL

null

௶ Debit Sign

U+0BF7

Symbol

SYMBOL

null

௷ Credit Sign

U+0BF8

Symbol

SYMBOL

null

௸ As Above Sign

U+0BF9

Symbol

SYMBOL

null

௹ Tamil Rupee Sign

U+0BFA

Symbol

SYMBOL

null

௺ Number Sign

U+0BFB

unassigned

U+0BFC

unassigned

U+0BFD

unassigned

U+0BFE

unassigned

U+0BFF

unassigned

Tamil Supplement character table

Tamil text runs may also include historical symbols and fractions from the Tamil Supplement block. These characters should be classified as follows.

Table 72 Tamil Supplement character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Glyph

U+11FC0

Number

NUMBER

null

𑿀 Fraction One Three-Hundred-And-Twentieth

U+11FC1

Number

NUMBER

null

𑿁 Fraction One One-Hundred-And-Sixtieth

U+11FC2

Number

NUMBER

null

𑿂 Fraction One Eightieth

U+11FC3

Number

NUMBER

null

𑿃 Fraction One Sixty-Fourth

U+11FC4

Number

NUMBER

null

𑿄 Fraction One Fortieth

U+11FC5

Number

NUMBER

null

𑿅 Fraction One Thirty-Second

U+11FC6

Number

NUMBER

null

𑿆 Fraction Three Eightieths

U+11FC7

Number

NUMBER

null

𑿇 Fraction Three Sixty-Fourths

U+11FC8

Number

NUMBER

null

𑿈 Fraction One Twentieth

U+11FC9

Number

NUMBER

null

𑿉 Fraction One Sixteenth-1

U+11FCA

Number

NUMBER

null

𑿊 Fraction One Sixteenth-2

U+11FCB

Number

NUMBER

null

𑿋 Fraction One Tenth

U+11FCC

Number

NUMBER

null

𑿌 Fraction One Eighth

U+11FCD

Number

NUMBER

null

𑿍 Fraction Three Twentieths

U+11FCE

Number

NUMBER

null

𑿎 Fraction Three Sixteenths

U+11FCF

Number

NUMBER

null

𑿏 Fraction One Fifth

U+11FD0

Number

NUMBER

null

𑿐 Fraction One Quarter

U+11FD1

Number

NUMBER

null

𑿑 Fraction One Half-1

U+11FD2

Number

NUMBER

null

𑿒 Fraction One Half-2

U+11FD3

Number

NUMBER

null

𑿓 Fraction Three Quarters

U+11FD4

Number

NUMBER

null

𑿔 Fraction Downscaling Factor Kiizh

U+11FD5

Symbol

SYMBOL

null

𑿕 Sign Nel

U+11FD6

Symbol

SYMBOL

null

𑿖 Sign Cevitu

U+11FD7

Symbol

SYMBOL

null

𑿗 Sign Aazhaakku

U+11FD8

Symbol

SYMBOL

null

𑿘 Sign Uzhakku

U+11FD9

Symbol

SYMBOL

null

𑿙 Sign Muuvuzhakku

U+11FDA

Symbol

SYMBOL

null

𑿚 Sign Kuruni

U+11FDB

Symbol

SYMBOL

null

𑿛 Sign Pathakku

U+11FDC

Symbol

SYMBOL

null

𑿜 Sign Mukkuruni

U+11FDD

Symbol

SYMBOL

null

𑿝 Sign Kaacu

U+11FDE

Symbol

SYMBOL

null

𑿞 Sign Panam

U+11FDF

Symbol

SYMBOL

null

𑿟 Sign Pon

U+11FE0

Symbol

SYMBOL

null

𑿠 Sign Varaakan

U+11FE1

Symbol

SYMBOL

null

𑿡 Sign Paaram

U+11FE2

Symbol

SYMBOL

null

𑿢 Sign Kuzhi

U+11FE3

Symbol

SYMBOL

null

𑿣 Sign Veli

U+11FE4

Symbol

SYMBOL

null

𑿤 Wet Cultivation Sign

U+11FE5

Symbol

SYMBOL

null

𑿥 Dry Cultivation Sign

U+11FE6

Symbol

SYMBOL

null

𑿦 Land Sign

U+11FE7

Symbol

SYMBOL

null

𑿧 Salt Pan Sign

U+11FE8

Symbol

SYMBOL

null

𑿨 Traditional Credit Sign

U+11FE9

Symbol

SYMBOL

null

𑿩 Traditional Number Sign

U+11FEA

Symbol

SYMBOL

null

𑿪 Current Sign

U+11FEB

Symbol

SYMBOL

null

𑿫 And Odd Sign

U+11FEC

Symbol

SYMBOL

null

𑿬 Spent Sign

U+11FED

Symbol

SYMBOL

null

𑿭 Total Sign

U+11FEE

Symbol

SYMBOL

null

𑿮 In Possession Sign

U+11FEF

Symbol

SYMBOL

null

𑿯 Starting From Sign

U+11FF0

Symbol

SYMBOL

null

𑿰 Sign Muthaliya

U+11FF1

Symbol

SYMBOL

null

𑿱 Sign Vakaiyaraa

U+11FF2

unassigned

U+11FF3

unassigned

U+11FF4

unassigned

U+11FF5

unassigned

U+11FF6

unassigned

U+11FF7

unassigned

U+11FF8

unassigned

U+11FF9

unassigned

U+11FFA

unassigned

U+11FFB

unassigned

U+11FFC

unassigned

U+11FFD

unassigned

U+11FFE

unassigned

U+11FFF

Punctuation

null

null

𑿿 End Of Text

Grantha marks character table

Tamil text runs may also include diacritical and syllable-modifier marks from the Grantha block. These characters should be classified as follows.

Table 73 Grantha marks character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Glyph

U+11301

Mark [Mn]

BINDU

TOP_POSITION

𑌁 Grantha Candrabindu

U+11303

Mark [Mc]

VISARGA

RIGHT_POSITION

𑌃 Grantha Visarga

U+1133B

Mark [Mn]

NUKTA

BOTTOM_POSITION

𑌻 Combining Bindu Below

U+1133C

Mark [Mn]

NUKTA

BOTTOM_POSITION

𑌼 Grantha Nukta

Vedic Extensions character table

Sanskrit runs written in the Tamil script may also include characters from the Vedic Extensions block. These characters should be classified as follows.

Note: See the Vedic Extensions document for additional information.

Table 74 Vedic Extensions character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Glyph

U+1CD0

Mark [Mn]

CANTILLATION

TOP_POSITION

᳐ Tone Karshana

U+1CD1

Mark [Mn]

CANTILLATION

TOP_POSITION

᳑ Tone Shara

U+1CD2

Mark [Mn]

CANTILLATION

TOP_POSITION

᳒ Tone Prenkha

U+1CD3

Punctuation

null

null

᳓ Sign Nihshvasa

U+1CD4

Mark [Mn]

CANTILLATION

OVERSTRUCK

᳔ Tone Midline Svarita

U+1CD5

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳕ Tone Aggravated Independent Svarita

U+1CD6

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳖ Tone Independent Svarita

U+1CD7

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳗ Tone Kathaka Independent Svarita

U+1CD8

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳘ Tone Candra Below

U+1CD9

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳙ Tone Kathaka Independent Svarita Schroeder

U+1CDA

Mark [Mn]

CANTILLATION

TOP_POSITION

᳚ Tone Double Svarita

U+1CDB

Mark [Mn]

CANTILLATION

TOP_POSITION

᳛ Tone Triple Svarita

U+1CDC

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳜ Tone Kathaka Anudatta

U+1CDD

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳝ Tone Dot Below

U+1CDE

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳞ Tone Two Dots Below

U+1CDF

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳟ Tone Three Dots Below

U+1CE0

Mark [Mn]

CANTILLATION

TOP_POSITION

᳠ Tone Rigvedic Kashmiri Independent Svarita

U+1CE1

Mark [Mc]

CANTILLATION

RIGHT_POSITION

᳡ Tone Atharavedic Independent Svarita

U+1CE2

Mark [Mn]

AVAGRAHA

OVERSTRUCK

᳢ Sign Visarga Svarita

U+1CE3

Mark [Mn]

null

OVERSTRUCK

᳣ Sign Visarga Udatta

U+1CE4

Mark [Mn]

null

OVERSTRUCK

᳤ Sign Reversed Visarga Udatta

U+1CE5

Mark [Mn]

null

OVERSTRUCK

᳥ Sign Visarga Anudatta

U+1CE6

Mark [Mn]

null

OVERSTRUCK

᳦ Sign Reversed Visarga Anudatta

U+1CE7

Mark [Mn]

null

OVERSTRUCK

᳧ Sign Visarga Udatta With Tail

U+1CE8

Mark [Mn]

AVAGRAHA

OVERSTRUCK

᳨ Sign Visarga Anudatta With Tail

U+1CE9

Letter

SYMBOL

null

ᳩ Sign Anusvara Antargomukha

U+1CEA

Letter

null

null

ᳪ Sign Anusvara Bahirgomukha

U+1CEB

Letter

null

null

ᳫ Sign Anusvara Vamagomukha

U+1CEC

Letter

SYMBOL

null

ᳬ Sign Anusvara Vamagomukha With Tail

U+1CED

Mark [Mn]

AVAGRAHA

BOTTOM_POSITION

᳭ Sign Tiryak

U+1CEE

Letter

SYMBOL

null

ᳮ Sign Hexiform Long Anusvara

U+1CEF

Letter

null

null

ᳯ Sign Long Anusvara

U+1CF0

Letter

null

null

ᳰ Sign Rthang Long Anusvara

U+1CF2

Letter

CONSONANT_DEAD

null

ᳲ Sign Ardhavisarga

U+1CF3

Letter

CONSONANT_DEAD

null

ᳳ Sign Rotated Ardhavisarga

U+1CF3

Mark [Mc]

VISARGA

null

ᳳ Sign Rotated Ardhavisarga

U+1CF4

Mark [Mn]

CANTILLATION

TOP_POSITION

᳴ Tone Candra Above

U+1CF5

Letter

CONSONANT_WITH_STACKER

null

ᳵ Sign Jihvamuliya

U+1CF6

Letter

CONSONANT_WITH_STACKER

null

ᳶ Sign Upadhmaniya

U+1CF7

Mark [Mc]

null

null

᳷ Sign Atikrama

U+1CF8

Mark [Mn]

CANTILLATION

null

᳸ Tone Ring Above

U+1CF9

Mark [Mn]

CANTILLATION

null

᳹ Tone Double Ring Above

U+1CFA

Letter

PLACEHOLDER

null

ᳺ Sign Double Anusvara Antargomukha

U+1CFB

unassigned

U+1CFC

unassigned

U+1CFD

unassigned

U+1CFE

unassigned

U+1CFF

unassigned

Miscellaneous character table

In addition to general punctuation, runs of Tamil text often use the danda (U+0964) and double danda (U+0965) punctuation marks from the Devanagari block. Tamil text can also incorporate the udatta (U+0951) and anudatta (U+0952) signs from the Devanagari block.

Table 75 Additional punctuation character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Glyph

U+0951

Mark [Mn]

CANTILLATION

TOP_POSITION

॑ Udatta

U+0952

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

॒ Anudatta

U+0964

Punctuation

null

null

। Danda

U+0965

Punctuation

null

null

॥ Double Danda

Other important characters that may be encountered when shaping runs of Tamil text include the dotted-circle placeholder (U+25CC), the zero-width joiner (U+200D) and zero-width non-joiner (U+200C), and the no-break space (U+00A0).

The dotted-circle placeholder is frequently used when displaying a dependent vowel (matra) or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.

Table 76 Miscellaneous character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Glyph

U+00A0

Separator

PLACEHOLDER

null

  No-break space

U+00B2

Number

SYLLABLE_MODIFIER

TOP

² Superscript Two

U+00B3

Number

SYLLABLE_MODIFIER

TOP

³ Superscript Three

U+200C

Other

NON_JOINER

null

‌ Zero-width non-joiner

U+200D

Other

JOINER

null

‍ Zero-width joiner

U+2010

Punctuation

PLACEHOLDER

null

‐ Hyphen

U+2011

Punctuation

PLACEHOLDER

null

‑ No-break hyphen

U+2012

Punctuation

PLACEHOLDER

null

‒ Figure dash

U+2013

Punctuation

PLACEHOLDER

null

– En dash

U+2014

Punctuation

PLACEHOLDER

null

— Em dash

U+2074

Number

SYLLABLE_MODIFIER

TOP

⁴ Superscript Four

U+2082

Number

SYLLABLE_MODIFIER

TOP

₂ Subscript Two

U+2083

Number

SYLLABLE_MODIFIER

TOP

₃ Subscript Three

U+2084

Number

SYLLABLE_MODIFIER

TOP

₄ Subscript Four

U+25CC

Symbol

DOTTED_CIRCLE

null

◌ Dotted circle

The zero-width joiner (ZWJ) is primarily used to prevent the formation of a conjunct from a “Consonant,Halant,Consonant” sequence. The sequence “Consonant,Halant,ZWJ,Consonant” blocks the formation of a conjunct between the two consonants.

Note, however, that the “Consonant,Halant” subsequence in the above example may still trigger a half-forms feature. To prevent the application of the half-forms feature in addition to preventing the conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The sequence “Consonant,Halant,ZWNJ,Consonant” should produce the first consonant in its standard form, followed by an explicit “Halant”.

A secondary usage of the zero-width joiner is to prevent the formation of “Reph”. An initial “Ra,Halant,ZWJ” sequence should not produce a “Reph”, where an initial “Ra,Halant” sequence without the zero-width joiner otherwise would.

The no-break space (NBSP) is primarily used to display those codepoints that are defined as non-spacing (marks, dependent vowels (matras), below-base consonant forms, and post-base consonant forms) in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder. These sequences will match “NBSP,ZWJ,Halant,Consonant”, “NBSP,mark”, or “NBSP,matra”.

Tamil text sometimes uses the Latin numerals 2, 3, and 4 in superscript or subscript positions to annotate Sanskrit. When used in this fashion, the superscripts and subscripts are treated as SYLLABLE_MODIFIER signs for shaping purposes.