# Tamil character tables #
This document lists the per-character shaping information needed to
[shape Tamil text](../opentype-shaping-tamil.md).
**Contents**
- [Tamil character table](#tamil-character-table)
- [Tamil Supplement character table](#tamil-supplement-character-table)
- [Grantha marks character table](#grantha-marks-character-table)
- [Vedic Extensions character table](#vedic-extensions-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
## Tamil character table ##
Tamil glyphs should be classified as in the following
table. Codepoints in the Tamil block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine. Note that
this does include some valid codepoints, such as currency marks,
punctuation, and other symbols.
> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important
> during syllable identification, but generally evoke no further
> special behavior during the rest of the shaping process.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the following table use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
:::{table} Tamil character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+0B80` | _unassigned_ | | | |
|`U+0B81` | _unassigned_ | | | |
|`U+0B82` | Mark [Mn] | BINDU | TOP_POSITION | ஂ Anusvara |
|`U+0B83` | Letter | MODIFYING_LETTER | _null_ | ஃ Visarga |
|`U+0B84` | _unassigned_ | | | |
|`U+0B85` | Letter | VOWEL_INDEPENDENT | _null_ | அ A |
|`U+0B86` | Letter | VOWEL_INDEPENDENT | _null_ | ஆ Aa |
|`U+0B87` | Letter | VOWEL_INDEPENDENT | _null_ | இ I |
|`U+0B88` | Letter | VOWEL_INDEPENDENT | _null_ | ஈ Ii |
|`U+0B89` | Letter | VOWEL_INDEPENDENT | _null_ | உ U |
|`U+0B8A` | Letter | VOWEL_INDEPENDENT | _null_ | ஊ Uu |
|`U+0B8B` | _unassigned_ | | | |
|`U+0B8C` | _unassigned_ | | | |
|`U+0B8D` | _unassigned_ | | | |
|`U+0B8E` | Letter | VOWEL_INDEPENDENT | _null_ | எ E |
|`U+0B8F` | Letter | VOWEL_INDEPENDENT | _null_ | ஏ Ee |
| | | | |
|`U+0B90` | Letter | VOWEL_INDEPENDENT | _null_ | ஐ Ai |
|`U+0B91` | _unassigned_ | | | |
|`U+0B92` | Letter | VOWEL_INDEPENDENT | _null_ | ஒ O |
|`U+0B93` | Letter | VOWEL_INDEPENDENT | _null_ | ஓ Oo |
|`U+0B94` | Letter | VOWEL_INDEPENDENT | _null_ | ஔ Au |
|`U+0B95` | Letter | CONSONANT | _null_ | க Ka |
|`U+0B96` | _unassigned_ | | | |
|`U+0B97` | _unassigned_ | | | |
|`U+0B98` | _unassigned_ | | | |
|`U+0B99` | Letter | CONSONANT | _null_ | ங Nga |
|`U+0B9A` | Letter | CONSONANT | _null_ | ச Ca |
|`U+0B9B` | _unassigned_ | | | |
|`U+0B9C` | Letter | CONSONANT | _null_ | ஜ Ja |
|`U+0B9D` | _unassigned_ | | | |
|`U+0B9E` | Letter | CONSONANT | _null_ | ஞ Nya |
|`U+0B9F` | Letter | CONSONANT | _null_ | ட Tta |
| | | | |
|`U+0BA0` | _unassigned_ | | | |
|`U+0BA1` | _unassigned_ | | | |
|`U+0BA2` | _unassigned_ | | | |
|`U+0BA3` | Letter | CONSONANT | _null_ | ண Nna |
|`U+0BA4` | Letter | CONSONANT | _null_ | த Ta |
|`U+0BA5` | _unassigned_ | | | |
|`U+0BA6` | _unassigned_ | | | |
|`U+0BA7` | _unassigned_ | | | |
|`U+0BA8` | Letter | CONSONANT | _null_ | ந Na |
|`U+0BA9` | Letter | CONSONANT | _null_ | ன Nnna |
|`U+0BAA` | Letter | CONSONANT | _null_ | ப Pa |
|`U+0BAB` | _unassigned_ | | | |
|`U+0BAC` | _unassigned_ | | | |
|`U+0BAD` | _unassigned_ | | | |
|`U+0BAE` | Letter | CONSONANT | _null_ | ம Ma |
|`U+0BAF` | Letter | CONSONANT | _null_ | ய Ya |
| | | | |
|`U+0BB0` | Letter | CONSONANT | _null_ | ர Ra |
|`U+0BB1` | Letter | CONSONANT | _null_ | ற Rra |
|`U+0BB2` | Letter | CONSONANT | _null_ | ல La |
|`U+0BB3` | Letter | CONSONANT | _null_ | ள Lla |
|`U+0BB4` | Letter | CONSONANT | _null_ | ழ Llla |
|`U+0BB5` | Letter | CONSONANT | _null_ | வ Va |
|`U+0BB6` | Letter | CONSONANT | _null_ | ஶ Sha |
|`U+0BB7` | Letter | CONSONANT | _null_ | ஷ Ssa |
|`U+0BB8` | Letter | CONSONANT | _null_ | ஸ Sa |
|`U+0BB9` | Letter | CONSONANT | _null_ | ஹ Ha |
|`U+0BBA` | _unassigned_ | | | |
|`U+0BBB` | _unassigned_ | | | |
|`U+0BBC` | _unassigned_ | | | |
|`U+0BBD` | _unassigned_ | | | |
|`U+0BBE` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ா Sign Aa |
|`U+0BBF` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ி Sign I |
| | | | |
|`U+0BC0` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ீ Sign Ii |
|`U+0BC1` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ு Sign U |
|`U+0BC2` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ூ Sign Uu |
|`U+0BC3` | _unassigned_ | | | |
|`U+0BC4` | _unassigned_ | | | |
|`U+0BC5` | _unassigned_ | | | |
|`U+0BC6` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ெ Sign E |
|`U+0BC7` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ே Sign Ee |
|`U+0BC8` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ை Sign Ai |
|`U+0BC9` | _unassigned_ | | | |
|`U+0BCA` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ொ Sign O |
|`U+0BCB` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ோ Sign Oo |
|`U+0BCC` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ௌ Sign Au |
|`U+0BCD` | Mark [Mn] | VIRAMA | TOP_POSITION | ் Virama |
|`U+0BCE` | _unassigned_ | | | |
|`U+0BCF` | _unassigned_ | | | |
| | | | |
|`U+0BD0` | Letter | _null_ | _null_ | ௐ Om |
|`U+0BD1` | _unassigned_ | | | |
|`U+0BD2` | _unassigned_ | | | |
|`U+0BD3` | _unassigned_ | | | |
|`U+0BD4` | _unassigned_ | | | |
|`U+0BD5` | _unassigned_ | | | |
|`U+0BD6` | _unassigned_ | | | |
|`U+0BD7` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ௗ Au Length Mark |
|`U+0BD8` | _unassigned_ | | | |
|`U+0BD9` | _unassigned_ | | | |
|`U+0BDA` | _unassigned_ | | | |
|`U+0BDB` | _unassigned_ | | | |
|`U+0BDC` | _unassigned_ | | | |
|`U+0BDD` | _unassigned_ | | | |
|`U+0BDE` | _unassigned_ | | | |
|`U+0BDF` | _unassigned_ | | | |
| | | | |
|`U+0BE0` | _unassigned_ | | | |
|`U+0BE1` | _unassigned_ | | | |
|`U+0BE2` | _unassigned_ | | | |
|`U+0BE3` | _unassigned_ | | | |
|`U+0BE4` | _unassigned_ | | | |
|`U+0BE5` | _unassigned_ | | | |
|`U+0BE6` | Number | NUMBER | _null_ | ௦ Digit Zero |
|`U+0BE7` | Number | NUMBER | _null_ | ௧ Digit One |
|`U+0BE8` | Number | NUMBER | _null_ | ௨ Digit Two |
|`U+0BE9` | Number | NUMBER | _null_ | ௩ Digit Three |
|`U+0BEA` | Number | NUMBER | _null_ | ௪ Digit Four |
|`U+0BEB` | Number | NUMBER | _null_ | ௫ Digit Five |
|`U+0BEC` | Number | NUMBER | _null_ | ௬ Digit Six |
|`U+0BED` | Number | NUMBER | _null_ | ௭ Digit Seven |
|`U+0BEE` | Number | NUMBER | _null_ | ௮ Digit Eight |
|`U+0BEF` | Number | NUMBER | _null_ | ௯ Digit Nine |
| | | | |
|`U+0BF0` | Number | NUMBER | _null_ | ௰ Number Ten |
|`U+0BF1` | Number | NUMBER | _null_ | ௱ Number One Hundred |
|`U+0BF2` | Number | NUMBER | _null_ | ௲ Number One Thousand |
|`U+0BF3` | Symbol | SYMBOL | _null_ | ௳ Day Sign |
|`U+0BF4` | Symbol | SYMBOL | _null_ | ௴ Month Sign |
|`U+0BF5` | Symbol | SYMBOL | _null_ | ௵ Year Sign |
|`U+0BF6` | Symbol | SYMBOL | _null_ | ௶ Debit Sign |
|`U+0BF7` | Symbol | SYMBOL | _null_ | ௷ Credit Sign |
|`U+0BF8` | Symbol | SYMBOL | _null_ | ௸ As Above Sign |
|`U+0BF9` | Symbol | SYMBOL | _null_ | ௹ Tamil Rupee Sign |
|`U+0BFA` | Symbol | SYMBOL | _null_ | ௺ Number Sign |
|`U+0BFB` | _unassigned_ | | | |
|`U+0BFC` | _unassigned_ | | | |
|`U+0BFD` | _unassigned_ | | | |
|`U+0BFE` | _unassigned_ | | | |
|`U+0BFF` | _unassigned_ | | | |
:::
## Tamil Supplement character table ##
Tamil text runs may also include historical symbols and fractions from
the Tamil Supplement block. These characters should be classified as
follows.
:::{table} Tamil Supplement character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:--------------|:------------------------|:------------------------------|
| `U+11FC0` | Number | NUMBER | _null_ | 𑿀 Fraction One Three-Hundred-And-Twentieth |
| `U+11FC1` | Number | NUMBER | _null_ | 𑿁 Fraction One One-Hundred-And-Sixtieth |
| `U+11FC2` | Number | NUMBER | _null_ | 𑿂 Fraction One Eightieth |
| `U+11FC3` | Number | NUMBER | _null_ | 𑿃 Fraction One Sixty-Fourth |
| `U+11FC4` | Number | NUMBER | _null_ | 𑿄 Fraction One Fortieth |
| `U+11FC5` | Number | NUMBER | _null_ | 𑿅 Fraction One Thirty-Second |
| `U+11FC6` | Number | NUMBER | _null_ | 𑿆 Fraction Three Eightieths |
| `U+11FC7` | Number | NUMBER | _null_ | 𑿇 Fraction Three Sixty-Fourths |
| `U+11FC8` | Number | NUMBER | _null_ | 𑿈 Fraction One Twentieth |
| `U+11FC9` | Number | NUMBER | _null_ | 𑿉 Fraction One Sixteenth-1 |
| `U+11FCA` | Number | NUMBER | _null_ | 𑿊 Fraction One Sixteenth-2 |
| `U+11FCB` | Number | NUMBER | _null_ | 𑿋 Fraction One Tenth |
| `U+11FCC` | Number | NUMBER | _null_ | 𑿌 Fraction One Eighth |
| `U+11FCD` | Number | NUMBER | _null_ | 𑿍 Fraction Three Twentieths |
| `U+11FCE` | Number | NUMBER | _null_ | 𑿎 Fraction Three Sixteenths |
| `U+11FCF` | Number | NUMBER | _null_ | 𑿏 Fraction One Fifth |
| | | | |
| `U+11FD0` | Number | NUMBER | _null_ | 𑿐 Fraction One Quarter |
| `U+11FD1` | Number | NUMBER | _null_ | 𑿑 Fraction One Half-1 |
| `U+11FD2` | Number | NUMBER | _null_ | 𑿒 Fraction One Half-2 |
| `U+11FD3` | Number | NUMBER | _null_ | 𑿓 Fraction Three Quarters |
| `U+11FD4` | Number | NUMBER | _null_ | 𑿔 Fraction Downscaling Factor Kiizh |
| `U+11FD5` | Symbol | SYMBOL | _null_ | 𑿕 Sign Nel |
| `U+11FD6` | Symbol | SYMBOL | _null_ | 𑿖 Sign Cevitu |
| `U+11FD7` | Symbol | SYMBOL | _null_ | 𑿗 Sign Aazhaakku |
| `U+11FD8` | Symbol | SYMBOL | _null_ | 𑿘 Sign Uzhakku |
| `U+11FD9` | Symbol | SYMBOL | _null_ | 𑿙 Sign Muuvuzhakku |
| `U+11FDA` | Symbol | SYMBOL | _null_ | 𑿚 Sign Kuruni |
| `U+11FDB` | Symbol | SYMBOL | _null_ | 𑿛 Sign Pathakku |
| `U+11FDC` | Symbol | SYMBOL | _null_ | 𑿜 Sign Mukkuruni |
| `U+11FDD` | Symbol | SYMBOL | _null_ | 𑿝 Sign Kaacu |
| `U+11FDE` | Symbol | SYMBOL | _null_ | 𑿞 Sign Panam |
| `U+11FDF` | Symbol | SYMBOL | _null_ | 𑿟 Sign Pon |
| | | | |
| `U+11FE0` | Symbol | SYMBOL | _null_ | 𑿠 Sign Varaakan |
| `U+11FE1` | Symbol | SYMBOL | _null_ | 𑿡 Sign Paaram |
| `U+11FE2` | Symbol | SYMBOL | _null_ | 𑿢 Sign Kuzhi |
| `U+11FE3` | Symbol | SYMBOL | _null_ | 𑿣 Sign Veli |
| `U+11FE4` | Symbol | SYMBOL | _null_ | 𑿤 Wet Cultivation Sign |
| `U+11FE5` | Symbol | SYMBOL | _null_ | 𑿥 Dry Cultivation Sign |
| `U+11FE6` | Symbol | SYMBOL | _null_ | 𑿦 Land Sign |
| `U+11FE7` | Symbol | SYMBOL | _null_ | 𑿧 Salt Pan Sign |
| `U+11FE8` | Symbol | SYMBOL | _null_ | 𑿨 Traditional Credit Sign |
| `U+11FE9` | Symbol | SYMBOL | _null_ | 𑿩 Traditional Number Sign |
| `U+11FEA` | Symbol | SYMBOL | _null_ | 𑿪 Current Sign |
| `U+11FEB` | Symbol | SYMBOL | _null_ | 𑿫 And Odd Sign |
| `U+11FEC` | Symbol | SYMBOL | _null_ | 𑿬 Spent Sign |
| `U+11FED` | Symbol | SYMBOL | _null_ | 𑿭 Total Sign |
| `U+11FEE` | Symbol | SYMBOL | _null_ | 𑿮 In Possession Sign |
| `U+11FEF` | Symbol | SYMBOL | _null_ | 𑿯 Starting From Sign |
| | | | |
| `U+11FF0` | Symbol | SYMBOL | _null_ | 𑿰 Sign Muthaliya |
| `U+11FF1` | Symbol | SYMBOL | _null_ | 𑿱 Sign Vakaiyaraa |
| `U+11FF2` | _unassigned_ | | | |
| `U+11FF3` | _unassigned_ | | | |
| `U+11FF4` | _unassigned_ | | | |
| `U+11FF5` | _unassigned_ | | | |
| `U+11FF6` | _unassigned_ | | | |
| `U+11FF7` | _unassigned_ | | | |
| `U+11FF8` | _unassigned_ | | | |
| `U+11FF9` | _unassigned_ | | | |
| `U+11FFA` | _unassigned_ | | | |
| `U+11FFB` | _unassigned_ | | | |
| `U+11FFC` | _unassigned_ | | | |
| `U+11FFD` | _unassigned_ | | | |
| `U+11FFE` | _unassigned_ | | | |
| `U+11FFF` | Punctuation | _null_ | _null_ | 𑿿 End Of Text |
:::
## Grantha marks character table ##
Tamil text runs may also include diacritical and syllable-modifier
marks from the Grantha block. These characters should be classified as
follows.
:::{table} Grantha marks character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+11301` | Mark [Mn] | BINDU | TOP_POSITION | 𑌁 Grantha Candrabindu|
|`U+11303` | Mark [Mc] | VISARGA | RIGHT_POSITION | 𑌃 Grantha Visarga |
|`U+1133B` | Mark [Mn] | NUKTA | BOTTOM_POSITION | 𑌻 Combining Bindu Below |
|`U+1133C` | Mark [Mn] | NUKTA | BOTTOM_POSITION | 𑌼 Grantha Nukta |
:::
## Vedic Extensions character table ##
Sanskrit runs written in the Tamil script may also include
characters from the Vedic Extensions block. These characters should be
classified as follows.
> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md)
> document for additional information.
:::{table} Vedic Extensions character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+1CD0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳐ Tone Karshana |
|`U+1CD1` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳑ Tone Shara |
|`U+1CD2` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳒ Tone Prenkha |
|`U+1CD3` | Punctuation | _null_ | _null_ | ᳓ Sign Nihshvasa |
|`U+1CD4` | Mark [Mn] | CANTILLATION | OVERSTRUCK | ᳔ Tone Midline Svarita |
|`U+1CD5` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳕ Tone Aggravated Independent Svarita |
|`U+1CD6` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳖ Tone Independent Svarita |
|`U+1CD7` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳗ Tone Kathaka Independent Svarita |
|`U+1CD8` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳘ Tone Candra Below |
|`U+1CD9` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳙ Tone Kathaka Independent Svarita Schroeder |
|`U+1CDA` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳚ Tone Double Svarita |
|`U+1CDB` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳛ Tone Triple Svarita |
|`U+1CDC` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳜ Tone Kathaka Anudatta |
|`U+1CDD` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳝ Tone Dot Below |
|`U+1CDE` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳞ Tone Two Dots Below |
|`U+1CDF` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳟ Tone Three Dots Below |
| | | | |
|`U+1CE0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳠ Tone Rigvedic Kashmiri Independent Svarita |
|`U+1CE1` | Mark [Mc] | CANTILLATION | RIGHT_POSITION | ᳡ Tone Atharavedic Independent Svarita |
|`U+1CE2` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳢ Sign Visarga Svarita |
|`U+1CE3` | Mark [Mn] | _null_ | OVERSTRUCK | ᳣ Sign Visarga Udatta |
|`U+1CE4` | Mark [Mn] | _null_ | OVERSTRUCK | ᳤ Sign Reversed Visarga Udatta |
|`U+1CE5` | Mark [Mn] | _null_ | OVERSTRUCK | ᳥ Sign Visarga Anudatta |
|`U+1CE6` | Mark [Mn] | _null_ | OVERSTRUCK | ᳦ Sign Reversed Visarga Anudatta |
|`U+1CE7` | Mark [Mn] | _null_ | OVERSTRUCK | ᳧ Sign Visarga Udatta With Tail |
|`U+1CE8` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳨ Sign Visarga Anudatta With Tail |
|`U+1CE9` | Letter | SYMBOL | _null_ | ᳩ Sign Anusvara Antargomukha |
|`U+1CEA` | Letter | _null_ | _null_ | ᳪ Sign Anusvara Bahirgomukha |
|`U+1CEB` | Letter | _null_ | _null_ | ᳫ Sign Anusvara Vamagomukha |
|`U+1CEC` | Letter | SYMBOL | _null_ | ᳬ Sign Anusvara Vamagomukha With Tail |
|`U+1CED` | Mark [Mn] | AVAGRAHA | BOTTOM_POSITION | ᳭ Sign Tiryak |
|`U+1CEE` | Letter | SYMBOL | _null_ | ᳮ Sign Hexiform Long Anusvara |
|`U+1CEF` | Letter | _null_ | _null_ | ᳯ Sign Long Anusvara |
| | | | |
|`U+1CF0` | Letter | _null_ | _null_ | ᳰ Sign Rthang Long Anusvara |
|`U+1CF2` | Letter | CONSONANT_DEAD | _null_ | ᳲ Sign Ardhavisarga |
|`U+1CF3` | Letter | CONSONANT_DEAD | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF3` | Mark [Mc] | VISARGA | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF4` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳴ Tone Candra Above |
|`U+1CF5` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳵ Sign Jihvamuliya |
|`U+1CF6` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳶ Sign Upadhmaniya |
|`U+1CF7` | Mark [Mc] | _null_ | _null_ | ᳷ Sign Atikrama |
|`U+1CF8` | Mark [Mn] | CANTILLATION | _null_ | ᳸ Tone Ring Above |
|`U+1CF9` | Mark [Mn] | CANTILLATION | _null_ | ᳹ Tone Double Ring Above |
|`U+1CFA` | Letter | PLACEHOLDER | _null_ | ᳺ Sign Double Anusvara Antargomukha |
|`U+1CFB` | _unassigned_ | | | |
|`U+1CFC` | _unassigned_ | | | |
|`U+1CFD` | _unassigned_ | | | |
|`U+1CFE` | _unassigned_ | | | |
|`U+1CFF` | _unassigned_ | | | |
:::
## Miscellaneous character table ##
In addition to general punctuation, runs of Tamil text often use the
danda (`U+0964`) and double danda (`U+0965`) punctuation marks from
the Devanagari block. Tamil text can also incorporate the udatta
(`U+0951`) and anudatta (`U+0952`) signs from the Devanagari block.
:::{table} Additional punctuation character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+0951` | Mark [Mn] | CANTILLATION | TOP_POSITION | ॑ Udatta |
|`U+0952` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ॒ Anudatta |
|`U+0964` | Punctuation | _null_ | _null_ | । Danda |
|`U+0965` | Punctuation | _null_ | _null_ | ॥ Double Danda |
:::
Other important characters that may be encountered when shaping runs
of Tamil text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
dependent vowel (matra) or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+00A0` | Separator | PLACEHOLDER | _null_ | No-break space |
|`U+00B2` | Number | SYLLABLE_MODIFIER | TOP | ² Superscript Two |
|`U+00B3` | Number | SYLLABLE_MODIFIER | TOP | ³ Superscript Three |
|`U+200C` | Other | NON_JOINER | _null_ | Zero-width non-joiner |
|`U+200D` | Other | JOINER | _null_ | Zero-width joiner |
|`U+2010` | Punctuation | PLACEHOLDER | _null_ | ‐ Hyphen |
|`U+2011` | Punctuation | PLACEHOLDER | _null_ | ‑ No-break hyphen |
|`U+2012` | Punctuation | PLACEHOLDER | _null_ | ‒ Figure dash |
|`U+2013` | Punctuation | PLACEHOLDER | _null_ | – En dash |
|`U+2014` | Punctuation | PLACEHOLDER | _null_ | — Em dash |
|`U+2074` | Number | SYLLABLE_MODIFIER | TOP | ⁴ Superscript Four |
|`U+2082` | Number | SYLLABLE_MODIFIER | TOP | ₂ Subscript Two |
|`U+2083` | Number | SYLLABLE_MODIFIER | TOP | ₃ Subscript Three |
|`U+2084` | Number | SYLLABLE_MODIFIER | TOP | ₄ Subscript Four |
|`U+25CC` | Symbol | DOTTED_CIRCLE | _null_ | ◌ Dotted circle |
:::
The zero-width joiner (ZWJ) is primarily used to prevent the formation
of a conjunct from a "_Consonant_,Halant,_Consonant_" sequence. The
sequence "_Consonant_,Halant,ZWJ,_Consonant_" blocks the formation of
a conjunct between the two consonants.
Note, however, that the "_Consonant_,Halant" subsequence in the above
example may still trigger a half-forms feature. To prevent the
application of the half-forms feature in addition to preventing the
conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The
sequence "_Consonant_,Halant,ZWNJ,_Consonant_" should produce the
first consonant in its standard form, followed by an explicit
"Halant".
A secondary usage of the zero-width joiner is to prevent the formation of
"Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph",
where an initial "Ra,Halant" sequence without the zero-width joiner
otherwise would.
The no-break space (NBSP) is primarily used to display those
codepoints that are defined as non-spacing (marks, dependent vowels
(matras), below-base consonant forms, and post-base consonant forms)
in an isolated context, as an alternative to displaying them
superimposed on the dotted-circle placeholder. These sequences will
match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or "NBSP,_matra_".
Tamil text sometimes uses the Latin numerals 2, 3, and 4 in
superscript or subscript positions to annotate Sanskrit. When used in
this fashion, the superscripts and subscripts are treated as
`SYLLABLE_MODIFIER` signs for shaping purposes.