# Tamil character tables # This document lists the per-character shaping information needed to [shape Tamil text](../opentype-shaping-tamil.md). **Contents** - [Tamil character table](#tamil-character-table) - [Tamil Supplement character table](#tamil-supplement-character-table) - [Grantha marks character table](#grantha-marks-character-table) - [Vedic Extensions character table](#vedic-extensions-character-table) - [Miscellaneous character table](#miscellaneous-character-table) ## Tamil character table ## Tamil glyphs should be classified as in the following table. Codepoints in the Tamil block with no assigned meaning are designated as _unassigned_ in the _Unicode category_ column. Assigned codepoints with a _null_ in the _Shaping class_ column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols. > Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important > during syllable identification, but generally evoke no further > special behavior during the rest of the shaping process. The _Mark-placement subclass_ column indicates mark-placement positioning for codepoints in the _Mark_ category. Assigned, non-mark codepoints have a _null_ in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the _Unicode category_ column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining. Some codepoints in the following table use a _Shaping class_ that differs from the codepoint's Unicode _General Category_. The _Shaping class_ takes precedence during OpenType shaping, as it captures more specific, script-aware behavior. :::{table} Tamil character table | Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph | |:----------|:-----------------|:------------------|:---------------------------|:-----------------------------| |`U+0B80` | _unassigned_ | | | | |`U+0B81` | _unassigned_ | | | | |`U+0B82` | Mark [Mn] | BINDU | TOP_POSITION | ஂ Anusvara | |`U+0B83` | Letter | MODIFYING_LETTER | _null_ | ஃ Visarga | |`U+0B84` | _unassigned_ | | | | |`U+0B85` | Letter | VOWEL_INDEPENDENT | _null_ | அ A | |`U+0B86` | Letter | VOWEL_INDEPENDENT | _null_ | ஆ Aa | |`U+0B87` | Letter | VOWEL_INDEPENDENT | _null_ | இ I | |`U+0B88` | Letter | VOWEL_INDEPENDENT | _null_ | ஈ Ii | |`U+0B89` | Letter | VOWEL_INDEPENDENT | _null_ | உ U | |`U+0B8A` | Letter | VOWEL_INDEPENDENT | _null_ | ஊ Uu | |`U+0B8B` | _unassigned_ | | | | |`U+0B8C` | _unassigned_ | | | | |`U+0B8D` | _unassigned_ | | | | |`U+0B8E` | Letter | VOWEL_INDEPENDENT | _null_ | எ E | |`U+0B8F` | Letter | VOWEL_INDEPENDENT | _null_ | ஏ Ee | | | | | | |`U+0B90` | Letter | VOWEL_INDEPENDENT | _null_ | ஐ Ai | |`U+0B91` | _unassigned_ | | | | |`U+0B92` | Letter | VOWEL_INDEPENDENT | _null_ | ஒ O | |`U+0B93` | Letter | VOWEL_INDEPENDENT | _null_ | ஓ Oo | |`U+0B94` | Letter | VOWEL_INDEPENDENT | _null_ | ஔ Au | |`U+0B95` | Letter | CONSONANT | _null_ | க Ka | |`U+0B96` | _unassigned_ | | | | |`U+0B97` | _unassigned_ | | | | |`U+0B98` | _unassigned_ | | | | |`U+0B99` | Letter | CONSONANT | _null_ | ங Nga | |`U+0B9A` | Letter | CONSONANT | _null_ | ச Ca | |`U+0B9B` | _unassigned_ | | | | |`U+0B9C` | Letter | CONSONANT | _null_ | ஜ Ja | |`U+0B9D` | _unassigned_ | | | | |`U+0B9E` | Letter | CONSONANT | _null_ | ஞ Nya | |`U+0B9F` | Letter | CONSONANT | _null_ | ட Tta | | | | | | |`U+0BA0` | _unassigned_ | | | | |`U+0BA1` | _unassigned_ | | | | |`U+0BA2` | _unassigned_ | | | | |`U+0BA3` | Letter | CONSONANT | _null_ | ண Nna | |`U+0BA4` | Letter | CONSONANT | _null_ | த Ta | |`U+0BA5` | _unassigned_ | | | | |`U+0BA6` | _unassigned_ | | | | |`U+0BA7` | _unassigned_ | | | | |`U+0BA8` | Letter | CONSONANT | _null_ | ந Na | |`U+0BA9` | Letter | CONSONANT | _null_ | ன Nnna | |`U+0BAA` | Letter | CONSONANT | _null_ | ப Pa | |`U+0BAB` | _unassigned_ | | | | |`U+0BAC` | _unassigned_ | | | | |`U+0BAD` | _unassigned_ | | | | |`U+0BAE` | Letter | CONSONANT | _null_ | ம Ma | |`U+0BAF` | Letter | CONSONANT | _null_ | ய Ya | | | | | | |`U+0BB0` | Letter | CONSONANT | _null_ | ர Ra | |`U+0BB1` | Letter | CONSONANT | _null_ | ற Rra | |`U+0BB2` | Letter | CONSONANT | _null_ | ல La | |`U+0BB3` | Letter | CONSONANT | _null_ | ள Lla | |`U+0BB4` | Letter | CONSONANT | _null_ | ழ Llla | |`U+0BB5` | Letter | CONSONANT | _null_ | வ Va | |`U+0BB6` | Letter | CONSONANT | _null_ | ஶ Sha | |`U+0BB7` | Letter | CONSONANT | _null_ | ஷ Ssa | |`U+0BB8` | Letter | CONSONANT | _null_ | ஸ Sa | |`U+0BB9` | Letter | CONSONANT | _null_ | ஹ Ha | |`U+0BBA` | _unassigned_ | | | | |`U+0BBB` | _unassigned_ | | | | |`U+0BBC` | _unassigned_ | | | | |`U+0BBD` | _unassigned_ | | | | |`U+0BBE` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ா Sign Aa | |`U+0BBF` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ி Sign I | | | | | | |`U+0BC0` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ீ Sign Ii | |`U+0BC1` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ு Sign U | |`U+0BC2` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ூ Sign Uu | |`U+0BC3` | _unassigned_ | | | | |`U+0BC4` | _unassigned_ | | | | |`U+0BC5` | _unassigned_ | | | | |`U+0BC6` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ெ Sign E | |`U+0BC7` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ே Sign Ee | |`U+0BC8` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ை Sign Ai | |`U+0BC9` | _unassigned_ | | | | |`U+0BCA` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ொ Sign O | |`U+0BCB` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ோ Sign Oo | |`U+0BCC` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ௌ Sign Au | |`U+0BCD` | Mark [Mn] | VIRAMA | TOP_POSITION | ் Virama | |`U+0BCE` | _unassigned_ | | | | |`U+0BCF` | _unassigned_ | | | | | | | | | |`U+0BD0` | Letter | _null_ | _null_ | ௐ Om | |`U+0BD1` | _unassigned_ | | | | |`U+0BD2` | _unassigned_ | | | | |`U+0BD3` | _unassigned_ | | | | |`U+0BD4` | _unassigned_ | | | | |`U+0BD5` | _unassigned_ | | | | |`U+0BD6` | _unassigned_ | | | | |`U+0BD7` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ௗ Au Length Mark | |`U+0BD8` | _unassigned_ | | | | |`U+0BD9` | _unassigned_ | | | | |`U+0BDA` | _unassigned_ | | | | |`U+0BDB` | _unassigned_ | | | | |`U+0BDC` | _unassigned_ | | | | |`U+0BDD` | _unassigned_ | | | | |`U+0BDE` | _unassigned_ | | | | |`U+0BDF` | _unassigned_ | | | | | | | | | |`U+0BE0` | _unassigned_ | | | | |`U+0BE1` | _unassigned_ | | | | |`U+0BE2` | _unassigned_ | | | | |`U+0BE3` | _unassigned_ | | | | |`U+0BE4` | _unassigned_ | | | | |`U+0BE5` | _unassigned_ | | | | |`U+0BE6` | Number | NUMBER | _null_ | ௦ Digit Zero | |`U+0BE7` | Number | NUMBER | _null_ | ௧ Digit One | |`U+0BE8` | Number | NUMBER | _null_ | ௨ Digit Two | |`U+0BE9` | Number | NUMBER | _null_ | ௩ Digit Three | |`U+0BEA` | Number | NUMBER | _null_ | ௪ Digit Four | |`U+0BEB` | Number | NUMBER | _null_ | ௫ Digit Five | |`U+0BEC` | Number | NUMBER | _null_ | ௬ Digit Six | |`U+0BED` | Number | NUMBER | _null_ | ௭ Digit Seven | |`U+0BEE` | Number | NUMBER | _null_ | ௮ Digit Eight | |`U+0BEF` | Number | NUMBER | _null_ | ௯ Digit Nine | | | | | | |`U+0BF0` | Number | NUMBER | _null_ | ௰ Number Ten | |`U+0BF1` | Number | NUMBER | _null_ | ௱ Number One Hundred | |`U+0BF2` | Number | NUMBER | _null_ | ௲ Number One Thousand | |`U+0BF3` | Symbol | SYMBOL | _null_ | ௳ Day Sign | |`U+0BF4` | Symbol | SYMBOL | _null_ | ௴ Month Sign | |`U+0BF5` | Symbol | SYMBOL | _null_ | ௵ Year Sign | |`U+0BF6` | Symbol | SYMBOL | _null_ | ௶ Debit Sign | |`U+0BF7` | Symbol | SYMBOL | _null_ | ௷ Credit Sign | |`U+0BF8` | Symbol | SYMBOL | _null_ | ௸ As Above Sign | |`U+0BF9` | Symbol | SYMBOL | _null_ | ௹ Tamil Rupee Sign | |`U+0BFA` | Symbol | SYMBOL | _null_ | ௺ Number Sign | |`U+0BFB` | _unassigned_ | | | | |`U+0BFC` | _unassigned_ | | | | |`U+0BFD` | _unassigned_ | | | | |`U+0BFE` | _unassigned_ | | | | |`U+0BFF` | _unassigned_ | | | | ::: ## Tamil Supplement character table ## Tamil text runs may also include historical symbols and fractions from the Tamil Supplement block. These characters should be classified as follows. :::{table} Tamil Supplement character table | Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph | |:----------|:-----------------|:--------------|:------------------------|:------------------------------| | `U+11FC0` | Number | NUMBER | _null_ | 𑿀 Fraction One Three-Hundred-And-Twentieth | | `U+11FC1` | Number | NUMBER | _null_ | 𑿁 Fraction One One-Hundred-And-Sixtieth | | `U+11FC2` | Number | NUMBER | _null_ | 𑿂 Fraction One Eightieth | | `U+11FC3` | Number | NUMBER | _null_ | 𑿃 Fraction One Sixty-Fourth | | `U+11FC4` | Number | NUMBER | _null_ | 𑿄 Fraction One Fortieth | | `U+11FC5` | Number | NUMBER | _null_ | 𑿅 Fraction One Thirty-Second | | `U+11FC6` | Number | NUMBER | _null_ | 𑿆 Fraction Three Eightieths | | `U+11FC7` | Number | NUMBER | _null_ | 𑿇 Fraction Three Sixty-Fourths | | `U+11FC8` | Number | NUMBER | _null_ | 𑿈 Fraction One Twentieth | | `U+11FC9` | Number | NUMBER | _null_ | 𑿉 Fraction One Sixteenth-1 | | `U+11FCA` | Number | NUMBER | _null_ | 𑿊 Fraction One Sixteenth-2 | | `U+11FCB` | Number | NUMBER | _null_ | 𑿋 Fraction One Tenth | | `U+11FCC` | Number | NUMBER | _null_ | 𑿌 Fraction One Eighth | | `U+11FCD` | Number | NUMBER | _null_ | 𑿍 Fraction Three Twentieths | | `U+11FCE` | Number | NUMBER | _null_ | 𑿎 Fraction Three Sixteenths | | `U+11FCF` | Number | NUMBER | _null_ | 𑿏 Fraction One Fifth | | | | | | | `U+11FD0` | Number | NUMBER | _null_ | 𑿐 Fraction One Quarter | | `U+11FD1` | Number | NUMBER | _null_ | 𑿑 Fraction One Half-1 | | `U+11FD2` | Number | NUMBER | _null_ | 𑿒 Fraction One Half-2 | | `U+11FD3` | Number | NUMBER | _null_ | 𑿓 Fraction Three Quarters | | `U+11FD4` | Number | NUMBER | _null_ | 𑿔 Fraction Downscaling Factor Kiizh | | `U+11FD5` | Symbol | SYMBOL | _null_ | 𑿕 Sign Nel | | `U+11FD6` | Symbol | SYMBOL | _null_ | 𑿖 Sign Cevitu | | `U+11FD7` | Symbol | SYMBOL | _null_ | 𑿗 Sign Aazhaakku | | `U+11FD8` | Symbol | SYMBOL | _null_ | 𑿘 Sign Uzhakku | | `U+11FD9` | Symbol | SYMBOL | _null_ | 𑿙 Sign Muuvuzhakku | | `U+11FDA` | Symbol | SYMBOL | _null_ | 𑿚 Sign Kuruni | | `U+11FDB` | Symbol | SYMBOL | _null_ | 𑿛 Sign Pathakku | | `U+11FDC` | Symbol | SYMBOL | _null_ | 𑿜 Sign Mukkuruni | | `U+11FDD` | Symbol | SYMBOL | _null_ | 𑿝 Sign Kaacu | | `U+11FDE` | Symbol | SYMBOL | _null_ | 𑿞 Sign Panam | | `U+11FDF` | Symbol | SYMBOL | _null_ | 𑿟 Sign Pon | | | | | | | `U+11FE0` | Symbol | SYMBOL | _null_ | 𑿠 Sign Varaakan | | `U+11FE1` | Symbol | SYMBOL | _null_ | 𑿡 Sign Paaram | | `U+11FE2` | Symbol | SYMBOL | _null_ | 𑿢 Sign Kuzhi | | `U+11FE3` | Symbol | SYMBOL | _null_ | 𑿣 Sign Veli | | `U+11FE4` | Symbol | SYMBOL | _null_ | 𑿤 Wet Cultivation Sign | | `U+11FE5` | Symbol | SYMBOL | _null_ | 𑿥 Dry Cultivation Sign | | `U+11FE6` | Symbol | SYMBOL | _null_ | 𑿦 Land Sign | | `U+11FE7` | Symbol | SYMBOL | _null_ | 𑿧 Salt Pan Sign | | `U+11FE8` | Symbol | SYMBOL | _null_ | 𑿨 Traditional Credit Sign | | `U+11FE9` | Symbol | SYMBOL | _null_ | 𑿩 Traditional Number Sign | | `U+11FEA` | Symbol | SYMBOL | _null_ | 𑿪 Current Sign | | `U+11FEB` | Symbol | SYMBOL | _null_ | 𑿫 And Odd Sign | | `U+11FEC` | Symbol | SYMBOL | _null_ | 𑿬 Spent Sign | | `U+11FED` | Symbol | SYMBOL | _null_ | 𑿭 Total Sign | | `U+11FEE` | Symbol | SYMBOL | _null_ | 𑿮 In Possession Sign | | `U+11FEF` | Symbol | SYMBOL | _null_ | 𑿯 Starting From Sign | | | | | | | `U+11FF0` | Symbol | SYMBOL | _null_ | 𑿰 Sign Muthaliya | | `U+11FF1` | Symbol | SYMBOL | _null_ | 𑿱 Sign Vakaiyaraa | | `U+11FF2` | _unassigned_ | | | | | `U+11FF3` | _unassigned_ | | | | | `U+11FF4` | _unassigned_ | | | | | `U+11FF5` | _unassigned_ | | | | | `U+11FF6` | _unassigned_ | | | | | `U+11FF7` | _unassigned_ | | | | | `U+11FF8` | _unassigned_ | | | | | `U+11FF9` | _unassigned_ | | | | | `U+11FFA` | _unassigned_ | | | | | `U+11FFB` | _unassigned_ | | | | | `U+11FFC` | _unassigned_ | | | | | `U+11FFD` | _unassigned_ | | | | | `U+11FFE` | _unassigned_ | | | | | `U+11FFF` | Punctuation | _null_ | _null_ | 𑿿 End Of Text | ::: ## Grantha marks character table ## Tamil text runs may also include diacritical and syllable-modifier marks from the Grantha block. These characters should be classified as follows. :::{table} Grantha marks character table | Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph | |:----------|:-----------------|:------------------|:---------------------------|:-----------------------------| |`U+11301` | Mark [Mn] | BINDU | TOP_POSITION | 𑌁 Grantha Candrabindu| |`U+11303` | Mark [Mc] | VISARGA | RIGHT_POSITION | 𑌃 Grantha Visarga | |`U+1133B` | Mark [Mn] | NUKTA | BOTTOM_POSITION | 𑌻 Combining Bindu Below | |`U+1133C` | Mark [Mn] | NUKTA | BOTTOM_POSITION | 𑌼 Grantha Nukta | ::: ## Vedic Extensions character table ## Sanskrit runs written in the Tamil script may also include characters from the Vedic Extensions block. These characters should be classified as follows. > Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md) > document for additional information. :::{table} Vedic Extensions character table | Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph | |:----------|:-----------------|:------------------|:---------------------------|:-----------------------------| |`U+1CD0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳐ Tone Karshana | |`U+1CD1` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳑ Tone Shara | |`U+1CD2` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳒ Tone Prenkha | |`U+1CD3` | Punctuation | _null_ | _null_ | ᳓ Sign Nihshvasa | |`U+1CD4` | Mark [Mn] | CANTILLATION | OVERSTRUCK | ᳔ Tone Midline Svarita | |`U+1CD5` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳕ Tone Aggravated Independent Svarita | |`U+1CD6` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳖ Tone Independent Svarita | |`U+1CD7` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳗ Tone Kathaka Independent Svarita | |`U+1CD8` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳘ Tone Candra Below | |`U+1CD9` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳙ Tone Kathaka Independent Svarita Schroeder | |`U+1CDA` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳚ Tone Double Svarita | |`U+1CDB` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳛ Tone Triple Svarita | |`U+1CDC` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳜ Tone Kathaka Anudatta | |`U+1CDD` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳝ Tone Dot Below | |`U+1CDE` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳞ Tone Two Dots Below | |`U+1CDF` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳟ Tone Three Dots Below | | | | | | |`U+1CE0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳠ Tone Rigvedic Kashmiri Independent Svarita | |`U+1CE1` | Mark [Mc] | CANTILLATION | RIGHT_POSITION | ᳡ Tone Atharavedic Independent Svarita | |`U+1CE2` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳢ Sign Visarga Svarita | |`U+1CE3` | Mark [Mn] | _null_ | OVERSTRUCK | ᳣ Sign Visarga Udatta | |`U+1CE4` | Mark [Mn] | _null_ | OVERSTRUCK | ᳤ Sign Reversed Visarga Udatta | |`U+1CE5` | Mark [Mn] | _null_ | OVERSTRUCK | ᳥ Sign Visarga Anudatta | |`U+1CE6` | Mark [Mn] | _null_ | OVERSTRUCK | ᳦ Sign Reversed Visarga Anudatta | |`U+1CE7` | Mark [Mn] | _null_ | OVERSTRUCK | ᳧ Sign Visarga Udatta With Tail | |`U+1CE8` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳨ Sign Visarga Anudatta With Tail | |`U+1CE9` | Letter | SYMBOL | _null_ | ᳩ Sign Anusvara Antargomukha | |`U+1CEA` | Letter | _null_ | _null_ | ᳪ Sign Anusvara Bahirgomukha | |`U+1CEB` | Letter | _null_ | _null_ | ᳫ Sign Anusvara Vamagomukha | |`U+1CEC` | Letter | SYMBOL | _null_ | ᳬ Sign Anusvara Vamagomukha With Tail | |`U+1CED` | Mark [Mn] | AVAGRAHA | BOTTOM_POSITION | ᳭ Sign Tiryak | |`U+1CEE` | Letter | SYMBOL | _null_ | ᳮ Sign Hexiform Long Anusvara | |`U+1CEF` | Letter | _null_ | _null_ | ᳯ Sign Long Anusvara | | | | | | |`U+1CF0` | Letter | _null_ | _null_ | ᳰ Sign Rthang Long Anusvara | |`U+1CF2` | Letter | CONSONANT_DEAD | _null_ | ᳲ Sign Ardhavisarga | |`U+1CF3` | Letter | CONSONANT_DEAD | _null_ | ᳳ Sign Rotated Ardhavisarga | |`U+1CF3` | Mark [Mc] | VISARGA | _null_ | ᳳ Sign Rotated Ardhavisarga | |`U+1CF4` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳴ Tone Candra Above | |`U+1CF5` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳵ Sign Jihvamuliya | |`U+1CF6` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳶ Sign Upadhmaniya | |`U+1CF7` | Mark [Mc] | _null_ | _null_ | ᳷ Sign Atikrama | |`U+1CF8` | Mark [Mn] | CANTILLATION | _null_ | ᳸ Tone Ring Above | |`U+1CF9` | Mark [Mn] | CANTILLATION | _null_ | ᳹ Tone Double Ring Above | |`U+1CFA` | Letter | PLACEHOLDER | _null_ | ᳺ Sign Double Anusvara Antargomukha | |`U+1CFB` | _unassigned_ | | | | |`U+1CFC` | _unassigned_ | | | | |`U+1CFD` | _unassigned_ | | | | |`U+1CFE` | _unassigned_ | | | | |`U+1CFF` | _unassigned_ | | | | ::: ## Miscellaneous character table ## In addition to general punctuation, runs of Tamil text often use the danda (`U+0964`) and double danda (`U+0965`) punctuation marks from the Devanagari block. Tamil text can also incorporate the udatta (`U+0951`) and anudatta (`U+0952`) signs from the Devanagari block. :::{table} Additional punctuation character table | Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph | |:----------|:-----------------|:------------------|:---------------------------|:-------------------------------| |`U+0951` | Mark [Mn] | CANTILLATION | TOP_POSITION | ॑ Udatta | |`U+0952` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ॒ Anudatta | |`U+0964` | Punctuation | _null_ | _null_ | । Danda | |`U+0965` | Punctuation | _null_ | _null_ | ॥ Double Danda | ::: Other important characters that may be encountered when shaping runs of Tamil text include the dotted-circle placeholder (`U+25CC`), the zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and the no-break space (`U+00A0`). The dotted-circle placeholder is frequently used when displaying a dependent vowel (matra) or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully. :::{table} Miscellaneous character table | Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph | |:----------|:-----------------|:------------------|:---------------------------|:-------------------------------| |`U+00A0` | Separator | PLACEHOLDER | _null_ |   No-break space | |`U+00B2` | Number | SYLLABLE_MODIFIER | TOP | ² Superscript Two | |`U+00B3` | Number | SYLLABLE_MODIFIER | TOP | ³ Superscript Three | |`U+200C` | Other | NON_JOINER | _null_ | ‌ Zero-width non-joiner | |`U+200D` | Other | JOINER | _null_ | ‍ Zero-width joiner | |`U+2010` | Punctuation | PLACEHOLDER | _null_ | ‐ Hyphen | |`U+2011` | Punctuation | PLACEHOLDER | _null_ | ‑ No-break hyphen | |`U+2012` | Punctuation | PLACEHOLDER | _null_ | ‒ Figure dash | |`U+2013` | Punctuation | PLACEHOLDER | _null_ | – En dash | |`U+2014` | Punctuation | PLACEHOLDER | _null_ | — Em dash | |`U+2074` | Number | SYLLABLE_MODIFIER | TOP | ⁴ Superscript Four | |`U+2082` | Number | SYLLABLE_MODIFIER | TOP | ₂ Subscript Two | |`U+2083` | Number | SYLLABLE_MODIFIER | TOP | ₃ Subscript Three | |`U+2084` | Number | SYLLABLE_MODIFIER | TOP | ₄ Subscript Four | |`U+25CC` | Symbol | DOTTED_CIRCLE | _null_ | ◌ Dotted circle | ::: The zero-width joiner (ZWJ) is primarily used to prevent the formation of a conjunct from a "_Consonant_,Halant,_Consonant_" sequence. The sequence "_Consonant_,Halant,ZWJ,_Consonant_" blocks the formation of a conjunct between the two consonants. Note, however, that the "_Consonant_,Halant" subsequence in the above example may still trigger a half-forms feature. To prevent the application of the half-forms feature in addition to preventing the conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The sequence "_Consonant_,Halant,ZWNJ,_Consonant_" should produce the first consonant in its standard form, followed by an explicit "Halant". A secondary usage of the zero-width joiner is to prevent the formation of "Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph", where an initial "Ra,Halant" sequence without the zero-width joiner otherwise would. The no-break space (NBSP) is primarily used to display those codepoints that are defined as non-spacing (marks, dependent vowels (matras), below-base consonant forms, and post-base consonant forms) in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder. These sequences will match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or "NBSP,_matra_". Tamil text sometimes uses the Latin numerals 2, 3, and 4 in superscript or subscript positions to annotate Sanskrit. When used in this fashion, the superscripts and subscripts are treated as `SYLLABLE_MODIFIER` signs for shaping purposes.