# Malayalam character tables #
This document lists the per-character shaping information needed to
[shape Malayalam text](../opentype-shaping-malayalam.md).
**Contents**
- [Malayalam character table](#malayalam-character-table)
- [Vedic Extensions character table](#vedic-extensions-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
## Malayalam character table ##
Malayalam glyphs should be classified as in the following
table. Codepoints in the Malayalam block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine. Note that
this does include some valid codepoints, such as currency marks,
punctuation, and other symbols.
> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important
> during syllable identification, but generally evoke no further
> special behavior during the rest of the shaping process.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the following table use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
:::{table} Malayalam character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+0D00` | Mark [Mn] | BINDU | TOP_POSITION | ഀ Combining Anusvara Above |
|`U+0D01` | Mark [Mn] | BINDU | TOP_POSITION | ഁ Candrabindu |
|`U+0D02` | Mark [Mc] | BINDU | RIGHT_POSITION | ം Anusvara |
|`U+0D03` | Mark [Mc] | VISARGA | RIGHT_POSITION | ഃ Visarga |
|`U+0D04` | Letter | BINDU | _null_ | ഄ Vedic Anusvara |
|`U+0D05` | Letter | VOWEL_INDEPENDENT | _null_ | അ A |
|`U+0D06` | Letter | VOWEL_INDEPENDENT | _null_ | ആ Aa |
|`U+0D07` | Letter | VOWEL_INDEPENDENT | _null_ | ഇ I |
|`U+0D08` | Letter | VOWEL_INDEPENDENT | _null_ | ഈ Ii |
|`U+0D09` | Letter | VOWEL_INDEPENDENT | _null_ | ഉ U |
|`U+0D0A` | Letter | VOWEL_INDEPENDENT | _null_ | ഊ Uu |
|`U+0D0B` | Letter | VOWEL_INDEPENDENT | _null_ | ഋ Vocalic R |
|`U+0D0C` | Letter | VOWEL_INDEPENDENT | _null_ | ഌ Vocalic L |
|`U+0D0D` | _unassigned_ | | | |
|`U+0D0E` | Letter | VOWEL_INDEPENDENT | _null_ | എ E |
|`U+0D0F` | Letter | VOWEL_INDEPENDENT | _null_ | ഏ Ee |
| | | | |
|`U+0D10` | Letter | VOWEL_INDEPENDENT | _null_ | ഐ Ai |
|`U+0D11` | _unassigned_ | | | |
|`U+0D12` | Letter | VOWEL_INDEPENDENT | _null_ | ഒ O |
|`U+0D13` | Letter | VOWEL_INDEPENDENT | _null_ | ഓ Oo |
|`U+0D14` | Letter | VOWEL_INDEPENDENT | _null_ | ഔ Au |
|`U+0D15` | Letter | CONSONANT | _null_ | ക Ka |
|`U+0D16` | Letter | CONSONANT | _null_ | ഖ Kha |
|`U+0D17` | Letter | CONSONANT | _null_ | ഗ Ga |
|`U+0D18` | Letter | CONSONANT | _null_ | ഘ Gha |
|`U+0D19` | Letter | CONSONANT | _null_ | ങ Nga |
|`U+0D1A` | Letter | CONSONANT | _null_ | ച Ca |
|`U+0D1B` | Letter | CONSONANT | _null_ | ഛ Cha |
|`U+0D1C` | Letter | CONSONANT | _null_ | ജ Ja |
|`U+0D1D` | Letter | CONSONANT | _null_ | ഝ Jha |
|`U+0D1E` | Letter | CONSONANT | _null_ | ഞ Nya |
|`U+0D1F` | Letter | CONSONANT | _null_ | ട Tta |
| | | | |
|`U+0D20` | Letter | CONSONANT | _null_ | ഠ Ttha |
|`U+0D21` | Letter | CONSONANT | _null_ | ഡ Dda |
|`U+0D22` | Letter | CONSONANT | _null_ | ഢ Ddha |
|`U+0D23` | Letter | CONSONANT | _null_ | ണ Nna |
|`U+0D24` | Letter | CONSONANT | _null_ | ത Ta |
|`U+0D25` | Letter | CONSONANT | _null_ | ഥ Tha |
|`U+0D26` | Letter | CONSONANT | _null_ | ദ Da |
|`U+0D27` | Letter | CONSONANT | _null_ | ധ Dha |
|`U+0D28` | Letter | CONSONANT | _null_ | ന Na |
|`U+0D29` | Letter | CONSONANT | _null_ | ഩ Nnna |
|`U+0D2A` | Letter | CONSONANT | _null_ | പ Pa |
|`U+0D2B` | Letter | CONSONANT | _null_ | ഫ Pha |
|`U+0D2C` | Letter | CONSONANT | _null_ | ബ Ba |
|`U+0D2D` | Letter | CONSONANT | _null_ | ഭ Bha |
|`U+0D2E` | Letter | CONSONANT | _null_ | മ Ma |
|`U+0D2F` | Letter | CONSONANT | _null_ | യ Ya |
| | | | |
|`U+0D30` | Letter | CONSONANT | _null_ | ര Ra |
|`U+0D31` | Letter | CONSONANT | _null_ | റ Rra |
|`U+0D32` | Letter | CONSONANT | _null_ | ല La |
|`U+0D33` | Letter | CONSONANT | _null_ | ള Lla |
|`U+0D34` | Letter | CONSONANT | _null_ | ഴ Llla |
|`U+0D35` | Letter | CONSONANT | _null_ | വ Va |
|`U+0D36` | Letter | CONSONANT | _null_ | ശ Sha |
|`U+0D37` | Letter | CONSONANT | _null_ | ഷ Ssa |
|`U+0D38` | Letter | CONSONANT | _null_ | സ Sa |
|`U+0D39` | Letter | CONSONANT | _null_ | ഹ Ha |
|`U+0D3A` | Letter | CONSONANT | _null_ | ഺ Ttta |
|`U+0D3B` | Mark [Mn] | PURE_KILLER | TOP_POSITION | ഻ Vertical Bar Virama |
|`U+0D3C` | Mark [Mn] | PURE_KILLER | TOP_POSITION | ഼ Circular Virama |
|`U+0D3D` | Letter | AVAGRAHA | _null_ | ഽ Avagraha |
|`U+0D3E` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ാ Sign Aa |
|`U+0D3F` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ി Sign I |
| | | | |
|`U+0D40` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ീ Sign Ii |
|`U+0D41` | Mark [Mn] | VOWEL_DEPENDENT | RIGHT_POSITION | ു Sign U |
|`U+0D42` | Mark [Mn] | VOWEL_DEPENDENT | RIGHT_POSITION | ൂ Sign Uu |
|`U+0D43` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ൃ Sign Vocalic R |
|`U+0D44` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ൄ Sign Vocalic Rr |
|`U+0D45` | _unassigned_ | | | |
|`U+0D46` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | െ Sign E |
|`U+0D47` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | േ Sign Ee |
|`U+0D48` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ൈ Sign Ai |
|`U+0D49` | _unassigned_ | | | |
|`U+0D4A` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ൊ Sign O |
|`U+0D4B` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ോ Sign Oo |
|`U+0D4C` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ൌ Sign Au |
|`U+0D4D` | Mark [Mn] | VIRAMA | TOP_POSITION | ് Virama |
|`U+0D4E` | Letter | CONSONANT_PRE_REPHA| _null_ | ൎ Dot Reph |
|`U+0D4F` | Symbol | SYMBOL | _null_ | ൏ Para |
| | | | |
|`U+0D50` | _unassigned_ | | | |
|`U+0D51` | _unassigned_ | | | |
|`U+0D52` | _unassigned_ | | | |
|`U+0D53` | _unassigned_ | | | |
|`U+0D54` | Letter | CONSONANT_DEAD | _null_ | ൔ Chillu M |
|`U+0D55` | Letter | CONSONANT_DEAD | _null_ | ൕ Chillu Y |
|`U+0D56` | Letter | CONSONANT_DEAD | _null_ | ൖ Chillu Lll |
|`U+0D57` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ൗ Au Length Mark |
|`U+0D58` | Number | NUMBER | _null_ | ൘ Fraction 1/160 |
|`U+0D59` | Number | NUMBER | _null_ | ൙ Fraction 1/40 |
|`U+0D5A` | Number | NUMBER | _null_ | ൚ Fraction 3/80 |
|`U+0D5B` | Number | NUMBER | _null_ | ൛ Fraction 1/20 |
|`U+0D5C` | Number | NUMBER | _null_ | ൜ Fraction 1/10 |
|`U+0D5D` | Number | NUMBER | _null_ | ൝ Fraction 3/20 |
|`U+0D5E` | Number | NUMBER | _null_ | ൞ Fraction 1/5 |
|`U+0D5F` | Letter | VOWEL_INDEPENDENT | _null_ | ൟ Archaic Ii |
| | | | |
|`U+0D60` | Letter | VOWEL_INDEPENDENT | _null_ | ൠ Vocalic Rr |
|`U+0D61` | Letter | VOWEL_INDEPENDENT | _null_ | ൡ Vocalic Ll |
|`U+0D62` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ൢ Sign Vocalic L |
|`U+0D63` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ൣ Sign Vocalic Ll |
|`U+0D64` | _unassigned_ | | | |
|`U+0D65` | _unassigned_ | | | |
|`U+0D66` | Number | NUMBER | _null_ | ൦ Digit Zero |
|`U+0D67` | Number | NUMBER | _null_ | ൧ Digit One |
|`U+0D68` | Number | NUMBER | _null_ | ൨ Digit Two |
|`U+0D69` | Number | NUMBER | _null_ | ൩ Digit Three |
|`U+0D6A` | Number | NUMBER | _null_ | ൪ Digit Four |
|`U+0D6B` | Number | NUMBER | _null_ | ൫ Digit Five |
|`U+0D6C` | Number | NUMBER | _null_ | ൬ Digit Six |
|`U+0D6D` | Number | NUMBER | _null_ | ൭ Digit Seven |
|`U+0D6E` | Number | NUMBER | _null_ | ൮ Digit Eight |
|`U+0D6F` | Number | NUMBER | _null_ | ൯ Digit Nine |
| | | | |
|`U+0D70` | Number | NUMBER | | ൰ Number Ten |
|`U+0D71` | Number | NUMBER | | ൱ Number One Hundred |
|`U+0D72` | Number | NUMBER | | ൲ Number One Thousand |
|`U+0D73` | Number | NUMBER | | ൳ Fraction 1/4 |
|`U+0D74` | Number | NUMBER | | ൴ Fraction 1/2 |
|`U+0D75` | Number | NUMBER | | ൵ Fraction 3/4 |
|`U+0D76` | Number | NUMBER | | ൶ Fraction 1/16 |
|`U+0D77` | Number | NUMBER | | ൷ Fraction 1/8 |
|`U+0D78` | Number | NUMBER | _null_ | ൸ Fraction 3/16 |
|`U+0D79` | Symbol | SYMBOL | _null_ | ൹ Date Mark |
|`U+0D7A` | Letter | CONSONANT_DEAD | _null_ | ൺ Chillu Nn |
|`U+0D7B` | Letter | CONSONANT_DEAD | _null_ | ൻ Chillu N |
|`U+0D7C` | Letter | CONSONANT_DEAD | _null_ | ർ Chillu Rr |
|`U+0D7D` | Letter | CONSONANT_DEAD | _null_ | ൽ Chillu L |
|`U+0D7E` | Letter | CONSONANT_DEAD | _null_ | ൾ Chillu Ll |
|`U+0D7F` | Letter | CONSONANT_DEAD | _null_ | ൿ Chillu K |
:::
## Vedic Extensions character table ##
Sanskrit runs written in the Malayalam script may also include
characters from the Vedic Extensions block. These characters should be
classified as follows.
> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md)
> document for additional information.
:::{table} Vedic Extensions character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+1CD0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳐ Tone Karshana |
|`U+1CD1` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳑ Tone Shara |
|`U+1CD2` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳒ Tone Prenkha |
|`U+1CD3` | Punctuation | _null_ | _null_ | ᳓ Sign Nihshvasa |
|`U+1CD4` | Mark [Mn] | CANTILLATION | OVERSTRUCK | ᳔ Tone Midline Svarita |
|`U+1CD5` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳕ Tone Aggravated Independent Svarita |
|`U+1CD6` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳖ Tone Independent Svarita |
|`U+1CD7` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳗ Tone Kathaka Independent Svarita |
|`U+1CD8` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳘ Tone Candra Below |
|`U+1CD9` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳙ Tone Kathaka Independent Svarita Schroeder |
|`U+1CDA` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳚ Tone Double Svarita |
|`U+1CDB` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳛ Tone Triple Svarita |
|`U+1CDC` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳜ Tone Kathaka Anudatta |
|`U+1CDD` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳝ Tone Dot Below |
|`U+1CDE` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳞ Tone Two Dots Below |
|`U+1CDF` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳟ Tone Three Dots Below |
| | | | |
|`U+1CE0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳠ Tone Rigvedic Kashmiri Independent Svarita |
|`U+1CE1` | Mark [Mc] | CANTILLATION | RIGHT_POSITION | ᳡ Tone Atharavedic Independent Svarita |
|`U+1CE2` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳢ Sign Visarga Svarita |
|`U+1CE3` | Mark [Mn] | _null_ | OVERSTRUCK | ᳣ Sign Visarga Udatta |
|`U+1CE4` | Mark [Mn] | _null_ | OVERSTRUCK | ᳤ Sign Reversed Visarga Udatta |
|`U+1CE5` | Mark [Mn] | _null_ | OVERSTRUCK | ᳥ Sign Visarga Anudatta |
|`U+1CE6` | Mark [Mn] | _null_ | OVERSTRUCK | ᳦ Sign Reversed Visarga Anudatta |
|`U+1CE7` | Mark [Mn] | _null_ | OVERSTRUCK | ᳧ Sign Visarga Udatta With Tail |
|`U+1CE8` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳨ Sign Visarga Anudatta With Tail |
|`U+1CE9` | Letter | SYMBOL | _null_ | ᳩ Sign Anusvara Antargomukha |
|`U+1CEA` | Letter | _null_ | _null_ | ᳪ Sign Anusvara Bahirgomukha |
|`U+1CEB` | Letter | _null_ | _null_ | ᳫ Sign Anusvara Vamagomukha |
|`U+1CEC` | Letter | SYMBOL | _null_ | ᳬ Sign Anusvara Vamagomukha With Tail |
|`U+1CED` | Mark [Mn] | AVAGRAHA | BOTTOM_POSITION | ᳭ Sign Tiryak |
|`U+1CEE` | Letter | SYMBOL | _null_ | ᳮ Sign Hexiform Long Anusvara |
|`U+1CEF` | Letter | _null_ | _null_ | ᳯ Sign Long Anusvara |
| | | | |
|`U+1CF0` | Letter | _null_ | _null_ | ᳰ Sign Rthang Long Anusvara |
|`U+1CF2` | Letter | CONSONANT_DEAD | _null_ | ᳲ Sign Ardhavisarga |
|`U+1CF3` | Letter | CONSONANT_DEAD | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF3` | Mark [Mc] | VISARGA | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF4` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳴ Tone Candra Above |
|`U+1CF5` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳵ Sign Jihvamuliya |
|`U+1CF6` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳶ Sign Upadhmaniya |
|`U+1CF7` | Mark [Mc] | _null_ | _null_ | ᳷ Sign Atikrama |
|`U+1CF8` | Mark [Mn] | CANTILLATION | _null_ | ᳸ Tone Ring Above |
|`U+1CF9` | Mark [Mn] | CANTILLATION | _null_ | ᳹ Tone Double Ring Above |
|`U+1CFA` | Letter | PLACEHOLDER | _null_ | ᳺ Sign Double Anusvara Antargomukha |
|`U+1CFB` | _unassigned_ | | | |
|`U+1CFC` | _unassigned_ | | | |
|`U+1CFD` | _unassigned_ | | | |
|`U+1CFE` | _unassigned_ | | | |
|`U+1CFF` | _unassigned_ | | | |
:::
## Miscellaneous character table ##
In addition to general punctuation, runs of Malayalam text often use the
danda (`U+0964`) and double danda (`U+0965`) punctuation marks from
the Devanagari block. Malayalam text can also incorporate the udatta
(`U+0951`) and anudatta (`U+0952`) signs from the Devanagari block.
:::{table} Additional punctuation character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+0951` | Mark [Mn] | CANTILLATION | TOP_POSITION | ॑ Udatta |
|`U+0952` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ॒ Anudatta |
|`U+0964` | Punctuation | _null_ | _null_ | । Danda |
|`U+0965` | Punctuation | _null_ | _null_ | ॥ Double Danda |
:::
Other important characters that may be encountered when shaping runs
of Malayalam text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
dependent vowel (matra) or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+00A0` | Separator | PLACEHOLDER | _null_ | No-break space |
|`U+200C` | Other | NON_JOINER | _null_ | Zero-width non-joiner |
|`U+200D` | Other | JOINER | _null_ | Zero-width joiner |
|`U+2010` | Punctuation | PLACEHOLDER | _null_ | ‐ Hyphen |
|`U+2011` | Punctuation | PLACEHOLDER | _null_ | ‑ No-break hyphen |
|`U+2012` | Punctuation | PLACEHOLDER | _null_ | ‒ Figure dash |
|`U+2013` | Punctuation | PLACEHOLDER | _null_ | – En dash |
|`U+2014` | Punctuation | PLACEHOLDER | _null_ | — Em dash |
|`U+25CC` | Symbol | DOTTED_CIRCLE | _null_ | ◌ Dotted circle |
:::
The zero-width joiner (ZWJ) is primarily used to prevent the formation
of a conjunct from a "_Consonant_,Halant,_Consonant_" sequence. The
sequence "_Consonant_,Halant,ZWJ,_Consonant_" blocks the formation of
a conjunct between the two consonants.
Note, however, that the "_Consonant_,Halant" subsequence in the above
example may still trigger a half-forms feature. To prevent the
application of the half-forms feature in addition to preventing the
conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The
sequence "_Consonant_,Halant,ZWNJ,_Consonant_" should produce the
first consonant in its standard form, followed by an explicit
"Halant".
A secondary usage of the zero-width joiner is to prevent the formation of
"Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph",
where an initial "Ra,Halant" sequence without the zero-width joiner
otherwise would.
The no-break space (NBSP) is primarily used to display
those codepoints that are defined as non-spacing (marks, dependent
vowels (matras), below-base consonant forms, and post-base consonant
forms) in an isolated context, as an alternative to displaying them
superimposed on the dotted-circle placeholder. These sequences will
match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or "NBSP,_matra_".