Malayalam character tables

This document lists the per-character shaping information needed to shape Malayalam text.

Contents

Malayalam character table

Malayalam glyphs should be classified as in the following table. Codepoints in the Malayalam block with no assigned meaning are designated as unassigned in the Unicode category column.

Assigned codepoints with a null in the Shaping class column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols.

Note: the NUMBER and SYMBOL Shaping classes are important during syllable identification, but generally evoke no further special behavior during the rest of the shaping process.

The Mark-placement subclass column indicates mark-placement positioning for codepoints in the Mark category. Assigned, non-mark codepoints have a null in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the Unicode category column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining.

Some codepoints in the following table use a Shaping class that differs from the codepoint’s Unicode General Category. The Shaping class takes precedence during OpenType shaping, as it captures more specific, script-aware behavior.

Table 45 Malayalam character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Glyph

U+0D00

Mark [Mn]

BINDU

TOP_POSITION

ഀ Combining Anusvara Above

U+0D01

Mark [Mn]

BINDU

TOP_POSITION

ഁ Candrabindu

U+0D02

Mark [Mc]

BINDU

RIGHT_POSITION

ം Anusvara

U+0D03

Mark [Mc]

VISARGA

RIGHT_POSITION

ഃ Visarga

U+0D04

Letter

BINDU

null

ഄ Vedic Anusvara

U+0D05

Letter

VOWEL_INDEPENDENT

null

അ A

U+0D06

Letter

VOWEL_INDEPENDENT

null

ആ Aa

U+0D07

Letter

VOWEL_INDEPENDENT

null

ഇ I

U+0D08

Letter

VOWEL_INDEPENDENT

null

ഈ Ii

U+0D09

Letter

VOWEL_INDEPENDENT

null

ഉ U

U+0D0A

Letter

VOWEL_INDEPENDENT

null

ഊ Uu

U+0D0B

Letter

VOWEL_INDEPENDENT

null

ഋ Vocalic R

U+0D0C

Letter

VOWEL_INDEPENDENT

null

ഌ Vocalic L

U+0D0D

unassigned

U+0D0E

Letter

VOWEL_INDEPENDENT

null

എ E

U+0D0F

Letter

VOWEL_INDEPENDENT

null

ഏ Ee

U+0D10

Letter

VOWEL_INDEPENDENT

null

ഐ Ai

U+0D11

unassigned

U+0D12

Letter

VOWEL_INDEPENDENT

null

ഒ O

U+0D13

Letter

VOWEL_INDEPENDENT

null

ഓ Oo

U+0D14

Letter

VOWEL_INDEPENDENT

null

ഔ Au

U+0D15

Letter

CONSONANT

null

ക Ka

U+0D16

Letter

CONSONANT

null

ഖ Kha

U+0D17

Letter

CONSONANT

null

ഗ Ga

U+0D18

Letter

CONSONANT

null

ഘ Gha

U+0D19

Letter

CONSONANT

null

ങ Nga

U+0D1A

Letter

CONSONANT

null

ച Ca

U+0D1B

Letter

CONSONANT

null

ഛ Cha

U+0D1C

Letter

CONSONANT

null

ജ Ja

U+0D1D

Letter

CONSONANT

null

ഝ Jha

U+0D1E

Letter

CONSONANT

null

ഞ Nya

U+0D1F

Letter

CONSONANT

null

ട Tta

U+0D20

Letter

CONSONANT

null

ഠ Ttha

U+0D21

Letter

CONSONANT

null

ഡ Dda

U+0D22

Letter

CONSONANT

null

ഢ Ddha

U+0D23

Letter

CONSONANT

null

ണ Nna

U+0D24

Letter

CONSONANT

null

ത Ta

U+0D25

Letter

CONSONANT

null

ഥ Tha

U+0D26

Letter

CONSONANT

null

ദ Da

U+0D27

Letter

CONSONANT

null

ധ Dha

U+0D28

Letter

CONSONANT

null

ന Na

U+0D29

Letter

CONSONANT

null

ഩ Nnna

U+0D2A

Letter

CONSONANT

null

പ Pa

U+0D2B

Letter

CONSONANT

null

ഫ Pha

U+0D2C

Letter

CONSONANT

null

ബ Ba

U+0D2D

Letter

CONSONANT

null

ഭ Bha

U+0D2E

Letter

CONSONANT

null

മ Ma

U+0D2F

Letter

CONSONANT

null

യ Ya

U+0D30

Letter

CONSONANT

null

ര Ra

U+0D31

Letter

CONSONANT

null

റ Rra

U+0D32

Letter

CONSONANT

null

ല La

U+0D33

Letter

CONSONANT

null

ള Lla

U+0D34

Letter

CONSONANT

null

ഴ Llla

U+0D35

Letter

CONSONANT

null

വ Va

U+0D36

Letter

CONSONANT

null

ശ Sha

U+0D37

Letter

CONSONANT

null

ഷ Ssa

U+0D38

Letter

CONSONANT

null

സ Sa

U+0D39

Letter

CONSONANT

null

ഹ Ha

U+0D3A

Letter

CONSONANT

null

ഺ Ttta

U+0D3B

Mark [Mn]

PURE_KILLER

TOP_POSITION

഻ Vertical Bar Virama

U+0D3C

Mark [Mn]

PURE_KILLER

TOP_POSITION

഼ Circular Virama

U+0D3D

Letter

AVAGRAHA

null

ഽ Avagraha

U+0D3E

Mark [Mc]

VOWEL_DEPENDENT

RIGHT_POSITION

ാ Sign Aa

U+0D3F

Mark [Mc]

VOWEL_DEPENDENT

RIGHT_POSITION

ി Sign I

U+0D40

Mark [Mc]

VOWEL_DEPENDENT

RIGHT_POSITION

ീ Sign Ii

U+0D41

Mark [Mn]

VOWEL_DEPENDENT

RIGHT_POSITION

ു Sign U

U+0D42

Mark [Mn]

VOWEL_DEPENDENT

RIGHT_POSITION

ൂ Sign Uu

U+0D43

Mark [Mn]

VOWEL_DEPENDENT

BOTTOM_POSITION

ൃ Sign Vocalic R

U+0D44

Mark [Mn]

VOWEL_DEPENDENT

BOTTOM_POSITION

ൄ Sign Vocalic Rr

U+0D45

unassigned

U+0D46

Mark [Mc]

VOWEL_DEPENDENT

LEFT_POSITION

െ Sign E

U+0D47

Mark [Mc]

VOWEL_DEPENDENT

LEFT_POSITION

േ Sign Ee

U+0D48

Mark [Mc]

VOWEL_DEPENDENT

LEFT_POSITION

ൈ Sign Ai

U+0D49

unassigned

U+0D4A

Mark [Mc]

VOWEL_DEPENDENT

LEFT_AND_RIGHT_POSITION

ൊ Sign O

U+0D4B

Mark [Mc]

VOWEL_DEPENDENT

LEFT_AND_RIGHT_POSITION

ോ Sign Oo

U+0D4C

Mark [Mc]

VOWEL_DEPENDENT

LEFT_AND_RIGHT_POSITION

ൌ Sign Au

U+0D4D

Mark [Mn]

VIRAMA

TOP_POSITION

് Virama

U+0D4E

Letter

CONSONANT_PRE_REPHA

null

ൎ Dot Reph

U+0D4F

Symbol

SYMBOL

null

൏ Para

U+0D50

unassigned

U+0D51

unassigned

U+0D52

unassigned

U+0D53

unassigned

U+0D54

Letter

CONSONANT_DEAD

null

ൔ Chillu M

U+0D55

Letter

CONSONANT_DEAD

null

ൕ Chillu Y

U+0D56

Letter

CONSONANT_DEAD

null

ൖ Chillu Lll

U+0D57

Mark [Mc]

VOWEL_DEPENDENT

RIGHT_POSITION

ൗ Au Length Mark

U+0D58

Number

NUMBER

null

൘ Fraction 1/160

U+0D59

Number

NUMBER

null

൙ Fraction 1/40

U+0D5A

Number

NUMBER

null

൚ Fraction 3/80

U+0D5B

Number

NUMBER

null

൛ Fraction 1/20

U+0D5C

Number

NUMBER

null

൜ Fraction 1/10

U+0D5D

Number

NUMBER

null

൝ Fraction 3/20

U+0D5E

Number

NUMBER

null

൞ Fraction 1/5

U+0D5F

Letter

VOWEL_INDEPENDENT

null

ൟ Archaic Ii

U+0D60

Letter

VOWEL_INDEPENDENT

null

ൠ Vocalic Rr

U+0D61

Letter

VOWEL_INDEPENDENT

null

ൡ Vocalic Ll

U+0D62

Mark [Mn]

VOWEL_DEPENDENT

BOTTOM_POSITION

ൢ Sign Vocalic L

U+0D63

Mark [Mn]

VOWEL_DEPENDENT

BOTTOM_POSITION

ൣ Sign Vocalic Ll

U+0D64

unassigned

U+0D65

unassigned

U+0D66

Number

NUMBER

null

൦ Digit Zero

U+0D67

Number

NUMBER

null

൧ Digit One

U+0D68

Number

NUMBER

null

൨ Digit Two

U+0D69

Number

NUMBER

null

൩ Digit Three

U+0D6A

Number

NUMBER

null

൪ Digit Four

U+0D6B

Number

NUMBER

null

൫ Digit Five

U+0D6C

Number

NUMBER

null

൬ Digit Six

U+0D6D

Number

NUMBER

null

൭ Digit Seven

U+0D6E

Number

NUMBER

null

൮ Digit Eight

U+0D6F

Number

NUMBER

null

൯ Digit Nine

U+0D70

Number

NUMBER

൰ Number Ten

U+0D71

Number

NUMBER

൱ Number One Hundred

U+0D72

Number

NUMBER

൲ Number One Thousand

U+0D73

Number

NUMBER

൳ Fraction 1/4

U+0D74

Number

NUMBER

൴ Fraction 1/2

U+0D75

Number

NUMBER

൵ Fraction 3/4

U+0D76

Number

NUMBER

൶ Fraction 1/16

U+0D77

Number

NUMBER

൷ Fraction 1/8

U+0D78

Number

NUMBER

null

൸ Fraction 3/16

U+0D79

Symbol

SYMBOL

null

൹ Date Mark

U+0D7A

Letter

CONSONANT_DEAD

null

ൺ Chillu Nn

U+0D7B

Letter

CONSONANT_DEAD

null

ൻ Chillu N

U+0D7C

Letter

CONSONANT_DEAD

null

ർ Chillu Rr

U+0D7D

Letter

CONSONANT_DEAD

null

ൽ Chillu L

U+0D7E

Letter

CONSONANT_DEAD

null

ൾ Chillu Ll

U+0D7F

Letter

CONSONANT_DEAD

null

ൿ Chillu K

Vedic Extensions character table

Sanskrit runs written in the Malayalam script may also include characters from the Vedic Extensions block. These characters should be classified as follows.

Note: See the Vedic Extensions document for additional information.

Table 46 Vedic Extensions character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Glyph

U+1CD0

Mark [Mn]

CANTILLATION

TOP_POSITION

᳐ Tone Karshana

U+1CD1

Mark [Mn]

CANTILLATION

TOP_POSITION

᳑ Tone Shara

U+1CD2

Mark [Mn]

CANTILLATION

TOP_POSITION

᳒ Tone Prenkha

U+1CD3

Punctuation

null

null

᳓ Sign Nihshvasa

U+1CD4

Mark [Mn]

CANTILLATION

OVERSTRUCK

᳔ Tone Midline Svarita

U+1CD5

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳕ Tone Aggravated Independent Svarita

U+1CD6

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳖ Tone Independent Svarita

U+1CD7

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳗ Tone Kathaka Independent Svarita

U+1CD8

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳘ Tone Candra Below

U+1CD9

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳙ Tone Kathaka Independent Svarita Schroeder

U+1CDA

Mark [Mn]

CANTILLATION

TOP_POSITION

᳚ Tone Double Svarita

U+1CDB

Mark [Mn]

CANTILLATION

TOP_POSITION

᳛ Tone Triple Svarita

U+1CDC

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳜ Tone Kathaka Anudatta

U+1CDD

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳝ Tone Dot Below

U+1CDE

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳞ Tone Two Dots Below

U+1CDF

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳟ Tone Three Dots Below

U+1CE0

Mark [Mn]

CANTILLATION

TOP_POSITION

᳠ Tone Rigvedic Kashmiri Independent Svarita

U+1CE1

Mark [Mc]

CANTILLATION

RIGHT_POSITION

᳡ Tone Atharavedic Independent Svarita

U+1CE2

Mark [Mn]

AVAGRAHA

OVERSTRUCK

᳢ Sign Visarga Svarita

U+1CE3

Mark [Mn]

null

OVERSTRUCK

᳣ Sign Visarga Udatta

U+1CE4

Mark [Mn]

null

OVERSTRUCK

᳤ Sign Reversed Visarga Udatta

U+1CE5

Mark [Mn]

null

OVERSTRUCK

᳥ Sign Visarga Anudatta

U+1CE6

Mark [Mn]

null

OVERSTRUCK

᳦ Sign Reversed Visarga Anudatta

U+1CE7

Mark [Mn]

null

OVERSTRUCK

᳧ Sign Visarga Udatta With Tail

U+1CE8

Mark [Mn]

AVAGRAHA

OVERSTRUCK

᳨ Sign Visarga Anudatta With Tail

U+1CE9

Letter

SYMBOL

null

ᳩ Sign Anusvara Antargomukha

U+1CEA

Letter

null

null

ᳪ Sign Anusvara Bahirgomukha

U+1CEB

Letter

null

null

ᳫ Sign Anusvara Vamagomukha

U+1CEC

Letter

SYMBOL

null

ᳬ Sign Anusvara Vamagomukha With Tail

U+1CED

Mark [Mn]

AVAGRAHA

BOTTOM_POSITION

᳭ Sign Tiryak

U+1CEE

Letter

SYMBOL

null

ᳮ Sign Hexiform Long Anusvara

U+1CEF

Letter

null

null

ᳯ Sign Long Anusvara

U+1CF0

Letter

null

null

ᳰ Sign Rthang Long Anusvara

U+1CF2

Letter

CONSONANT_DEAD

null

ᳲ Sign Ardhavisarga

U+1CF3

Letter

CONSONANT_DEAD

null

ᳳ Sign Rotated Ardhavisarga

U+1CF3

Mark [Mc]

VISARGA

null

ᳳ Sign Rotated Ardhavisarga

U+1CF4

Mark [Mn]

CANTILLATION

TOP_POSITION

᳴ Tone Candra Above

U+1CF5

Letter

CONSONANT_WITH_STACKER

null

ᳵ Sign Jihvamuliya

U+1CF6

Letter

CONSONANT_WITH_STACKER

null

ᳶ Sign Upadhmaniya

U+1CF7

Mark [Mc]

null

null

᳷ Sign Atikrama

U+1CF8

Mark [Mn]

CANTILLATION

null

᳸ Tone Ring Above

U+1CF9

Mark [Mn]

CANTILLATION

null

᳹ Tone Double Ring Above

U+1CFA

Letter

PLACEHOLDER

null

ᳺ Sign Double Anusvara Antargomukha

U+1CFB

unassigned

U+1CFC

unassigned

U+1CFD

unassigned

U+1CFE

unassigned

U+1CFF

unassigned

Miscellaneous character table

In addition to general punctuation, runs of Malayalam text often use the danda (U+0964) and double danda (U+0965) punctuation marks from the Devanagari block. Malayalam text can also incorporate the udatta (U+0951) and anudatta (U+0952) signs from the Devanagari block.

Table 47 Additional punctuation character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Glyph

U+0951

Mark [Mn]

CANTILLATION

TOP_POSITION

॑ Udatta

U+0952

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

॒ Anudatta

U+0964

Punctuation

null

null

। Danda

U+0965

Punctuation

null

null

॥ Double Danda

Other important characters that may be encountered when shaping runs of Malayalam text include the dotted-circle placeholder (U+25CC), the zero-width joiner (U+200D) and zero-width non-joiner (U+200C), and the no-break space (U+00A0).

The dotted-circle placeholder is frequently used when displaying a dependent vowel (matra) or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.

Table 48 Miscellaneous character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Glyph

U+00A0

Separator

PLACEHOLDER

null

  No-break space

U+200C

Other

NON_JOINER

null

‌ Zero-width non-joiner

U+200D

Other

JOINER

null

‍ Zero-width joiner

U+2010

Punctuation

PLACEHOLDER

null

‐ Hyphen

U+2011

Punctuation

PLACEHOLDER

null

‑ No-break hyphen

U+2012

Punctuation

PLACEHOLDER

null

‒ Figure dash

U+2013

Punctuation

PLACEHOLDER

null

– En dash

U+2014

Punctuation

PLACEHOLDER

null

— Em dash

U+25CC

Symbol

DOTTED_CIRCLE

null

◌ Dotted circle

The zero-width joiner (ZWJ) is primarily used to prevent the formation of a conjunct from a “Consonant,Halant,Consonant” sequence. The sequence “Consonant,Halant,ZWJ,Consonant” blocks the formation of a conjunct between the two consonants.

Note, however, that the “Consonant,Halant” subsequence in the above example may still trigger a half-forms feature. To prevent the application of the half-forms feature in addition to preventing the conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The sequence “Consonant,Halant,ZWNJ,Consonant” should produce the first consonant in its standard form, followed by an explicit “Halant”.

A secondary usage of the zero-width joiner is to prevent the formation of “Reph”. An initial “Ra,Halant,ZWJ” sequence should not produce a “Reph”, where an initial “Ra,Halant” sequence without the zero-width joiner otherwise would.

The no-break space (NBSP) is primarily used to display those codepoints that are defined as non-spacing (marks, dependent vowels (matras), below-base consonant forms, and post-base consonant forms) in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder. These sequences will match “NBSP,ZWJ,Halant,Consonant”, “NBSP,mark”, or “NBSP,matra”.