Gujarati character tables¶

This document lists the per-character shaping information needed to shape Gujarati text.

Contents

Gujarati character table
Vedic Extensions character table
Miscellaneous character table

Gujarati character table¶

Gujarati glyphs should be classified as in the following table. Codepoints in the Gujarati block with no assigned meaning are designated as unassigned in the Unicode category column.

Assigned codepoints with a null in the Shaping class column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols.

Note: the NUMBER and SYMBOL Shaping classes are important during syllable identification, but generally evoke no further special behavior during the rest of the shaping process.

The Mark-placement subclass column indicates mark-placement positioning for codepoints in the Mark category. Assigned, non-mark codepoints have a null in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the Unicode category column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining.

Some codepoints in the following table use a Shaping class that differs from the codepoint’s Unicode General Category. The Shaping class takes precedence during OpenType shaping, as it captures more specific, script-aware behavior.

Table 18 Gujarati character table¶
Codepoint	Unicode category	Shaping class	Mark-placement subclass	Glyph
`U+0A80`	unassigned
`U+0A81`	Mark [Mn]	BINDU	TOP_POSITION	ઁ Candrabindu
`U+0A82`	Mark [Mn]	BINDU	TOP_POSITION	ં Anusvara
`U+0A83`	Mark [Mc]	VISARGA	RIGHT_POSITION	ઃ Visarga
`U+0A84`	unassigned
`U+0A85`	Letter	VOWEL_INDEPENDENT	null	અ A
`U+0A86`	Letter	VOWEL_INDEPENDENT	null	આ Aa
`U+0A87`	Letter	VOWEL_INDEPENDENT	null	ઇ I
`U+0A88`	Letter	VOWEL_INDEPENDENT	null	ઈ Ii
`U+0A89`	Letter	VOWEL_INDEPENDENT	null	ઉ U
`U+0A8A`	Letter	VOWEL_INDEPENDENT	null	ઊ Uu
`U+0A8B`	Letter	VOWEL_INDEPENDENT	null	ઋ Vocalic R
`U+0A8C`	Letter	VOWEL_INDEPENDENT	null	ઌ Vocalic L
`U+0A8D`	Letter	VOWEL_INDEPENDENT	null	ઍ Candra E
`U+0A8E`	unassigned
`U+0A8F`	Letter	VOWEL_INDEPENDENT	null	એ E

`U+0A90`	Letter	VOWEL_INDEPENDENT	null	ઐ Ai
`U+0A91`	Letter	VOWEL_INDEPENDENT	null	ઑ Candra O
`U+0A92`	unassigned
`U+0A93`	Letter	VOWEL_INDEPENDENT	null	ઓ O
`U+0A94`	Letter	VOWEL_INDEPENDENT	null	ઔ Au
`U+0A95`	Letter	CONSONANT	null	ક Ka
`U+0A96`	Letter	CONSONANT	null	ખ Kha
`U+0A97`	Letter	CONSONANT	null	ગ Ga
`U+0A98`	Letter	CONSONANT	null	ઘ Gha
`U+0A99`	Letter	CONSONANT	null	ઙ Nga
`U+0A9A`	Letter	CONSONANT	null	ચ Ca
`U+0A9B`	Letter	CONSONANT	null	છ Cha
`U+0A9C`	Letter	CONSONANT	null	જ Ja
`U+0A9D`	Letter	CONSONANT	null	ઝ Jha
`U+0A9E`	Letter	CONSONANT	null	ઞ Nya
`U+0A9F`	Letter	CONSONANT	null	ટ Tta

`U+0AA0`	Letter	CONSONANT	null	ઠ Ttha
`U+0AA1`	Letter	CONSONANT	null	ડ Dda
`U+0AA2`	Letter	CONSONANT	null	ઢ Ddha
`U+0AA3`	Letter	CONSONANT	null	ણ Nna
`U+0AA4`	Letter	CONSONANT	null	ત Ta
`U+0AA5`	Letter	CONSONANT	null	થ Tha
`U+0AA6`	Letter	CONSONANT	null	દ Da
`U+0AA7`	Letter	CONSONANT	null	ધ Dha
`U+0AA8`	Letter	CONSONANT	null	ન Na
`U+0AA9`	unassigned
`U+0AAA`	Letter	CONSONANT	null	પ Pa
`U+0AAB`	Letter	CONSONANT	null	ફ Pha
`U+0AAC`	Letter	CONSONANT	null	બ Ba
`U+0AAD`	Letter	CONSONANT	null	ભ Bha
`U+0AAE`	Letter	CONSONANT	null	મ Ma
`U+0AAF`	Letter	CONSONANT	null	ય Ya

`U+0AB0`	Letter	CONSONANT	null	ર Ra
`U+0AB1`	unassigned
`U+0AB2`	Letter	CONSONANT	null	લ La
`U+0AB3`	Letter	CONSONANT	null	ળ Lla
`U+0AB4`	unassigned
`U+0AB5`	Letter	CONSONANT	null	વ Va
`U+0AB6`	Letter	CONSONANT	null	શ Sha
`U+0AB7`	Letter	CONSONANT	null	ષ Ssa
`U+0AB8`	Letter	CONSONANT	null	સ Sa
`U+0AB9`	Letter	CONSONANT	null	હ Ha
`U+0ABA`	unassigned
`U+0ABB`	unassigned
`U+0ABC`	Mark [Mn]	NUKTA	BOTTOM_POSITION	઼ Nukta
`U+0ABD`	Letter	AVAGRAHA	null	ઽ Avagraha
`U+0ABE`	Mark [Mc]	VOWEL_DEPENDENT	RIGHT_POSITION	ા Sign Aa
`U+0ABF`	Mark [Mc]	VOWEL_DEPENDENT	LEFT_POSITION	િ Sign I

`U+0AC0`	Mark [Mc]	VOWEL_DEPENDENT	RIGHT_POSITION	ી Sign Ii
`U+0AC1`	Mark [Mn]	VOWEL_DEPENDENT	BOTTOM_POSITION	ુ Sign U
`U+0AC2`	Mark [Mn]	VOWEL_DEPENDENT	BOTTOM_POSITION	ૂ Sign Uu
`U+0AC3`	Mark [Mn]	VOWEL_DEPENDENT	BOTTOM_POSITION	ૃ Sign Vocalic R
`U+0AC4`	Mark [Mn]	VOWEL_DEPENDENT	BOTTOM_POSITION	ૄ Sign Vocalic Rr
`U+0AC5`	Mark [Mn]	VOWEL_DEPENDENT	TOP_POSITION	ૅ Sign Candra E
`U+0AC6`	unassigned
`U+0AC7`	Mark [Mn]	VOWEL_DEPENDENT	TOP_POSITION	ે Sign E
`U+0AC8`	Mark [Mn]	VOWEL_DEPENDENT	TOP_POSITION	ૈ Sign Ai
`U+0AC9`	Mark [Mc]	VOWEL_DEPENDENT	TOP_AND_RIGHT_POSITION	ૉ Sign Candra O
`U+0ACA`	unassigned
`U+0ACB`	Mark [Mc]	VOWEL_DEPENDENT	RIGHT_POSITION	ો Sign O
`U+0ACC`	Mark [Mc]	VOWEL_DEPENDENT	RIGHT_POSITION	ૌ Sign Au
`U+0ACD`	Mark [Mn]	VIRAMA	BOTTOM_POSITION	્ Virama
`U+0ACE`	unassigned
`U+0ACF`	unassigned

`U+0AD0`	Letter	null	null	ૐ Om
`U+0AD1`	unassigned
`U+0AD2`	unassigned
`U+0AD3`	unassigned
`U+0AD4`	unassigned
`U+0AD5`	unassigned
`U+0AD6`	unassigned
`U+0AD7`	unassigned
`U+0AD8`	unassigned
`U+0AD9`	unassigned
`U+0ADA`	unassigned
`U+0ADB`	unassigned
`U+0ADC`	unassigned
`U+0ADD`	unassigned
`U+0ADE`	unassigned
`U+0ADF`	unassigned

`U+0AE0`	Letter	VOWEL_INDEPENDENT	null	ૠ Vocalic Rr
`U+0AE1`	Letter	VOWEL_INDEPENDENT	null	ૡ Vocalic Ll
`U+0AE2`	Mark [Mn]	VOWEL_DEPENDENT	BOTTOM_POSITION	ૢ Sign Vocalic L
`U+0AE3`	Mark [Mn]	VOWEL_DEPENDENT	BOTTOM_POSITION	ૣ Sign Vocalic Ll
`U+0AE4`	unassigned
`U+0AE5`	unassigned
`U+0AE6`	Number	NUMBER	null	૦ Digit Zero
`U+0AE7`	Number	NUMBER	null	૧ Digit One
`U+0AE8`	Number	NUMBER	null	૨ Digit Two
`U+0AE9`	Number	NUMBER	null	૩ Digit Three
`U+0AEA`	Number	NUMBER	null	૪ Digit Four
`U+0AEB`	Number	NUMBER	null	૫ Digit Five
`U+0AEC`	Number	NUMBER	null	૬ Digit Six
`U+0AED`	Number	NUMBER	null	૭ Digit Seven
`U+0AEE`	Number	NUMBER	null	૮ Digit Eight
`U+0AEF`	Number	NUMBER	null	૯ Digit Nine

`U+0AF0`	Symbol	SYMBOL	null	૰ Abbreviation
`U+0AF1`	Symbol	SYMBOL	null	૱ Rupee Sign
`U+0AF2`	unassigned
`U+0AF3`	unassigned
`U+0AF4`	unassigned
`U+0AF5`	unassigned
`U+0AF6`	unassigned
`U+0AF7`	unassigned
`U+0AF8`	unassigned
`U+0AF9`	Letter	CONSONANT	null	ૹ Zha
`U+0AFA`	Mark [Mn]	CANTILLATION	TOP_POSITION	ૺ Sukun
`U+0AFB`	Mark [Mn]	NUKTA	TOP_POSITION	ૻ Shadda
`U+0AFC`	Mark [Mn]	CANTILLATION	TOP_POSITION	ૼ Maddah
`U+0AFD`	Mark [Mn]	NUKTA	TOP_POSITION	૽ Three-Dot Nukta Above
`U+0AFE`	Mark [Mn]	NUKTA	TOP_POSITION	૾ Circle Nukta Above
`U+0AFF`	Mark [Mn]	NUKTA	TOP_POSITION	૿ Two-Circle Nukta Above

Vedic Extensions character table¶

Sanskrit runs written in the Gujarati script may also include characters from the Vedic Extensions block. These characters should be classified as follows.

Note: See the Vedic Extensions document for additional information.

Table 19 Vedic Extensions character table¶
Codepoint	Unicode category	Shaping class	Mark-placement subclass	Glyph
`U+1CD0`	Mark [Mn]	CANTILLATION	TOP_POSITION	᳐ Tone Karshana
`U+1CD1`	Mark [Mn]	CANTILLATION	TOP_POSITION	᳑ Tone Shara
`U+1CD2`	Mark [Mn]	CANTILLATION	TOP_POSITION	᳒ Tone Prenkha
`U+1CD3`	Punctuation	null	null	᳓ Sign Nihshvasa
`U+1CD4`	Mark [Mn]	CANTILLATION	OVERSTRUCK	᳔ Tone Midline Svarita
`U+1CD5`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳕ Tone Aggravated Independent Svarita
`U+1CD6`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳖ Tone Independent Svarita
`U+1CD7`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳗ Tone Kathaka Independent Svarita
`U+1CD8`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳘ Tone Candra Below
`U+1CD9`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳙ Tone Kathaka Independent Svarita Schroeder
`U+1CDA`	Mark [Mn]	CANTILLATION	TOP_POSITION	᳚ Tone Double Svarita
`U+1CDB`	Mark [Mn]	CANTILLATION	TOP_POSITION	᳛ Tone Triple Svarita
`U+1CDC`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳜ Tone Kathaka Anudatta
`U+1CDD`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳝ Tone Dot Below
`U+1CDE`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳞ Tone Two Dots Below
`U+1CDF`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳟ Tone Three Dots Below

`U+1CE0`	Mark [Mn]	CANTILLATION	TOP_POSITION	᳠ Tone Rigvedic Kashmiri Independent Svarita
`U+1CE1`	Mark [Mc]	CANTILLATION	RIGHT_POSITION	᳡ Tone Atharavedic Independent Svarita
`U+1CE2`	Mark [Mn]	AVAGRAHA	OVERSTRUCK	᳢ Sign Visarga Svarita
`U+1CE3`	Mark [Mn]	null	OVERSTRUCK	᳣ Sign Visarga Udatta
`U+1CE4`	Mark [Mn]	null	OVERSTRUCK	᳤ Sign Reversed Visarga Udatta
`U+1CE5`	Mark [Mn]	null	OVERSTRUCK	᳥ Sign Visarga Anudatta
`U+1CE6`	Mark [Mn]	null	OVERSTRUCK	᳦ Sign Reversed Visarga Anudatta
`U+1CE7`	Mark [Mn]	null	OVERSTRUCK	᳧ Sign Visarga Udatta With Tail
`U+1CE8`	Mark [Mn]	AVAGRAHA	OVERSTRUCK	᳨ Sign Visarga Anudatta With Tail
`U+1CE9`	Letter	SYMBOL	null	ᳩ Sign Anusvara Antargomukha
`U+1CEA`	Letter	null	null	ᳪ Sign Anusvara Bahirgomukha
`U+1CEB`	Letter	null	null	ᳫ Sign Anusvara Vamagomukha
`U+1CEC`	Letter	SYMBOL	null	ᳬ Sign Anusvara Vamagomukha With Tail
`U+1CED`	Mark [Mn]	AVAGRAHA	BOTTOM_POSITION	᳭ Sign Tiryak
`U+1CEE`	Letter	SYMBOL	null	ᳮ Sign Hexiform Long Anusvara
`U+1CEF`	Letter	null	null	ᳯ Sign Long Anusvara

`U+1CF0`	Letter	null	null	ᳰ Sign Rthang Long Anusvara
`U+1CF2`	Letter	CONSONANT_DEAD	null	ᳲ Sign Ardhavisarga
`U+1CF3`	Letter	CONSONANT_DEAD	null	ᳳ Sign Rotated Ardhavisarga
`U+1CF3`	Mark [Mc]	VISARGA	null	ᳳ Sign Rotated Ardhavisarga
`U+1CF4`	Mark [Mn]	CANTILLATION	TOP_POSITION	᳴ Tone Candra Above
`U+1CF5`	Letter	CONSONANT_WITH_STACKER	null	ᳵ Sign Jihvamuliya
`U+1CF6`	Letter	CONSONANT_WITH_STACKER	null	ᳶ Sign Upadhmaniya
`U+1CF7`	Mark [Mc]	null	null	᳷ Sign Atikrama
`U+1CF8`	Mark [Mn]	CANTILLATION	null	᳸ Tone Ring Above
`U+1CF9`	Mark [Mn]	CANTILLATION	null	᳹ Tone Double Ring Above
`U+1CFA`	Letter	PLACEHOLDER	null	ᳺ Sign Double Anusvara Antargomukha
`U+1CFB`	unassigned
`U+1CFC`	unassigned
`U+1CFD`	unassigned
`U+1CFE`	unassigned
`U+1CFF`	unassigned

Miscellaneous character table¶

In addition to general punctuation, runs of Gujarati text often use the danda (U+0964) and double danda (U+0965) punctuation marks from the Devanagari block. Gujarati text can also incorporate the udatta (U+0951) and anudatta (U+0952) signs from the Devanagari block.

Table 20 Additional punctuation character table¶
Codepoint	Unicode category	Shaping class	Mark-placement subclass	Glyph
`U+0951`	Mark [Mn]	CANTILLATION	TOP_POSITION	॑ Udatta
`U+0952`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	॒ Anudatta
`U+0964`	Punctuation	null	null	। Danda
`U+0965`	Punctuation	null	null	॥ Double Danda

Other important characters that may be encountered when shaping runs of Gujarati text include the dotted-circle placeholder (U+25CC), the zero-width joiner (U+200D) and zero-width non-joiner (U+200C), and the no-break space (U+00A0).

The dotted-circle placeholder is frequently used when displaying a dependent vowel (matra) or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.

Table 21 Miscellaneous character table¶
Codepoint	Unicode category	Shaping class	Mark-placement subclass	Glyph
`U+00A0`	Separator	PLACEHOLDER	null	No-break space
`U+200C`	Other	NON_JOINER	null	‌ Zero-width non-joiner
`U+200D`	Other	JOINER	null	‍ Zero-width joiner
`U+2010`	Punctuation	PLACEHOLDER	null	‐ Hyphen
`U+2011`	Punctuation	PLACEHOLDER	null	‑ No-break hyphen
`U+2012`	Punctuation	PLACEHOLDER	null	‒ Figure dash
`U+2013`	Punctuation	PLACEHOLDER	null	– En dash
`U+2014`	Punctuation	PLACEHOLDER	null	— Em dash
`U+25CC`	Symbol	DOTTED_CIRCLE	null	◌ Dotted circle

The zero-width joiner (ZWJ) is primarily used to prevent the formation of a conjunct from a “Consonant,Halant,Consonant” sequence. The sequence “Consonant,Halant,ZWJ,Consonant” blocks the formation of a conjunct between the two consonants.

Note, however, that the “Consonant,Halant” subsequence in the above example may still trigger a half-forms feature. To prevent the application of the half-forms feature in addition to preventing the conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The sequence “Consonant,Halant,ZWNJ,Consonant” should produce the first consonant in its standard form, followed by an explicit “Halant”.

A secondary usage of the zero-width joiner is to prevent the formation of “Reph”. An initial “Ra,Halant,ZWJ” sequence should not produce a “Reph”, where an initial “Ra,Halant” sequence without the zero-width joiner otherwise would.

The no-break space (NBSP) is primarily used to display those codepoints that are defined as non-spacing (marks, dependent vowels (matras), below-base consonant forms, and post-base consonant forms) in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder. These sequences will match “NBSP,ZWJ,Halant,Consonant”, “NBSP,mark”, or “NBSP,matra”.

Gujarati character tables¶