# Gujarati character tables # This document lists the per-character shaping information needed to [shape Gujarati text](../opentype-shaping-gujarati.md). **Contents** - [Gujarati character table](#gujarati-character-table) - [Vedic Extensions character table](#vedic-extensions-character-table) - [Miscellaneous character table](#miscellaneous-character-table) ## Gujarati character table ## Gujarati glyphs should be classified as in the following table. Codepoints in the Gujarati block with no assigned meaning are designated as _unassigned_ in the _Unicode category_ column. Assigned codepoints with a _null_ in the _Shaping class_ column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols. > Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important > during syllable identification, but generally evoke no further > special behavior during the rest of the shaping process. The _Mark-placement subclass_ column indicates mark-placement positioning for codepoints in the _Mark_ category. Assigned, non-mark codepoints have a _null_ in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the _Unicode category_ column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining. Some codepoints in the following table use a _Shaping class_ that differs from the codepoint's Unicode _General Category_. The _Shaping class_ takes precedence during OpenType shaping, as it captures more specific, script-aware behavior. :::{table} Gujarati character table | Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph | |:----------|:-----------------|:------------------|:---------------------------|:-----------------------------| |`U+0A80` | _unassigned_ | | | | |`U+0A81` | Mark [Mn] | BINDU | TOP_POSITION | ઁ Candrabindu | |`U+0A82` | Mark [Mn] | BINDU | TOP_POSITION | ં Anusvara | |`U+0A83` | Mark [Mc] | VISARGA | RIGHT_POSITION | ઃ Visarga | |`U+0A84` | _unassigned_ | | | | |`U+0A85` | Letter | VOWEL_INDEPENDENT | _null_ | અ A | |`U+0A86` | Letter | VOWEL_INDEPENDENT | _null_ | આ Aa | |`U+0A87` | Letter | VOWEL_INDEPENDENT | _null_ | ઇ I | |`U+0A88` | Letter | VOWEL_INDEPENDENT | _null_ | ઈ Ii | |`U+0A89` | Letter | VOWEL_INDEPENDENT | _null_ | ઉ U | |`U+0A8A` | Letter | VOWEL_INDEPENDENT | _null_ | ઊ Uu | |`U+0A8B` | Letter | VOWEL_INDEPENDENT | _null_ | ઋ Vocalic R | |`U+0A8C` | Letter | VOWEL_INDEPENDENT | _null_ | ઌ Vocalic L | |`U+0A8D` | Letter | VOWEL_INDEPENDENT | _null_ | ઍ Candra E | |`U+0A8E` | _unassigned_ | | | | |`U+0A8F` | Letter | VOWEL_INDEPENDENT | _null_ | એ E | | | | | | |`U+0A90` | Letter | VOWEL_INDEPENDENT | _null_ | ઐ Ai | |`U+0A91` | Letter | VOWEL_INDEPENDENT | _null_ | ઑ Candra O | |`U+0A92` | _unassigned_ | | | | |`U+0A93` | Letter | VOWEL_INDEPENDENT | _null_ | ઓ O | |`U+0A94` | Letter | VOWEL_INDEPENDENT | _null_ | ઔ Au | |`U+0A95` | Letter | CONSONANT | _null_ | ક Ka | |`U+0A96` | Letter | CONSONANT | _null_ | ખ Kha | |`U+0A97` | Letter | CONSONANT | _null_ | ગ Ga | |`U+0A98` | Letter | CONSONANT | _null_ | ઘ Gha | |`U+0A99` | Letter | CONSONANT | _null_ | ઙ Nga | |`U+0A9A` | Letter | CONSONANT | _null_ | ચ Ca | |`U+0A9B` | Letter | CONSONANT | _null_ | છ Cha | |`U+0A9C` | Letter | CONSONANT | _null_ | જ Ja | |`U+0A9D` | Letter | CONSONANT | _null_ | ઝ Jha | |`U+0A9E` | Letter | CONSONANT | _null_ | ઞ Nya | |`U+0A9F` | Letter | CONSONANT | _null_ | ટ Tta | | | | | | |`U+0AA0` | Letter | CONSONANT | _null_ | ઠ Ttha | |`U+0AA1` | Letter | CONSONANT | _null_ | ડ Dda | |`U+0AA2` | Letter | CONSONANT | _null_ | ઢ Ddha | |`U+0AA3` | Letter | CONSONANT | _null_ | ણ Nna | |`U+0AA4` | Letter | CONSONANT | _null_ | ત Ta | |`U+0AA5` | Letter | CONSONANT | _null_ | થ Tha | |`U+0AA6` | Letter | CONSONANT | _null_ | દ Da | |`U+0AA7` | Letter | CONSONANT | _null_ | ધ Dha | |`U+0AA8` | Letter | CONSONANT | _null_ | ન Na | |`U+0AA9` | _unassigned_ | | | | |`U+0AAA` | Letter | CONSONANT | _null_ | પ Pa | |`U+0AAB` | Letter | CONSONANT | _null_ | ફ Pha | |`U+0AAC` | Letter | CONSONANT | _null_ | બ Ba | |`U+0AAD` | Letter | CONSONANT | _null_ | ભ Bha | |`U+0AAE` | Letter | CONSONANT | _null_ | મ Ma | |`U+0AAF` | Letter | CONSONANT | _null_ | ય Ya | | | | | | |`U+0AB0` | Letter | CONSONANT | _null_ | ર Ra | |`U+0AB1` | _unassigned_ | | | | |`U+0AB2` | Letter | CONSONANT | _null_ | લ La | |`U+0AB3` | Letter | CONSONANT | _null_ | ળ Lla | |`U+0AB4` | _unassigned_ | | | | |`U+0AB5` | Letter | CONSONANT | _null_ | વ Va | |`U+0AB6` | Letter | CONSONANT | _null_ | શ Sha | |`U+0AB7` | Letter | CONSONANT | _null_ | ષ Ssa | |`U+0AB8` | Letter | CONSONANT | _null_ | સ Sa | |`U+0AB9` | Letter | CONSONANT | _null_ | હ Ha | |`U+0ABA` | _unassigned_ | | | | |`U+0ABB` | _unassigned_ | | | | |`U+0ABC` | Mark [Mn] | NUKTA | BOTTOM_POSITION | ઼ Nukta | |`U+0ABD` | Letter | AVAGRAHA | _null_ | ઽ Avagraha | |`U+0ABE` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ા Sign Aa | |`U+0ABF` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | િ Sign I | | | | | | |`U+0AC0` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ી Sign Ii | |`U+0AC1` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ુ Sign U | |`U+0AC2` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ૂ Sign Uu | |`U+0AC3` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ૃ Sign Vocalic R | |`U+0AC4` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ૄ Sign Vocalic Rr | |`U+0AC5` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ૅ Sign Candra E | |`U+0AC6` | _unassigned_ | | | | |`U+0AC7` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ે Sign E | |`U+0AC8` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ૈ Sign Ai | |`U+0AC9` | Mark [Mc] | VOWEL_DEPENDENT | TOP_AND_RIGHT_POSITION | ૉ Sign Candra O | |`U+0ACA` | _unassigned_ | | | | |`U+0ACB` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ો Sign O | |`U+0ACC` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ૌ Sign Au | |`U+0ACD` | Mark [Mn] | VIRAMA | BOTTOM_POSITION | ્ Virama | |`U+0ACE` | _unassigned_ | | | | |`U+0ACF` | _unassigned_ | | | | | | | | | |`U+0AD0` | Letter | _null_ | _null_ | ૐ Om | |`U+0AD1` | _unassigned_ | | | | |`U+0AD2` | _unassigned_ | | | | |`U+0AD3` | _unassigned_ | | | | |`U+0AD4` | _unassigned_ | | | | |`U+0AD5` | _unassigned_ | | | | |`U+0AD6` | _unassigned_ | | | | |`U+0AD7` | _unassigned_ | | | | |`U+0AD8` | _unassigned_ | | | | |`U+0AD9` | _unassigned_ | | | | |`U+0ADA` | _unassigned_ | | | | |`U+0ADB` | _unassigned_ | | | | |`U+0ADC` | _unassigned_ | | | | |`U+0ADD` | _unassigned_ | | | | |`U+0ADE` | _unassigned_ | | | | |`U+0ADF` | _unassigned_ | | | | | | | | | |`U+0AE0` | Letter | VOWEL_INDEPENDENT | _null_ | ૠ Vocalic Rr | |`U+0AE1` | Letter | VOWEL_INDEPENDENT | _null_ | ૡ Vocalic Ll | |`U+0AE2` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ૢ Sign Vocalic L | |`U+0AE3` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ૣ Sign Vocalic Ll | |`U+0AE4` | _unassigned_ | | | | |`U+0AE5` | _unassigned_ | | | | |`U+0AE6` | Number | NUMBER | _null_ | ૦ Digit Zero | |`U+0AE7` | Number | NUMBER | _null_ | ૧ Digit One | |`U+0AE8` | Number | NUMBER | _null_ | ૨ Digit Two | |`U+0AE9` | Number | NUMBER | _null_ | ૩ Digit Three | |`U+0AEA` | Number | NUMBER | _null_ | ૪ Digit Four | |`U+0AEB` | Number | NUMBER | _null_ | ૫ Digit Five | |`U+0AEC` | Number | NUMBER | _null_ | ૬ Digit Six | |`U+0AED` | Number | NUMBER | _null_ | ૭ Digit Seven | |`U+0AEE` | Number | NUMBER | _null_ | ૮ Digit Eight | |`U+0AEF` | Number | NUMBER | _null_ | ૯ Digit Nine | | | | | | |`U+0AF0` | Symbol | SYMBOL | _null_ | ૰ Abbreviation | |`U+0AF1` | Symbol | SYMBOL | _null_ | ૱ Rupee Sign | |`U+0AF2` | _unassigned_ | | | | |`U+0AF3` | _unassigned_ | | | | |`U+0AF4` | _unassigned_ | | | | |`U+0AF5` | _unassigned_ | | | | |`U+0AF6` | _unassigned_ | | | | |`U+0AF7` | _unassigned_ | | | | |`U+0AF8` | _unassigned_ | | | | |`U+0AF9` | Letter | CONSONANT | _null_ | ૹ Zha | |`U+0AFA` | Mark [Mn] | CANTILLATION | TOP_POSITION | ૺ Sukun | |`U+0AFB` | Mark [Mn] | NUKTA | TOP_POSITION | ૻ Shadda | |`U+0AFC` | Mark [Mn] | CANTILLATION | TOP_POSITION | ૼ Maddah | |`U+0AFD` | Mark [Mn] | NUKTA | TOP_POSITION | ૽ Three-Dot Nukta Above| |`U+0AFE` | Mark [Mn] | NUKTA | TOP_POSITION | ૾ Circle Nukta Above | |`U+0AFF` | Mark [Mn] | NUKTA | TOP_POSITION | ૿ Two-Circle Nukta Above| ::: ## Vedic Extensions character table ## Sanskrit runs written in the Gujarati script may also include characters from the Vedic Extensions block. These characters should be classified as follows. > Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md) > document for additional information. :::{table} Vedic Extensions character table | Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph | |:----------|:-----------------|:------------------|:---------------------------|:-----------------------------| |`U+1CD0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳐ Tone Karshana | |`U+1CD1` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳑ Tone Shara | |`U+1CD2` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳒ Tone Prenkha | |`U+1CD3` | Punctuation | _null_ | _null_ | ᳓ Sign Nihshvasa | |`U+1CD4` | Mark [Mn] | CANTILLATION | OVERSTRUCK | ᳔ Tone Midline Svarita | |`U+1CD5` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳕ Tone Aggravated Independent Svarita | |`U+1CD6` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳖ Tone Independent Svarita | |`U+1CD7` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳗ Tone Kathaka Independent Svarita | |`U+1CD8` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳘ Tone Candra Below | |`U+1CD9` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳙ Tone Kathaka Independent Svarita Schroeder | |`U+1CDA` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳚ Tone Double Svarita | |`U+1CDB` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳛ Tone Triple Svarita | |`U+1CDC` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳜ Tone Kathaka Anudatta | |`U+1CDD` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳝ Tone Dot Below | |`U+1CDE` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳞ Tone Two Dots Below | |`U+1CDF` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳟ Tone Three Dots Below | | | | | | |`U+1CE0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳠ Tone Rigvedic Kashmiri Independent Svarita | |`U+1CE1` | Mark [Mc] | CANTILLATION | RIGHT_POSITION | ᳡ Tone Atharavedic Independent Svarita | |`U+1CE2` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳢ Sign Visarga Svarita | |`U+1CE3` | Mark [Mn] | _null_ | OVERSTRUCK | ᳣ Sign Visarga Udatta | |`U+1CE4` | Mark [Mn] | _null_ | OVERSTRUCK | ᳤ Sign Reversed Visarga Udatta | |`U+1CE5` | Mark [Mn] | _null_ | OVERSTRUCK | ᳥ Sign Visarga Anudatta | |`U+1CE6` | Mark [Mn] | _null_ | OVERSTRUCK | ᳦ Sign Reversed Visarga Anudatta | |`U+1CE7` | Mark [Mn] | _null_ | OVERSTRUCK | ᳧ Sign Visarga Udatta With Tail | |`U+1CE8` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳨ Sign Visarga Anudatta With Tail | |`U+1CE9` | Letter | SYMBOL | _null_ | ᳩ Sign Anusvara Antargomukha | |`U+1CEA` | Letter | _null_ | _null_ | ᳪ Sign Anusvara Bahirgomukha | |`U+1CEB` | Letter | _null_ | _null_ | ᳫ Sign Anusvara Vamagomukha | |`U+1CEC` | Letter | SYMBOL | _null_ | ᳬ Sign Anusvara Vamagomukha With Tail | |`U+1CED` | Mark [Mn] | AVAGRAHA | BOTTOM_POSITION | ᳭ Sign Tiryak | |`U+1CEE` | Letter | SYMBOL | _null_ | ᳮ Sign Hexiform Long Anusvara | |`U+1CEF` | Letter | _null_ | _null_ | ᳯ Sign Long Anusvara | | | | | | |`U+1CF0` | Letter | _null_ | _null_ | ᳰ Sign Rthang Long Anusvara | |`U+1CF2` | Letter | CONSONANT_DEAD | _null_ | ᳲ Sign Ardhavisarga | |`U+1CF3` | Letter | CONSONANT_DEAD | _null_ | ᳳ Sign Rotated Ardhavisarga | |`U+1CF3` | Mark [Mc] | VISARGA | _null_ | ᳳ Sign Rotated Ardhavisarga | |`U+1CF4` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳴ Tone Candra Above | |`U+1CF5` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳵ Sign Jihvamuliya | |`U+1CF6` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳶ Sign Upadhmaniya | |`U+1CF7` | Mark [Mc] | _null_ | _null_ | ᳷ Sign Atikrama | |`U+1CF8` | Mark [Mn] | CANTILLATION | _null_ | ᳸ Tone Ring Above | |`U+1CF9` | Mark [Mn] | CANTILLATION | _null_ | ᳹ Tone Double Ring Above | |`U+1CFA` | Letter | PLACEHOLDER | _null_ | ᳺ Sign Double Anusvara Antargomukha | |`U+1CFB` | _unassigned_ | | | | |`U+1CFC` | _unassigned_ | | | | |`U+1CFD` | _unassigned_ | | | | |`U+1CFE` | _unassigned_ | | | | |`U+1CFF` | _unassigned_ | | | | ::: ## Miscellaneous character table ## In addition to general punctuation, runs of Gujarati text often use the danda (`U+0964`) and double danda (`U+0965`) punctuation marks from the Devanagari block. Gujarati text can also incorporate the udatta (`U+0951`) and anudatta (`U+0952`) signs from the Devanagari block. :::{table} Additional punctuation character table | Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph | |:----------|:-----------------|:------------------|:---------------------------|:-------------------------------| |`U+0951` | Mark [Mn] | CANTILLATION | TOP_POSITION | ॑ Udatta | |`U+0952` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ॒ Anudatta | |`U+0964` | Punctuation | _null_ | _null_ | । Danda | |`U+0965` | Punctuation | _null_ | _null_ | ॥ Double Danda | ::: Other important characters that may be encountered when shaping runs of Gujarati text include the dotted-circle placeholder (`U+25CC`), the zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and the no-break space (`U+00A0`). The dotted-circle placeholder is frequently used when displaying a dependent vowel (matra) or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully. :::{table} Miscellaneous character table | Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph | |:----------|:-----------------|:------------------|:---------------------------|:-------------------------------| |`U+00A0` | Separator | PLACEHOLDER | _null_ |   No-break space | |`U+200C` | Other | NON_JOINER | _null_ | ‌ Zero-width non-joiner | |`U+200D` | Other | JOINER | _null_ | ‍ Zero-width joiner | |`U+2010` | Punctuation | PLACEHOLDER | _null_ | ‐ Hyphen | |`U+2011` | Punctuation | PLACEHOLDER | _null_ | ‑ No-break hyphen | |`U+2012` | Punctuation | PLACEHOLDER | _null_ | ‒ Figure dash | |`U+2013` | Punctuation | PLACEHOLDER | _null_ | – En dash | |`U+2014` | Punctuation | PLACEHOLDER | _null_ | — Em dash | |`U+25CC` | Symbol | DOTTED_CIRCLE | _null_ | ◌ Dotted circle | ::: The zero-width joiner (ZWJ) is primarily used to prevent the formation of a conjunct from a "_Consonant_,Halant,_Consonant_" sequence. The sequence "_Consonant_,Halant,ZWJ,_Consonant_" blocks the formation of a conjunct between the two consonants. Note, however, that the "_Consonant_,Halant" subsequence in the above example may still trigger a half-forms feature. To prevent the application of the half-forms feature in addition to preventing the conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The sequence "_Consonant_,Halant,ZWNJ,_Consonant_" should produce the first consonant in its standard form, followed by an explicit "Halant". A secondary usage of the zero-width joiner is to prevent the formation of "Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph", where an initial "Ra,Halant" sequence without the zero-width joiner otherwise would. The no-break space (NBSP) is primarily used to display those codepoints that are defined as non-spacing (marks, dependent vowels (matras), below-base consonant forms, and post-base consonant forms) in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder. These sequences will match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or "NBSP,_matra_".