Gurmukhi character tables¶

This document lists the per-character shaping information needed to shape Gurmukhi text.

Contents

Gurmukhi character table
Vedic Extensions character table
Miscellaneous character table

Gurmukhi character table¶

Gurmukhi glyphs should be classified as in the following table. Codepoints in the Gurmukhi block with no assigned meaning are designated as unassigned in the Unicode category column.

Assigned codepoints with a null in the Shaping class column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols.

Note: the NUMBER and SYMBOL Shaping classes are important during syllable identification, but generally evoke no further special behavior during the rest of the shaping process.

The Mark-placement subclass column indicates mark-placement positioning for codepoints in the Mark category. Assigned, non-mark codepoints have a null in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the Unicode category column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining.

Some codepoints in the following table use a Shaping class that differs from the codepoint’s Unicode General Category. The Shaping class takes precedence during OpenType shaping, as it captures more specific, script-aware behavior.

Table 22 Gurmukhi character table¶
Codepoint	Unicode category	Shaping class	Mark-placement subclass	Glyph
`U+0A00`	unassigned
`U+0A01`	Mark [Mn]	BINDU	TOP_POSITION	ਁ Adak Bindi
`U+0A02`	Mark [Mn]	BINDU	TOP_POSITION	ਂ Bindi
`U+0A03`	Mark [Mc]	VISARGA	RIGHT_POSITION	ਃ Visarga
`U+0A04`	unassigned
`U+0A05`	Letter	VOWEL_INDEPENDENT	null	ਅ A
`U+0A06`	Letter	VOWEL_INDEPENDENT	null	ਆ Aa
`U+0A07`	Letter	VOWEL_INDEPENDENT	null	ਇ I
`U+0A08`	Letter	VOWEL_INDEPENDENT	null	ਈ Ii
`U+0A09`	Letter	VOWEL_INDEPENDENT	null	ਉ U
`U+0A0A`	Letter	VOWEL_INDEPENDENT	null	ਊ Uu
`U+0A0B`	unassigned
`U+0A0C`	unassigned
`U+0A0D`	unassigned
`U+0A0E`	unassigned
`U+0A0F`	Letter	VOWEL_INDEPENDENT	null	ਏ Ee

`U+0A10`	Letter	VOWEL_INDEPENDENT	null	ਐ Ai
`U+0A11`	unassigned
`U+0A12`	unassigned
`U+0A13`	Letter	VOWEL_INDEPENDENT	null	ਓ Oo
`U+0A14`	Letter	VOWEL_INDEPENDENT	null	ਔ Au
`U+0A15`	Letter	CONSONANT	null	ਕ Ka
`U+0A16`	Letter	CONSONANT	null	ਖ Kha
`U+0A17`	Letter	CONSONANT	null	ਗ Ga
`U+0A18`	Letter	CONSONANT	null	ਘ Gha
`U+0A19`	Letter	CONSONANT	null	ਙ Nga
`U+0A1A`	Letter	CONSONANT	null	ਚ Ca
`U+0A1B`	Letter	CONSONANT	null	ਛ Cha
`U+0A1C`	Letter	CONSONANT	null	ਜ Ja
`U+0A1D`	Letter	CONSONANT	null	ਝ Jha
`U+0A1E`	Letter	CONSONANT	null	ਞ Nya
`U+0A1F`	Letter	CONSONANT	null	ਟ Tta

`U+0A20`	Letter	CONSONANT	null	ਠ Ttha
`U+0A21`	Letter	CONSONANT	null	ਡ Dda
`U+0A22`	Letter	CONSONANT	null	ਢ Ddha
`U+0A23`	Letter	CONSONANT	null	ਣ Nna
`U+0A24`	Letter	CONSONANT	null	ਤ Ta
`U+0A25`	Letter	CONSONANT	null	ਥ Tha
`U+0A26`	Letter	CONSONANT	null	ਦ Da
`U+0A27`	Letter	CONSONANT	null	ਧ Dha
`U+0A28`	Letter	CONSONANT	null	ਨ Na
`U+0A29`	unassigned
`U+0A2A`	Letter	CONSONANT	null	ਪ Pa
`U+0A2B`	Letter	CONSONANT	null	ਫ Pha
`U+0A2C`	Letter	CONSONANT	null	ਬ Ba
`U+0A2D`	Letter	CONSONANT	null	ਭ Bha
`U+0A2E`	Letter	CONSONANT	null	ਮ Ma
`U+0A2F`	Letter	CONSONANT	null	ਯ Ya

`U+0A30`	Letter	CONSONANT	null	ਰ Ra
`U+0A31`	unassigned
`U+0A32`	Letter	CONSONANT	null	ਲ La
`U+0A33`	Letter	CONSONANT	null	ਲ਼ Lla
`U+0A34`	unassigned
`U+0A35`	Letter	CONSONANT	null	ਵ Va
`U+0A36`	Letter	CONSONANT	null	ਸ਼ Sha
`U+0A37`	unassigned
`U+0A38`	Letter	CONSONANT	null	ਸ Sa
`U+0A39`	Letter	CONSONANT	null	ਹ Ha
`U+0A3A`	unassigned
`U+0A3B`	unassigned
`U+0A3C`	Mark [Mn]	NUKTA	BOTTOM_POSITION	਼ Nukta
`U+0A3D`	unassigned
`U+0A3E`	Mark [Mc]	VOWEL_DEPENDENT	RIGHT_POSITION	ਾ Sign Aa
`U+0A3F`	Mark [Mc]	VOWEL_DEPENDENT	LEFT_POSITION	ਿ Sign I

`U+0A40`	Mark [Mc]	VOWEL_DEPENDENT	RIGHT_POSITION	ੀ Sign Ii
`U+0A41`	Mark [Mn]	VOWEL_DEPENDENT	BOTTOM_POSITION	ੁ Sign U
`U+0A42`	Mark [Mn]	VOWEL_DEPENDENT	BOTTOM_POSITION	ੂ Sign Uu
`U+0A43`	unassigned
`U+0A44`	unassigned
`U+0A45`	unassigned
`U+0A46`	unassigned
`U+0A47`	Mark [Mn]	VOWEL_DEPENDENT	TOP_POSITION	ੇ Sign Ee
`U+0A48`	Mark [Mn]	VOWEL_DEPENDENT	TOP_POSITION	ੈ Sign Ai
`U+0A49`	unassigned
`U+0A4A`	unassigned
`U+0A4B`	Mark [Mn]	VOWEL_DEPENDENT	TOP_POSITION	ੋ Sign Oo
`U+0A4C`	Mark [Mn]	VOWEL_DEPENDENT	TOP_POSITION	ੌ Sign Au
`U+0A4D`	Mark [Mn]	VIRAMA	BOTTOM_POSITION	੍ Virama
`U+0A4E`	unassigned
`U+0A4F`	unassigned

`U+0A50`	unassigned
`U+0A51`	Mark [Mn]	CANTILLATION	null	ੑ Udaat
`U+0A52`	unassigned
`U+0A53`	unassigned
`U+0A54`	unassigned
`U+0A55`	unassigned
`U+0A56`	unassigned
`U+0A57`	unassigned
`U+0A58`	unassigned
`U+0A59`	Letter	CONSONANT	null	ਖ਼ Khha
`U+0A5A`	Letter	CONSONANT	null	ਗ਼ Ghha
`U+0A5B`	Letter	CONSONANT	null	ਜ਼ Za
`U+0A5C`	Letter	CONSONANT	null	ੜ Rra
`U+0A5D`	unassigned
`U+0A5E`	Letter	CONSONANT	null	ਫ਼ Fa
`U+0A5F`	unassigned

`U+0A60`	unassigned
`U+0A61`	unassigned
`U+0A62`	unassigned
`U+0A63`	unassigned
`U+0A64`	unassigned
`U+0A65`	unassigned
`U+0A66`	Number	NUMBER	null	੦ Digit Zero
`U+0A67`	Number	NUMBER	null	੧ Digit One
`U+0A68`	Number	NUMBER	null	੨ Digit Two
`U+0A69`	Number	NUMBER	null	੩ Digit Three
`U+0A6A`	Number	NUMBER	null	੪ Digit Four
`U+0A6B`	Number	NUMBER	null	੫ Digit Five
`U+0A6C`	Number	NUMBER	null	੬ Digit Six
`U+0A6D`	Number	NUMBER	null	੭ Digit Seven
`U+0A6E`	Number	NUMBER	null	੮ Digit Eight
`U+0A6F`	Number	NUMBER	null	੯ Digit Nine

`U+0A70`	Mark [Mn]	BINDU	TOP_POSITION	ੰ Tippi
`U+0A71`	Mark [Mn]	GEMINATION_MARK	TOP_POSITION	ੱ Addak
`U+0A72`	Letter	CONSONANT	null	ੲ Iri
`U+0A73`	Letter	CONSONANT	null	ੳ Ura
`U+0A74`	Letter	null	null	ੴ Ek Onkar
`U+0A75`	Mark [Mn]	CONSONANT_MEDIAL	BOTTOM_POSITION	ੵ Yakash
`U+0A76`	Punctuation	null	null	੶ Abbreviation Sign
`U+0A77`	unassigned
`U+0A78`	unassigned
`U+0A79`	unassigned
`U+0A7A`	unassigned
`U+0A7B`	unassigned
`U+0A7C`	unassigned
`U+0A7D`	unassigned
`U+0A7E`	unassigned
`U+0A7F`	unassigned

Vedic Extensions character table¶

Sanskrit runs written in the Gurmukhi script may also include characters from the Vedic Extensions block. These characters should be classified as follows.

Note: See the Vedic Extensions document for additional information.

Table 23 Vedic Extensions character table¶
Codepoint	Unicode category	Shaping class	Mark-placement subclass	Glyph
`U+1CD0`	Mark [Mn]	CANTILLATION	TOP_POSITION	᳐ Tone Karshana
`U+1CD1`	Mark [Mn]	CANTILLATION	TOP_POSITION	᳑ Tone Shara
`U+1CD2`	Mark [Mn]	CANTILLATION	TOP_POSITION	᳒ Tone Prenkha
`U+1CD3`	Punctuation	null	null	᳓ Sign Nihshvasa
`U+1CD4`	Mark [Mn]	CANTILLATION	OVERSTRUCK	᳔ Tone Midline Svarita
`U+1CD5`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳕ Tone Aggravated Independent Svarita
`U+1CD6`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳖ Tone Independent Svarita
`U+1CD7`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳗ Tone Kathaka Independent Svarita
`U+1CD8`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳘ Tone Candra Below
`U+1CD9`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳙ Tone Kathaka Independent Svarita Schroeder
`U+1CDA`	Mark [Mn]	CANTILLATION	TOP_POSITION	᳚ Tone Double Svarita
`U+1CDB`	Mark [Mn]	CANTILLATION	TOP_POSITION	᳛ Tone Triple Svarita
`U+1CDC`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳜ Tone Kathaka Anudatta
`U+1CDD`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳝ Tone Dot Below
`U+1CDE`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳞ Tone Two Dots Below
`U+1CDF`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	᳟ Tone Three Dots Below

`U+1CE0`	Mark [Mn]	CANTILLATION	TOP_POSITION	᳠ Tone Rigvedic Kashmiri Independent Svarita
`U+1CE1`	Mark [Mc]	CANTILLATION	RIGHT_POSITION	᳡ Tone Atharavedic Independent Svarita
`U+1CE2`	Mark [Mn]	AVAGRAHA	OVERSTRUCK	᳢ Sign Visarga Svarita
`U+1CE3`	Mark [Mn]	null	OVERSTRUCK	᳣ Sign Visarga Udatta
`U+1CE4`	Mark [Mn]	null	OVERSTRUCK	᳤ Sign Reversed Visarga Udatta
`U+1CE5`	Mark [Mn]	null	OVERSTRUCK	᳥ Sign Visarga Anudatta
`U+1CE6`	Mark [Mn]	null	OVERSTRUCK	᳦ Sign Reversed Visarga Anudatta
`U+1CE7`	Mark [Mn]	null	OVERSTRUCK	᳧ Sign Visarga Udatta With Tail
`U+1CE8`	Mark [Mn]	AVAGRAHA	OVERSTRUCK	᳨ Sign Visarga Anudatta With Tail
`U+1CE9`	Letter	SYMBOL	null	ᳩ Sign Anusvara Antargomukha
`U+1CEA`	Letter	null	null	ᳪ Sign Anusvara Bahirgomukha
`U+1CEB`	Letter	null	null	ᳫ Sign Anusvara Vamagomukha
`U+1CEC`	Letter	SYMBOL	null	ᳬ Sign Anusvara Vamagomukha With Tail
`U+1CED`	Mark [Mn]	AVAGRAHA	BOTTOM_POSITION	᳭ Sign Tiryak
`U+1CEE`	Letter	SYMBOL	null	ᳮ Sign Hexiform Long Anusvara
`U+1CEF`	Letter	null	null	ᳯ Sign Long Anusvara

`U+1CF0`	Letter	null	null	ᳰ Sign Rthang Long Anusvara
`U+1CF2`	Letter	CONSONANT_DEAD	null	ᳲ Sign Ardhavisarga
`U+1CF3`	Letter	CONSONANT_DEAD	null	ᳳ Sign Rotated Ardhavisarga
`U+1CF3`	Mark [Mc]	VISARGA	null	ᳳ Sign Rotated Ardhavisarga
`U+1CF4`	Mark [Mn]	CANTILLATION	TOP_POSITION	᳴ Tone Candra Above
`U+1CF5`	Letter	CONSONANT_WITH_STACKER	null	ᳵ Sign Jihvamuliya
`U+1CF6`	Letter	CONSONANT_WITH_STACKER	null	ᳶ Sign Upadhmaniya
`U+1CF7`	Mark [Mc]	null	null	᳷ Sign Atikrama
`U+1CF8`	Mark [Mn]	CANTILLATION	null	᳸ Tone Ring Above
`U+1CF9`	Mark [Mn]	CANTILLATION	null	᳹ Tone Double Ring Above
`U+1CFA`	Letter	PLACEHOLDER	null	ᳺ Sign Double Anusvara Antargomukha
`U+1CFB`	unassigned
`U+1CFC`	unassigned
`U+1CFD`	unassigned
`U+1CFE`	unassigned
`U+1CFF`	unassigned

Miscellaneous character table¶

In addition to general punctuation, runs of Gurmukhi text often use the danda (U+0964) and double danda (U+0965) punctuation marks from the Devanagari block. Gurmukhi text can also incorporate the udatta (U+0951) and anudatta (U+0952) signs from the Devanagari block.

Table 24 Additional punctuation character table¶
Codepoint	Unicode category	Shaping class	Mark-placement subclass	Glyph
`U+0951`	Mark [Mn]	CANTILLATION	TOP_POSITION	॑ Udatta
`U+0952`	Mark [Mn]	CANTILLATION	BOTTOM_POSITION	॒ Anudatta
`U+0964`	Punctuation	null	null	। Danda
`U+0965`	Punctuation	null	null	॥ Double Danda

Other important characters that may be encountered when shaping runs of Gurmukhi text include the dotted-circle placeholder (U+25CC), the zero-width joiner (U+200D) and zero-width non-joiner (U+200C), and the no-break space (U+00A0).

The dotted-circle placeholder is frequently used when displaying a dependent vowel (matra) or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.

Table 25 Miscellaneous character table¶
Codepoint	Unicode category	Shaping class	Mark-placement subclass	Glyph
`U+00A0`	Separator	PLACEHOLDER	null	No-break space
`U+200C`	Other	NON_JOINER	null	‌ Zero-width non-joiner
`U+200D`	Other	JOINER	null	‍ Zero-width joiner
`U+2010`	Punctuation	PLACEHOLDER	null	‐ Hyphen
`U+2011`	Punctuation	PLACEHOLDER	null	‑ No-break hyphen
`U+2012`	Punctuation	PLACEHOLDER	null	‒ Figure dash
`U+2013`	Punctuation	PLACEHOLDER	null	– En dash
`U+2014`	Punctuation	PLACEHOLDER	null	— Em dash
`U+25CC`	Symbol	DOTTED_CIRCLE	null	◌ Dotted circle

The zero-width joiner (ZWJ) is primarily used to prevent the formation of a conjunct from a “Consonant,Halant,Consonant” sequence. The sequence “Consonant,Halant,ZWJ,Consonant” blocks the formation of a conjunct between the two consonants.

Note, however, that the “Consonant,Halant” subsequence in the above example may still trigger a half-forms feature. To prevent the application of the half-forms feature in addition to preventing the conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The sequence “Consonant,Halant,ZWNJ,Consonant” should produce the first consonant in its standard form, followed by an explicit “Halant”.

A secondary usage of the zero-width joiner is to prevent the formation of “Reph”. An initial “Ra,Halant,ZWJ” sequence should not produce a “Reph”, where an initial “Ra,Halant” sequence without the zero-width joiner otherwise would.

The no-break space (NBSP) is primarily used to display those codepoints that are defined as non-spacing (marks, dependent vowels (matras), below-base consonant forms, and post-base consonant forms) in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder. These sequences will match “NBSP,ZWJ,Halant,Consonant”, “NBSP,mark”, or “NBSP,matra”.

Gurmukhi character tables¶