# Vedic Extensions in OpenType # This document outlines the shaping information needed to display characters from the Unicode Vedic Extensions block, which may be used within text runs in many Indic scripts. **Contents** - [General information](#general-information) - [Terminology](#terminology) - [Glyph classification](#glyph-classification) - [Vedic Extensions character table](#vedic-extensions-character-table) - [Shaping information](#shaping-information) ## General information ## The Vedic Extensions block encodes letters and marks that are used in a large body of ancient literature written in the Vedic Sanskrit language. Primarily an oral language in the time period when the key literature originated, Vedic Sanskrit has no native script. Therefore, texts may be typeset in any one of the Indic scripts, using the Vedic Extensions to supplement the main script's character set. ## Terminology ## Individual Vedic Extension characters may be named by a combination of the Vedic text in which the mark is used, the regional or manuscript tradition involved, or a simple visual or phonetic description of the character. Some commonly used general categories are worth noting. **Udatta** is the term for a high tone on a vowel. **Anudatta** is the term for a low tone on a vowel. **Svarita** is the term for a falling or mixed tone on a vowel. **Anusvara** is the term for a nasalization sound that precedes a consonant. **Visarga** is the term for a soft breathing sound that precedes a vowel. > Note: In modern Indic languages, the terms _anusvara_ and _visarga_ > often refer to diacritical marks that have the above effects on > pronunciation. In the Vedic Sanskrit language, however, they are > generally considered independent letters. ## Glyph classification ## For most codepoints, the `General Category` property defined in the Unicode standard is correct, but it is not sufficient to fully capture the expected shaping behavior (such as how the character is treated during glyph reordering). Therefore, they must additionally be classified by how they are treated when shaping a run of text. ### Vedic Extensions character table ### Vedic Extension glyphs should be classified as in the following table. Codepoints with no assigned meaning are marked as _unassigned_ in the _Unicode category_ column. Assigned codepoints marked with a _null_ in the _Shaping class_ column evoke no special behavior from the shaping engine. The _Mark-placement subclass_ column indicates mark-placement positioning. Assigned codepoints marked with a _null_ in this column evoke no special mark-placement behavior. Marks tagged with [Mn] in the _Unicode category_ column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining. Some codepoints in the following table use a _Shaping class_ that differs from the codepoint's Unicode _General Category_. The _Shaping class_ takes precedence during OpenType shaping, as it captures more specific behavior. :::{table} Vedic Extensions character table | Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph | |:----------|:-----------------|:------------------|:---------------------------|:-----------------------------| |`U+1CD0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳐ Tone Karshana | |`U+1CD1` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳑ Tone Shara | |`U+1CD2` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳒ Tone Prenkha | |`U+1CD3` | Punctuation | _null_ | _null_ | ᳓ Sign Nihshvasa | |`U+1CD4` | Mark [Mn] | CANTILLATION | OVERSTRUCK | ᳔ Tone Midline Svarita | |`U+1CD5` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳕ Tone Aggravated Independent Svarita | |`U+1CD6` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳖ Tone Independent Svarita | |`U+1CD7` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳗ Tone Kathaka Independent Svarita | |`U+1CD8` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳘ Tone Candra Below | |`U+1CD9` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳙ Tone Kathaka Independent Svarita Schroeder | |`U+1CDA` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳚ Tone Double Svarita | |`U+1CDB` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳛ Tone Triple Svarita | |`U+1CDC` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳜ Tone Kathaka Anudatta | |`U+1CDD` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳝ Tone Dot Below | |`U+1CDE` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳞ Tone Two Dots Below | |`U+1CDF` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳟ Tone Three Dots Below | | | | | | |`U+1CE0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳠ Tone Rigvedic Kashmiri Independent Svarita | |`U+1CE1` | Mark [Mc] | CANTILLATION | RIGHT_POSITION | ᳡ Tone Atharavedic Independent Svarita | |`U+1CE2` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳢ Sign Visarga Svarita | |`U+1CE3` | Mark [Mn] | _null_ | OVERSTRUCK | ᳣ Sign Visarga Udatta | |`U+1CE4` | Mark [Mn] | _null_ | OVERSTRUCK | ᳤ Sign Reversed Visarga Udatta | |`U+1CE5` | Mark [Mn] | _null_ | OVERSTRUCK | ᳥ Sign Visarga Anudatta | |`U+1CE6` | Mark [Mn] | _null_ | OVERSTRUCK | ᳦ Sign Reversed Visarga Anudatta | |`U+1CE7` | Mark [Mn] | _null_ | OVERSTRUCK | ᳧ Sign Visarga Udatta With Tail | |`U+1CE8` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳨ Sign Visarga Anudatta With Tail | |`U+1CE9` | Letter | AVAGRAHA | _null_ | ᳩ Sign Anusvara Antargomukha | |`U+1CEA` | Letter | _null_ | _null_ | ᳪ Sign Anusvara Bahirgomukha | |`U+1CEB` | Letter | _null_ | _null_ | ᳫ Sign Anusvara Vamagomukha | |`U+1CEC` | Letter | AVAGRAHA | _null_ | ᳬ Sign Anusvara Vamagomukha With Tail | |`U+1CED` | Mark [Mn] | AVAGRAHA | BOTTOM_POSITION | ᳭ Sign Tiryak | |`U+1CEE` | Letter | AVAGRAHA | _null_ | ᳮ Sign Hexiform Long Anusvara | |`U+1CEF` | Letter | _null_ | _null_ | ᳯ Sign Long Anusvara | | | | | | |`U+1CF0` | Letter | _null_ | _null_ | ᳰ Sign Rthang Long Anusvara | |`U+1CF1` | Letter | AVAGRAHA | _null_ | ᳱ Sign Anusvara Ubhayato Mukha | |`U+1CF2` | Letter | CONSONANT_DEAD | _null_ | ᳲ Sign Ardhavisarga | |`U+1CF3` | Letter | CONSONANT_DEAD | _null_ | ᳳ Sign Rotated Ardhavisarga | |`U+1CF4` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳴ Tone Candra Above | |`U+1CF5` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳵ Sign Jihvamuliya | |`U+1CF6` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳶ Sign Upadhmaniya | |`U+1CF7` | Mark [Mc] | _null_ | _null_ | ᳷ Sign Atikrama | |`U+1CF8` | Mark [Mn] | CANTILLATION | _null_ | ᳸ Tone Ring Above | |`U+1CF9` | Mark [Mn] | CANTILLATION | _null_ | ᳹ Tone Double Ring Above | |`U+1CFA` | Letter | PLACEHOLDER | _null_ | ᳺ Sign Double Anusvara Antargomukha | |`U+1CFB` | _unassigned_ | | | | |`U+1CFC` | _unassigned_ | | | | |`U+1CFD` | _unassigned_ | | | | |`U+1CFE` | _unassigned_ | | | | |`U+1CFF` | _unassigned_ | | | | ::: ## Shaping information ## 31 of the characters in the block are categorized as marks. 27 of these marks are subcategorized as non-spacing; the remaining four are spacing-combining. Of the non-spacing marks, 20 are classified as `CANTILLATION` (or tone-marker) indicators, which modify the pitch of vowels. Most of these marks are generally positioned above or below the main character, using GPOS mark attachment, in a position that does not interact or interfere with the main character. In Unicode, the `CANTILLATION` classification is separate from the `TONE_MARKER` classification used in some scripts for semantic reasons; the two classifications are identical for shaping purposes. Some of the marks (cantillation and non-cantillation) are classified as `OVERSTRUCK` in the _Mark-placement subclass_ column. This indicates that the mark is intended to be rendered on top of the preceding character. During reordering, `OVERSTRUCK` marks are tagged for the ordering position `POS_AFTER_MAIN`. Some marks are classified, for shaping purposes, as `AVAGRAHA` or `VISARGA`. This indicates that the mark behaves more like the Avagraha or Visarga character than like a diacritic. Characters that are categorized in Unicode as letters vary with respect to whether or not they trigger special behavior in the shaping process. These include letters that are classified as `CONSONANT` and letters that are classified as `AVAGRAHA`.