12. Vedic Extensions in OpenType

This document outlines the shaping information needed to display characters from the Unicode Vedic Extensions block, which may be used within text runs in many Indic scripts.

Contents

12.1. General information

The Vedic Extensions block encodes letters and marks that are used in a large body of ancient literature written in the Vedic Sanskrit language.

Primarily an oral language in the time period when the key literature originated, Vedic Sanskrit has no native script. Therefore, texts may be typeset in any one of the Indic scripts, using the Vedic Extensions to supplement the main script’s character set.

12.2. Terminology

Individual Vedic Extension characters may be named by a combination of the Vedic text in which the mark is used, the regional or manuscript tradition involved, or a simple visual or phonetic description of the character. Some commonly used general categories are worth noting.

Udatta is the term for a high tone on a vowel.

Anudatta is the term for a low tone on a vowel.

Svarita is the term for a falling or mixed tone on a vowel.

Anusvara is the term for a nasalization sound that precedes a consonant.

Visarga is the term for a soft breathing sound that precedes a vowel.

Note: In modern Indic languages, the terms anusvara and visarga often refer to diacritical marks that have the above effects on pronunciation. In the Vedic Sanskrit language, however, they are generally considered independent letters.

12.3. Glyph classification

For most codepoints, the General Category property defined in the Unicode standard is correct, but it is not sufficient to fully capture the expected shaping behavior (such as how the character is treated during glyph reordering). Therefore, they must additionally be classified by how they are treated when shaping a run of text.

12.3.1. Vedic Extensions character table

Vedic Extension glyphs should be classified as in the following table. Codepoints with no assigned meaning are marked as unassigned in the Unicode category column.

Assigned codepoints marked with a null in the Shaping class column evoke no special behavior from the shaping engine.

The Mark-placement subclass column indicates mark-placement positioning. Assigned codepoints marked with a null in this column evoke no special mark-placement behavior. Marks tagged with [Mn] in the Unicode category column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining.

Some codepoints in the following table use a Shaping class that differs from the codepoint’s Unicode General Category. The Shaping class takes precedence during OpenType shaping, as it captures more specific behavior.

Table 12.3.1 Vedic Extensions character table

Codepoint

Unicode category

Shaping class

Mark-placement subclass

Glyph

U+1CD0

Mark [Mn]

CANTILLATION

TOP_POSITION

᳐ Tone Karshana

U+1CD1

Mark [Mn]

CANTILLATION

TOP_POSITION

᳑ Tone Shara

U+1CD2

Mark [Mn]

CANTILLATION

TOP_POSITION

᳒ Tone Prenkha

U+1CD3

Punctuation

null

null

᳓ Sign Nihshvasa

U+1CD4

Mark [Mn]

CANTILLATION

OVERSTRUCK

᳔ Tone Midline Svarita

U+1CD5

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳕ Tone Aggravated Independent Svarita

U+1CD6

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳖ Tone Independent Svarita

U+1CD7

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳗ Tone Kathaka Independent Svarita

U+1CD8

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳘ Tone Candra Below

U+1CD9

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳙ Tone Kathaka Independent Svarita Schroeder

U+1CDA

Mark [Mn]

CANTILLATION

TOP_POSITION

᳚ Tone Double Svarita

U+1CDB

Mark [Mn]

CANTILLATION

TOP_POSITION

᳛ Tone Triple Svarita

U+1CDC

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳜ Tone Kathaka Anudatta

U+1CDD

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳝ Tone Dot Below

U+1CDE

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳞ Tone Two Dots Below

U+1CDF

Mark [Mn]

CANTILLATION

BOTTOM_POSITION

᳟ Tone Three Dots Below

U+1CE0

Mark [Mn]

CANTILLATION

TOP_POSITION

᳠ Tone Rigvedic Kashmiri Independent Svarita

U+1CE1

Mark [Mc]

CANTILLATION

RIGHT_POSITION

᳡ Tone Atharavedic Independent Svarita

U+1CE2

Mark [Mn]

AVAGRAHA

OVERSTRUCK

᳢ Sign Visarga Svarita

U+1CE3

Mark [Mn]

null

OVERSTRUCK

᳣ Sign Visarga Udatta

U+1CE4

Mark [Mn]

null

OVERSTRUCK

᳤ Sign Reversed Visarga Udatta

U+1CE5

Mark [Mn]

null

OVERSTRUCK

᳥ Sign Visarga Anudatta

U+1CE6

Mark [Mn]

null

OVERSTRUCK

᳦ Sign Reversed Visarga Anudatta

U+1CE7

Mark [Mn]

null

OVERSTRUCK

᳧ Sign Visarga Udatta With Tail

U+1CE8

Mark [Mn]

AVAGRAHA

OVERSTRUCK

᳨ Sign Visarga Anudatta With Tail

U+1CE9

Letter

AVAGRAHA

null

ᳩ Sign Anusvara Antargomukha

U+1CEA

Letter

null

null

ᳪ Sign Anusvara Bahirgomukha

U+1CEB

Letter

null

null

ᳫ Sign Anusvara Vamagomukha

U+1CEC

Letter

AVAGRAHA

null

ᳬ Sign Anusvara Vamagomukha With Tail

U+1CED

Mark [Mn]

AVAGRAHA

BOTTOM_POSITION

᳭ Sign Tiryak

U+1CEE

Letter

AVAGRAHA

null

ᳮ Sign Hexiform Long Anusvara

U+1CEF

Letter

null

null

ᳯ Sign Long Anusvara

U+1CF0

Letter

null

null

ᳰ Sign Rthang Long Anusvara

U+1CF1

Letter

AVAGRAHA

null

ᳱ Sign Anusvara Ubhayato Mukha

U+1CF2

Letter

CONSONANT_DEAD

null

ᳲ Sign Ardhavisarga

U+1CF3

Letter

CONSONANT_DEAD

null

ᳳ Sign Rotated Ardhavisarga

U+1CF4

Mark [Mn]

CANTILLATION

TOP_POSITION

᳴ Tone Candra Above

U+1CF5

Letter

CONSONANT_WITH_STACKER

null

ᳵ Sign Jihvamuliya

U+1CF6

Letter

CONSONANT_WITH_STACKER

null

ᳶ Sign Upadhmaniya

U+1CF7

Mark [Mc]

null

null

᳷ Sign Atikrama

U+1CF8

Mark [Mn]

CANTILLATION

null

᳸ Tone Ring Above

U+1CF9

Mark [Mn]

CANTILLATION

null

᳹ Tone Double Ring Above

U+1CFA

Letter

PLACEHOLDER

null

ᳺ Sign Double Anusvara Antargomukha

U+1CFB

unassigned

U+1CFC

unassigned

U+1CFD

unassigned

U+1CFE

unassigned

U+1CFF

unassigned

12.4. Shaping information

31 of the characters in the block are categorized as marks. 27 of these marks are subcategorized as non-spacing; the remaining four are spacing-combining.

Of the non-spacing marks, 20 are classified as CANTILLATION (or tone-marker) indicators, which modify the pitch of vowels. Most of these marks are generally positioned above or below the main character, using GPOS mark attachment, in a position that does not interact or interfere with the main character. In Unicode, the CANTILLATION classification is separate from the TONE_MARKER classification used in some scripts for semantic reasons; the two classifications are identical for shaping purposes.

Some of the marks (cantillation and non-cantillation) are classified as OVERSTRUCK in the Mark-placement subclass column. This indicates that the mark is intended to be rendered on top of the preceding character. During reordering, OVERSTRUCK marks are tagged for the ordering position POS_AFTER_MAIN.

Some marks are classified, for shaping purposes, as AVAGRAHA or VISARGA. This indicates that the mark behaves more like the Avagraha or Visarga character than like a diacritic.

Characters that are categorized in Unicode as letters vary with respect to whether or not they trigger special behavior in the shaping process. These include letters that are classified as CONSONANT and letters that are classified as AVAGRAHA.