# Default script shaping in OpenType # This document details the default shaping procedure needed to display text runs in non-complex scripts. It may also be used as a fallback model for unrecognized scripts. **Contents** - [General information](#general-information) - [Terminology](#terminology) - [Normalization](#normalization) - [The default shaping model](#the-default-shaping-model) - [Stage 1: Applying the basic substitution features from GSUB](#stage-1-applying-the-basic-substitution-features-from-gsub) - [Stage 2: Applying typographic substitution features from GSUB](#stage-2-applying-typographic-substitution-features-from-gsub) - [Stage 3: Applying the positioning features from GPOS](#stage-3-applying-the-positioning-features-from-gpos) ## General information ## The default OpenType shaping model is used for scripts that are considered _non-complex_ from the shaper's perspective. This designation means that shaping a text run does not involve glyph reordering, contextual joining behavior, or the substitution of context-dependent forms for linguistic or orthographic correctness. Text runs in non-complex scripts may, however, involve ligature substitution, Unicode normalization, mark positioning, kerning, and the application of other features from the active font's GSUB and GPOS tables. The non-complex scripts covered by this model include Latin, Cyrillic, Greek, Armenian, Georgian, Ethiopic, Cherokee, Tifinagh, and many others. ## Terminology ## Many of these scripts support diacritics and other **marks**. Unicode may contain **precomposed** mark-and-base codepoints for some or all combinations of marks and base letters in the script. For combinations without a codepoint, the desired form can be achieved by following the **base** letter with a **combining mark** codepoint. The primary concern for the shaping engine is processing the text run into the correct normalized form, so that the best glyphs from the active font can be selected from among the available precomposed and combining alternatives. Fonts for non-complex scripts might not include a GSUB or GPOS table at all. However, GSUB and GPOS may also be used to implement a variety of OpenType smart features, including several classes of ligature, contextual alternate, or contextual positioning rules. Because these features are not required in order to render the text run orthographically correct, the features are not considered shaping features. Nevertheless, the shaping engine may be expected to apply these features in order to simplify the overall text-rendering architecture of the implementation. ## Normalization ## Unicode defines algorithms for normalizing a sequence of input codepoints into either a canonical composed form or a canonical decomposed form. The purpose of these algorithms and of the defined normalization forms is to determine equivalent representations of input sequences regardless of variations in the input sequences. For example, a base letter with an attached mark might exist in Unicode as a single codepoint, but an input sequence might consist of the base letter codepoint followed by the combining mark codepoint. Unicode normalization can be used to determine that the "letter, mark" sequence is equivalent to the single codepoint. This simplifies sorting, searching, string comparison, and many other common tasks. OpenType shaping utilizes Unicode normalization, but OpenType shaping has a distinctly different goal: to select the best or most appropriate representation of the input codepoint sequence that is available in the active font. A full description of the algorithm is available in the [normalization](opentype-shaping-normalization.md) document. Shaping some complex scripts involves explicit composition or decomposition steps. The default shaping model does not involve any such steps, but it does proceed with the general assumption that text runs have been normalized as part of input sanitization. For convenience, shaping engines may choose to implement a single normalization routine for all scripts, default and complex. If normalization is done before the shaping-model–specific processing is done, then there may be no work required in certain shaping steps (such as the processing of `ccmp` substitutions from GSUB). However, these steps will always be described in the relevant script's shaping document. ## The default shaping model ## Processing a run of text in the default shaping model involves three top-level stages: 1. Applying the basic substitution features from GSUB 2. Applying typographic substitution features from GSUB 3. Applying the positioning features from GPOS Together, these stages cover the application of all GSUB and GPOS features that are required or that have been defined by OpenType as being on by default. For convenience, shaping engines may also choose to apply any optional or off-by-default OpenType features that have been activated for the text run (including those that have been enabled by the user and those that have been enabled at the application level). However, the order in which such features should be applied and how they should interact with OpenType shaping features is beyond the scope of this document. The default shaping model does not involve syllable-identification, word-identification, or other preprocessing of the input sequence. Shaping engines may choose how to segment longer text runs for processing, or may choose to rely on higher-level applications to make segmentation decisions. ### Stage 1: Applying the basic substitution features from GSUB ### The basic-substitution stage applies mandatory substitution features using the rules in the font's GSUB table. In preparation for this stage, glyph sequences should be tagged for possible application of GSUB features. These substitutions include those features designed to provide linguistic and orthographic correctness. The order in which these features are applied is not canonical; they should be applied in the order in which they appear in the GSUB table in the font. locl ccmp rlig The `locl` feature replaces default glyphs with any language-specific variants, based on examining the language setting of the text run. > Note: Strictly speaking, the use of localized-form substitutions is > not part of the shaping process, but of the localization process, > and could take place at an earlier point while handling the text > run. However, shaping engines are expected to complete the > application of the `locl` feature before applying the subsequent > GSUB substitutions in the following steps. The `ccmp` feature allows a font to substitute mark-and-base sequences with a pre-composed glyph including the mark and the base, or to substitute a single glyph into an equivalent decomposed sequence of glyphs. If present, these composition and decomposition substitutions must be performed before applying any other GSUB lookups, because those lookups may be written to match only the `ccmp`-substituted glyphs. > Note: The `ccmp` feature may perform compositions or decompositions > of glyph sequences that do not have a canonical decomposition > defined in Unicode. The `rlig` feature substitutes glyph sequences with mandatory ligatures. Substitutions made by `rlig` cannot be disabled by application-level user interfaces. ### Stage 2: Applying typographic substitution features from GSUB ### The typographic-substitution phase applies all remaining substitution features using the rules in the font's GSUB table. In preparation for this stage, glyph sequences should be tagged for possible application of GSUB features. These substitutions include those features designed to provide typographic consistency and correctness. The order in which these features are applied is not canonical; they should be applied in the order in which they appear in the GSUB table in the font. rclt calt clig liga The `rclt` feature substitutes glyphs with contextual alternate forms. In general, the `rclt` feature is used to perform such substitutions that are required by the orthography of the active script and language. Substitutions made by `rclt` cannot be disabled by application-level user interfaces. The `calt` feature substitutes glyphs with contextual alternate forms. In general, the `calt` feature performs substitutions that are not mandatory for orthographic correctness. However, unlike `rclt`, the substitutions made by `calt` can be disabled by application-level user interfaces. The `clig` feature substitutes optional ligatures that are on by default, but which are activated only in certain contexts. Substitutions made by `clig` may be disabled by application-level user interfaces. The `liga` feature substitutes standard, optional ligatures that are on by default. Substitutions made by `liga` may be disabled by application-level user interfaces. ### Stage 3: Applying the positioning features from GPOS ### The positioning stage adjusts the positions of mark and base glyphs. In preparation for this stage, glyph sequences should be tagged for possible application of GPOS features. The order in which these features are applied is not canonical; they should be applied in the order in which they appear in the GSUB table in the font. curs dist kern mark mkmk The `curs` feature perform cursive positioning. Each glyph has an entry point and exit point; the `curs` feature positions glyphs so that the entry point of the current glyph meets the exit point of the preceding glyph. The `dist` feature adjusts the horizontal positioning of glyphs. Unlike `kern`, adjustments made with `dist` do not require the application or the user to enable any software kerning features, if such features are optional. The `kern` adjusts glyph spacing between pairs of adjacent glyphs. The `mark` feature positions marks with respect to base glyphs. The `mkmk` feature positions marks with respect to preceding marks, providing proper positioning for sequences of marks that attach to the same base glyph.