```{include} /_global.md
```
# OpenType shaping documents #
Sponsored by [YesLogic](https://yeslogic.com/)
__
:::{admonition} 🆆 🅰 🆁 🅽 🅸 🅽 🅶
:class: caution
These documents are an active WORK IN PROGRESS.
NONE of the documents you currently see here are complete
nor are they suitable for reference. PLEASE do not use
them as a guide or as a general information source.
As long as this warning text remains visible, the above
holds true.
:::
These documents are meant to provide a functional specification for
text shaping. The expectation is that an implementer of this
specification will be using fonts in the OpenType font format applied
to input text that complies with Unicode.
Because application software and end-user documents may utilize
non-OpenType fonts and non-Unicode text (in particular, when older
fonts or documents are encountered), these documents also provide
functional information that a shaping engine may use to implement a
reasonable best-effort attempt at producing useful output in the most
common of such scenarios.
## Shapers
The shaping behavior described here can be roughly divided into five
categories.
All non-complex scripts follow the same
[default](opentype-shaping-default.md) shaping model.
The _Indic Model_ is shared by ten individual scripts. These scripts
follow the same overall approach to shaping, described in the [Indic
general](opentype-shaping-indic-general.md) document, but each script
incorporates script-specific details, which are more fully described
in its own document:
- [Devanagari](opentype-shaping-devanagari.md)
- [Bengali](opentype-shaping-bengali.md)
- [Gujarati](opentype-shaping-gujarati.md)
- [Gurmukhi](opentype-shaping-gurmukhi.md)
- [Kannada](opentype-shaping-kannada.md)
- [Malayalam](opentype-shaping-malayalam.md)
- [Oriya](opentype-shaping-oriya.md)
- [Tamil](opentype-shaping-tamil.md)
- [Telugu](opentype-shaping-telugu.md)
- [Sinhala](opentype-shaping-sinhala.md)
The _Arabic Model_ is shared by four individual scripts. These scripts
follow the same overall approach to shaping, described in the [Arabic
general](opentype-shaping-arabic-general.md) document, but each script
incorporates script-specific details, which are more fully described
in its own document:
- [Arabic](opentype-shaping-arabic.md)
- [N'Ko](opentype-shaping-nko.md)
- [Syriac](opentype-shaping-syriac.md)
- [Mongolian](opentype-shaping-mongolian.md)
Five of the remaining scripts each use a distinct, script-specific
model, with two others (Thai and Lao) sharing enough details to be
handled by a common shaper:
- [Hangul](opentype-shaping-hangul.md)
- [Hebrew](opentype-shaping-hebrew.md)
- [Khmer](opentype-shaping-khmer.md)
- [Thai and Lao](opentype-shaping-thai-lao.md)
- [Tibetan](opentype-shaping-tibetan.md)
- [Myanmar](opentype-shaping-myanmar.md)
Finally, the Universal Shaping Engine (USE) model is designed to shape all
complex scripts that are not handled by a dedicated
script-specific shaping model in the lists above:
- [Universal Shaping Engine (USE)](opentype-shaping-use.md)
In addition, these documents describe the handling of emoji
sequences. Although emoji sequences do not constitute a separate
shaping model, handling emoji sequences can incorporate many of the
same shaping mechanisms and shaping engine implementations may be
expected to handle them:
- [Emoji](opentype-shaping-emoji.md)
Shaping is just one part of the overall text-handling process. These
documents assume that other components in the software stack will be
responsible for details such as handling higher-level markup, layout,
font matching and loading, rasterization, and so on. Most importantly,
these documents assume that the input text has already been segmented
into text runs that consist of a single language, script, font, and
all other markup considerations (such as size or color, for example).
Within those assumptions, the shaping of a particular text run should
be consistent, regardless of whether the higher-level processes
involve a document, user-interface element, network stream, or any
other context for displaying text.
## Normalization
However, these documents also include a description of text
[normalization](opentype-shaping-normalization.md) in the OpenType
shaping context, which differs from Unicode normalization in several
respects. Shaping engine implementations may differ as to whether the
shaping engine itself is responsible for handling normalization or
whether normalization is handled by another component
in the stack.
## Additional information
Various practical [notes](notes/index.md) about this document set and
the details of its scope, limitations, and quirks are also provided.
Some [errata](errata.md) about the "upstream" specifications and
reference documents are noted separately.
In its final form, this repository will hold documentation describing
the shaping behavior used for layout of OpenType text. In particular,
it will focus on complex scripts.
In addition to the primary, per-script documents, implementers and
other interested readers are encouraged to check the
[character tables](character-tables/index.md) for correctness and to
examine the [image-generation logs](https://github.com/n8willis/opentype-shaping-documents/images/README.md) to identify
issues seen in the inline images.
## Feedback
Interested readers, font developers, and shaping-engine implementers
are encouraged to provide feedback, ask questions, and propose
improvements to any part of these documents. Shaping is the concern of
software developers and readers across the world, and all are welcome
to participate in recording and clarifying what is required to produce
the best and most accurate text output possible, both now and in the
future.
See the upstream git repository at
[github.com/n8willis/opentype-shaping-documents](https://github.com/n8willis/opentype-shaping-documents)
to raise issues, ask questions, or add comments.
## References
These documents cite the following informative references:
1. The Microsoft [Script development
specifications](https://docs.microsoft.com/en-us/typography/script-development/standard),
which document the behaviors expected for OpenType Layout fonts and
provide guidance & examples for type designers. OpenType is a
registered trademark of Microsoft Corporation.
2. Related portions of the Microsoft OpenType specification, such as the
[OpenType Layout tag
registry](https://docs.microsoft.com/en-us/typography/opentype/spec/ttoreg)
and [OpenType Layout common table
formats](https://docs.microsoft.com/en-us/typography/opentype/spec/chapter2),
which list and define feature tags, script & language tags, and
other internals of compliant OpenType font binaries. OpenType is a
registered trademark of Microsoft Corporation.
3. The [HarfBuzz](https://github.com/harfbuzz/harfbuzz) project, which
includes a free-software/open-source implementation of OpenType
Layout shaping with full source code and documentation.
4. The [AllSorts](https://github.com/yeslogic/allsorts) project, which
includes a free-software/open-source implementation of OpenType
Layout shaping with full source code and documentation.
5. The [Unicode
Standard](http://www.unicode.org/standard/standard.html) and
related Unicode Consortium projects such as the [Unicode Character
Database](http://www.unicode.org/reports/tr44/), which defines
Unicode code points and formal character properties used in
shaping. Unicode and the Unicode Logo are registered trademarks of
Unicode, Inc. in the United States and other countries.
6. The YesLogic [text corpus](https://github.com/yeslogic/corpus),
which includes real-world text data for several Indic scripts,
scraped from Wikipedia, Reddit, and multiple online news
sources. This data is used to test shaping in AllSorts and Prince.
7. Known but unofficial information about other shaping-engine
projects. Primarily this includes tests and reproducible issues
found via [HarfBuzz](https://github.com/harfbuzz/harfbuzz), because
HarfBuzz intentionally aims to produce results that will 100% match
the output of Microsoft Uniscribe (not counting cases where
Uniscribe's output is known to be incorrect, of course).
> Note: occasionally, tests or issues documenting the behavior of
> Apple CoreText are also included, but CoreText compatibility is
> not an explicit goal for HarfBuzz.
---
Version {{ env.config.version }}, release {{ env.config.release }};
built {sub-ref}`today`.