Syriac character tables

This document lists the per-character shaping information needed to shape Syriac text.

Contents

Syriac character table

Syriac glyphs should be classified as in the following table. Codepoints in the Syriac block with no assigned meaning are designated as unassigned in the Unicode category column.

The Joining type column indicates whether each codepoint is defined as joining with adjacent characters on the left side, right side, left and right sides (“DUAL”), or neither side (“NON_JOINING”). Codepoints designated TRANSPARENT in the Joining type column do not join with adjacent characters and, in addition, do not affect the joining behavior of surrounding characters. Non-spacing marks are of type TRANSPARENT. Codepoints designated JOIN_CAUSING force adjacent characters to join.

The Joining group column lists the fundamental letter that the listed codepoint behaves like for joining purposes.

Assigned codepoints with a null in the Joining group column evoke no special behavior from the shaping engine during the join-computation stage.

The Mark class column indicates the Canonical Combining Class for the codepoint. Marks are assigned non-zero combining classes so that sequences of adjacent marks can be reordered as required by the orthography.

For Syriac, a subset of marks in the 220 and 230 classes are also designated Modifier Combining Marks (MCM). These are denoted with 220_MCM and 230_MCM in the Mark class column. The MCM marks are treated differently during the mark-reordering stage.

Table 68 Syriac character table

Codepoint

Unicode category

Joining type

Joining group

Mark class

Glyph

U+0700

Punctuation

NON_JOINING

null

0

܀ End of Paragraph

U+0701

Punctuation

NON_JOINING

null

0

܁ Supralinear Full Stop

U+0702

Punctuation

NON_JOINING

null

0

܂ Sublinear Full Stop

U+0703

Punctuation

NON_JOINING

null

0

܃ Supralinear Colon

U+0704

Punctuation

NON_JOINING

null

0

܄ Sublinear Colon

U+0705

Punctuation

NON_JOINING

null

0

܅ Horizontal Colon

U+0706

Punctuation

NON_JOINING

null

0

܆ Colon Skewed Left

U+0707

Punctuation

NON_JOINING

null

0

܇ Colon Skewed Right

U+0708

Punctuation

NON_JOINING

null

0

܈ Supralinear Colon Skewed Left

U+0709

Punctuation

NON_JOINING

null

0

܉ Sublinear Colon Skewed Right

U+070A

Punctuation

NON_JOINING

null

0

܊ Contraction

U+070B

Punctuation

NON_JOINING

null

0

܋ Harklean Obelus

U+070C

Punctuation

NON_JOINING

null

0

܌ Harklean Metobelus

U+070D

Punctuation

NON_JOINING

null

0

܍ Harklean Asteriscus

U+070E

unassigned

U+070F

Other

TRANSPARENT

null

0

܏ Syriac Abbreviation Mark

U+0710

Letter

RIGHT

ALAPH

0

ܐ Alaph

U+0711

Mark [Mn]

TRANSPARENT

null

36

ܑ Superscript Alaph

U+0712

Letter

DUAL

BETH

0

ܒ Beth

U+0713

Letter

DUAL

GAMAL

0

ܓ Gamal

U+0714

Letter

DUAL

GAMAL

0

ܔ Gamal Garshuni

U+0715

Letter

RIGHT

DALATH_RISH

0

ܕ Dalath

U+0716

Letter

RIGHT

DALATH_RISH

0

ܖ Dotless Dalath Rish

U+0717

Letter

RIGHT

HE

0

ܗ He

U+0718

Letter

RIGHT

SYRIAC_WAW

0

ܘ Waw

U+0719

Letter

RIGHT

ZAIN

0

ܙ Zain

U+071A

Letter

DUAL

HETH

0

ܚ Heth

U+071B

Letter

DUAL

TETH

0

ܛ Teth

U+071C

Letter

DUAL

TETH

0

ܜ Teth Garshuni

U+071D

Letter

DUAL

YUDH

0

ܝ Yudh

U+071E

Letter

RIGHT

YUDH_HE

0

ܞ Yudh He

U+071F

Letter

DUAL

KAPH

0

ܟ Kaph

U+0720

Letter

DUAL

LAMADH

0

ܠ Lamadh

U+0721

Letter

DUAL

MIM

0

ܡ Mim

U+0722

Letter

DUAL

NUN

0

ܢ Nun

U+0723

Letter

DUAL

SEMKATH

0

ܣ Semkath

U+0724

Letter

DUAL

FINAL_SEMKATH

0

ܤ Final Semkath

U+0725

Letter

DUAL

E

0

ܥ E

U+0727

Letter

DUAL

PE

0

ܧ Pe

U+0727

Letter

DUAL

REVERSED_PE

0

ܧ Reversed Pe

U+0728

Letter

RIGHT

SADHE

0

ܨ Sadhe

U+0729

Letter

DUAL

QAPH

0

ܩ Qaph

U+072A

Letter

RIGHT

DALATH_RISH

0

ܪ Rish

U+072B

Letter

DUAL

SHIN

0

ܫ Shin

U+072C

Letter

RIGHT

TAW

0

ܬ Taw

U+072D

Letter

DUAL

BETH

0

ܭ Persian Bheth

U+072E

Letter

DUAL

GAMAL

0

ܮ Persian Ghamal

U+072F

Letter

RIGHT

DALATH_RISH

0

ܯ Persian Dhalath

U+0730

Mark [Mn]

TRANSPARENT

null

230

ܰ Pthaha Above

U+0731

Mark [Mn]

TRANSPARENT

null

220

ܱ Pthaha Below

U+0732

Mark [Mn]

TRANSPARENT

null

230

ܲ Pthaha Dotted

U+0733

Mark [Mn]

TRANSPARENT

null

230

ܳ Zqapha Above

U+0734

Mark [Mn]

TRANSPARENT

null

220

ܴ Zqapha Below

U+0735

Mark [Mn]

TRANSPARENT

null

230

ܵ Zqapha Dotted

U+0736

Mark [Mn]

TRANSPARENT

null

230

ܶ Rbasa Above

U+0737

Mark [Mn]

TRANSPARENT

null

220

ܷ Rbasa Below

U+0738

Mark [Mn]

TRANSPARENT

null

220

ܸ Dotted Zlama Horizontal

U+0739

Mark [Mn]

TRANSPARENT

null

220

ܹ Dotted Zlama Angular

U+073A

Mark [Mn]

TRANSPARENT

null

230

ܺ Hbasa Above

U+073B

Mark [Mn]

TRANSPARENT

null

220

ܻ Hbasa Below

U+073C

Mark [Mn]

TRANSPARENT

null

220

ܼ Hbasa-Esasa Dotted

U+073D

Mark [Mn]

TRANSPARENT

null

230

ܽ Esasa Above

U+073E

Mark [Mn]

TRANSPARENT

null

220

ܾ Esasa Below

U+073F

Mark [Mn]

TRANSPARENT

null

230

ܿ Rwaha

U+0740

Mark [Mn]

TRANSPARENT

null

230

݀ Feminine Dot

U+0741

Mark [Mn]

TRANSPARENT

null

230

݁ Qushshaya

U+0742

Mark [Mn]

TRANSPARENT

null

220

݂ Rukkakha

U+0743

Mark [Mn]

TRANSPARENT

null

230

݃ Two Vertical Dots Above

U+0744

Mark [Mn]

TRANSPARENT

null

220

݄ Two Vertical Dots Below

U+0745

Mark [Mn]

TRANSPARENT

null

230

݅ Three Dots Above

U+0746

Mark [Mn]

TRANSPARENT

null

220

݆ Three Dots Below

U+0747

Mark [Mn]

TRANSPARENT

null

220

݇ Oblique Line Above

U+0748

Mark [Mn]

TRANSPARENT

null

230

݈ Oblique Line Below

U+0749

Mark [Mn]

TRANSPARENT

null

230

݉ Music

U+074A

Mark [Mn]

TRANSPARENT

null

230

݊ Barrekh

U+074B

unassigned

U+074C

unassigned

U+074D

Letter

RIGHT

ZHAIN

0

ݍ Sogdian Zhain

U+074E

Letter

DUAL

KHAPH

0

ݎ Sogdian Khaph

U+074F

Letter

DUAL

FE

0

ݏ Sogdian Fe

Syriac Supplement character table

The Syriac Supplement block includes letters needed to write Suriyani Malayalam, also known as Garshuni or Syriac Malayalam.

Table 69 Syriac Supplement character table

Codepoint

Unicode category

Joining type

Joining group

Mark class

Glyph

U+0860

Letter

DUAL

MALAYALAM_NGA

0

ࡠ Malayalam Nga

U+0861

Letter

NON_JOINING

MALAYALAM_JA

0

ࡡ Malayalam Ja

U+0862

Letter

DUAL

MALAYALAM_NYA

0

ࡢ Malayalam Nya

U+0863

Letter

DUAL

MALAYALAM_TTA

0

ࡣ Malayalam Tta

U+0864

Letter

DUAL

MALAYALAM_NNA

0

ࡤ Malayalam Nna

U+0865

Letter

DUAL

MALAYALAM_NNNA

0

ࡥ Malayalam Nnna

U+0866

Letter

NON_JOINING

MALAYALAM_BHA

0

ࡦ Malayalam Bha

U+0867

Letter

RIGHT

MALAYALAM_RA

0

ࡧ Malayalam Ra

U+0868

Letter

DUAL

MALAYALAM_LLA

0

ࡨ Malayalam Lla

U+0869

Letter

RIGHT

MALAYALAM_LLLA

0

ࡩ Malayalam Llla

U+086A

Letter

RIGHT

MALAYALAM_SSA

0

ࡪ Malayalam Ssa

U+086B

unassigned

U+086C

unassigned

U+086D

unassigned

U+086E

unassigned

U+086F

unassigned

Miscellaneous character table

Other important characters that may be encountered when shaping runs of Syriac text include the dotted-circle placeholder (U+25CC), the combining grapheme joiner (U+034F), the zero-width joiner (U+200D) and zero-width non-joiner (U+200C), the left-to-right text marker (U+200E) and right-to-left text marker (U+200F), and the no-break space (U+00A0).

The dotted-circle placeholder is frequently used when displaying a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.

In addition, Syriac text runs may include the “Tatweel” or kashida codepoint (U+0640) from the Arabic block, because the Syriac block does not encode a separate kashida character.

Table 70 Miscellaneous character table

Codepoint

Unicode category

Joining type

Joining group

Mark class

Glyph

U+00A0

Separator

NON_JOINING

null

0

  No-break space

U+034F

Other

NON_JOINING

null

0

͏ Combining grapheme joiner

U+0640

Letter modifier

JOIN_CAUSING

null

0

ـ Arabic Tatweel

U+200C

Other

NON_JOINING

null

0

‌ Zero-width non-joiner

U+200D

Other

JOIN_CAUSING

null

0

‍ Zero-width joiner

U+200E

Other

NON_JOINING

null

0

‎ Left-to-Right marker

U+200F

Other

NON_JOINING

null

0

‏ Right-to-Left marker

U+2010

Punctuation

NON_JOINING

null

0

‐ Hyphen

U+2011

Punctuation

NON_JOINING

null

0

‑ No-break hyphen

U+2012

Punctuation

NON_JOINING

null

0

‒ Figure dash

U+2013

Punctuation

NON_JOINING

null

0

– En dash

U+2014

Punctuation

NON_JOINING

null

0

— Em dash

U+25CC

Symbol

NON_JOINING

null

0

◌ Dotted circle

The combining grapheme joiner (CGJ) is primarily used to alter the order in which adjacent marks are positioned during the mark-reordering stage, in order to adhere to the needs of a non-default language orthography.

The zero-width joiner (ZWJ) is primarily used to force the usage of the cursive connecting form of a letter even when the context of the adjoining letters would not trigger the connecting form.

For example, to show the initial form of a letter in isolation (such as for displaying it in a table of forms), the sequence “Letter,ZWJ” would be used. To show the medial form of a letter in isolation, the sequence “ZWJ,Letter,ZWJ” would be used.

The right-to-left mark (RLM) and left-to-right mark (LRM) are used by the Unicode bidirectionality algorithm (BiDi) to indicate the points in a text run at which the writing direction changes.

The no-break space is primarily used to display those codepoints that are defined as non-spacing (such as vowel or diacritical marks and “Hamza”) in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder.