Character Substitutions

The replacements substitution processes textual characters such as marks, arrows and dashes and replaces them with the decimal format of their Unicode code point, i.e., their numeric character reference.

Textual symbol replacements
Name Syntax Unicode Replacement Rendered Notes

Copyright

(C)
©

©

Registered

(R)
®

®

Trademark

(TM)
™

Em dash

--
—

 — 

Only replaced if between two word characters, between a word character and a line boundary, or flanked by spaces.

When flanked by space characters (e.g., a -- b), the normal spaces are replaced by thin spaces ( ).

ellipses

...
…

…​

right single arrow

->
→

right double arrow

=>
⇒

left single arrow

<-
&#8592;

left double arrow

<=
&#8656;

apostrophe

Sam's
Sam&#8217;s

Sam’s

The vertical form apostrophe is replaced with the curved form apostrophe.

The replacements element depends on the substitutions completed by the specialcharacters element. This is important to keep in mind when applying custom substitutions to a block. See applying substitutions for more information.

The replacements substitution also recognizes HTML and XML character entity references as well as decimal and hexadecimal Unicode code points and substitutes them for their corresponding decimal form Unicode code point.

For example, to produce the § symbol you could write &sect;, &#x00A7;, or &#167;. When the document is processed, replacements will replace the section symbol reference, regardless of whether it is a character entity reference or a numeric character reference, with &#167;. In turn, &#167; will display as §.

Anatomy of a character entity reference and a numeric character reference

A character reference is a standard sequence of characters that is substituted for a single character when Asciidoctor processes a document. There are two types of character references: character entity references and numeric character references.

A character entity reference is the name of an entity which refers to a character. The name must be prefixed with an ampersand (&) and end with a semicolon (;).

For example:

  • &dagger; displays as †

  • &euro; displays as €

  • &loz; displays as ◊

Numeric character references are the decimal or hexadecimal Universal Character Set/Unicode code points which refer to a character.

  • The decimal code point references are prefixed with an ampersand (&), followed by a hash (#), and end with a semicolon (;).

  • Hexadecimal code point references are prefixed with an ampersand (&), followed by a hash (#), followed by a lowercase x, and end with a semicolon (;).

For example:

  • &#x2020; or &#8224; displays as †

  • &#x20AC; or &#8364; displays as €

  • &#x25CA; or &#9674; displays as ◊

Developers may be more familiar with using Unicode escape sequences to perform text substitutions. For example, to produce an @ sign using a Unicode escape sequence, you would prefix the hexadecimal Unicode code point with a backslash (\) and an uppercase or lowercase u, i.e. u0040. However, Asciidoctor doesn’t process and replace Unicode escape sequences at this time.

Asciidoctor also provides numerous built-in attributes for representing characters and symbols. These attributes and their corresponding output are listed in the Character Replacement Attributes Reference Table.

The replacements substitution occurs within title, paragraph, example, quote, sidebar, and verse blocks.

Elements subject to replacements text substitution
Element replacements substitution

Attribute Entry Value

No

Comment

No

Example

Yes

Fenced

No

Header

No

Literal

No

Listing

No

Macro

Yes

Open

Varies

Paragraph

Yes

Passthrough

No

Quote

Yes

Sidebar

Yes

Source

No

Special sections

Yes

Table

Varies

Title

Yes

Verse

Yes