Difference between revisions of "Markup Notes"

From PeformIQ Wiki
Jump to navigation Jump to search
(Created page with "= Forms of Documentation Markup = In the past, many solutions have evolved to facilitate the easy creation of documents and documentation. These range from lightweight human-readable formats like Markdown, to highly capable typesetting systems like TeX/LaTeX, to structured data formats like XML. The choice of markup language typically reflects a trade-off between ease of authoring, rendering fidelity, and the intended output medium. ---++ Historical Background The co...")
 
 
Line 3: Line 3:
In the past, many solutions have evolved to facilitate the easy creation of documents and documentation. These range from lightweight human-readable formats like Markdown, to highly capable typesetting systems like TeX/LaTeX, to structured data formats like XML. The choice of markup language typically reflects a trade-off between ease of authoring, rendering fidelity, and the intended output medium.
In the past, many solutions have evolved to facilitate the easy creation of documents and documentation. These range from lightweight human-readable formats like Markdown, to highly capable typesetting systems like TeX/LaTeX, to structured data formats like XML. The choice of markup language typically reflects a trade-off between ease of authoring, rendering fidelity, and the intended output medium.


---++ Historical Background
----
 
== Historical Background ==


The concept of "markup" predates computers — editors and typesetters would literally mark up manuscripts with instructions for printers. With the arrival of electronic publishing in the 1960s and 70s, these conventions were formalised into machine-readable languages. Early systems were tightly coupled to specific hardware (typesetters, printers, terminals), and over time higher-level, more portable formats emerged.
The concept of "markup" predates computers — editors and typesetters would literally mark up manuscripts with instructions for printers. With the arrival of electronic publishing in the 1960s and 70s, these conventions were formalised into machine-readable languages. Early systems were tightly coupled to specific hardware (typesetters, printers, terminals), and over time higher-level, more portable formats emerged.


---++ Markup Languages and Formats
== Markup Languages and Formats ==


---+++ Runoff / Roff Family (1964 onwards)
=== Runoff / Roff Family (1964 onwards) ===
The roff family traces its roots to the =RUNOFF= program written by Jerome Saltzer at MIT in 1964. It was the dominant document formatting system on early UNIX systems.
  * *nroff* — formats text for fixed-width terminal output
  * *troff* — the typesetting variant, targeting phototypesetting devices
  * *groff* (GNU roff) — the modern open source successor, still widely used today for Unix man pages


Roff documents use dot-prefixed commands (e.g. =.PP= for paragraph, =.B= for bold) and remain the standard format for Linux/Unix manual pages.
The roff family traces its roots to the <code>RUNOFF</code> program written by Jerome Saltzer at MIT in 1964. It was the dominant document formatting system on early UNIX systems.
 
* '''nroff''' — formats text for fixed-width terminal output
* '''troff''' — the typesetting variant, targeting phototypesetting devices
* '''groff''' (GNU roff) — the modern open source successor, still widely used today for Unix man pages
 
Roff documents use dot-prefixed commands (e.g. <code>.PP</code> for paragraph, <code>.B</code> for bold) and remain the standard format for Linux/Unix manual pages.
 
=== SGML — Standard Generalised Markup Language (1986) ===


---+++ SGML — Standard Generalised Markup Language (1986)
SGML is a meta-language for defining markup languages, standardised as ISO 8879:1986. It introduced the concept of separating document structure from presentation, and was the parent of both HTML and XML. Its complexity limited widespread adoption outside publishing and aerospace/defence industries, but its influence is enormous.
SGML is a meta-language for defining markup languages, standardised as ISO 8879:1986. It introduced the concept of separating document structure from presentation, and was the parent of both HTML and XML. Its complexity limited widespread adoption outside publishing and aerospace/defence industries, but its influence is enormous.


---+++ TeX and LaTeX (1978 / 1984)
=== TeX and LaTeX (1978 / 1984) ===
Created by Donald Knuth, *TeX* is a typesetting system renowned for its precise, beautiful rendering of mathematical and scientific notation. *LaTeX*, built on top of TeX by Leslie Lamport, added higher-level macros that made document authoring far more approachable. LaTeX remains the dominant format for academic papers, theses, and scientific publishing worldwide. Documents use commands such as =\section{}=, =\textbf{}=, and the =equation= environment.


---+++ RTF — Rich Text Format (1987)
Created by Donald Knuth, '''TeX''' is a typesetting system renowned for its precise, beautiful rendering of mathematical and scientific notation. '''LaTeX''', built on top of TeX by Leslie Lamport, added higher-level macros that made document authoring far more approachable. LaTeX remains the dominant format for academic papers, theses, and scientific publishing worldwide. Documents use commands such as <code>\section{}</code>, <code>\textbf{}</code>, and the <code>equation</code> environment.
Developed by Microsoft for cross-application and cross-platform document exchange. RTF encodes formatting as plain-text control words (e.g. =\b= for bold), making it human-readable in theory but verbose in practice. It was the default format for WordPad and early versions of Word, and served as a common interchange format before the dominance of OOXML and PDF.


---+++ HTML HyperText Markup Language (1991)
=== RTF Rich Text Format (1987) ===
Proposed by Tim Berners-Lee at CERN, HTML is an SGML-derived language that became the lingua franca of the World Wide Web. It separates document structure (tags like =<h1>=, =<p>=, =<a>=) from presentation (originally handled inline, later delegated to CSS). HTML5 (2014) significantly expanded its capabilities to include multimedia, semantic elements, and application-level features.


---+++ DocBook (1991)
Developed by Microsoft for cross-application and cross-platform document exchange. RTF encodes formatting as plain-text control words (e.g. <code>\b</code> for bold), making it human-readable in theory but verbose in practice. It was the default format for WordPad and early versions of Word, and served as a common interchange format before the dominance of OOXML and PDF.
An XML/SGML-based markup language originally developed by HaL Computer Systems and O'Reilly. DocBook is specifically designed for technical documentation and supports rich semantic tagging (=<procedure>=, =<varlistentry>=, =<programlisting>=, etc.). It can be transformed to HTML, PDF, man pages and other formats via XSLT stylesheets. Widely used in open source projects and technical publishing.
 
=== HTML — HyperText Markup Language (1991) ===
 
Proposed by Tim Berners-Lee at CERN, HTML is an SGML-derived language that became the lingua franca of the World Wide Web. It separates document structure (tags like <code>&lt;h1&gt;</code>, <code>&lt;p&gt;</code>, <code>&lt;a&gt;</code>) from presentation (originally handled inline, later delegated to CSS). HTML5 (2014) significantly expanded its capabilities to include multimedia, semantic elements, and application-level features.
 
=== DocBook (1991) ===
 
An XML/SGML-based markup language originally developed by HaL Computer Systems and O'Reilly. DocBook is specifically designed for technical documentation and supports rich semantic tagging (<code>&lt;procedure&gt;</code>, <code>&lt;varlistentry&gt;</code>, <code>&lt;programlisting&gt;</code>, etc.). It can be transformed to HTML, PDF, man pages and other formats via XSLT stylesheets. Widely used in open source projects and technical publishing.
 
=== XML — eXtensible Markup Language (1998) ===


---+++ XML — eXtensible Markup Language (1998)
A simplified subset of SGML designed for general-purpose structured data. While not a document format per se, XML underpins many documentation formats (DocBook, OOXML, DITA). Its strict well-formedness rules and wide tooling support made it a foundational technology for data interchange in the 2000s, though JSON has displaced it for many API use cases.
A simplified subset of SGML designed for general-purpose structured data. While not a document format per se, XML underpins many documentation formats (DocBook, OOXML, DITA). Its strict well-formedness rules and wide tooling support made it a foundational technology for data interchange in the 2000s, though JSON has displaced it for many API use cases.


---+++ reStructuredText / RST (2002)
=== reStructuredText / RST (2002) ===
Developed by David Goodger as part of the Python =docutils= project. RST is a lightweight markup language designed to be readable as plain text while being processable into HTML, LaTeX, PDF, and man pages. It is the native format of Python's official documentation and the Sphinx documentation generator, making it extremely prevalent in the Python ecosystem.


<verbatim>
Developed by David Goodger as part of the Python <code>docutils</code> project. RST is a lightweight markup language designed to be readable as plain text while being processable into HTML, LaTeX, PDF, and man pages. It is the native format of Python's official documentation and the Sphinx documentation generator, making it extremely prevalent in the Python ecosystem.
 
<pre>
Example RST:
Example RST:
Section Title
Section Title
=============
=============
Some **bold** text and a `hyperlink <http://example.com>`_.
Some **bold** text and a `hyperlink <http://example.com>`_.
</verbatim>
</pre>
 
=== AsciiDoc (2002) and Asciidoctor (2013) ===
 
AsciiDoc fills a niche between Markdown's simplicity and DocBook's power. It was designed for writing books and technical documentation, supporting features like cross-references, includes, callouts, and multiple output formats natively. '''Asciidoctor''', a Ruby reimplementation, significantly revitalised the ecosystem. It is used by the Git project, Spring Framework, and many others for their documentation.
 
=== Markdown (2004) ===


---+++ Markdown (2004)
Created by John Gruber (with input from Aaron Swartz), Markdown was explicitly designed to be readable as-is in plain-text form, with its syntax inspired by email conventions. It converts to HTML and is now the de facto standard for README files, static site generators, wikis, and developer documentation. Its simplicity came at the cost of standardisation — numerous dialects emerged (GitHub Flavored Markdown, CommonMark, MultiMarkdown) with incompatible extensions.
Created by John Gruber (with input from Aaron Swartz), Markdown was explicitly designed to be readable as-is in plain-text form, with its syntax inspired by email conventions. It converts to HTML and is now the de facto standard for README files, static site generators, wikis, and developer documentation. Its simplicity came at the cost of standardisation — numerous dialects emerged (GitHub Flavored Markdown, CommonMark, MultiMarkdown) with incompatible extensions.


<verbatim>
<pre>
Example Markdown:
Example Markdown:
## Section Title
## Section Title
Some **bold** text and a [hyperlink](http://example.com).
Some **bold** text and a [hyperlink](http://example.com).
</verbatim>
</pre>


---+++ AsciiDoc (2002) and Asciidoctor (2013)
=== Wiki Markup ===
AsciiDoc fills a niche between Markdown's simplicity and DocBook's power. It was designed for writing books and technical documentation, supporting features like cross-references, includes, callouts, and multiple output formats natively. *Asciidoctor*, a Ruby reimplementation, significantly revitalised the ecosystem. It is used by the Git project, Spring Framework, and many others for their documentation.


---+++ Wiki Markup
Most wiki platforms developed their own lightweight markup languages before Markdown became dominant. Examples include:
Most wiki platforms developed their own lightweight markup languages before Markdown became dominant. Examples include:
  * *MediaWiki markup* — used by Wikipedia; uses =''italic''=, ='''bold'''=, ===headings===
  * *TWiki / Foswiki markup* — uses ==---+= headings, =*bold*=, =_italic_=, and TML (Topic Markup Language) for structured data
  * *Confluence wiki markup* — Atlassian's proprietary variant


---+++ Org-mode (2003)
* '''MediaWiki markup''' — used by Wikipedia; uses <code>''italic''</code>, <code>'''bold'''</code>, <code>== headings ==</code>
An outlining and plain-text markup system embedded within the Emacs text editor, created by Carsten Dominik. Org-mode is exceptionally powerful for note-taking, project planning, literate programming, and publishing. It can export to HTML, LaTeX, PDF, ODT and many other formats, and its =babel= feature allows executable code blocks in dozens of languages.
* '''TWiki / Foswiki markup''' — uses <code>---+</code> headings, <code>*bold*</code>, <code>_italic_</code>, and TML (Topic Markup Language) for structured data
* '''Confluence wiki markup''' — Atlassian's proprietary variant
 
=== Org-mode (2003) ===
 
An outlining and plain-text markup system embedded within the Emacs text editor, created by Carsten Dominik. Org-mode is exceptionally powerful for note-taking, project planning, literate programming, and publishing. It can export to HTML, LaTeX, PDF, ODT and many other formats, and its <code>babel</code> feature allows executable code blocks in dozens of languages.
 
=== CommonMark and Pandoc ===
 
'''CommonMark''' (2014) was an effort to produce a rigorous, unambiguous specification for Markdown to resolve the fragmentation of dialects. '''Pandoc''', created by John MacFarlane, is a universal document converter supporting dozens of input and output formats, and has become an essential tool for anyone working across multiple markup ecosystems.


---+++ CommonMark and Pandoc
== Summary Comparison ==
*CommonMark* (2014) was an effort to produce a rigorous, unambiguous specification for Markdown to resolve the fragmentation of dialects. *Pandoc*, created by John MacFarlane, is a universal document converter supporting dozens of input and output formats, and has become an essential tool for anyone working across multiple markup ecosystems.


---++ Summary Comparison
{| class="wikitable sortable"
! Format !! Created !! Primary Use !! Output Targets !! Complexity
|-
| Roff/groff || 1964+ || Man pages, UNIX docs || Terminal, PostScript || Medium
|-
| TeX/LaTeX || 1978/1984 || Academic & scientific publishing || PDF, DVI || High
|-
| SGML || 1986 || Meta-language, publishing || Varies || Very High
|-
| RTF || 1987 || Word processing interchange || Print, screen || Medium
|-
| HTML || 1991 || Web pages || Browser || Low–Medium
|-
| DocBook || 1991 || Technical documentation || HTML, PDF, man || High
|-
| XML || 1998 || Structured data & documents || Varies || Medium
|-
| RST || 2002 || Python/developer docs || HTML, PDF, man || Medium
|-
| AsciiDoc || 2002 || Technical books & docs || HTML, PDF, ePub || Medium
|-
| Org-mode || 2003 || Notes, literate programming || HTML, PDF, ODT || Medium–High
|-
| Markdown || 2004 || READMEs, wikis, web || HTML || Low
|}


| *Format* | *Created* | *Primary Use* | *Output Targets* | *Complexity* |
== See Also ==
| Roff/groff | 1964+ | Man pages, UNIX docs | Terminal, PostScript | Medium |
| TeX/LaTeX | 1978/1984 | Academic & scientific publishing | PDF, DVI | High |
| SGML | 1986 | Meta-language, publishing | Varies | Very High |
| RTF | 1987 | Word processing interchange | Print, screen | Medium |
| HTML | 1991 | Web pages | Browser | Low–Medium |
| DocBook | 1991 | Technical documentation | HTML, PDF, man | High |
| XML | 1998 | Structured data & documents | Varies | Medium |
| RST | 2002 | Python/developer docs | HTML, PDF, man | Medium |
| AsciiDoc | 2002 | Technical books & docs | HTML, PDF, ePub | Medium |
| Markdown | 2004 | READMEs, wikis, web | HTML | Low |
| Org-mode | 2003 | Notes, literate programming | HTML, PDF, ODT | Medium–High |


---++ See Also
* [[Markdown Syntax|Markdown Syntax Reference]]
  * [[MarkdownSyntax][Markdown Syntax Reference]]
* [https://pandoc.org Pandoc — Universal Document Converter]
  * [[WikiSyntax][Foswiki Markup Reference]]
* [https://commonmark.org CommonMark Specification]
  * [[https://pandoc.org][Pandoc — Universal Document Converter]]
* [https://www.mediawiki.org/wiki/Help:Formatting MediaWiki Formatting Help]
  * [[https://commonmark.org][CommonMark Specification]]


[[Category:Languages]]
[[Category:Languages]]
[[Category:Documentation]]
[[Category:Documentation]]

Latest revision as of 10:02, 26 February 2026

Forms of Documentation Markup

In the past, many solutions have evolved to facilitate the easy creation of documents and documentation. These range from lightweight human-readable formats like Markdown, to highly capable typesetting systems like TeX/LaTeX, to structured data formats like XML. The choice of markup language typically reflects a trade-off between ease of authoring, rendering fidelity, and the intended output medium.


Historical Background

The concept of "markup" predates computers — editors and typesetters would literally mark up manuscripts with instructions for printers. With the arrival of electronic publishing in the 1960s and 70s, these conventions were formalised into machine-readable languages. Early systems were tightly coupled to specific hardware (typesetters, printers, terminals), and over time higher-level, more portable formats emerged.

Markup Languages and Formats

Runoff / Roff Family (1964 onwards)

The roff family traces its roots to the RUNOFF program written by Jerome Saltzer at MIT in 1964. It was the dominant document formatting system on early UNIX systems.

  • nroff — formats text for fixed-width terminal output
  • troff — the typesetting variant, targeting phototypesetting devices
  • groff (GNU roff) — the modern open source successor, still widely used today for Unix man pages

Roff documents use dot-prefixed commands (e.g. .PP for paragraph, .B for bold) and remain the standard format for Linux/Unix manual pages.

SGML — Standard Generalised Markup Language (1986)

SGML is a meta-language for defining markup languages, standardised as ISO 8879:1986. It introduced the concept of separating document structure from presentation, and was the parent of both HTML and XML. Its complexity limited widespread adoption outside publishing and aerospace/defence industries, but its influence is enormous.

TeX and LaTeX (1978 / 1984)

Created by Donald Knuth, TeX is a typesetting system renowned for its precise, beautiful rendering of mathematical and scientific notation. LaTeX, built on top of TeX by Leslie Lamport, added higher-level macros that made document authoring far more approachable. LaTeX remains the dominant format for academic papers, theses, and scientific publishing worldwide. Documents use commands such as \section{}, \textbf{}, and the equation environment.

RTF — Rich Text Format (1987)

Developed by Microsoft for cross-application and cross-platform document exchange. RTF encodes formatting as plain-text control words (e.g. \b for bold), making it human-readable in theory but verbose in practice. It was the default format for WordPad and early versions of Word, and served as a common interchange format before the dominance of OOXML and PDF.

HTML — HyperText Markup Language (1991)

Proposed by Tim Berners-Lee at CERN, HTML is an SGML-derived language that became the lingua franca of the World Wide Web. It separates document structure (tags like <h1>, <p>, <a>) from presentation (originally handled inline, later delegated to CSS). HTML5 (2014) significantly expanded its capabilities to include multimedia, semantic elements, and application-level features.

DocBook (1991)

An XML/SGML-based markup language originally developed by HaL Computer Systems and O'Reilly. DocBook is specifically designed for technical documentation and supports rich semantic tagging (<procedure>, <varlistentry>, <programlisting>, etc.). It can be transformed to HTML, PDF, man pages and other formats via XSLT stylesheets. Widely used in open source projects and technical publishing.

XML — eXtensible Markup Language (1998)

A simplified subset of SGML designed for general-purpose structured data. While not a document format per se, XML underpins many documentation formats (DocBook, OOXML, DITA). Its strict well-formedness rules and wide tooling support made it a foundational technology for data interchange in the 2000s, though JSON has displaced it for many API use cases.

reStructuredText / RST (2002)

Developed by David Goodger as part of the Python docutils project. RST is a lightweight markup language designed to be readable as plain text while being processable into HTML, LaTeX, PDF, and man pages. It is the native format of Python's official documentation and the Sphinx documentation generator, making it extremely prevalent in the Python ecosystem.

Example RST:
Section Title
=============
Some **bold** text and a `hyperlink <http://example.com>`_.

AsciiDoc (2002) and Asciidoctor (2013)

AsciiDoc fills a niche between Markdown's simplicity and DocBook's power. It was designed for writing books and technical documentation, supporting features like cross-references, includes, callouts, and multiple output formats natively. Asciidoctor, a Ruby reimplementation, significantly revitalised the ecosystem. It is used by the Git project, Spring Framework, and many others for their documentation.

Markdown (2004)

Created by John Gruber (with input from Aaron Swartz), Markdown was explicitly designed to be readable as-is in plain-text form, with its syntax inspired by email conventions. It converts to HTML and is now the de facto standard for README files, static site generators, wikis, and developer documentation. Its simplicity came at the cost of standardisation — numerous dialects emerged (GitHub Flavored Markdown, CommonMark, MultiMarkdown) with incompatible extensions.

Example Markdown:
## Section Title
Some **bold** text and a [hyperlink](http://example.com).

Wiki Markup

Most wiki platforms developed their own lightweight markup languages before Markdown became dominant. Examples include:

  • MediaWiki markup — used by Wikipedia; uses italic, bold, == headings ==
  • TWiki / Foswiki markup — uses ---+ headings, *bold*, _italic_, and TML (Topic Markup Language) for structured data
  • Confluence wiki markup — Atlassian's proprietary variant

Org-mode (2003)

An outlining and plain-text markup system embedded within the Emacs text editor, created by Carsten Dominik. Org-mode is exceptionally powerful for note-taking, project planning, literate programming, and publishing. It can export to HTML, LaTeX, PDF, ODT and many other formats, and its babel feature allows executable code blocks in dozens of languages.

CommonMark and Pandoc

CommonMark (2014) was an effort to produce a rigorous, unambiguous specification for Markdown to resolve the fragmentation of dialects. Pandoc, created by John MacFarlane, is a universal document converter supporting dozens of input and output formats, and has become an essential tool for anyone working across multiple markup ecosystems.

Summary Comparison

Format Created Primary Use Output Targets Complexity
Roff/groff 1964+ Man pages, UNIX docs Terminal, PostScript Medium
TeX/LaTeX 1978/1984 Academic & scientific publishing PDF, DVI High
SGML 1986 Meta-language, publishing Varies Very High
RTF 1987 Word processing interchange Print, screen Medium
HTML 1991 Web pages Browser Low–Medium
DocBook 1991 Technical documentation HTML, PDF, man High
XML 1998 Structured data & documents Varies Medium
RST 2002 Python/developer docs HTML, PDF, man Medium
AsciiDoc 2002 Technical books & docs HTML, PDF, ePub Medium
Org-mode 2003 Notes, literate programming HTML, PDF, ODT Medium–High
Markdown 2004 READMEs, wikis, web HTML Low

See Also