Markup Notes

From PeformIQ Wiki
Revision as of 09:58, 26 February 2026 by PeterHarding (talk | contribs) (Created page with "= Forms of Documentation Markup = In the past, many solutions have evolved to facilitate the easy creation of documents and documentation. These range from lightweight human-readable formats like Markdown, to highly capable typesetting systems like TeX/LaTeX, to structured data formats like XML. The choice of markup language typically reflects a trade-off between ease of authoring, rendering fidelity, and the intended output medium. ---++ Historical Background The co...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Forms of Documentation Markup

In the past, many solutions have evolved to facilitate the easy creation of documents and documentation. These range from lightweight human-readable formats like Markdown, to highly capable typesetting systems like TeX/LaTeX, to structured data formats like XML. The choice of markup language typically reflects a trade-off between ease of authoring, rendering fidelity, and the intended output medium.

---++ Historical Background

The concept of "markup" predates computers — editors and typesetters would literally mark up manuscripts with instructions for printers. With the arrival of electronic publishing in the 1960s and 70s, these conventions were formalised into machine-readable languages. Early systems were tightly coupled to specific hardware (typesetters, printers, terminals), and over time higher-level, more portable formats emerged.

---++ Markup Languages and Formats

---+++ Runoff / Roff Family (1964 onwards) The roff family traces its roots to the =RUNOFF= program written by Jerome Saltzer at MIT in 1964. It was the dominant document formatting system on early UNIX systems.

  * *nroff* — formats text for fixed-width terminal output
  * *troff* — the typesetting variant, targeting phototypesetting devices
  * *groff* (GNU roff) — the modern open source successor, still widely used today for Unix man pages

Roff documents use dot-prefixed commands (e.g. =.PP= for paragraph, =.B= for bold) and remain the standard format for Linux/Unix manual pages.

---+++ SGML — Standard Generalised Markup Language (1986) SGML is a meta-language for defining markup languages, standardised as ISO 8879:1986. It introduced the concept of separating document structure from presentation, and was the parent of both HTML and XML. Its complexity limited widespread adoption outside publishing and aerospace/defence industries, but its influence is enormous.

---+++ TeX and LaTeX (1978 / 1984) Created by Donald Knuth, *TeX* is a typesetting system renowned for its precise, beautiful rendering of mathematical and scientific notation. *LaTeX*, built on top of TeX by Leslie Lamport, added higher-level macros that made document authoring far more approachable. LaTeX remains the dominant format for academic papers, theses, and scientific publishing worldwide. Documents use commands such as =\section{}=, =\textbf{}=, and the =equation= environment.

---+++ RTF — Rich Text Format (1987) Developed by Microsoft for cross-application and cross-platform document exchange. RTF encodes formatting as plain-text control words (e.g. =\b= for bold), making it human-readable in theory but verbose in practice. It was the default format for WordPad and early versions of Word, and served as a common interchange format before the dominance of OOXML and PDF.

---+++ HTML — HyperText Markup Language (1991)

Proposed by Tim Berners-Lee at CERN, HTML is an SGML-derived language that became the lingua franca of the World Wide Web. It separates document structure (tags like =

=, =

=, =<a>=) from presentation (originally handled inline, later delegated to CSS). HTML5 (2014) significantly expanded its capabilities to include multimedia, semantic elements, and application-level features. ---+++ DocBook (1991) An XML/SGML-based markup language originally developed by HaL Computer Systems and O'Reilly. DocBook is specifically designed for technical documentation and supports rich semantic tagging (=<procedure>=, =<varlistentry>=, =<programlisting>=, etc.). It can be transformed to HTML, PDF, man pages and other formats via XSLT stylesheets. Widely used in open source projects and technical publishing. ---+++ XML — eXtensible Markup Language (1998) A simplified subset of SGML designed for general-purpose structured data. While not a document format per se, XML underpins many documentation formats (DocBook, OOXML, DITA). Its strict well-formedness rules and wide tooling support made it a foundational technology for data interchange in the 2000s, though JSON has displaced it for many API use cases. ---+++ reStructuredText / RST (2002) Developed by David Goodger as part of the Python =docutils= project. RST is a lightweight markup language designed to be readable as plain text while being processable into HTML, LaTeX, PDF, and man pages. It is the native format of Python's official documentation and the Sphinx documentation generator, making it extremely prevalent in the Python ecosystem. <verbatim> Example RST: Section Title

=

Some **bold** text and a `hyperlink <http://example.com>`_. </verbatim>

---+++ Markdown (2004) Created by John Gruber (with input from Aaron Swartz), Markdown was explicitly designed to be readable as-is in plain-text form, with its syntax inspired by email conventions. It converts to HTML and is now the de facto standard for README files, static site generators, wikis, and developer documentation. Its simplicity came at the cost of standardisation — numerous dialects emerged (GitHub Flavored Markdown, CommonMark, MultiMarkdown) with incompatible extensions.

<verbatim> Example Markdown:

    1. Section Title

Some **bold** text and a [hyperlink](http://example.com). </verbatim>

---+++ AsciiDoc (2002) and Asciidoctor (2013) AsciiDoc fills a niche between Markdown's simplicity and DocBook's power. It was designed for writing books and technical documentation, supporting features like cross-references, includes, callouts, and multiple output formats natively. *Asciidoctor*, a Ruby reimplementation, significantly revitalised the ecosystem. It is used by the Git project, Spring Framework, and many others for their documentation.

---+++ Wiki Markup Most wiki platforms developed their own lightweight markup languages before Markdown became dominant. Examples include:

  * *MediaWiki markup* — used by Wikipedia; uses =italic=, =bold=, ===headings===
  * *TWiki / Foswiki markup* — uses ==---+= headings, =*bold*=, =_italic_=, and TML (Topic Markup Language) for structured data
  * *Confluence wiki markup* — Atlassian's proprietary variant

---+++ Org-mode (2003) An outlining and plain-text markup system embedded within the Emacs text editor, created by Carsten Dominik. Org-mode is exceptionally powerful for note-taking, project planning, literate programming, and publishing. It can export to HTML, LaTeX, PDF, ODT and many other formats, and its =babel= feature allows executable code blocks in dozens of languages.

---+++ CommonMark and Pandoc

  • CommonMark* (2014) was an effort to produce a rigorous, unambiguous specification for Markdown to resolve the fragmentation of dialects. *Pandoc*, created by John MacFarlane, is a universal document converter supporting dozens of input and output formats, and has become an essential tool for anyone working across multiple markup ecosystems.

---++ Summary Comparison

| *Format* | *Created* | *Primary Use* | *Output Targets* | *Complexity* | | Roff/groff | 1964+ | Man pages, UNIX docs | Terminal, PostScript | Medium | | TeX/LaTeX | 1978/1984 | Academic & scientific publishing | PDF, DVI | High | | SGML | 1986 | Meta-language, publishing | Varies | Very High | | RTF | 1987 | Word processing interchange | Print, screen | Medium | | HTML | 1991 | Web pages | Browser | Low–Medium | | DocBook | 1991 | Technical documentation | HTML, PDF, man | High | | XML | 1998 | Structured data & documents | Varies | Medium | | RST | 2002 | Python/developer docs | HTML, PDF, man | Medium | | AsciiDoc | 2002 | Technical books & docs | HTML, PDF, ePub | Medium | | Markdown | 2004 | READMEs, wikis, web | HTML | Low | | Org-mode | 2003 | Notes, literate programming | HTML, PDF, ODT | Medium–High |

---++ See Also

  * [[MarkdownSyntax][Markdown Syntax Reference]]
  * [[WikiSyntax][Foswiki Markup Reference]]
  * [[1][Pandoc — Universal Document Converter]]
  * [[2][CommonMark Specification]]