Gentoo DevBook XML guide

DevBook XML design goals

The DevBook XML syntax is lightweight yet expressive, so that it is easy to learn yet also provides all the features we need for the creation of web documentation. The number of tags is kept to a minimum — just those we need. This makes it easy to transform DevBook XML into other formats, such as DocBook XML/SGML or web-ready HTML.

The goal is to make it easy to create and transform DevBook XML documents.

Basic structure

Let's start learning the DevBook XML syntax. We'll start with the initial tags used in a DevBook XML document:

The preamble of a DevBook XML document

<?xml version="1.0" encoding="UTF-8"?>
<devbook self="appendices/devbook-guide/">
<chapter>
<title>Gentoo DevBook XML guide</title>

On the first lines, we see the XML declaration that identifies this as an XML document. Next, there's a <devbook> tag — the entire document is enclosed within a <devbook> </devbook> pair. Its self attribute must point to the relative path of the document from the root node; in the example above the path is appendices/devbook-guide/. An exception is the root node itself, which has <devbook root="true"> instead.

Next, there is a <chapter> tag. Every document must have exactly one chapter. Its <title> is used to set the title for the entire document.

All elements must be closed of course, so the document ends with:

</chapter>
</devbook>

Sections and subsections

Once the initial tags have been specified, you're ready to start adding the structural elements of the document. Chapters are divided into sections; each section can hold zero or more subsections, which can contain zero or more subsubsections. Section, subsection and subsubsection elements must have a title. Here's an example section with a single subsection, consisting of a paragraph:

Minimal DevBook example

<section>
<title>This is my section</title>
<subsection>
<title>This is subsection one of my section</title>
<body>

<p>
This is the actual text content of my subsection.
</p>

</body>
</subsection>
</section>

Above, I set the section title by adding a child <title> element to the <section> element. Then, I created a subsection by adding a <subsection> element. If you look inside the <subsection> element, you'll see that it has two child elements — a <title> and a <body>. While the <title> is nothing new, the <body> is — it contains the actual text content of this particular subsection. We'll look at the tags that are allowed inside a <body> element in a bit.

Including sub-documents

The manual is organized as a tree. Each directory contains one document, which can include multiple sub-documents using the <include href="foo/"/> tag. Note that the trailing slash in the href value is mandatory.

A table of contents can be generated with <contents/>. Typically, this would be the only element in its own section body, as in the following example:

<section>
<title>Contents</title>
<body>
<contents/>
</body>
</section>

An example <body>

Now, it's time to learn how to mark up actual content. Here's the XML code for an example <body> element:

Example of a body element

<p>
This is a paragraph. <c>/etc/passwd</c> is a file.
<uri>https://www.gentoo.org/</uri> is my favorite website.
Type <c>ls</c> if you feel like it. I <e>really</e> want to go to sleep now.
</p>

<pre>
This is text output or code.
# this is user input
</pre>

<codesample lang="sgml">
Make HTML/XML easier to read by using selective emphasis:
&lt;foo&gt;bar&lt;/foo&gt;
</codesample>

<note>
This is a note.
</note>

<important>
This is important.
</important>

<warning>
This is a warning.
</warning>

<todo>
Text inside a <c>todo</c> element will appear in the
<uri link="::appendices/todo-list/"/>.
</todo>

Now, here's how the <body> element above is rendered:

This is a paragraph. /etc/passwd is a file. https://www.gentoo.org/ is my favorite web site. Type ls if you feel like it. I really want to go to sleep now.

This is text output or code.
# this is user input
Make HTML/XML easier to read by using selective emphasis:
<foo>bar</foo>

Body elements

We introduced a lot of new tags in the previous section — here's what you need to know. The <p> (paragraph), <pre> (preformatted block), <codesample> (code block), <note>, <important>, <warning> and <todo> tags all can contain one or more lines of text. Besides the <figure>, <table>, <ul>, <ol> and <dl> elements (which we'll cover in just a bit), these are the only tags that should appear immediately inside a <body> element. Another thing — these tags should not be stacked — in other words, don't put a <note> element inside a <p> element. As you might guess, the <pre> and <codesample> elements preserve their whitespace exactly, making them well-suited for code excerpts. Both <pre> and <codesample> can have a caption attribute:

Named <pre>

<pre caption="Output of uptime">
# uptime
16:50:47 up 164 days,  2:06,  5 users,  load average: 0.23, 0.20, 0.25
</pre>

Code samples and colour-coding

The <pre> tag does not support any syntax highlighting. When you need syntax highlighting, use the <codesample> tag along with a lang attribute — usually you want this to be set to "ebuild" to syntax highlight ebuild code snippets. Currently, the following languages are supported:

  • c
  • ebuild
  • make
  • m4
  • sgml

Sample <codesample lang="c"> block:

#include <stdio.h>

main()
{
	/* This is a comment */
	printf("Hello, world!\n");
}

You can also specify numbering="lines" to enable line numbering, as in the following example:

01: # Copyright 1999-2021 Gentoo Authors
02: # Distributed under the terms of the GNU General Public License v2
03: 
04: EAPI=7
05: 
06: DESCRIPTION="MicroGnuEmacs, a port from the BSDs"
07: HOMEPAGE="https://homepage.boetes.org/software/mg/"
08: SRC_URI="https://github.com/hboetes/${PN}/archive/${PV}.tar.gz -> ${P}.tar.gz"
09: 
10: LICENSE="public-domain"
11: SLOT="0"
12: KEYWORDS="alpha amd64 arm hppa ppc ~ppc64 sparc x86"
13: 
14: RDEPEND="sys-libs/ncurses:0=
15: 	>=dev-libs/libbsd-0.7.0"
16: DEPEND="${RDEPEND}"
17: BDEPEND="virtual/pkgconfig"
18: 
19: src_install() {
20: 	dobin mg
21: 	doman mg.1
22: 	dodoc README tutorial
23: }

Figures

Here's how to insert a figure into a document — <figure link="mygfx.png" short="my picture" caption="my favorite picture of all time"/>. The link attribute points to the actual graphic image, the short attribute specifies a short description (currently used for the image's HTML alt attribute), and a caption. Not too difficult :) We also support the standard HTML-style <img src="foo.gif"/> tag for adding images without captions, borders, etc.

Tables

DevBook XML supports a simplified table syntax similar to that of HTML. To start a table, use a <table> tag. Start a row with a <tr> tag. However, for inserting actual table data, we don't support the HTML <td> tag; instead, use the <th> if you are inserting a header, and <ti> if you are inserting a normal informational block. You can use a <th> anywhere you can use a <ti> — there's no requirement that <th> elements appear only in the first row.

Besides, both table headers (<th>) and table items (<ti>) accept the colspan and rowspan attributes to span their content across rows, columns or both.

Furthermore, table cells (<ti> & <th>) can be right-aligned, left-aligned or centered with the align attribute.

This title spans 4 columns
This title spans 6 rows Item A1 Item A2 Item A3
Item B1 Blocky 2x2 title
Item C1
Item D1..D3
Item E1..F1 Item E2..E3
Item F2..F3

Lists

To create ordered or unordered lists, simply use the XHTML-style <ol>, <ul> and <li> tags. Lists may only appear inside the <body> and <li> tags which means that you can have lists inside lists. Don't forget that you are writing XML and that you must close all tags including list items unlike in HTML.

Definition lists (<dl>) are also supported. Please note that the definition term tag (<dt>) does not accept any other block level tag such as paragraphs or admonitions. A definition list comprises:

<dl>
A Definition List Tag containing
<dt>
Definition Term Tags
<dd>
and Definition Data Tags.

The following example copied from w3.org shows that lists may also be nested and different list types may be used together:

The ingredients:
  • 100 g flour
  • 10 g sugar
  • 1 cup water
  • 2 eggs
  • salt, pepper
The procedure:
  1. Mix dry ingredients thoroughly.
  2. Pour in wet ingredients.
  3. Mix for 10 minutes.
  4. Bake for one hour at 300 degrees.
Notes:
The recipe may be improved by adding raisins.

Inline elements

<c>, <b>, <e>, <sub> and <sup>

The <c> element is used to mark up a command or user input. Think of <c> as a way to alert the reader to something that they can type in that will perform some kind of action. For example, all the XML tags displayed in this document are enclosed in a <c> element because they represent something that the user could type in that is not a path. By using <c> elements, you'll help your readers quickly identify commands that they need to type in. Also, because <c> elements are already offset from regular text, it is rarely necessary to surround user input with double-quotes. For example, don't refer to a "<c>" element like I did in this sentence. Avoiding the use of unnecessary double-quotes makes a document more readable — and adorable!

As you might have guessed, <b> is used to boldface some text.

<e> is used to apply emphasis to a word or phrase; for example: I really should use semicolons more often. As you can see, this text is offset from the regular paragraph type for emphasis. This helps to give your prose more punch!

The <sub> and <sup> elements are used to specify subscript and superscript.

<uri>

The <uri> tag is used to point to files/locations on the Internet. It has two forms — the first can be used when you want to have the actual URI displayed in the body text, such as this link to https://www.gentoo.org/. To create this link, I typed <uri>https://www.gentoo.org/</uri>. The alternate form is when you want to associate a URI with some other text — for example, the Gentoo Linux website. To create this link, I typed <uri link="https://www.gentoo.org/">the Gentoo Linux website</uri>.

Please avoid the click here syndrome as recommended by the W3C.

Intra-document references

DevBook XML makes it really easy to reference other parts of the document using hyperlinks. You can create a link pointing to another chapter, like Ebuild file format, by typing <uri link="::ebuild-writing/file-format/">Ebuild file format</uri>, i.e. two colons followed by the relative path from the root node. To refer to a section in another chapter, like First ebuild, type <uri link="::quickstart/#First ebuild">First ebuild</uri>.

If the link target's chapter (or section etc.) title is to be used as the link text, an empty <uri> element can be used. As a matter of fact, I could have written the two examples above in more compact form: <uri link="::ebuild-writing/file-format/"/> and <uri link="::quickstart/#First ebuild"/> render as Ebuild file format and First ebuild, respectively.

Coding style

Since all Gentoo Documentation is a joint effort and several people will most likely change existing documentation, a coding style is needed. A coding style contains two sections. The first one is regarding internal coding — how the XML-tags are placed. The second one is regarding the content — how not to confuse the reader.

Both sections are described next.

Internal coding style

Newlines must be placed immediately after every DevBook XML tag (both opening and closing), except for: <title>, <th>, <ti>, <li>, <dt>, <dd>, <b>, <c>, <e>, <d/>, <uri>.

Blank lines must be placed immediately after every <body> (opening tag only) and before every <section>, <p>, <pre>, <codesample>, <figure>, <table>, <ul>, <ol>, <dl>, <note>, <important> and <warning> (opening tags only). An exception to this rule applies to tags that are located within list items or table cells.

Word-wrapping must be applied at 80 characters except inside <pre> and <codesample>. You may only deviate from this rule when there is no other choice (for instance when a URL exceeds the maximum amount of characters). The editor must then wrap whenever the first whitespace occurs. You should try to keep the rendered content of <pre> and <codesample> elements within 80 columns to help console users.

Indentation may not be used, except with the XML-constructs of which the parent XML-tags are <tr> (from <table>), <ul>, <ol> and <dl>. If indentation is used, it must be two spaces for each indentation. That means no tabs and not more spaces. Besides, tabs are not allowed in DevBook XML documents (again, except for <pre> and <codesample>).

In case word-wrapping happens in <ti>, <th>, <li> or <dd> constructs, indentation must be used for the content.

An example for indentation is:

Indentation example

<table>
<tr>
  <th>Foo</th>
  <th>Bar</th>
</tr>
<tr>
  <ti>This is an example for indentation</ti>
  <ti>
    In case text cannot be shown within an 80-character wide line, you
    must use indentation if the parent tag allows it
  </ti>
</tr>
</table>

<ul>
  <li>First option</li>
  <li>Second option</li>
</ul>

Opening tags with a single attribute may not be split between lines. For example, don't put a newline between <uri and its link attribute. Break the line before the <uri> tag instead.

Attributes may not have spaces in between the attribute, the "=" mark, and the attribute value. As an example:

Attributes

Wrong  :     <uri link = "https://www.gentoo.org/">
Correct:     <uri link="https://www.gentoo.org/">

Dashes used as in-sentence punctuation — like this — should be written as a <d/> tag surrounded by spaces. It would be difficult to distinguish a Unicode em-dash from a hyphen when editing the source using a fixed-width font.

External coding style

Inside tables (<table>) and listings (<ul>, <ol> and <dl>), periods (".") should not be used unless multiple sentences are used. In that case, every sentence should end with a period (or other reading marks).

Every sentence, including those inside tables and listings, should start with a capital letter.

Periods and capital letters

<ul>
  <li>No period</li>
  <li>With period. Multiple sentences, remember?</li>
</ul>

Titles should use sentence case, i.e. their first word should start with a capital letter, and all other words (except proper nouns) should be in lower case.

Try to use <uri> with the link attribute as much as possible. In other words, the Gentoo website is preferred over https://www.gentoo.org/.