Personal tools
You are here: Home Members ThunderChicken's Home The Brokenness of Structured Text
Document Actions

The Brokenness of Structured Text

by Stewart "ThunderChicken" Stremler last modified 2005-05-18 07:50

Structured Text Misfeatures.

There is a lot of discussion in some circles about the bewildering proliferation of various "structured text" schemes. It's generally acknowledged that HTML is excessively complex for most people most of the time when all they really want to do is apply some really basic markup to their text. The Wiki world caught on to this early and devised a really simple markup for wiki pages, and it's obviously a good thing, but nobody can agree on which features are necessary and how they should be implemented.

Zope uses something called "Structured Text" that is more broken than most, in my opinion. It's spending a lot of effort solving the wrong sorts of problems. I'll try to distill some opinions about what a "good" solution should include at the end.

Let's begin with some of what is done right.

Underscore text underscore: like so, results in an underline.

Likewise, asterisk text asterisk: like this, results in italics (emphasized). This is traditionally read as bold, and italics as slash-text-slash, but then asterisk asterisk bracketing the text provides bold, so the usual effects are there, they're just not achieved by the standard (UseNet) conventions.

Blank lines set off paragraphs. This is one of the more important whitespace features. Further, line breaks within a paragraph do not do strange and annoying things. It's all smushed back into one paragraph. Those of us who are in a habit of hitting return at the end of a line instead of just typing and letting the line-wrap take care of it are not unduly discomfited (as we are in some systems).

So, now on to what isn't quite so nice.

First, the using of single-quotes to delimit "quoted" text. It's not uncommon to use single-quotes (apostrophes) to quote something, so this counts as a misfeature. It's removing a common device from the writer (can you say "scare quotes"?) without an obvious recourse. (It isn't even consistent, as you can have situations where phrases are set off by single-quotes. I haven't quite figured out how that works just yet.)

Second, there's the matter of brackets. Any number of brackets around a single word becomes a link, but brackets around a phrase aren't a link. Thus, something like [[link]] becomes a link, while [no link] does not. Most of the other wiki-text schemes realized the importance of providing a consistent mechanism; the ones that I have used used a double-bracket to escape the leading bracket, allowing the user to use square brackets for single words without inadvertently creating a link.

Readers who see a link should be expected to click on it. Inadvertent links only serve to confuse and annoy.

Third, there's the double-dash issue. I find it much easier to read ASCII text when dash-dash has a space on each side. No spaces, and it doesn't indicate the break properly (it looks like a typo when hyphenating) and results in the reader (me) pausing to puzzle out the meaning. With only a leading or trailing space, it looks unbalanced, and again, the reader is left to puzzle out the meaning implied by the asymmetric usage. And if spaces are on both sides of the dash-dash, the Structured Text convention does something rather funky (and inserts a newline in the process).

Fourth, unordered bullets. Unordered bullets are standard (dash, lowercase-Oh, and asterisk, at the start of a line), although it might be nice if different signifiers would indicate different sorts of bullets. If I'm mixing asterisks and dashes, it is probably because I want the bullets to be distinguishable.

Sub-bullets are indicated by indentation, but the sub-bullets are identical to the bullets above. No visual indication for the reader, aside from a few pixels. If the bullets are more than short items (as in containing actual information) then the reader is SOL and will have to work unduly hard to avoid getting lost. After all, it's already the case that with unique bullets per level, the reader is apt to get lost in complicated content-heavy lists anyway.

Fifth, ordered bullets. Ordered bullets are indicated by a number at the start of the line (paragraph), with an optional period. However, the actual number is ignored, and replaced with the position in the list. This sort of behavior is broken enough to make the using of any digits aside from 0 senseless. Just start your ordered list with "0." and have every element use "0." as the bullet. Assigning numbers would only be confusing and lead you to refer to "point three" which might well have the bullet "4." in front of it.

If you're going to use numbers to indicate ordered bullets, respect the numbers. If you're not, choose something else. Using # as a bullet would have been an ideal choice, as it indicates the bullet will be replaced with a number (but it isn't specific), and it follows the pattern laid out by the unordered bullets. You still wouldn't get to choose where your bullets started counting from, but you wouldn't expect to either.

In conclusion, it's my opinion that the "Structured Text" format is hopelessly broken. It does not constitute a significant advance over the many other wiki-formats overall, although it does make minor advances in limited areas.

So what constitutes a "good" solution?

  • Do not suprise the writer.

    The writer should be able to write the basic text without having to worry that what they write will not result in very strange output. Age-old standard traditions should be respected (-, , o for bullets, bracketing _s for underlying, bracketing s for bold, bracketing /s for italics, etc.) when possible.

    The writer should be able to use standard means of expression - scare quotes, em-dashes, and suchlike - without having to adopt a totally alien style of writing. We already have that alien style of writing: it's called HTML.

  • Do not provide features that might confuse the reader.

    Do not discard information provided by the writer that the reader might need to avoid confusion. Do not generate output that obscures the intent of the writer.

    (Yes, this items is about brackets and bullets.)

  • Provide escape mechanisms.

    The writer should be able to display any sequence of ASCII characters in the final product. It is insane to provide a mechanism that is less flexible that straight ASCII text. The escape mechanisms should be simple, clear, and consistent. UNIX adopted the backslash as an escape character; a lot of the wiki-languages double-up special characters (such as brakets) to escape them.

    It doesn't really matter which mechanism is used, but one should be provided.

  • Strive for simplicity.

    A structured text format should strive to exceed ASCII in ability, not rival HTML in complexity. The reasons we want these sorts of mini-markup languages is so that we can quickly (thus, wiki) and easily create content by just typing.

    A good rule of thumb is that you should be able to print out the raw source, and hand it to someone totally unfamiliar with the format, and they should be able to read and appreciate the content without getting confused by the markup.


I've learned a couple of more "features" that I need to incorporate into this document. One is that structured text uses (a subset of?) raw HTML as an "escape" mechanism, and another is that some features can be negated.

Powered by Plone CMS, the Open Source Content Management System

This site conforms to the following standards: