(c) 2002 Toomas Karmo. You are welcome to distribute copies of this essay either in a paper printing not exceeding 10 copies or electronically, without asking my explicit permission, provided you do not modify or omit any of the contents, including this paragraph. Please contact me if you wish to use this material for other purposes, such as magazine publication. Network location of authoritative file version: http://www.interlog.com/~verbum/, literary-pages section. Revision history: 20021026T162844Z/version 0001.0000.

Web Development:
Current Good Practices

Five good practices need to be promoted now, if we Web developers are to unleash the full power of our technology:

Formal validation of HTML files begins with the machine-directed declaration of an HTML standard at the beginning of each file, explicitly telling servers and browsers not only that the given file is an HTML document, but to which Document Type Definition (DTD) the document is intended to conform. At present (late in 2002), it suffices to choose XHTML 1.0 Strict - a standard that is, with the exception of its prefatory lines, tag-for-tag identical with HTML 4.01 Strict. Specifying HTML 4.01 Transitional, or some earlier HTML, such as HTML 3.2, is not quite ideal, since the World Wide Web Consortium (W3C) is for good reasons now moving toward the abandonment of (formally, "deprecating") some of the 1990s tags, including the once-ubiquitous <FONT>. While intelligently written XHTML 1.0 Strict is backward-compatible, yielding acceptable results even in older browsers, the old HTML versions are not guaranteed to remain forever compatible with browsers tailored to future (X)HTML standards.

This means, in particular, that if we are generating our HTML with page-coding software, rather than with an all-purpose editor such as Unix emacs or Unix vi, we need to ensure correct configuration of the software: the software must write an incantation like <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> (with some further prefatory material specifying an XML namespace).

In striving to write strictly correct HTML code, we need a book explaining the full theory of the tag structures. The job is done well by the fourth edition of Chuck Musciano and Bill Kennedy's HTML & XHTML: The Definitive Guide (O'Reilly, 2000).

With our HTML duly coded, we need to check validity. If we have uploaded our page to some public server, we can supply the URL to the online validity-checker at W3C, using self-evident links from http://www.w3c.org. Alternatively, if we are running the Opera browser - available in both payware and no-functions-lost freeware versions from http://www.opera.com - we can make a Linux third-mousebutton click in the displayed page, and then select menu items "Frame -> Validate source...". The latter strategy has the advantage of working even if we are viewing the page on our own local workstation, before uploading it to a public server. In both cases, we shall either be told that the page conforms to our declared DTD or be told what specific tags are nonconformant.

A check of well-known pages on the Web reveals astonishing quantities of illegal HTML. A pleasant occasional five-minute recreation for the publishing professional, and one which benefits the wider community, is the checking of well-known pages with a validating tool. If, as is almost always the case, we find errors, we can send a quick, friendly mail to the webmaster, perhaps a little pointedly praising the efforts of W3C in bringing law and order to the Internet!

Further, we do well to place the W3C valid-HTML badge at the foot of our page, hyperlinked to the Consortium's validating parser. Displaying the badge declares openly that our page is legally tagged, and so makes it evident to everyone that our design ideas are worth cloning. Moreover, making the badge a hyperlink to the validating parser means that we ourselves, and the commercial client buying pages from us, and the properly skeptical member of the surfing public, have only to click on the badge to test future uploaded versions of the page for validity. (There are pretty well bound to be future versions, if only because text gets revised as fine details in the underlying journalistic message change. Murphy's Law being what it is, we may well expect that we, or the people to whom we have sold our services as an Internet consultant, or the unknown third parties who have cloned our design ideas, will someday misplace an HTML tag in the act of revising, and so will inadvertently upload an illegally tagged, seemingly good-looking, page to the server.)

Separation of content from presentation primarily involves creating an entire HTML page as content, without presentational markup, and using a separate Cascading Style Sheet file to guide the browser in applying appropriate fonts, background colours, and similar typographic decoration. Here again, an O'Reilly paperback from the year 2000, this time Eric A. Meyer's Cascading Style Sheets: The Definitive Guide, gives a thorough briefing on the underlying theory. When the job is done correctly, the HTML file itself has no presentational elements at all - not even the <TABLE> tag so heavily used by 1990s designers, in texts without true tables, to shoehorn page elements into the right spots. (At any rate, my own design philosophy eschews tables. Others may feel differently. I take this element in my philosophy from CSS evangelist Dominique Hazaël-Massieux, who supplies a do-it-without-any-tables tutorial at http://www.w3.org/2002/03/csslayout-howto.)

Admittedly, many browsers do not support CSS adequately. My reading and experiments suggest that it is best not to expect the Netscape 4.x family to cope with CSS. We can make Netscape 4.x unaware that CSS styling has been applied, while drawing the CSS to the attention of Internet Explorer 4.0 and above, Opera, Mozilla, Netscape 6.x, Netscape 7.x, and the like: Netscape 4.x is conveniently blind to the invocation <style type='text/css'> @import url('foobar.css'); </style>. If we want to be more sophisticated (I haven't been there yet, but perhaps I ought to go), we can reveal selected aspects of CSS for the benefit of Netscape 4.x, while concealing others. Details are available from http://www.ericmeyeroncss.com/bonus/trick-hide.html.

Further, both my reading and my experiments indicate serious problems with at least the earliest of the Internet Explorer 5.x family. But here the problems are not so severe as to preclude applying CSS. It is enough to be wary, paying the painful price of suppressing display of bullets in unordered lists, and bracing oneself for the possible need to render paragraphs without indention.

The ultimate response to the diverse insectarium of browser bugs is the incorporation of a JavaScript sniffer in one's HTML code, so as to detect the manufacturer and version number of each server-interrogating browser, and to serve out one or another CSS stylesheet as appropriate in the given case. (It's not, alas, an approach I know anything about.) In particular, one could, while serving substantial CSS to Explorer 5.x, serve not only bulleted lists, but traditional fine book typography, to all and only the CSS-aware browsers identifying themselves to the sniffer as superior to Explorer 5.x. Fine typography might include the following: no extra interparagraph whitespace; the first line of each paragraph indented by a single em, except where the paragraph opens a section; the first line of each section-opening paragraph set flush left; some or all section openings embellished with a large initial cap, the rest of that line being further embellished by rendering in small caps. (Maybe I'll get that far next year with these pages, maybe not.)

CSS, once we get it to work, delivers an immediate benefit, long before we need its well-known advantages in text-to-speech browsing and handheld-device (PDA) display. Since CSS allows presentation directives to reside in a file separate from all our HTML files, we can adjust such a detail as headline colour, in a two-hundred-page site with a thousand headlines, by changing a single stylesheet line. That, in turn, makes us bolder in experimenting with the look of our pages, and so is in the long run liable to improve our professional taste in typography.

Not only the HTML code in a page, but also the accompanying CSS, needs validation. In this case, too, a validating tool, with a validity-proclaiming badge, is available at http://www.w3c.org.

Primarily, then, separating content from presentation means separating plain HTML from CSS. Secondarily, it means designing even the plain HTML to meet the needs of all readers - so as to accommodate not only those using text-to-speech browsers (certainly the blind; possibly, too, the eventual purchasers of some "Web Walkman"), but also, more subtly, those using the emerging low-graphics wireless text-display devices, such as networked PDAs or Web-connected cell phones.

In the best of all possible worlds, we Web developers would run some such tool as the IBM Home Page Reader (HPR) synthesized-voice browser, for checking that our pages meet even the most exacting requirements of text-only publication. Failing that, we can at least check our work in the text-only visual browser lynx. Here's how lynx looks on my own (Mandrake Linux) workstation:

screenshot of lynx browser

Reproduced here is a small piece of one screen from my four-screen environment, showing at this particular instant in the workflow the bottom edge of a (time-server-synchronized) clock, the bottom edge of the system-monitoring xosview tool, the left edge of a home-made address book maintained with the plain-text vi editor (why waste time on address-book bloatware?), a portion of an xpdf viewer, two minuscule xterm chunks, a tiny chunk of the navy-blue desktop surface, and pretty well all of lynx.

In this particular case, lynx has been launched in a tiny-font xterm, with the command-line invocation wea, so as to bring up the Toronto weather forecast at or very near the Universal Coordinated Time 20021026T153417Z. (wea is my own shorthand, strictly a "bash shell alias", implemented with the .bashrc configuration-file line alias 'wea=lynx -term=vt100 http://weatheroffice.ec.gc.ca/foo/bar' for an appropriate foo/bar. The shorthand is useful when we want the weather forecast in a hurry, and do not wish to take the several seconds needed for mousing to, and activating, a weather bookmark in a conventional browser such as Netscape or Opera.)

If we do not have lynx installed on our own workstation, we can find a public-access client by giving Google the search string public lynx.

If you are reading this present page in a conventional browser with image downloading enabled, you will find a pair of adjacent clickable icons at the bottom of the page, leading to two different validation engines at W3C. Following my own exhortation to make pages display cleanly in text-only browsers, I have chosen to code the HTML for the two icons with alt attributes on the img tags, with square brackets in the attribute text: alt="[Valid XHTML 1.0!]" for the first image, and alt="[Valid CSS!]" for the second. This pair of decisions has the consequence that a text-only browser displays [Valid XHTML 1.0!] [Valid CSS!] - or, conceivably, in the worst possible case, the no-intervening-whitespace [Valid XHTML 1.0!][Valid CSS!] - but with no risk of displaying the repellent Valid XTHML 1.0!Valid CSS!. In addition to being effective separators, the square brackets reinforce the suggestion of clickability already incorporated into the standard lynx hyperlink look-and-feel.

The idea of using square brackets comes from an excellent essay on the alt= attribute by computing-for-physics specialist A.J. Flavell of Glasgow University. (The essay was originally published at http://ppewww.ph.gla.ac.uk/%7Eflavell/alt/alt-text.html, was subsequently published in an authorized reprint at http://www.htmlhelp.com, and was also misappropriated elsewhere - in at least one case with an assertion of copyright! - by third parties on the Web.) Flavell stresses that alt= is normally to be used not to supply a textual description of an image, but, rather, to supply text which in a lynx-like environment serves the same function as the corresponding image serves in a conventional browser.

Among Flavell's examples of what can go wrong when his precept is ignored are the lynx-style rendering "Small red bullet Response to Terrorism" for a page at the American Embassy in Belgrade (what was needed was not alt="Small red bullet", but alt="*") and the splendidly sinister Britishism "Our Classroom and Staff fancy horizontal rule" (what was needed was not alt="fancy horizontal rule", but alt="________").

In many cases, an image is purely decorative, and so needs to be dropped without any comment at all in lynx. In such a case, the correct code is not, as it were, alt="decorative image of a fancy open book on an oak table", but merely alt="". (Why not leave out alt= altogether? That minimalist tactic, apart from being illegal in at least XHTML 1.0 Strict, may cause a text-only browser to mark the place of the image with a string such as IMG. Such a string will lead some readers to worry about the possible loss of editorially significant visual content.)

Manufacturers of conventional browsers tend to render alt= as mouseover text, in the style of "Tooltips" in a Microsoft application. This is contrary to the actual W3C intent of the alt= construction. (It is, on the other hand, an eminently appropriate use of the attribute title=, permitted with the img tag in at least the HTML 4.01 and XHTML 1.0 family of standards.)

Not only <img alt=, but also tabular matter, used to present problems for nonstandard browsers. Such matter is for two reasons less problematic nowadays: on the one hand, current Web development best practice (arguably) uses CSS in place of <TABLE> to shoehorn the images on an elaborate page into the places required by the full-dress visual presentation; and, on the other hand, lynx (as one browser, at least, in the text-only family) has (at least) a partial understanding of <TABLE>. Still, problems may remain. A useful resource, updated as recently as Universal Coordinated Time 20020517T121011Z, is A.J. Flavell's "TABLES on non-TABLE Browsers" (http://ppewww.ph.gla.ac.uk/~flavell/www/tablejob.html).

Adoption of disciplined information architectures for collections of Web pages makes it easy for the surfer to find information. In the most interesting phase of my 1990s career as a part-time Web project manager, I worked hard with two talented designers, Tim Jancelewicz and David McCarthy, on the design of a technical site for a Government of Greenland agency. How could field engineers be sure to lay their hands quickly on mineral-blasting permits and similar agency documents? How could the site create a uniform surfing experience for readers of English, Greenlandic, and Danish, even given that the content in the three languages was not always identical-save-for-translation? After much thought, we evolved some principles (I will here disregard our special ideas for accommodating three languages) which I use today on my own http://www.interlog.com/~verbum/:

Tight integration of Web and press publishing consists in allowing authors to write just once, and yet to publish both to the PDF viewer (or to the PostScript viewer, or to the offset press) and to the HTML browser.

I first explored Web-and-press integration in 1999, coding much of a 700-page Estonian-language book draft in SGML. In those days, the preferred markup language was SGML-based DocBook, and OpenJade the preferred tool for converting DocBook into the finished presentation. A tiny Linux bash script conveniently drove my Jade and a couple of pieces of auxiliary open-source software, generating, from the DocBook source code, a set of interlinked HTML pages on the one hand, and on the other hand a PostScript file ready for a PDF distiller.

The same technology has been used by document engineers in Paris, at least in 2001 or 2002, to produce the "User" and "Reference" manuals for Mandrake Linux.

Underlying the creation of a presentation for Web or press from DocBook source code via Jade is a stylesheet file in Document Style Semantics and Specification Language (DSSSL). Awkwardly, however, DSSSL syntax is alien to SGML, deriving instead from the 1960s artificial-intelligence language LISP, and therefore ultimately from the 1930s "lambda calculus" of American mathematical logician Alonzo Church. With the advent in the late 1990s of the machine-streamlined SGML subset XML, with the reformulation of DocBook markup in XML, and with the rise of XSL Formatting Objects (XSL-FO), a new Web-and-press integration technology has emerged.

Under this new technology, people continue to author in DocBook (admittedly, now in its XML, not its SGML, flavour). In place of Jade, however, they now run so-called XSL Transformations (XSLT) against the DocBook source file, with some such tool as the Xalan XSLT processor. One invocation of XSLT generates a set of XHTML pages. Another invocation generates what is essentially high-level printing-press markup, in XSL-FO. Finally, an invocation of some such tool as the Apache XML Project's open-source FOP generates PDF, ready for shipping to the lithographic-plate crew at the press house, from the XSL-FO. It's a technology I have not used yet, but which I look forward to exploring over the coming months.

At the moment, I've just begun investigating an XML alternative to the DocBook DTD, the Text Encoding Initiative (TEI; http://www.tei-c.org/). TEI started in 1987 in SGML, but has since diversified into XML. Whereas DocBook is the appropriate DTD framework for computer manuals and similar technical documents, TEI is suitable for the humanities. (TEI has applied its accumulated wisdom even to the more challenging problems in humanities scholarship, such as the analysis of medieval manuscripts. Gratifyingly, among the four universities guiding TEI is Oxford, a longstanding citadel of lexicography and philology.) Like DocBook, TEI is evidently an appropriate archiving technology for documents that get rendered via XSLT into printing-press and HTML publications.

Participation in the public Web-accessibility initiatives begins with an awareness of the work of the W3C (http://www.w3c.org). W3C was founded in 1994 by the CERN physicist Tim Berners-Lee, who devised the HTML-powered Web as an application of SGML. Today, W3C has more than 450 member organizations. The universities in the guiding-and-facilitating core include the Massachusetts Institute of Technology. It is W3C that promulgates the formal standards, such as XHTML 1.0 Strict, against which we validate our code.

Not only, however, does W3C construct formal Web engineering standards: the Consortium has now also launched the Web Accessibility Initiative (WAI). WAI promulgates a set of Web Content Accessibility Guidelines (WCAG), with prioritized quality-control "checkpoints". WAI and WCAG meet the concerns of Section 508 in the Rehabilitation Act (USA) of 1973, and so supply an appropriate framework for those Web developers seeking government contracts. (Here in Canada, for instance, the Treasury Board has issued the directive, "All GoC Web sites must comply with W3C Priority 1 and Priority 2 checkpoints . . . " It is no doubt thanks to that directive that the Environment Canada weather report proves readable when I hastily launch lynx in a Linux xterm by typing the command wea.)

As W3C supplies badges for declaring one's compliance with XHTML and CSS standards, so also it supplies a badge for asserting compliance with WAI-WCAG. Although I have not yet subjected my site to the WAI-WCAG checkpoint list, I hope to do so.

Perhaps second only in importance to WAI-WCAG is the grassroots "Web Standards Project", at http://www.webstandards.org/. Here we find an impressive combination of impassioned advocacy with the same clearheaded websmithing praxis as hallmarks http://www.w3c.org/ - pages implemented in CSS, with elegant typography, and maintained with fine engineering. (The site "Colophon" page gives the essentials of the engineering, in terms which show how much I, for one, need to learn, most notably about the Web-maintenance potential of Perl:

The site is written in XHTML Strict with Cascading Style Sheets (CSS). Several Perl scripts were used to create the site's directory structure and do the dirty work of copying templates and includes to their proper places before we could begin to populate the site with content. . . . The site is served by the Apache Web server, on a handmade Pentium III/400 running Linux. . . . We use the Concurrent Versions System, as well as various other CVS clients, to allow the geographically disparate members of the group to edit and manage the site.

Among the achievements of the Web Standards Project is the creation, in 2001, of a Macromedia Dreamweaver Taskforce. Thanks in part to the Taskforce, the Macromedia Dreamweaver MX which was released to the Web-developer community in May of 2002 is substantially more standards-aware than its predecessors.

Also eminently worth mentioning is the Web Design Group (WDG), which has for some years promoted the creation of "non-browser specific, non-resolution specific" pages. The WDG site, http://www.htmlhelp.com/, offers tools, including a link-checker, and a CSS validator that proves somewhat more user-friendly than the corresponding tool at W3C.

A lone crusader for Web usability and standards compliance, but now with allies publishing in 30 languages, from Afrikaans to Vietnamese, is the young American Web developer Cari D. Burstein, at http://www.anybrowser.org/.

Among the books on conservative design philosophy is the deeply thoughtful Web Style Guide, created by specialists in medical publishing at Yale. The hoary first edition (tailored to Netscape 2.x and Explorer 2.x, but nevertheless of continued utility on many points - especially, perhaps, on the eternally vexed question of browser-window widths) is still available at http://info.med.yale.edu/caim/manual/contents.html. The second edition (which I have not seen) can now be had from Amazon.

Finally, we remark that on the topic of Web accessibility, as on most topics, the volunteer editors at "Directory Mozilla", or the Open Directory Project, have amassed annotated bibliographies. (ODP can be browsed either as a downstream resource, through the Google "Directory" facility, or directly, at http://www.dmoz.com/.) At or very near Universal Coordinated Time 20021026T152837Z, two categories to check were Top: Computers: Software: Internet: Clients: WWW: Browsers: Accessibility and Top: Computers: Internet: Web Design and Development: Web Usability.

If you share my view that good Web engineering matters, then do get in touch. The most efficient means of communication is an e-mail to verbum@interlog.com, with a subject header incorporating the phrase "HTML good practices".