XHTML Details

XHTML is an official recommendation of the w3c, published in 2000, and revised in August of 2002. The official copy and most recently revised copy is always at http://www.w3.org/TR/xhtml1/ and this document discusses XHTML 1.0. XHTML is the technology that brings the power of XML to the web and represents the current standard for how the Internet is intended to operate. As with all Internet standards, there is no enforcement of this standard. Rather it is enforced informally by the collective will of all those who participate in the Internet to ensure the interoperability and accessibility that has established it as the economic and social force that it is.

What is XHTML?

Simply, XHTML is the HTML 4.01 standard, updated and slightly reformulated to fit in with the syntax and sematics of XML document construction. In large part, these differences are covered in the standard which reads something like "XHTML is HTML4.01 with the deprecated tags removed". This document focuses upon the XHTML 1.0 Strict standard, however XHTML 1.0 does define two others, XHTML 1.0 Transitional and XHTML 1.0 Frameset. These exist in order to provide backward support for older browsers and to enable frames respectively--frames being so strongly deprecated that an entirely different standard was established to cover documents that make use of them rather than pollute the designated backward compatible version of the standard (Transitional).

That there exists backwards compatible versions of XHTML that enable you to make use of features that are deprecated and outright wrong (e.g. bgcolor and frameset) is not to invite one to create new content with them, rather they exist to aid existing sites and services to transition to the new technology. Therefore, unless you are editing an existing web page too complicated to simply rewrite, or targeting a closed, captive audience with administrative policies that preclude viewing XHTML, there is no legitimate reason to make use of the Transitional or Frameset versions of the standard.

Document basics

The most basic and smallest legal XHTML document contains the xml header, the document type declaration, and only the required tags. This file is shown below. If displayed in a browser, a blank page would appear with only the title in the title bar, or whatever mechanism the browser uses to display the title.

01: <?xml version="1.0" encoding="UTF-8" standalone="no"?>
02: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
03:   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
04: <html xmlns="http://www.w3.org/1999/xhtml">
05: <head>
06:   <title>A minimal XHTML file</title>
07: </head>
08: <body>
09: </body>
10: </html>

The <body> portion of the document would be populated with the actual content of the page, but as the XHTML standard shows, there is no required content for a page, only a title.

Entities

An XHTML document, as well as XML documents in general, consist of a series of nested entities. An entity is the most basic object in an XHTML document and consists of a start-tag, an end-tag, all of the content between them and the attributes in the start-tag.

Tags

XHTML tags consist of a word (comprised of any alphanumeric character and a select few punctuation characters) wrapping other content. This word appears as a start-tag and an end-tag. The start tag is <tag> and the end tag is </tag>. Everything between a matched pair of tags is the content of that tag. Tags may have other tags nested within them.

Attributes

Entities also have attributes that appear in the start-tag as a series of key-value pairs. The syntax is key="value" and the name of a key can be any valid XML name (again alphanumeric characters and a few punctuation characters).

References

These documents define what XHTML is and how it operates. Reading them is a required part of understanding XHTML.

HTML 4.01 Standard
http://www.w3.org/TR/html4/
XHTML 1.0 Standard
http://www.w3.org/TR/xhtml1/

Common Attributes

There are several attributes which are common to nearly every tag in the XHTML DTD.

iddocument-wide unique id
classspace separated list of classes style associated style info
titleadvisory title/amplification

Internationalization (%i18n;)

langlanguage code (backwards compatible)
xml:langlanguage code (as per XML 1.0 spec)
dirdirection for weak/neutral text

Scripting event attributes (%events;)

onclicka pointer button was clicked
ondblclicka pointer button was double clicked
onmousedowna pointer button was pressed down
onmouseupa pointer button was released onmousemove a pointer was moved onto the element
onmouseouta pointer was moved away from the element
onkeypressa key was pressed and released
onkeydowna key was pressed down
onkeyupa key was released

Structural Tags

The structural tags are generally invisible to the end user, with the notable exception of <title>, and serve to guide the web browser in its overall interpretation of the page.

tagdescriptionattributes
<body>document bodyall common attributes plus: onload, onunload
<head>document head%i18n;, id, profile
<html>document root element%i18n;, id, xmlns
<meta>generic meta information%i18n;, id, http-equiv, name, content, scheme
<title>document title%i18n;, id

Inline Elements

The content of inline tags are (generally) rendered without changing the layout of the text around it. A simple example of an inline tag is the anchor (<a>) tag. When the clickable text for an anchor is rendered, it is rendered in line with the text around it. Inline elements exist in contrast to block elements.

Tagdescription
<span>generic language/style container
<a>anchor
<abbr>abbreviated form (e.g., WWW, HTTP, etc.)
<acronym>an acronym
<b>bold text style
<bdo>I18N BiDi override
<big>large text style
<br>forced line break
<cite>citation
<code>computer code fragment
<dfn>instance definition
<em>emphasis
<i>italic text style
<kbd>text to be entered by the user
<q>short inline quotation
<samp>sample program output, scripts, etc.
<strong>strong emphasis
<sub>subscript
<sup>superscript
<tt>teletype or monospaced text style
<var>instance of a variable or program argument

Block Elements

Block tags are (generally) rendered by starting a new line and adding a little space above and below the contents of the tag. The simplest example of a block-level tag, is the paragraph (<p>) tag. Each paragraph is rendered as its own block of content. Block elements exist in contrast to inline elements.

Tagdescription
<div>generic language/style container
<address>information on author
<blockquote>long quotation
<body>document body
<dd>definition description
<dl>definition list
<dt>definition term
<h1>heading
<h2>heading
<h3>heading
<h4>heading
<h5>heading
<h6>heading
<hr>horizontal rule
<object>generic embedded object
<ol>ordered list
<p>paragraph
<pre>preformatted text
<ul>unordered list

Embedded Media

Embedded media consist of external content that the user agent incorporates into the XHTML presentation. The most common type of embedded media is an image. Through various browser extensions, the actual nature of embedded media can vary from novel forms of image compression, to fully interactive multimedia content, to full embedded applications.

<img>Embedded image
<hr>horizontal rule
<object>generic embedded object
<param>named property value

Linking and Imagemaps

<a>anchor
<area>client-side image map area
<map>client-side image map

Form Elements

Tagdescription
<form>interactive form
<label>form field label text
<input>form control
<select>option selector
<optgroup>option group
<option>selectable choice
<textarea>multi-line text field
<fieldset>form control group
<legend>fieldset legend
<button>push button

Deprecated Elements

The following elements have been deprecated. Deprecation serves as a means to ease existing users of a standard into a new version, offering them a transition period where old, formerly conforming material still conforms while discouraging new use. Needless to say, these entities do not appear in the XHTML 1.0 Strict DTD, and you should not use them in new content without a full understanding of the ramifications.

TagdescriptionReplacement technology
<applet>Java applet<object>
<basefont>base font sizeCSS
<center>shorthand for DIV align=centerCSS + <div>
<dir>directory listCSS
<font>local change to fontCSS + <span>
<isindex>single line prompt<form>
<menu>menu listCSS
<s>strike-through text styleCSS + <span>
<strike>strike-through textCSS + <span>
<u>underlined text styleCSS + <span>

All Valid Elements

Tagdescription
<a>anchor
<abbr>abbreviated form (e.g., WWW, HTTP, etc.)
<acronym>
<address>information on author
<area>client-side image map area
<b>bold text style
<base>document base URI
<bdo>I18N BiDi override
<big>large text style
<blockquote>long quotation
<body>document body
<br>forced line break
<button>push button
<caption>table caption
<cite>citation
<code>computer code fragment
<col>table column
<colgroup>table column group
<dd>definition description
<del>deleted text
<dfn>instance definition
<div>generic language/style container
<dl>definition list
<dt>definition term
<em>emphasis
<fieldset>form control group
<form>interactive form
<frame>F subwindow
<frameset>F window subdivision
<h1>heading
<h2>heading
<h3>heading
<h4>heading
<h5>heading
<h6>heading
<head>document head
<hr>horizontal rule
<html>document root element
<i>italic text style
<iframe>L inline subwindow
<img>Embedded image
<input>form control
<ins>inserted text
<kbd>text to be entered by the user
<label>form field label text
<legend>fieldset legend
<li>list item
<link>a media-independent link
<map>client-side image map
<meta>generic metainformation
<noframes>F alternate content container for non frame-based rendering
<noscript>alternate content container for non script-based rendering
<object>generic embedded object
<ol>ordered list
<optgroup>option group
<option>selectable choice
<p>paragraph
<param>named property value
<pre>preformatted text
<q>short inline quotation
<samp>sample program output, scripts, etc.
<script>script statements
<select>option selector
<small>small text style
<span>generic language/style container
<strong>strong emphasis
<style>style info
<sub>subscript
<sup>superscript
<table>
<tbody>table body
<td>table data cell
<textarea>multi-line text field
<tfoot>table footer
<th>table header cell
<thead>table header
<title>document title
<tr>table row
<tt>teletype or monospaced text style
<ul>unordered list
<var>instance of a variable or program argument

Valid XHTML 1.0! Valid CSS!