XHTML Tutorial

XHTML stands for eXtensible HyperText Markup Language.

XHTML Overview

XHTML is the extended version of widely used Hypertext Markup Language (HTML) and designed to work with the eXtensible Markup Language, or XML.

XHTML is in many ways similar to HTML, but it is more stricter and cleaner than HTML.

Here are the most important points to remember while creating a new XHTML document or converting existing HTML document into XHTML document:

  • XHTML document must have a DOCTYPE declaration at the top of the document.
  • All XHTML tag and attribute names must be written in lower case.
  • All the tags must be nested properly.
  • End tags are required for non-empty elements.
  • The start tag of an empty element must end with />.
  • All the attribute values must be quoted.
  • Attribute minimization is forbidden.

You will find the detailed explanation of the above XHTML rules under the section entitled Differences Between HTML and XHTML.


Why XHTML?

As XHTML documents need to be well-formed, your website will more likely to be compatible with present and future web browsers and rendered more accurately. It also makes your website easier to maintain, convert and format in the long run.

XHTML combines strength of HTML and XML; thus XHTML pages can be parsed by any XML enabled devices — unlike HTML, which requires a lenient HTML specific parser.

Web developers and user agent designers are constantly discovering new ways to express their ideas through new markup. In XML, it is relatively easy to introduce new elements or additional element attributes. The XHTML family is designed to accommodate these extensions through XHTML modules. These modules will permit the combination of existing and new feature sets when developing content and when designing new user agents.


Creating an XHTML Document

These are the basic steps to create an XHTML document.

  • The root element of the document must be <html>.
  • The root element of the document must contain an xmlns declaration for the XHTML namespace. The namespace for XHTML is defined to be http://www.w3.org/1999/xhtml.
  • There must be a DOCTYPE declaration in the document prior to the root element.
  • An XML declaration should be included on the top of the document.

Here is an example of an XHTML document.

  • <?xml version="1.0" encoding="UTF-8"?>
  • <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  • <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  • <head>
  •     <title>XHTML Document</title>
  • </head>
  • <body>
  •     <p>This is an example of XHTML document.</p>
  • </body>
  • </html>
 

Note:An XML declaration is not required in all XML documents; however it is strongly encouraged to use the XML declarations in all XHTML documents.


Differences Between HTML and XHTML

The following section summarizes the differences between HTML and XHTML.

All Tag Names and Attribute Names Must Be Written in Lowercase

In HTML tags and attributes can be written in uppercase or lowercase characters:

INCORRECT: uppercase elements

<P>Here is a <strong>important</strong> word in a paragraph.</P>

In XHTML, all tag names and attribute names must be written in lowercase. This difference is necessary because XML is case-sensitive e.g. <p> and <P> are different tags.

CORRECT: lowercase elements

<p>Here is a <strong>important</strong> word in a paragraph.</p>

Elements Must Be Nested Properly; There Must Be No Overlapping

In XHTML, all the elements must be nested properly. It means if an opening tag placed inside another element, the closing tag must also be placed within the same element:

Thus, you can't write:

INCORRECT: overlapping elements

<p>Here is an emphasized <em>paragraph</p>.</em>

Instead, this must be written as:

CORRECT: nested elements.

<p>Here is an emphasized <em>paragraph</em>.</p>
 

Tip:Overlapping is also illegal in HTML. You should always close the elements properly for the markup to be valid.

End Tags are Required for Non-empty Elements

In HTML certain elements were permitted to omit the end tag such as paragraph:

INCORRECT: unterminated elements

<p>This is a paragraph
<p>This is another paragraph

XHTML does not allow end tags to be omitted.

CORRECT: terminated elements

<p>This is a paragraph</p>
<p>This is another paragraph</p>

Empty Elements Must End With />

In HTML, empty elements are written like this:

INCORRECT: unterminated empty elements

A break: <br>
A horizontal rule: <hr>
An image: <img src="smiley.png" alt="Smiley">

In XHTML, The Start tag of empty elements must end with />:

CORRECT: terminated empty elements

A break: <br />
A horizontal rule: <hr />
An image: <img src="smiley.png" alt="Smiley" />
 

Note:Include a space before the trailing "/>" of empty elements, e.g. <br />, <hr /> and <img src="smiley.png" alt="Smiley" /> to ensure backward compatibility with the browsers.

Attribute Values Must Always Be Quoted

In HTML you can sometimes omit the quotes, as in:

INCORRECT: unquoted attribute values

<td rowspan=2>

In XHTML, all attributes must be enclosed within quotation marks, even those which appear to be numeric.

CORRECT: quoted attribute values

<td rowspan="2">

Attribute Minimization is Forbidden

XHML does not support attribute minimization. Attribute-value pairs must be written in full.

Attribute names such as selected and checked cannot occur in elements without their value being specified. Thus, you can't write:

INCORRECT: minimized attributes

<option selected>Car</option>

Instead must write this in attribute-value pair, as:

CORRECT: unminimized attributes

<option selected="selected">Car</option>

Scripts and Style Should Placed inside CDATA Section

In HTML, scripts and style elements could be included in a document, even if they included characters such as < or &:

In XHTML, the script and style elements are declared as having #PCDATA content. As a result, < and & will be treated as the start of markup, and entities such as &lt; and &amp; will be recognized as entity references by the XML processor to < and & respectively. This can cause rendering problems in web browsers.

Wrapping the content of the script or style element within a CDATA marked section avoids the expansion of these entities. But, since the document can also be parsed by HTML parsers, which do not recognise the CDATA markers, the CDATA markers are usually commented-out, as in this JavaScript example:

  • <script type="text/javascript">
  • //<![CDATA[
  • document.write("<, &, >");
  • //]]>
  • </script>

Or this CSS example:

  • <style type="text/css">
  • /*<![CDATA[*/
  • body { background: url("sky.jpg?width=500&height=300") no-repeat; }
  • /*]]>*/
  • </style>
 
Close

Your Feedback:

 

We would love to hear from you! Please say something.