Login




Forget password?
Create new account

XHTML Tutorial, Part 1

Page Contents

Introduction

This topic gets into some of the nuts and bolts of creating web pages. We're not going to use Dreamweaver yet, at least not for its WYSIWYG interface. We're going to start at the very beginning, creating web pages from scratch. For those of you with less web experience (or no web experience) this may seem like a scary process, but it's not so bad. Root canals are much worse. I promise...

Note: this will be a relatively easy topic for those of you who have created web content before. But don't assume that you already know everything that this topic is going to cover, especially if you have not used XHTML much before.

What is the Internet?

The internet is a collection of computers linked together in a network. A computer in Northern Virginia can be a part of the internet. So can a computer in India, or Australia, or Taiwan, or Lithuania, or Greenland... or anywhere else in the world. The internet links these computers together and lets them know where all the other computers are. There are billions of computers connected this way.

How do all of these computers know where the other computers are? The answer is that there is a group of high-capacity computers that keep track of this information. These specialized computers are sometimes referred to as the "backbone of the internet." They each have copies of the master database that lists all of the computers on the internet.

The internet was invented by researchers at MIT (Massachusetts Institute of Technology) in conjunction with projects for the United States military (through ARPA, the Advanced Research Projects Agency) in the 1960s. Some of the people responsible include J.C.R. Licklider (who called it a "Galactic Network"), Ivan Sutherland, Bob Taylor, Lawrence G. Roberts, Leonard Kleinrock, Thomas Merrill, and others. The concept was actually independently invented by several groups of researchers at about the same time. The first large scale wide area network became known as ARPANET, because of its connection with ARPA.

The idea behind it was to create a decentralized network that could not only link computers together, but also survive catastrophic circumstances, such as a nuclear attack or massive natural disaster. If some of the computers in the internet are disabled for any reason, there are many other computers that are still linked together. Even if some of the computers that make up the backbone of the internet are disabled, there are still other computers with this same information distributed throughout the world. People (and the military) would still be able to share information over the internet. It is a form of communication, and a way of sharing data around the world. This type of network could be especially important in terms of national security, which was the original purpose. Nowadays the internet is used for much more than military or pure academic purposes, but that's how the internet first began.

What I have said here is of course a very simplified version of the history of the internet. If the topic interests you, you could always search for "history of the internet" in your browser, and you'll find a long list of resources.

What is the World Wide Web?

Even though most people use the terms "internet" and "world wide web" as if they meant the same thing, they aren't quite the same thing. The internet is the network. The World Wide Web is a system for sharing information over that network. The distinction is not extremely important, but the internet and the World Wide Web are technically two different things.

The World Wide Web uses documents formatted in XHTML (or HTML) to share information. XHTML documents are just text, with a few extra pieces of information that provide formatting (in paragraphs, headings, tables, etc.) and the ability to link to other XHTML documents. In fact, the linking capability of XHTML is where XHTML gets its name. XHTML stands for eXtensible HyperText Markup Language. "Hypertext" is the name they gave to linked text.

Note: HTML means the same thing, but without the "eXtensible" part. HTML was invented first. XHTML was invented later and has a few advantages over HTML. In this class we're learning XHTML rather than HTML because XHTML is the newer, more current version of HTML.

Tim Berners-Lee developed the original HTML language and the first "browser," which was nothing more than a software program that could interpret the HTML language in a way that would be useful to users. The browser interpreted links from one "page" to another and allowed users to click on the link to go to the other document.

XHTML and "Web Pages"

So what is a web page? When you type in www dot whatever, what happens? How do you get the information? How did it get there to begin with? The short answer is that a web page is nothing but text. Sometimes that text includes references to images, and sometimes the text has been programmed to do things other than just sit there, but a web page is still just text. This is an over-simplification, but it's still true (or at least it's true enough for our purposes).

The text in web pages has some extra invisible "tags" or "elements" or "markup" (all of these terms are used by developers) that give text some formatting. For example, if you want to designate a chunk of text as a paragraph, you have to enclose it in paragraph elements, like this:

<p>This is a (very short) paragraph.</p>

You put a <p> at the beginning of the paragraph to say that this is where the paragraph begins, and you put a </p> at the end of the paragraph to say that this is where the paragraph ends. Any element with a forward slash in it is a closing element. Here are other examples:

Table of elements
Type of Element Opening Closing
An HTML document <html> </html>
The "head" of an HTML document (this is where you put the title and other "meta" information) <head> </head>
Title (the "name" of your web page) <title> </title>
The body (visible text) of an HTML document <body> </body>
Paragraph <p> </p>
Heading level 1 <h1> </h1>
Heading level 2 <h2> </h2>
Heading level 3 <h3> </h3>
Heading level 4 <h4> </h4>
Heading level 5 <h5> </h5>
Heading level 6 <h6> </h6>
Unordered list (bulleted list) <ul> </ul>
Ordered list (numbered list) <ol> </ol>
List items (the text of a single item in either a bulleted or numbered list) <li> </li>
Blockquote (a chunk of text quoting somebody else) <blockquote> </blockquote>
Strong emphasis (for important text; this is rendered as bold in browsers) <strong> </strong>
Emphasis (this is rendered as italic in browsers) <em> </em>
Anchor (used to create links or link destinations) <a> </a>
Table <table> </table>
Table row <tr> </tr>
Table data cell <td> </td>
Table header cell <th> </th>

In fact, by knowing these simple elements, you can already create a web page. Open up a text editor such as Notepad (in Windows: Start > All Programs > Accessories > Notepad) or download more sophisticated text editors like PSPad or TextPad (for Windows) or Text Wrangler (for Macs) or other similar programs.

Note: Don't use Word or any other word processor. All of the extra features in these programs will just get in the way.

Your First Web Page

You can create a web page in less than a minute. Are you ready? Let's do it.

Copy and paste this text right into your text editor:

<html>
  <head>
    <title>My first web page</title>
  </head>

  <body>
    <h1>This is my first web page</h1>
    <p>I'm so excited to be creating my very first web page.
       I'm so proud of myself. I wish my friends could see me now. 
       My mom would be so proud of me. Here's a list of things I like:</p>
    <ul>
      <li>Ice cream</li>
      <li>Cute bunny rabbits</li>
      <li>Myself</li>
    </ul>
    <p>My dad always said:</p>
      <blockquote>
        <p>If you can't say something nice, say something mean nicely.</p>
      </blockquote> 
    <p>Ok. That's enough bragging about myself. I think it's 
       time to end this web page and move on.</p>
  </body>
</html>

Now save the document. Make sure that you save it as an HTML document. In Notepad, you'll have to specify "All Files" from the "Save as type" option in the "Save As" dialogue box. You'll then need to give the file a .htm or .html extension. For example, you could save the file as "webpage.htm". Make sure you save the file to a place that you can find easily, like your desktop.

So if you've done this step correctly, you now have a file sitting on your desktop (or somewhere else handy) called "webpage.htm" (or whatever you called your file). Good. Now double click on that file. It should open up your new web page in your default browser program. It should look something like this:

This is my first web page

I'm so excited to be creating my very first web page. I'm so proud of myself. I wish my friends could see me now. My mom would be so proud of me. Here's a list of things I like:

My dad always said:

If you can't say something nice, say something mean nicely.

Ok. That's enough bragging about myself. I think it's time to end this web page and move on.

If it looks like that, then congratulations! You've created your first web page!

(And no, my dad didn't always say that.)

Now, some of you have already created other web pages before. I know that, but let's all keep with the spirit of offering excited words of encouragement to people who have never done this before.

Ok. That's a good beginning. One question that you may be asking yourself at this point is "can everybody now see my exciting new web page?" The answer is no. Only you can see it... and maybe anyone else in the room who happens to be looking over your shoulder at this moment, but this web page is just for you to see. It's your little secret. So now you're wondering, "how do I put this page on the web? Isn't that what the web is all about... sharing things with the whole wide world?" The answer is yes, that's what the web is all about, but your page isn't on the web. It's on your desktop. In order to put the page on the web, you have to, well, put it there. Don't worry about doing that at the moment. We'll get into that later. For now, just be satisfied knowing that you've created a document that could go on the web if you wanted it to.

Elements and Tags: Open then close; Start then Finish

One important aspect about HTML is that every "element" or "tag" must have both an opening and a closing part; a start and and a finish. Think of them as parentheses. You can't have just an opening parenthesis. You need to have a closing parenthesis in order to balance the group. Look at these examples of English grammar:

Correct: I like fruit (especially raspberries and mangoes).

Incorrect: I like fruit (especially raspberries and mangoes.

The same thing is true for XHTML elements. You can't just start a paragraph, for example. You also have to end it at some point.

Correct: <p>This is my paragraph.</p>

Incorrect: <p>This is my paragraph.

sandwichYou might think of this structure like a sandwich (one of my previous students came up with this metaphor). You wouldn't put bread on only the top of the sandwich. You'd also want bread on the bottom of the sandwich (at least in the type of sandwich that we're trying to build here). "XHTML sandwiches" can get quite complex, with lots of layers of bread and other elements sandwiched inside each other. For example, The <html> tags surround everything. Then there are the <head> and <body> tags. Within the <body> tags you have <p> tags, which may contain other tags. Here is our original "first" web page with all of the content stripped out, leaving only the tags:

<html>
  <head>
    <title></title>
  </head>
  <body>
    <h1></h1>
    <p></p>
    <ul>
      <li></li>
      <li></li>
      <li></li>
    </ul>
    <p></p>
      <blockquote>
        <p></p>
      </blockquote> 
    <p></p>
  </body>
</html>

You can see that all elements have both a beginning and and end tag. You can also see that some elements are embedded inside of other elements. All of the list items (<li></li>) are embedded in between the "unordered list" tags (<ul> and </ul>).

With the early versions of HTML, you could get away with just opening tags in some instances. They weren't as strict in HTML as they are in XHTML, but there are good reasons to be strict. For now I'll just say "that's the way it is" and tell you that you need to have both opening and closing tags.

Self-Closing Elements (including images)

After that explanation about opening and closing tags, I need to explain that there are some tags that can close themselves. They act slightly differently from the elements we've been talking about up until this point. These other elements don't require separate closing tags. They have the closing tags built into them. How is that possible? Well, if you remember what I said about the forward slash, that's a clue. The forward slash is the way to close your elements. In the case of image elements, it looks like this:

<img src="someimage.jpg" />

See the forward slash at the end? That means that the tag is done. It's over. It's finished. You won't have a </img> tag at the end. In some ways it would make sense to have that extra closing tag, but that's not the way they do it. Why? Because there's no information to go in between the tags. All of the information you need is already in the tag itself. The "img" stands for image. The "src" stands for source. The src attribute explains which image to use. In this case, the image is called someimage.jpg. You are using HTML markup to say "go get the someimage.jpg image because I want to insert it here in my web page."

There are other elements similar to the <img> element in that they don't require closing tags. Here are the most common:

Element The "tag"
Image <img />
Break (go to a new line, but stay in the same paragraph) <br />
Horizontal rule (a horizontal line separating sections of a document) <hr />

Go ahead and insert some of these elements into your web page, then save the document again and view it in your browser again (refresh the browser or double click on the file again).

Important Note! When inserting images, the image must be in the same folder as the HTML file itself, or else you must provide a longer path to the image. If your practice web page is on your desktop, try putting the image on your desktop first. That's the easy way. That way both your HTML file and your image are in the same folder. The markup will look something like this:

<img src="someimage.jpg" />

If you're feeling a little adventurous, try creating a new folder on your desktop and then moving your image into that folder. You'll have to change your HTML file to take this into account. Let's say that you named your folder "images". That's a sensible name. Your <img> tag must now look like this:

<img src="images/someimage.jpg" />

This tells the browser to look for the image in the images folder. You could get really fancy and place the image inside of a folder inside of a folder inside of a folder, or something like that. In that case you'll just have to keep on specifying which folders to look in. Something like this:

<img src="images/folder2/folder3/someimage.jpg" />

That will work, as long as the names of the folders match the names in your src attribute.

In real life, you'd also want to include the dimensions of your image. If you know the dimensions of your image, type them in:

<img src="someimage.jpg" height="200" width="210" />

The height and width are measured in pixels. You can find out the height and width of your image inside of your graphics program. If you don't know the height and width, or if you don't know how to use your graphics program, don't worry about it for now. Just make up a height and width. You'll probably guess wrong, and the graphic will be either squished or stretched. That's ok for now.

Alt text

Here's an important point that we can't forget: alt text. Alt text is alternative text. This is what a blind person will hear when listening to the web page being read out loud by their screen reader (we'll learn more about screen readers later). The alt text should describe the image in a short word or phrase in a way that it describes the image in a way that is meaningful to a person who can't see the image. We'll get into the finer points of alt text later. For now, just put something in there.

<img src="someimage.jpg" height="200" width="210" alt="Five parrots attack a helpless cute bunny" /> 

There. Now our image element is complete.

Attributes vs. Elements (or "tags")

Most of what I've been talking about up to this point has been XHTML "elements" or "tags" as they're sometimes called. Examples include <html>, <head>, <body>, <p>, etc. I've also introduced you to two "attributes." These are the images source (src) and alternative text (alt). Attributes describe some part of the element that they're inside of. The src attribute describes where to find the image. The alt describes the image for someone who can't see it.

You don't need to open and close attributes, unless you count the quotation marks around them (alt="products"). There is no such thing as a "src tag" or an "alt tag," even though you may hear some people refer to them that way. They aren't tags. They are attributes. Attributes can't survive on their own. They must be contained within an element/tag in order to mean anything.

Here's a common beginner's mistake to avoid:

Correct: <img src="someimage.jpg" />

Incorrect: <imgsrc="someimage.jpg" />

Notice that the element and the attribute have a space in between them in the correct example. In the incorrect example, someone has in effect invented a new element/tag called "imgsrc". Browsers won't know what to do with the strange new element. They'll probably ignore it, and it won't work the way you wanted it to.

Links

Links are fun. Links are what makes the web webby. It's the way that all the internet is connected and interconnected. When you click on links, you leave one spot in the internet and you go to another. Sometimes the links take you to a different spot on the same page, and sometimes they take you to a different page somewhere in the middle of China. It can happen.

Here's how you create a link:

<a href="http://www.gmu.edu/">George Mason University</a>

When creating links, you have to know where the link destination is (the "href", which stands for "hypertext reference") and what text you'll have as the visible clickable link text. In our example, the words "George Mason University" will be clickable, like this:

George Mason University

Most anything on a web page can be made into a link, as long as it's either text or graphics. To turn our graphic into a link, we would wrap it inside of a link element, like this:

<a href="awebpage.htm">
 <img src="someimage.jpg" height="200" width="210" alt="Five parrots attack a helpless cute bunny" />
</a>

Notice that the <a> tags surround the image. That means that everything inside of the <a> tags will be a clickable link.

Folders and Default Pages

Sometimes we want to link to files on our own web site that are in a different folder. Let's pretend that this is the file structure of our web site:

At the bottom of this list of files is a file called index.html. This is the home page of the web site. When we type the web address of this site (let's pretend the site's address is http://www.mysite.com/), we could either type http://www.mysite.com/ or http://www.mysite.com/index.html. Either one would work. That's because the web site has already been set up so that all files named index.html will be the default file of their respective folders. The index.html file on the top level is the home page for the whole site, but notice that most of the folders have their own index.html file. You could sort of say that each index.html file is the "home page" of their respective folders. For example, you could type http://www.mysite.com/about_us, but you could also type http://www.mysite.com/about_us/index.html. Both of these links will go to the same place.

However, it is important to note that not all web sites are set up in exactly the same way. Sometimes the default "home page" is index.htm (without the "L" at the end). Sometimes it is default.html, or index.php, or index.jsp, or some other name. The truth is that it could be called bananas_are_yellow.q7v or any other outlandish name if someone decided to do that on the server. For the most part, though, index.html or index.htm are the most common. Only the web server administrator has control over the name of the default file. If you're not the web server administrator, you'll have to accept the way that it was set up.

Links to files in the same folder

Linking to files in the same folder is easy. You just type the name of the file in the link like this:

<a href="me.html">It's all about me</a>  

The above example would apply if we were linking to the me.html file from you.html or from index.html within the same folder. It would not work if we were linking from any of the other files.

Links to lower level folders

Now let's create a link from our site's home page to the default page (index.html) in the products folder. Our link would look like this:

<a href="products/">Products</a>

Notice the slash at the end of the word "products" in the link. The slash indicates that this is a folder. Or we could write it like this:

<a href="products/index.html">Products</a>

Usually it is better to use the first method, because it's shorter.

If we want to link down deeper into the "Mousetraps" folder, our link would look like this:

<a href="products/mousetraps/">Products</a>

or this:

<a href="products/mousetraps/index.html">Products</a>  

Again, the preferred method is the shorter one.

However, if you are NOT linking to the default file, you MUST include the file name. For example, if we want to link to the me.html file, we must include me.html in the link, like this:

<a href="about_us/me.html">It's all about me</a> 

Links to higher level folders

Now let's say that we want to create a link in the reverse direction. We're on the Products page and we want to link to the home page. Our link could look like this:

<a href="../">Home page</a>

or this:

<a href="../index.html">Home page</a> 

The dots before the slash mean that we're going up a level to the folder at a higher level. If we need to go up more than one level, we just add ../ to the link. For example, if we are on the me.html page and want to link up two levels to the site's home page, our link would like this:

<a href="../../">Home page</a> 

or this:

<a href="../../index.html">Home page</a>  

As usual, the shorter method is preferred.

Links to External Web Sites

When linking to someone else's web site, you must include the full web address, including the http:// at the beginning, as shown below:

<a href="http://www.gmu.edu">GMU</a> (CORRECT)

It's not enough to say:

<a href="www.gmu.edu">GMU</a> (WRONG)

The above bad example simply won't work.

Relative vs. Absolute links

Linking to external web sites (as explained above) is one way of creating "absolute" links. You can also create absolute links to other pages on your own web site. To use our fictional example of http://www.mysite.com, I could write all of my links so that they include http://www.mysite.com/ in the link, but it's usually better to use links that are relative to the site structure. Absolute links require the browser to send requests to the internet, which is slower than relative links which send the request within the same web server.

However, there is more than one type of relative link. The links that we looked at previously (links to files in the same folder, to files in lower level folders, or in higher level folders) are examples of links that are relative to the current file. You can also create links that are relative to the home page. To do this, start your link with a slash and write the path of the file relative to the home page. For example, to link to the me.html page, we would write this:

<a href="/about_us/me.html">It's all about me</a>

The above link—written exactly as it is now—would work from any of the files on the web site. We could write this link on the home page, or on the you.html page, or any other page.

The biggest caveat with this approach is that it won't work quite in the way that you might expect it to if you are on a shared server, such as the GMU portfolios site or the GMU Cluster Account site. The home page for both of these sites is not your home page. It's the home page of these servers. On these servers, your home page is already one folder deep within the site. For example, your site might be located at http://portfolios.gmu.edu/yourname/ or http://mason.gmu.edu/~yourname/. The "yourname" folder is your folder, and your home page is located there, but the home page for the server is located at http://portfolios.gmu.edu/ or http://mason.gmu.edu/.

If you choose to write links that are relative to the site, you will need to take this structure into account. A link to your home page that is relative to the site would look like this:

<a href="/yourname/">My home page</a>

The last thing that I'll say is that Dreamweaver and other tools make it easy to create links, by automating the process to some extent, but it still helps to understand the different types of links in order to understand what Dreamweaver is doing. You want to make sure that Dreamweaver does what you want it to do.

Tables

It's kind of a pain to create tables by hand. It's easier to do it inside of a tool like Dreamweaver that does most of the work for you, but we're not going to do that yet. We'll get there eventually, but not in this lesson. You'll understand much better what Dreamweaver is doing if you take the time to learn the markup first.

Tables have rows and columns. They are a grid of information. A simple table in HTML looks like this:

<table>
  <tr>
    <th>Color</th>
    <th>Size</th>
  </tr>
  <tr>
    <td>Red</td>
    <td>Large</td>
  </tr>
    <tr>
    <td>Blue</td>
    <td>Small</td>
  </tr>
</table>

Remember that <tr> means table row, <th> means table header, and <td> means table data cell. The table will look something like this in the browser:

screen shot of the table, showing two columns and three rows

And there you have it. A simple data table. Look back at the markup and compare it to the visible table. You'll notice that we designate table rows, but there is no markup that designates a table column, at least not explicitly. The columns are designated as either header cells or data cells within rows. The first cell within a row is the first column, and the second cell is the second column, and so on. You could have fun with this table by adding more and more data cells to each row. Just remember to make sure that you have the same number of cells per row, or else you're going to have some odd-looking tables that won't display correctly.

Document <head>

I'm not going to get into great detail about the document <head> in this lesson, but be aware that you should always include a <title> in every HTML document. The title will show up in the top of the browser window, and it lets people know what your page is about. It's especially important for people who use screen readers. The screen reader will read the text out loud to them and let them know immediately what the page is about. It's the first thing that the screen reader reads.

<html>
  <head>
    <title>All about parrots and cute bunny rabbits</title>
  </head>
(etc.)

The head of a document is where you say what type of document you're creating and what language it's created in. A more complete document head will look something like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head><title>All about parrots and cute bunny rabbits</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name="keywords" content="parrots, bunnies, rabbits, bunny rabbits, cute" /> <meta name="description" content="Everything you ever wanted to know about parrots and bunny rabbits (especially cute ones) that you were always afraid to ask" /> </head>

Yikes! What does all that mean? Don't get too frightened. Look closely. Yeah, some of it's a mess if you don't understand it, but some of it is sort of self-explanatory. For example, lang="en" means that the language is English. The xml:lang="en" means the same thing. It's redundant, and in some ways is not necessary, but it's there for a reason. Just trust me on that one.

The "meta" elements tell us "meta" information about the page. Meta information is "information about information." Maybe that's confusing, but the concept is rather simple when you look at what the <meta> elements are doing. These particular elements are just specifying keywords for the page and giving it a brief description. Notice that <meta> elements are self-closing in the same way that <img> elements and <hr /> elements are.

Block elements vs Inline Elements

Some of the elements we have talked about are considered "block" elements, while others are considered "inline" elements. Block elements are things like paragraphs, tables, block quotes, bulleted lists, numbered lists, and list items. All of these things are chunks of content that form a rectangle, or block. They all start on a new line. Inline tags, in contrast, do not start a new line. Links and images fall into this category. In the example below, I have applied a blue background to a paragraph (a block element). Notice that the background stretches to the edge of the text borders of this page and that the paragraph is on its own line. It is not a part of the paragraph above it. I have also applied a yellow background to a link (an inline element). Notice that the background does not stretch across the full width of the text area. The dimensions of the link are confined to the small area that the text occupies and the link is on the same line as the paragraph in which it resides.

This is a paragraph with a link to GMU.

Here are a couple more inline elements worth knowing:

Element Opening Closing The default visual effect in the browser
Strong emphasis <strong> </strong> Bold text
Emphasis <em> </em> Italicized text
<p><strong>Note:</strong> Don't forget to tie your shoes,
even if you don't <em>want</em> to tie your shoes.</p>

Don't Overlap Elements

Another thing to keep in mind with the opening and closing of elements is that you can't overlap them. You can embed them inside each other in many cases, but you can't overlap them.

Correct: <p>Words with both <em><strong>bold and italics</strong></em></p>

Incorrect: <p>Words with both <em><strong>bold and italics</em></strong></p>

The incorrect example is sort of like putting the sandwich meat on the bottom of the sandwich, beneath the bread. You shouldn't do that. It doesn't work very well with sandwiches, and it doesn't work very well with XHTML. Elements can contain other elements, but they can't cross over each other.

More information

I found a tutorial on the web that is quite helpful in explaining XHTML for newbies. It also has some good information for people who think they know XHTML. I'll bet that most of you will learn something from this tutorial. Here it is:

http://www.sitepoint.com/article/xhtml-web-design-beginners-2/2


Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.5 License.