Meta Tags

This page could also have the headline “YAMD – Yet Another MetaTag Description”.

There are thousands of pages dealing with “Meta Tags”. Remember: Meta Tags are these small blocks of information for search engines et al. Usually they are hidden in the header of a HTML file and no visitor usually sees them.

Most guides to Meta Tags only mention description, keywords, title and at most the robots entry. In my opinion there are some other quite interesting tags for the proper identification and description of a homepage.

The (in my humble opinion) best explanation for Meta Tags can be found on the homepage of SELFHTML. Information about the robots.txt and the Robots Meta Tag can also be found on the net.

The use of description, title and robots is normally enough to ensure a proper listing in search engines. Please remember always that Meta Tags are not some kind of Voodoo. The most important thing about you website is the acutal content. No Meta Tag can replace a good page with proper text. They are only a supplement or aid to specify the contents of a page a little bit more precisely and in a defined and machine readable manner. Keep this in mind when you receive the next mail with advertisements for so called Meta Tag and search engine optimizers..

Standard Tags

<title>about 60 characters maximum</title>

The “title” defines the headline of a HTML page. The maximum number of characters is only a guiding value. There is no rule defining a character maximum for a tag. Unfortunately search engines usually cut or ignore a title if it is too long. The title often decides whether a user clicks on your page in a long list of search results or not. So just keep this voluntary limitation in mind if you write the header for your HTML page.

 

<meta name="description" content="between 200 and 250 characters" />

The Meta Tag “description” contains – guess what – description of the page. It should describe the content briefly and objective. “Best and most freakin’ hot page on this god damn planet” is an example for a not very detailed and formal description. Search engines often show title and description on their search results. So it is a good idea to write some short and useful prose about the contents of the page.

 

<meta name="keywords" content="between 200 and 250 characters" />

The Meta Tag “keywords” is almost obsolete. Once search engines almost exclusivly gathered their wisdom from the keywords of a page. As this opened the gates of spammers and lead to all kind of abuse most crawlers nowadays completely ignore the keywords. The entire content of the page matters.

Nevertheless “keywords” are still a good idea if used in the original context. Pick about 20 words that describe the contents of the page. Please mind that these words should make some sense and deal with the rest of the page. Some search engines regard endless repetition of words (“sex, sex, sex”) as spam then ignore the page completely.
There is no standard how the keywords should be stringed together. The most common practice is to divide the words by commas (“keyword1, keyword2, keyword3“).

 

<meta http-equiv="Content-Type" 
content="text/html; charset=iso-8859-1" />

The “content-type” is usually predefined by most HTML-editors. This example here means the document has the format “text/html” (= MIME type) and is encoded in “iso-8859-1”. This should be correct for most English (or more general Romanesque) web pages.

 

<meta name="robots" content="INDEX,FOLLOW" />

The Robots Tag is used as a kind of process instruction for search engines (robots). Every search engine offers specialized subcommands or specific settings. All of them should understand the statements “INDEX” or “NOINDEX”, “FOLLOW” or “NOFOLLOW” and of course “ALL” or “NONE”.
In general you should not rely on the obedience of robots. They usually do not accept commands of the HTML pages they are indexing. However it has become a common agreement that every reputable robot respects the Robots Tags (and the Robots.txt). Therefore you could use other means of protection (for example authentication or blocking via .HTACCESS) to ensure that your page is added to the search engine index.
On the other hand the command “INDEX” is no guarantee for the inclusion of your page into search indices. The decision for or against the inclusion lies exclusively in the hands of the indexing algorithm. The Robots Tag is just a humble way to ask an engine to do something.

  • “INDEX” asks the search robot to include the page into the search index. “NOINDEX” asks to remove or in general not to include the page into the index.
  • “FOLLOW” says that the robot should follow all links in the document (of course this is only possible if they are available in a HTML format). “NOFOLLOW” in turn means that the robot should not follow any link contained in the current HTML file. This is for example useful if you just wish to include your start page into the index and not the subsequent pages. Another good idea to guide a robot through the public and non-public areas of your page could again be the Robots.txt.
  • In combination the statement “INDEX, FOLLOW” therefore means that the robot should index the current page and follow all links to subsequent pages.
  • Instead of “INDEX, FOLLOW” it would also be possible to use “ALL”. The complete negation would be “NONE”.

 

<meta name="revisit-after" content="10 days" />

Another humble request to the search engine. This tag asks robots to return every ten days for new indexing.

To be honest: I do not think that any of the large search engines pays attention to this tag. At least I was not able to find any causally determined connection between the “revisit-after” tag and the actual activity of a robot. The early beta version of MSN-Bot seemed to act exactly as is was told. All other engines define the intervals for revisits based on their internal algorithms. For example Google visits your page more often if it has a higher page-rank. Smaller search engines obviously seem to with to save their bandwidth and revisit the indexed pages only once a year.

Nevertheless it might be a good idea to define something reasonable in the “revisit-after” tag. At least it should have no negative impact.

Additional information about the document

Besides the basic tags shown above it is also possible to add some machine-readable information about the document into the Meta header of a HTML document. Unfortunately most search engines ignore these tags. Maybe this changes if they are used more often.

<meta name="date" content="2002-03-15" />

This information tells the creation date of the current page in the format YYYY-MM-DD.

It is also possible to specify an exact timestamp in the RFC 822 Format (e.g. <meta name=”date” content=”Sun, 7 Oct 2001 14:56:02 +0200″ />) including the time zone. A good idea to write the current date and time into the document could be the short PHP string <?php echo date(“r”); ?>.

 

<meta http-equiv="Reply-to" content="mailadresse@somewhere.net" />

E-mail address of the author or the person responsible for this page.

 

<meta name="copyright" content="Stefan Plogmann" />

Copyright owner of the contents of the current page (was once shown on every search result of the German search engine Fireball).

 

<meta name="author" content="John Smith, Springfield" />

Name of the author of the current document. Should be written in common citation form with name (or company name) and place.

 

<meta name="page-topic" content="Internet, Homepage" />

The tag “page-topic” is very good idea to classify the current page for large web catalogues like Yahoo. Unfortunately this again would open the gates for spammers. To be honest: I do not know if there is currently any robot using this tag to categorize the internet. Fireball once did but this was some years ago.

Furthermore every catalogue has different categories and identifiers. Therefore this tag is usually useless and you can safe your time to spend more efforts on the contents of your page.

 

<meta name="Audience" content="Webmaster" />

This tag defines the targeted audience of the web page. Suffers form the same problems as “page-topic” and therefore is most certainly useless.

 

<meta http-equiv="pics-label" content="(pics-1.1 
"http://www.icra.org/ratingsv02.html" l gen true for 
"http://www.plogmann.net" r (cz 1 lz 1 nz 1 oz 1 vz 1) 
"http://www.rsac.org/ratingsv01.html" l gen true for 
"http://www.plogmann.net" r <br>(n 0 s 0 v 0 l 0))" />

The PICS-Label is a very old and successful initiative to classify the contents of a webpage. Web authors fill in an online questionnaire describing the content of their site, simply in terms of what is and isn’t present (e.g. pornography, violence). ICRA, the organisation behind PICS, then generates a Content Label which the author adds to the Meta header of each page.

Microsoft Internet Explorer supports the PICS-label and therefore ensures that parents can block certain categories of pages. Of course this only works if the authors comply with the instructions and rules of the label. Nevertheless it is a very good idea and it is very easy to support it.

 

<meta content="TRUE" name="MSSmartTagsPreventParsing" />
<meta http-equiv="imagetoolbar" content="no" />

These two tags are only for the Microsoft Internet Explorer.

The first statement suppresses the automatic inclusion of so called SmartTags into the page when viewed with the Internet Explorer. SmartTags were never actually included in any final version of the Internet Explorer. After it was integrated into early beta versions of IE 6.0 vehement protests forced Microsoft to disable this feature. So maybe this tag is not really needed any more.

The second statement suppresses the so called Image Toolbar of Microsoft Internet Explorer. You can see this toolbar if you move your mouse over larger images on a web page. This feature may be useful from time to time. It is however very disturbing on pages relying on the heavy usage of images for design purposes.

Links in Meta header

Everyone is speaking of semantic web and the importance of a better structuring the internet. Newer concepts like RDF may be much more flexible and detailed. Nevertheless plain and simple HTLM also offers functionalities that really offer added comfort and information for users and search engines alike.

LINK-tags are some of the oldest features of HTML. I do not mean normal links in the form<a href=”page.html”>linktext</a>. It is possible to add links describing the hierarchical position or logical relation of a document in the Meta header. So it is for example possible to link to pages that are superordinate to the current document.

Unfortunately most browsers do not support Metalinks and therefore should not claim to be HTML complaint. Only Opera offers a visualization of embedded LINK-tags in a separate toolbar. If you used this feature once, maybe even in combination with mouse gestures, you would not like to miss is.

SELFHTML again offers the best manual for the implementation of LINK-tags. Nevertheless I will give a short overview of the most useful points. Users of Opera can also use my homepage or my picture gallery to test the usage of Metalinks. The browser should show a small toolbar above the content window with several buttons (Home, Search, Up, etc.).

Google supports Metalinks and uses them to gather more links for the crawling engine. Unfortunately I do not know if the links are used to analyse the hierarchy within a website.

 

<link rel="SHORTCUT ICON" href="/favicon.ico" />

This link defines where a small icon is stored that can be used by the web browser when saving a bookmark of the page. These icons are just simple bitmap pictures (BMP) with a resolution of 16×16 pixels, 8 bit colour depth and the file ending “.ico”.

The Internet Explorer (Version 6) ignores the link the to shortcut icon. It assumes that the bookmark icon can be found on the root directory of the respective web server. For example IE would search for the icon of this page under the URL http://www.plogmann.net/favicon.ico.

 

<link rel="start" href="/" />

Link to the home or start page of a web site.

 

<link rel="glossary" href="/glossary.php" 
title="Glossary of this website" />

Refers to a page with a glossary. For example this could be useful for a page containing a large number of specialized foreign words, abbreviations or names.

It is always possible to include a title to a Metalink. This title is indexed by Google an can be shown in the search results if the page was not completely indexed. Opera has no support for titles in this context.

 

<link rel="copyright" href="/copyright.php" />

Links to a page with information about copyright of this site. This could also be used to link to legal issues and terms of usage.

 

<link rel="author" href="/author.php" />
<link rel="author" href="mailto:webmaster@somewhere.com" />

Links to a URL with information about the author of the current page. It is also possible to use a “mailto:”-link instead of a page address.

 

<link rel="search" href="/search/" />

Links to a page offering search services.

 

<link rel="up" href="../index.php" />

This tag links to a document that is hierarchically positioned above the current document. This tag can be used to promote the logical relation inside the website if it is based on a tree hierarchy.

 

<link rel="first" href="page01.html" />
<link rel="last" href="page345.html" />
<link rel="previous" href="page12.html" />
<link rel="next" href="page14.html" />

The tags “first”, “last,” “previous” and “next” can be used for navigation within a hierarchy level. This works very well in picture galleries or articles with multiple pages.

Again I would like to refer to my picture gallery. There you can test this kind of navigation with Opera. Very handy is the usage of mouse gestures. For example you can go to the next page if you hold down left mouse button and click on the right mouse button.

 

<link rel="index" href="index.html" />
<link rel="contents" href="index.html" />

The difference between “index” and “contents” is not clear to me. Personally I use “index” for a list of all topics in my gallery. “Contents” is used to give an overview over all pictures in a specific topic. Maybe “contents” could also be a link to something like a sitemap.

 

<link rel="help" href="/help/" title="HELP!" />
<link rel="help" href="javascript:open('help.html', 'HELP',
'width=500,height=450,left=10,top=10,scrollbars=yes')" />

Links to a page with help information. It is also possible to use JavaScript commands (here to open a popup window) in Metalinks.