Sitemaps
Publish date 24/04/2008
sitemaps are XML-files that inform search engines about pages on your site.
It can have a lot of information; when it was last updated, how often it changes and how importand the page is relative to other pages on your site.
This allows search engines to better crawl your site and index it.
The file
Usually it is named sitemaps.xml and placed in the root directory. (But you can give it another name in robot.txt)
Important to know is that it must be written in UTF-8 and can not be bigger then 10MB (10 485 760 bytes) or 50 000 URLs. You can compress it using gzip to save bandwidth but uncompressed it still can't be bigger then 10MB.
If you really want to list more pages in sitemaps, you can split over several pages.
More here.
The format
Below is a basic example of a sitemap:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc>
<lastmod>2008-03-24</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.example.com/products.html</loc>
<changefreq>monthly</changefreq>
</url>
<url>
<loc>http://www.example.com/detail.php?category=2&ln=54</loc>
<lastmod>2008-03-10T12:00:00+00:00</lastmod>
<priority>0.4</priority>
</url>
</urlset>
As with all XML-files, any data must use entity escaped codes for special characters.
| Character |
Escaped code |
| Ampersand & |
& |
| Single quote ' |
' |
| Double quote " |
"e; |
| Greater than > |
> |
| Less then < |
< |
The available tags are described below:
| Attribute |
|
Description |
| <urlset> |
required |
Encapsulates the files and reference of the protocol |
| <url> |
required |
Parent tag of the URL. Everything below are children of this tag. |
| <loc> |
required |
URL of the page. Must begin with the protocol (http://) and end with a trailing slash. |
| <lastmod> |
optional |
Date of the last modification of this page (or file). This should be in W3C Datetime format. |
| <changefreq> |
optional |
How frequently this page changes. Valid values are:
- always
- hourly
- daily
- weekly
- monthly
- yearly
- never
The value always should be used for documents each time they are visited. The value never is for archived URLs.
|
| <priority> |
optional |
The priority of the URL relevent to other URLs on your site. Between 0.0 and 1. 1 is then for your most important page. |
Sitemap Index
There is a limit on how many pages you list in one sitemaps-file. 50 000 URLs or 10MB (10 485 760 bytes).
But you can use multiple files, of course you need to tell them where they can be found. here comes the sitemap index file.
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/shemas/sitemap/0.9">
<sitemap>
<loc>http://www.example.com/sitemap1.xml.gz</loc>
<lastmod>2008-02-14T18:31:17+00:00</lastmod>
</sitemap>
<sitemap>
<loc>http://www.example.com/example/sitemap2.xml</loc>
<lastmod>2008-03-20</lastmod>
</sitemap>
</sitemapindex>
It must also be UTF-8 encoded, and can not list more then 1000 sitemaps or be larger then 10MB.
You can only specify sitemaps on the same site, not on other sites.
| Tag |
|
Description |
| <sitemapindex> |
required |
Encapsulates information about all the sitemaps in the file. |
| <sitemap> |
required |
Encapsulates information about 1 individual sitemap in the file. |
| <loc> |
required |
Location of the sitemap file. |
| <lastmod> |
optional |
Identifies the sitemap file was last modified. |
Let it be known
When you have upload the file to your server, you need to make sure that search engine crawlers can find the file.
robots.txt
The easiest way is to add it to your robots.txt-file.
Simply add the following line:
Sitemap: sitemaplocation
Make sure you put the full URL of the sitemap. (http://www.example.com/sitemaps.xml)
You can add multiple lines if you have multiple sitemap-files.
Submitting
You can also submit it directly to the searchengine.
Submitting via an HTTP request
This is for the more tech-savvy. You can submit it directly to a searchengine using an HTTP request. This can be done with wget, curl or any other program. A successful request will return an HTTP 200 response code.
A complete reference can be found at www.sitemaps.org
TOP