XML and HTML sitemap file in SEO
sitemap
xml sitemap
html sitemap
What Is a Sitemap?
The sitemap's a file place on your website that will enable Google and other search engines to crawl better and understand the structure of the site. Sitemaps also tell search engines which pages on your site are most important. It's a binder of your website's content.
Why is Sitemap important?
The scope of a sitemap it's to allow search engines to crawl smart your website and this will help you improve search results rankings, and obtain more relevant traffic.
If the website's pages are correctly linked, the search engine web crawlers can usually discover most of the pages without a sitemap. However, the sitemap will give valuable information about which pages are important.
Sitemap format
The two main types of sitemaps are HTML and XML.
HTML Sitemap
HTML sitemaps are mainly created for users, as it helps to get an overview of the structure of the website and to navigate through all the subpages. An HTML sitemap is an HTML page on which all subpages of the website are listed.
XML Sitemap
The format of the XML Sitemap consists of XML tags. All data values in a Sitemap must be entity-escaped. The file itself must be UTF-8 encoded.
The Sitemap must:
- Begin with an opening
<urlset>
tag and end with a closing</urlset>
tag. - Specify the namespace (protocol standard) within the
<urlset>
tag. - Include a
<url>
entry for each URL, as a parent XML tag. - Include a
<loc>
child entry for each<url>
parent tag.
Sample XML Sitemap
The sitemap contains two URL one with all the optional elements and the second with only the mandatory elements.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://www.example.com/blog/java</loc>
<lastmod>2020-02-13</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.example.com/shop?item=100&page=2</loc>
</url>
</urlset>
</loc>
tag
Note: This field is mandatory.
URL of the page. This URL must begin with the protocol (such as http) and end with a trailing slash if your web server requires it. This value must be less than 2,048 characters.
</lastmod>
tag
Note: This field is optional.
The date of the last modification of the file. This date should be in W3C Datetime format. This format allows you to omit the time portion if desired and use YYYY-MM-DD.
</changefreq>
tag
Note: This field is optional.
How frequently the page is likely to change. This value provides general information to search engines and may not correlate exactly to how often they crawl the page. Valid values are:
- always
- hourly
- daily
- weekly
- monthly
- yearly
- never
</priority>
tag
Note: This field is optional.
The priority of this URL relative to other URLs on your site. Valid values range from 0.0 to 1.0. This value does not affect how your pages are compared to pages on other sites—it only lets the search engines know which pages you deem most important for the crawlers.
The default priority of a page is 0.5.
XML Index Sitemap
You can provide multiple Sitemap files, but each Sitemap file that you provide must have no more than 50,000 URLs and must be no larger than 50MB (52,428,800 bytes). If you would like, you may compress your Sitemap files using gzip to reduce your bandwidth requirement; however, the sitemap file once uncompressed must be no larger than 50MB. If you want to list more than 50,000 URLs, you must create multiple Sitemap files.
The Sitemap index file must:
- Begin with an opening
<sitemapindex>
tag and end with a closing</sitemapindex>
tag. - Include a
<sitemap>
entry for each Sitemap as a parent XML tag. - Include a
<loc>
child entry for each<sitemap>
parent tag.
The optional <lastmod>
tag is also available for Sitemap index files.
Sample XML Index Sitemap
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://www.example.com/sitemap_post.xml</loc>
<lastmod>2020-02-13T11:01:17+00:00</lastmod>
</sitemap>
<sitemap>
<loc>http://www.example.com/sitemap2.xml.gz</loc>
<lastmod>2020-02-13</lastmod>
</sitemap>
</sitemapindex>
</loc>
tag
Note: This field is mandatory.
The tag identifies the location of the Sitemap.
This location can be a Sitemap, an Atom file, RSS file or a simple text file.
</lastmod>
tag
Note: This field is optional.
Identifies the time that the corresponding Sitemap file was modified. It does not correspond to the time that any of the pages listed in that Sitemap were changed. The value for the lastmod tag should be in W3C Datetime format.
Conclusion
It's best to have both types of sitemaps, but if you decide only to one, the XML type it's essential for a better ranking. There are many tools that can help you create an XML sitemap.
They are essential components of technical SEO and are the foundation of your website, along with other factors. They are crucial for any website, and they require special attention. You have to optimize the infrastructure of your website to be sure that it will be crawled and indexed correctly by the search engine robots. This way, you have more chances of getting higher rank and therefore obtain more relevant traffic.
References
- W3C Datetime format
- Sitemap protocol