It can prove to be a challenging task when it comes to guiding search engines, on indexing your webpage or website the way you desire. For instance, robots.txt does not tell search engines to index a website content but it manages the accessibility. You cannot use robots.txt to control indexation. For indexation control, you need the robot’s meta tags.
The x-robots-tag HTTP header and robot meta tags tell crawlers to index the content on a website.
ROBOTS META TAG. WHAT IS?
Simply, a piece of HTML code that controls search engine robots on their directives on a page is a robot meta tag. This code allows you to control the indexing of a webpage. And it controls how the search results of the information from the page are displayed. Control tags are placed in the <head> section of a website page.
The robot meta tags are one type of robot meta directives that exist. These tags are contained in the HTML page while the other robot meta directive is the x-robots-tag as stated earlier. The x-robots-tag is sent as HTTP headers by the web servers. Meta robots tag and the x-robots-tag are used together with the indexing directives a meta tag gives which are “noindex” and “nofollow”. The difference lies in the way these two directives are relayed to crawlers.
The meta directives relay instruction to the crawlers on how to search and index whatever is found on a webpage. But it’s confirmed that some bad web code will ignore the directives because crawlers do not necessarily follow meta directives.
The guidelines that search engine crawlers follow when they are used in meta directives are listed below. It is important to note that these guidelines are not case-sensitive and some search engine crawlers might choose to follow just a few of the guidelines. The crawlers may adhere to some directives differently.
Control guidelines for Indexing.
Noindex: noindex instructs a search engine to not index a page. That means it does not allow it to show in search engine results.
Index: This is a default code. This code does the reverse of noindex. It instructs a search engine to index a webpage.
Nofollow: Stops all crawlers from crawling on all links on a webpage. It is important to note that these links or URLs can still be indexed. They might have been backlinked so it is very much possible that they can be indexed.
Follow: instructs the crawler to follow all links and index the backlinks on the page.
None: this is the same thing as not using the nofollow and noindex tags together.
Noimageindex: instructs the crawlers to not Index any images on a webpage
Noarchive: does not allow Google to display a cached version of the webpage on a SERP.
Notranslate: does not allow google to translate a version of the webpage on a SERP.
Nocache: performs the same function as noarchive, but it’s utilized only by Firefox browser and internet explorer.
Nosnippet: instructs a search engine to not show a meta description of the webpage on a SERP.
Noodyp: does not allow search engines to use a page DMOZ description as the SERP meta description for the webpage. Although DMOZ is now obsolete because the tag was retired in 2017.
All: default code. It serves the same function as an index and follow.
Unavailable_after: instructs search engines to not index a webpage, after a certain given date.
ROBOT META DIRECTIVES: META ROBOT TAG AND X-ROBOT TAGS
The two main types of meta directives as mentioned earlier are the meta robot tag and the x-robot tag. The guidelines or parameters that are used in a meta-robot tag will also work when specified in an x-robots-tag
But what kind of directives control the meta robots tag and the x-robots-tag?
Meta Robots Tag:
popularly known as robots tag or meta robots, they are a part of a webpage’s HTML code and they identify as elements of the code in a web page’s <head> section.
For instance, below is a sample of a code:
<meta name=”robots” content=”[PARAMETER]”>
In this code, <meta name=”robots” content=”[PARAMETER]”> is a standard tag but you could direct some specific crawlers by removing the “robots” and exchanging it with the name of a user-agent. Let’s assume you want to send a directive to Googlebot, the code below is a sample of what you’ll use:
<meta name=”googlebot” content=”[DIRECTIVE]”>
From the above code, you will notice that the “robot” which was in the earlier code was replaced with “googlebot” which is the name of the user agent you want to direct crawlers to. Yes, exchanging for a user agent name is all you need to do to achieve this.
But what if you wish to use multiple directives on a page? Well, as far as the directives are targeted to the user agent(“robot”), multiple directives are included in a single meta directive. By simply separating the directives using commas. An example is given below:
<meta name=”robots” content=”noimageindex, nofollow, nosnippet”>
The meta tag written above would instruct robots to perform multiple functions. These are:
To not index any image displayed on a webpage
To not follow any of the links on the webpage
And to not show a snippet of the webpage when it shows on a SERP
However though, if it’s a case where for different user agents, you apply different meta tag directives, you will then use separate tags for each bot.
X-ROBOTS-TAG.
The difference between the meta robot tags and the x-robots-tag is the level of control between them. Meta robot tags allow indexation at page level, while the x-robots-tag code is developed to function as part of the HTTP header in a webpage. This HTTP header controls the indexation of a webpage completely. It regulates the indexation of specific parts of that webpage.
The x-robots-tag can be used to carry out all the same indexation directives just like the meta-robot tag does. However, the x-robots-tag is more flexible regarding functionality, and this has proven to be important in some cases. The meta-robot tag is not as flexible in functionality as the x-robots-tag. What this means is that the x-robots-tag allows the use of regular expressions, carries out directives to crawl on non-HTML files, and applies guidelines at a global level.
You must have access to your website’s server access file or header .htaccess, .php. From the server access file, you can then include the specific configurations. This includes any parameters.
You might need to use the x-robots-tag in cases such as these:
To control the indexing of content that was not written in HTML. For instance, video or flash.
To block the indexing of not the entire webpage but a specific part of the webpage. For instance, video or image.
To control the indexing of a webpage if access to the webpages’ HTML is not available. Or if the websites’ header cannot be changed, like a global header.
To add rules such as indexing on condition. For instance, should a user’s profile be indexed if he has commented over a certain number of times?
SUMMARY:
When a link is crawled, all meta directives are found. This implies that if crawling is not allowed by a robots.txt file, the meta directives on that URL will not be found and will be ignored.
Mostly, “noindex, follow” should be used with a meta robots tag to stop crawling rather than using a robots.txt file.
However, stubborn crawlers will likely overlook directives and bypass them. Hence using this method doesn’t guarantee your security especially if you have information that you wish to keep private. All you need to do is add password protection to prevent viewers from having access to private web pages. There is no need to use meta robot tags and x-robots-tag on the same webpage.
Comments