Logo Weblocation Website creation

txt robots, how to use them correctly and help SEO

In this topic about txt robots, the benefits of good use for Google bots and other search engines will be discussed.

robots-txt

The user-agent called Googlebot is undoubtedly the best known. but there are, however, other user-agents that have practically the same functions, which are:

Browse the internet through links found on website pages looking for new links and content to be indexed and displayed in search results. However, you can choose not to allow some of your pages to be displayed in search results, for example (login pages).

What are robots.txt

As the name already says, robots.txt is a file in text format (.txt) which is used to make life easier for search engine robots.
Allowing us to tell you which particular pages or folders on the websites they can and cannot access.

What is user-agent?

User-agent is a generic crawler type for creating website crawl rules. Some trackers have more than one token, as shown in this Google's own table. For a rule to be applied, a token just needs to match the tracker. This list is not complete, but it includes many of the trackers that may access your site.
See a list of some user-agents here.

about allow

dare of allow guide to robots which directories or pages they can have access to the content and allow its indexing and addition in search engine indexes.
It should be used to allow the site directories “/”, or when you have blocked access to a directory through the Disallow, but needs to index one or more files or sub-directories within the locked directory.

See the example below:

Disallow: /files
Allow: /files/image

Disallow

the use of disallow guides search robots on which directories or pages should not be crawled and included in the search index.

See the example below:

  • Disallow: /rss – tells robots not to index folders or files that start with rss;
  • Disallow: /user/ – tells robots not to index the content inside the folder user.
  • Disallow: /readme.html - tells robots not to index the page certificate.html.

Sitemap in robts.txt

Another function of robots.txt is the permission to indicate the path and name of the sitemap.xml of the site.

Here's an example of how to enter them:

Sitemap: https://site. com/sitemap-index.xml
Sitemap: https://site. com/page-sitemap.xml
Sitemap: https://site. com/post-sitemap.xml
Sitemap: https://site. com/product-sitemap.xml
Sitemap: https://site. com/user-sitemap.xml

According to WordPress a secure robots.txt would look like this

User-agent:*
Disallow: /feed/
Disallow: /trackback/
Disallow: /wp -admin/
Disallow: /wp -content/
Disallow: /wp -includes/
Disallow: /xmlrpc.php
Disallow: /wp

Example WordPress robots.txt

User-agent: Googlebot
Disallow: /cgi-bin/
Disallow: /wp-admin/$
Disallow: */trackback/$
Disallow: /comments/feed*
Disallow: /wp-login.php?*
Allow: /*.js*
Allow: /*.css*
Allow: /wp-admin/admin-ajax.php
Allow: /wp-admin/admin-ajax.php?action=*
Allow: /wp-content/uploads/*

User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/$
Disallow: */trackback/$
Disallow: /comments/feed*
Disallow: /wp-login.php?*
Allow: /*.js*
Allow: /*.css*
Allow: /wp-admin/admin-ajax.php
Allow: /wp-admin/admin-ajax.php?action=*
Allow: /wp-content/uploads/*

Sitemap: https://site.com/sitemap-index.xml
Sitemap: https://site. com/page-sitemap.xml
Sitemap: https://site. com/post-sitemap.xml
Sitemap: https://site. com/product-sitemap.xml
Sitemap: https://site. com/user-sitemap.xml

The tool google search console, is Google's own tool for use by Webmasters. It offers greater control over how sitemaps work. Showing your mistakes and adjustments to be made.

It is essential to tell Google the location of sitemaps in the archives robots txt.

It is not advisable to use plugins in the case of Wordpres to create robots.txt.

Here is a link to test your robots.txt file and include it in the Google Search Console.

Need help?