A Robots.txt file, put very simply, is a file that can be created & uploaded to your website and is used to inform the various search engine crawlers that visit your site how to index your site’s content.
By default, most major search engines will index any page of your website that can be found that doesn’t specifically tell the crawler not to index it. The reason for this is because search engines in general aim to index as much quality information as possible & will generally assume that they can crawl and index anything they find, unless you specifically tell them otherwise.
This is where a Robots.txt file can come in handy, by giving the crawlers a bit more direction on which content to avoid. Some common uses for a robots.txt file range from things such as blocking sensitive pages of your site (like admin pages) from being searchable by the public, or blocking a specific file/folder on your website that your don’t want crawled. Truth is, adding/updating your robots.txt file is a great way to do some general indexing cleanup for your website and possibly even help increase your website’s rankings along the way.
To start, simply create a text file called robots.txt and place it in the root folder of your website.
Example: https://wordjack.com/robots.txt
Once the file has been uploaded, you will need to go back in and edit it and add some instructions that tell the crawlers what pages/files/folders you want blocked from being crawled. Like most code, these instruction must be written in a specific format in order to work properly. There are a wide range of funcations that can be used in a robots.txt file, but some of the more common ones include:
1. Blocking your entire site from being indexed (commonly used while a site is still in development).
User-agent: *
Disallow: /
Note: You will want to remove this from your robots.txt file once your site is done, and you want it indexed.
2. Blocking a specific folder of your website from being indexed (commonly used for blocking entire admin/back-end/theme folders of a website).
User-agent: *
Disallow: /folder-name-here/
3. Blocking a specific file of your website from being indexed (commonly used for blocking specific images/media from being added to the crawlers’ index).
User-agent: *
Disallow: /folder-name-here/file-name-here.jpg
4. Blocking a specific page of your website from being indexed (commonly used for blocking duplicate content or unwanted pages of your website).
User-agent: *
Disallow: /folder-name-here/page-name-here.htm
To learn more about robots.txt files and how to use them to start cleaning up your website’s unwanted indexed pages, visit robotstxt.org or check out this cool Robots.txt Generator from SEO Book, which makes creating a robots.txt file for your website a breeze.
—
WordJack Media provides a wide range of website design and online marketing solutions to clients throughout Canada and the US, including Collingwood ON, Ottawa ON, Barrie ON, Miami FL, Lakeland FL, Orlando FL, Charlotte NC, Hickory NC, Asheville NC and more. Contact WordJack Media today for more information about how we can help your business win on the web!