Learn How To Manage Your Soft 404 Pages

Learn How To Manage Your Soft 404 Pages

Soft 404s – A detailed study!Have you ever clicked on a link of a web page only to receive a message that the page doesn't exist?  Well this is a 404 error or 'page not found'.  The tricky thing is that sometimes search engines aren't aware of a 404 and think its a real page.  We call this a soft 404.  Warning we are about to get real techie on you here!

The '404' or 'Not Found' error message is an HTTP standard response code indicating that the server could not find the file that was requested. Both 404 and soft 404 are indicating the same thing - that the requested file is not present on the web server. But a soft 404 will return the server status code of 200 (OK) which tells a search engine that the page exists. (which it doesn't) So obviously it is now necessary for us to learn how to tell search engines that soft 404 pages are actually 404 pages (broken links) and should therefore not crawl them.

'Soft 404s' in Google Webmaster tools is one of Google's latest updates.  They give you more control over the robots that are crawling your web pages. Why is this important? Search engines are very concerned about the web page’s server status code. If many of your web pages have a “404” set as server status code, your site will be considered low quality and will be pushed down in search engine rankings. Usually we try hard to find and fix the broken links that occur in a website. In some cases we may have a hard time in finding the broken links.

Soft 404s – What Does This Mean?

In short, broken links that are labeled as 200 (OK – server status code) by web servers are called soft 404s.

How can a web server report broken links as 200 (OK)?
Incorrect custom 404 page set up makes the web server report all the broken links (404) in your website as 200 (OK).

Correct Setup – Error Document 404 /unknown-page.php (Relative path is used to define the custom 404 page)

Incorrect Setup – Error Document 404 (https://www.techwyse.com/unknown-page.php) (Absolute path is used to define the custom 404 page)

If you have an absolute path for your custom 404 page in error document set up it will make the web server report broken links as 200 (OK) server status code and not a proper 404. Tools like Xenu or GPablo (or other broken link check tools) will not be able to find the broken link if you have an absolute path in the error document set up.

In this case, you cannot identify the broken links present in your website. Search engines also will consider this as an existing page and crawl it. This may end up with a duplication penalty if the URL matches with any other pages in your website.

Google has started showing these soft 404s in Google Webmaster tools.

Technically Google should report only the pages that do not exist and have 200 (OK) server status code. But now Google Webmaster Tools is reporting some 404 pages in the soft 404's list which Google will have to correct.

Following are the actions required against each of your soft 404s listed in Google webmaster tools for your website.

1) Page contains the correct content and properly returns a 200 response - Not actually a soft 404 and no action required

2) Page returning 404 status response - Not actually a soft 404 and no action required

3) Page doesn’t exist but returns a 200 response code - 301 redirect to a more accurate URL

If you are sure about these conditions, your site is completely free from broken links. Search engine robots are always interested in crawling strong content driven web pages. Make sure to let them visit only your valid pages and not the unwanted ones.

It's a competitive market. Contact us to learn how you can stand out from the crowd.

Read Similar Blogs

Post a Comment

7 Comments

  • avatar

    Thanks for explaining this in detail Elan.

  • avatar

    Thank you so much for this article – at last I understand what has been happening on my sites and using your correction of relative path did the trick……

  • avatar

    @Dan: Sites that are developed in .net or other development platforms don't have the chances of getting 200 (server status code) for broken links and thats why i didn't mention here.

  • avatar

    Elan, informative read about soft 404. Thanks much for detailing it vividly. Now I clearly got the logic behind the concept. Thanks for that clarification and comments.

  • avatar

    Ah thats such a great insight Elan. I thought that all pages that do not exist are considered as 404 by Google. But to know about the 'status code' is really a revelation. I think the research you have done is great. 
    Does Webmaster consider or differentiate the soft 404 and 404 pages seperately and show them too? 

  • avatar

    This one is worth noting! I just found one in my site. Good initiative from google to update it in webmaster tools

  • avatar

    What about sites made in .Net?  It seems everyone assumes PHP is the ONLY development platform out there 😀

Ready To Rule The First Page of Google?

Contact us for an exclusive 20-minute assessment & strategy discussion. Fill out the form, and we will get back to you right away!

What Our Clients Have To Say

L
Luciano Zeppieri
S
Sharon Tierney
S
Sheena Owen
A
Andrea Bodi - Lab Works
D
Dr. Philip Solomon MD