Monday, July 29, 2019

Google Open Source Robots.txt Parser

For those interested in seeing how their websites components are indexed (or not), Googles decision to open source their robots.txt parser is an amazing bit of news.  Webmasters have been struggling with understanding the robots.txt files for many years.  The challenge was not so much how to write and declare the directives in the files; more so to fully comprehend what actions would be taken by each search engine. While there is a single, de-facto standard, the Robots Exclusion Protocol (REP),  the manner in which the corner cases were handled was ambiguous, like when their text editor included BOM characters in their robots.txt files.

On July 1, 2019, Google announced they are spearheading the effort to make the REP an internet standard. Thank you, Google. We applaud your moves thunderously!

<p>Webstation has set up a public Github repository where you can download a snapshot of the full C++ library.  It is an Apache license.  Full instructions on how to build and run the library are included.

<p>Read the full article at

No comments:

Post a Comment

Do not spam this blog! Google and Yahoo DO NOT follow comment links for SEO. If you post an unrelated link advertising a company or service, you will be reported immediately for spam and your link deleted within 30 minutes. If you want to sponsor a post, please let us know by reaching out to duane dot nickull at gmail dot com.