SEOlytics Crawler

SEOlyticsCrawler is the crawler that SEOlytics uses to crawl the public web. Our goal is to make the internet more transparent by crawling the web and collecting public data for analyses. We offer an extensive free version (see below) of our tool, so even webmasters without a budget can get access to the data.

How SEOlyticsCrawler Interacts with your website

SEOlytics uses a web crawler that identifies as “SEOlyticsCrawler” to collect public web data. When we crawl your site we adhere to the “politeness policy” so that we don’t impact the performance of your site.

The SEOlyticsCrawler, in contrast to crawlers from search engines that seek to access and store all of a website’s content, is usually restricted to selected information on single pages and/or assessing the link-structure of single pages. All information is sporadically collected from publicly available sources, in which the webmaster knowingly intended to give public access to the information.

To assess how SEOlyticsCrawler is interacting with your site, the following user agent string can be used when evaluating statistics or web server logs:

Mozilla/5.0 (compatible; SEOlyticsCrawler/3.0; +http://crawler.seolytics.net/)

What is SEOlytics?

SEOlytics is a marketing analytics company that creates analytics software for online marketers and does market research. Some of our analyses require us to crawl websites. It is important to understand that we acquire much of the information published from a combination of publicly available sources, third-party data suppliers and other sources.

How can I identify the crawler and protect content?

It is practically impossible to keep web content secret without further actions. The moment someone places a link to your site they will be found by search engines and eventually by crawlers as well. Our crawler always uses this user agent in the request header when querying a website:

Mozilla/5.0 (compatible; SEOlyticsCrawler/3.0; +http://crawler.seolytics.net/)

The crawler respects the Robot Exclusion Standard and parses the content of the robots.txt file. Please make sure the crawler can actually retrieve robots.txt itself – if it can’t then it will assume (this is the industry practice) that it’s okay to visit your site.

If you feel the need to protect content from the crawler you can add rules like this to your robots.txt

User-agent: SEOlyticsCrawler
Disallow: /secret-sauce.html

Collection and processing of data

Data is processed and protected in accordance with legal regulations and “netiquette”. When we crawl your site we adhere to the “politeness policy” so that we don’t impact the performance of your site.

We do this by dynamically controlling (throttling) the amount of URLs that we crawl in a given period. If we determine that your site is responding slowly or there are network issues, we extend the crawl intervals. Following our policy protects your site from excessive load.

In addition you can easily slow down the crawler even further yourself by adding a crawl-delay to your robots.txt file:

User-Agent: SEOlyticsCrawler
Crawl-Delay: 1

Crawl-Delay should be an integer number and it signifies number of seconds of wait between requests. Using a high crawl-delay should further minimize impact on your site. This Crawl-Delay parameter will also be active if it was used for * wildcard.

Where to get access to the free SEOlytics Starter

To provide all webmasters with free data to gain online marketing insights, SEOlytics offers a Starter version – our free tool is no simple trial! You can generate your own account here:

Report problems

We are constantly striving to improve the quality of our crawler and welcome any feedback. If you have further questions or want to provide feedback, please contact crawler@seolytics.net