AbotX

A powerful C# web crawler that makes advanced crawling features easy to use. AbotX builds upon the open source Abot C# Web Crawler by providing a powerful set of wrappers and extensions.

  • Crawl multiple sites concurrently
  • Pause/resume live crawls
  • Render javascript before processing
  • Simplified pluggability/extensibility
  • Avoid getting blocked by sites
  • Automatically tune speed/concurrency

Parallel Crawler Engine

A crawler instance can crawl a single site quickly. However, if you have to crawl 10,000 sites quickly you need the ParallelCrawlerEngine. It allows you to crawl a configurable number of sites concurrently to maximize throughput.

See Tutorial »

Easy Override

Easy Override allows you to easily plugin in any implementation of a key interface in an easy to use object wrapper that handles nested dependencies for you. No matter how deep.

See Tutorial »

Pause And Resume

There may be times when you need to temporarily pause a crawl to clear disk space on the machine or run a resource intensive utility. No matter the reason, you can confidently Pause and Resume the crawler and it will continue on like nothing happened.

See Tutorial »

Javascript Rendering

Many web pages on the internet today use javascript to create the final page rendering. Most web crawlers do not render the javascript but instead just process the raw html sent back by the server. Use this feature to render javascript before processing.

See Tutorial »

Auto Throttling

Most websites you crawl cannot or will not handle the load of a web crawler. Auto Throttling automatically slows down the crawl speed if the website being crawled is showing signs of stress or unwillingness to respond to the frequency of http requests.

See Tutorial »

Auto Tuning

Its difficult to predict what your machine can handle when the sites you will crawl/process all require different levels of machine resources. Auto tuning automatically monitors the host machine's resource usage and adjusts the crawl speed and concurrency to maximize throughput without overrunning it.

See Tutorial »

Icon