Auto Tuning

Its difficult to predict what your machine can handle when the sites you will crawl/process all require different levels of machine resources. Auto tuning automatically monitors the host machine's resource usage and adjusts the crawl speed and concurrency to maximize throughput without overrunning it.

Example Usage

var config = new CrawlConfigurationX
{
    AutoTuning = new AutoTuningConfig
    {
        IsEnabled = true,
        CpuThresholdHigh = 85,              //default
        CpuThresholdMed = 65,               //default
        MinAdjustmentWaitTimeInSecs = 30    //default
    },
    Accelerator = new AcceleratorConfig
    {
        ConcurrentSiteCrawlsIncrement = 2,      //default
        ConcurrentRequestIncrement = 2,         //default
        DelayDecrementInMilliseconds = 2000,    //default
        MinDelayInMilliseconds = 0,             //default
        ConcurrentSiteCrawlsMax = config.MaxConcurrentSiteCrawls,   //default is 0
        ConcurrentRequestMax = config.MaxConcurrentThreads          //default is 0
    },
    Decelerator = new DeceleratorConfig
    {
        ConcurrentSiteCrawlsDecrement = 2,      //default
        ConcurrentRequestDecrement = 2,         //default
        DelayIncrementInMilliseconds = 2000,    //default
        MaxDelayInMilliseconds = 15000,         //default
        ConcurrentSiteCrawlsMin = 1,            //default
        ConcurrentRequestMin = 1                //default
    },
    MaxRetryCount = 3,
};

Using CrawlerX (single instance of a crawler)

var crawler = new CrawlerX(config);
crawler.CrawlAsync(new Uri(url));

Using ParallelCrawlerEngine (multiple instances of crawlers)

var crawlEngine = new ParallelCrawlerEngine(config);

Configure the sensitivity to what will trigger tuning

Name Description Used By
config.AutoTuning.IsEnabled Whether to enable the AutoTuning feature CrawlerX, ParallelCrawlerEngine
config.AutoTuning.CpuThresholdHigh The avg cpu percentage before considering a host as under high stress CrawlerX, ParallelCrawlerEngine
config.AutoTuning.CpuThresholdMed The avg cpu percentage before considering a host as under medium stress CrawlerX, ParallelCrawlerEngine
config.AutoTuning.MinAdjustmentWaitTimeInSecs The minimum number of seconds since the last tuned action to wait before attempting to check/adjust tuning again. We want to give the last adjustment a chance to work before adjusting again. CrawlerX, ParallelCrawlerEngine

Configure how agressively to speed up

Name Description Used By
config.Accelerator.ConcurrentSiteCrawlsIncrement The number to increment the MaxConcurrentSiteCrawls for each call the the SpeedUp() method. This deals with site crawl concurrency, NOT the number of concurrent http requests to a single site crawl. ParallelCrawlerEngine
config.Accelerator.ConcurrentRequestIncrement The number to increment the MaxConcurrentThreads for each call the the SpeedUp() method. This deals with the number of concurrent http requests for a single crawl. CrawlerX
config.Accelerator.DelayDecrementInMilliseconds If there is a configured (manual or programatically determined) delay in between requests to a site, this is the amount of milliseconds to remove from that configured value on every call to the SpeedUp() method. CrawlerX
config.Accelerator.MinDelayInMilliseconds If there is a configured (manual or programatically determined) delay in between requests to a site, this is the minimum amount of milliseconds to delay no matter how many calls to the SpeedUp() method. CrawlerX
config.Accelerator.ConcurrentSiteCrawlsMax The maximum amount of concurrent site crawls to allow no matter how many calls to the SpeedUp() method. ParallelCrawlerEngine
config.Accelerator.ConcurrentRequestMax The maximum amount of concurrent http requests to a single site no matter how many calls to the SpeedUp() method. CrawlerX

Configure how agressively to slow down

Name Description Used By
config.Decelerator.ConcurrentSiteCrawlsDecrement The number to decrement the MaxConcurrentSiteCrawls for each call the the SlowDown() method. This deals with site crawl concurrency, NOT the number of concurrent http requests to a single site crawl. ParallelCrawlerEngine
config.Decelerator.ConcurrentRequestDecrement The number to decrement the MaxConcurrentThreads for each call the the SlowDown() method. This deals with the number of concurrent http requests for a single crawl. CrawlerX
config.Decelerator.DelayIncrementInMilliseconds If there is a configured (manual or programatically determined) delay in between requests to a site, this is the amount of milliseconds to add to that configured value on every call to the SlowDown() method CrawlerX
config.Decelerator.MaxDelayInMilliseconds The maximum value the delay can be. CrawlerX
config.Decelerator.ConcurrentSiteCrawlsMin The minimum amount of concurrent site crawls to allow no matter how many calls to the SlowDown() method. ParallelCrawlerEngine
config.Decelerator.ConcurrentRequestMin The minimum amount of concurrent http requests to a single site no matter how many calls to the SlowDown() method. CrawlerX

Like other Abot/AbotX features, you can configure this through code or xml. To see the full AbotX xml config examples click here.


;