Javascript Rendering

Many web pages on the internet today use javascript to create the final page rendering. Most web crawlers do not render the javascript but instead just process the raw html sent back by the server. Use this feature to render javascript before processing.

Performance Considerations

Rendering javascript is a much slower operation than just requesting the page source. The browser has to make the initial request to the web server for the page source. Then it must request, wait for and load all the external resources. Care must be taken in how you configure AbotX when this feature is enabled. A modern machine with an intel I7 processor and 8+ gigs of ram could crawl 30-50 sites concurrently and each of those crawls spawning 10+ threads each. However if javascript rendering is enabled that same configuration would overwhelm the host machine

The recommended configuration for AbotX/Abot is that you only crawl a single site at a time and that the number of concurrent crawl threads match the number of cpus the host machine has. See the example configuration below.

Safe Configuration

The following is an example how to configure Abot/AbotX to run with javascript rendering enabled for a modern host machine that has an Intel I7-4710MQ processor and at least 16GB of ram. It has 4 cores and 8 logical processors. This machine should be able to handle this configuration under normal circumstances.

var config = new CrawlConfigurationX
{
    IsJavascriptRenderingEnabled = true,
    JavascriptRenderingWaitTimeInMilliseconds = 3000,         //How long to wait for js to process 
    MaxConcurrentSiteCrawls = 1,     //Only crawl a single site at a time
    MaxConcurrentThreads = 8,         //Logical processor count to avoid cpu thrashing
};
var crawler = new CrawlerX(config);

//Add optional decision whether javascript should be rendered
crawler.ShouldRenderPageJavascript((crawledPage, crawlContext) =>
{
    if(crawledPage.Uri.AbsoluteUri.Contains("ghost"))
        return new CrawlDecision {Allow = false, Reason = "scared to render ghost javascript"};

    return new CrawlDecision { Allow = true };
});
var crawlerTask = crawler.CrawlAsync(new Uri("http://blahblahblah.com"));

;