What is the Smart and Simple Web Crawler?
* Smart and easy framework that crawls a web site
* Integrated Lucene support
* It's simple to integrate the framework in own applications
* The crawler can start from one or from a list of links
* Two crawling models available:
o Max Iterations: Crawls a web site through a limited number of links: Fast model with a small memory footprint and cpu usage.
o Max Depth: A simple graph model parser without recording in and outcoming links. Fast as the max interations model.
* Accept filter interface to limit the links to be crawled
* Core accept filters available: ServerFilter, BeginningPathFilter and RegularExpressionFilter
* Combining the accept filters with AND, OR and NOT possible
* Plugable http connection libraries HttpClient (default) and HTMLParser (optional)
* Own listeners can be added in the parsing process or before and after loading a page
* The framework is not a GUI based tool to mirror a website and browse the site offline!