A specification compliant robots.txt parser with wildcard (*) matching support.
samclarkealmost 2 years agoVery straightforward, event driven web crawler. Features a flexible queue interface and a basic cache mechanism with extensible backend.
simplecrawlerover 2 years agoJavaScript module detecting bots/crawlers/spiders via user-agent
mahovichover 2 years agoECMAScript parser that produces a Shift format AST
shapesecurityover 2 years agoCrawler is a ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.
bda-research6 months ago