The fast, flexible & elegant library for parsing and manipulating HTML and XML.
A specification compliant robots.txt parser with wildcard (*) matching support.
A library to easily scrape metadata from an article on the web using Open Graph, JSON+LD, regular HTML metadata, and series of fallbacks.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.