Welcome to Grab’s documentation!¶
Useful Links¶
Source code: https://github.com/lorien/grab
Documentation: https://grab.readthedocs.io/en/latest/
Russian Web Scraping Chat Group: https://t.me/grablab_ru
English Web Scraping Chat Group: https://t.me/grablab
What is Grab?¶
Grab is a python framework for building web scrapers. With Grab you can build web scrapers of various complexity, from simple 5-line scripts to complex asynchronous website crawlers processing millions of web pages. Grab provides an API for performing network requests and for handling the received content e.g. interacting with DOM tree of the HTML document.
There are two main parts in the Grab library:
1) The single request/response API that allows you to build network request, perform it and work with the received content. The API is built on top of urllib3 and lxml libraries.
2) The Spider API to build asynchronous web crawlers. You write classes that define handlers for each type of network request. Each handler is able to spawn new network requests. Network requests are processed concurrently with a pool of asynchronous web sockets.
Table of Contents¶
Grab User Manual¶
- Grab Installation
- Testing Grab Framework
- Grab Quickstart
- Request Methods
- Setting up the Grab Request
- Grab Settings
- Debugging
- Work with HTTP Headers
- Redirect Handling
- Form Processing
- Network Errors Handling
- HTML Document Encoding
- Cookie Support
- Proxy Server Support
- Searching the response body
- Work With Network Response
- Network Transport
Grab::Spider User Manual¶
Grab::Spider is a framework to build well-structured asynchronous web-site crawlers.
API Reference¶
Using the API Reference you can get an overview of what modules, classes, and methods exist, what they do, what they return, and what parameters they accept.