`grab`¶

Submodules¶

Package Contents¶

Classes¶

`HttpClient`	Abstract base class for generic types.
`Document`	Network response.
`Grab`	Abstract base class for generic types.
`HttpRequest`

Functions¶

request(→ grab.document.Document)

Attributes¶

DataNotFound

class grab.HttpClient(transport: None | BaseTransport[RequestT, ResponseT] | type[BaseTransport[RequestT, ResponseT]] = None)[source]¶

Bases: grab.base.BaseClient[grab.request.HttpRequest, grab.document.Document]

Abstract base class for generic types.

A generic type is typically declared by inheriting from this class parameterized with one or more type variables. For example, a generic mapping type might be defined as:

class Mapping(Generic[KT, VT]):
    def __getitem__(self, key: KT) -> VT:
        ...
    # Etc.

This class can then be used as follows:

def lookup_name(mapping: Mapping[KT, VT], key: KT, default: VT) -> VT:
    try:
        return mapping[key]
    except KeyError:
        return default

document_class: type[grab.document.Document]¶

extension¶

request_class¶

default_transport_class¶

request(req: None | str | grab.request.HttpRequest = None, **request_kwargs: Any) → grab.document.Document[source]¶

process_request_result(req: grab.request.HttpRequest) → grab.document.Document[source]¶: Process result of real request performed via transport extension.

grab.request(url: None | str | grab.request.HttpRequest = None, client: None | HttpClient | type[HttpClient] = None, **request_kwargs: Any) → grab.document.Document[source]¶

Bases: grab.base.BaseResponse

Network response.

property status: None | int¶

property json: Any¶: Return response body deserialized into JSON object.

property pyquery: Any¶: Return pyquery handler.

property body: bytes¶

property tree: lxml.etree._Element¶: Return DOM tree of the document built with HTML DOM builder.

property form: lxml.html.FormElement¶

Return default document’s form.

If form was not selected manually then select the form which has the biggest number of input elements.

The form value is just an lxml.html form element.

Example:

g.request('some URL')
# Choose form automatically
print g.form

# And now choose form manually
g.choose_form(1)
print g.form

__slots__ = ('document_type', 'code', 'head', 'headers', 'url', 'cookies', 'encoding', '_bytes_body',...¶

__call__(query: str) → selection.SelectorList[lxml.etree._Element][source]¶

select(*args: Any, **kwargs: Any) → selection.SelectorList[lxml.etree._Element][source]¶

process_encoding(encoding: None | str = None) → str[source]¶

Process explicitly defined encoding or auto-detect it.

If encoding is explicitly defined, ensure it is a valid encoding the python can deal with. If encoding is not specified, auto-detect it.

Raises unicodec.InvalidEncodingName if explicitly set encoding is invalid.

copy() → Document[source]¶

save(path: str) → None[source]¶: Save response body to file.

url_details() → urllib.parse.SplitResult[source]¶: Return result of urlsplit function applied to response url.

query_param(key: str) → str[source]¶: Return value of parameter in query string.

browse() → None[source]¶: Save response in temporary file and open it in GUI browser.

__getstate__() → collections.abc.Mapping[str, Any][source]¶: Reset cached lxml objects which could not be pickled.

__setstate__(state: collections.abc.Mapping[str, Any]) → None[source]¶

text_search(anchor: str | bytes) → bool[source]¶

Search the substring in response body.

Parameters

anchor – string to search
byte – if False then anchor should be the unicode string, and search will be performed in response.unicode_body() else anchor should be the byte-string and search will be performed in response.body

If substring is found return True else False.

text_assert(anchor: str | bytes) → None[source]¶: If anchor is not found then raise DataNotFound exception.

text_assert_any(anchors: collections.abc.Sequence[str | bytes]) → None[source]¶: If no anchors were found then raise DataNotFound exception.

rex_text(regexp: str | bytes | re.Pattern[str] | re.Pattern[bytes], flags: int = 0, default: Any = UNDEFINED) → Any[source]¶: Return content of first matching group of regexp found in response body.

rex_search(regexp: str | bytes | re.Pattern[str] | re.Pattern[bytes], flags: int = 0, default: Any = UNDEFINED) → Any[source]¶

Search the regular expression in response body.

Return found match object or None

rex_assert(rex: str | bytes | re.Pattern[str] | re.Pattern[bytes]) → None[source]¶: Raise DataNotFound exception if rex expression is not found.

get_body_chunk() → bytes[source]¶

unicode_body() → str[source]¶: Return response body as unicode string.

classmethod wrap_io(inp: bytes | str) → io.StringIO | io.BytesIO[source]¶

classmethod _build_dom(content: bytes | str, mode: str, encoding: str) → lxml.etree._Element[source]¶

build_html_tree() → lxml.etree._Element[source]¶

build_xml_tree() → lxml.etree._Element[source]¶

choose_form(number: None | int = None, xpath: None | str = None, name: None | str = None, **kwargs: Any) → None[source]¶

Set the default form.

Parameters

number – number of form (starting from zero)
id – value of “id” attribute
name – value of “name” attribute
xpath – XPath query

Raises

DataNotFound if form not found

Raises

GrabMisuseError if method is called without parameters

Selected form will be available via form attribute of Grab instance. All form methods will work with default form.

Examples:

# Select second form
g.choose_form(1)

# Select by id
g.choose_form(id="register")

# Select by name
g.choose_form(name="signup")

# Select by xpath
g.choose_form(xpath='//form[contains(@action, "/submit")]')

get_cached_form() → lxml.html.FormElement[source]¶

Get form which has been already selected.

Returns None if form has not been selected yet.

It is for testing mainly. To not trigger pylint warnings about accessing protected element.

set_input(name: str, value: Any) → None[source]¶

Set the value of form element by its name attribute.

Parameters

name – name of element
value – value which should be set to element

To check/uncheck the checkbox pass boolean value.

Example:

g.set_input('sex', 'male')

# Check the checkbox
g.set_input('accept', True)

set_input_by_id(_id: str, value: Any) → None[source]¶

Set the value of form element by its id attribute.

Parameters

_id – id of element
value – value which should be set to element

set_input_by_number(number: int, value: Any) → None[source]¶

Set the value of form element by its number in the form.

Parameters

number – number of element
value – value which should be set to element

set_input_by_xpath(xpath: str, value: Any) → None[source]¶

Set the value of form element by xpath.

Parameters

xpath – xpath path
value – value which should be set to element

process_extra_post(post_items: list[tuple[str, Any]], extra_post_items: collections.abc.Sequence[tuple[str, Any]]) → list[tuple[str, Any]][source]¶

clean_submit_controls(post: collections.abc.MutableMapping[str, Any], submit_name: None | str) → None[source]¶

get_form_request(submit_name: None | str = None, url: None | str = None, extra_post: None | collections.abc.Mapping[str, Any] | collections.abc.Sequence[tuple[str, Any]] = None, remove_from_post: None | collections.abc.Sequence[str] = None) → FormRequestParams[source]¶

Submit default form.

Parameters

submit_name – name of button which should be “clicked” to submit form
url – explicitly specify form action url
extra_post – (dict or list of pairs) additional form data which will override data automatically extracted from the form.
remove_from_post – list of keys to remove from the submitted data

Following input elements are automatically processed:

input[type=”hidden”] - default value
select: value of last option
radio - ???
checkbox - ???

Multipart forms are correctly recognized by grab library.

build_fields_to_remove(fields: collections.abc.Mapping[str, Any], form_inputs: collections.abc.Sequence[lxml.html.HtmlElement]) → set[str][source]¶

process_form_fields(fields: collections.abc.MutableMapping[str, Any]) → None[source]¶

form_fields() → collections.abc.MutableMapping[str, lxml.html.HtmlElement][source]¶

Return fields of default form.

Fill some fields with reasonable values.

choose_form_by_element(xpath: str) → None[source]¶

grab.DataNotFound[source]¶

exception grab.GrabError[source]¶

Bases: Exception

All custom Grab exception should be children of that class.

exception grab.GrabMisuseError[source]¶

Bases: GrabError

Indicates incorrect usage of grab API.

exception grab.GrabNetworkError(*args: Any, **kwargs: Any)[source]¶

Bases: OriginalExceptionGrabError

Raises in case of network error.

exception grab.GrabTimeoutError(*args: Any, **kwargs: Any)[source]¶

Bases: GrabNetworkError

Raises when configured time is outed for the request.

class grab.Grab(transport: None | BaseTransport[RequestT, ResponseT] | type[BaseTransport[RequestT, ResponseT]] = None)[source]¶

Bases: grab.client.HttpClient

Abstract base class for generic types.

A generic type is typically declared by inheriting from this class parameterized with one or more type variables. For example, a generic mapping type might be defined as:

class Mapping(Generic[KT, VT]):
    def __getitem__(self, key: KT) -> VT:
        ...
    # Etc.

This class can then be used as follows:

def lookup_name(mapping: Mapping[KT, VT], key: KT, default: VT) -> VT:
    try:
        return mapping[key]
    except KeyError:
        return default

cookies¶

Bases: grab.base.BaseRequest

init_keys¶

get_full_url() → str[source]¶

_process_timeout_param(value: None | float | grab.util.timeout.Timeout) → grab.util.timeout.Timeout[source]¶

compile_request_data() → CompiledRequestData[source]¶

grab¶

Subpackages¶

Submodules¶

Package Contents¶

Classes¶

Functions¶

Attributes¶

`grab`¶