Module grab.document

The Document class is the result of network request made with Grab instance.

class grab.document.Document(grab=None)[source]
Document (in most cases it is a network response
i.e. result of network request)
browse()[source]

Save response in temporary file and open it in GUI browser.

choose_form(number=None, xpath=None, name=None, **kwargs)[source]

Set the default form.

Parameters:
  • number – number of form (starting from zero)
  • id – value of “id” attribute
  • name – value of “name” attribute
  • xpath – XPath query
Raises:

DataNotFound if form not found

Raises:

GrabMisuseError if method is called without parameters

Selected form will be available via form attribute of Grab instance. All form methods will work with default form.

Examples:

# Select second form
g.choose_form(1)

# Select by id
g.choose_form(id="register")

# Select by name
g.choose_form(name="signup")

# Select by xpath
g.choose_form(xpath='//form[contains(@action, "/submit")]')
copy(new_grab=None)[source]

Clone the Response object.

detect_charset()[source]

Detect charset of the response.

Try following methods: * meta[name=”Http-Equiv”] * XML declaration * HTTP Content-Type header

Ignore unknown charsets.

Use utf-8 as fallback charset.

form

This attribute points to default form.

If form was not selected manually then select the form which has the biggest number of input elements.

The form value is just an lxml.html form element.

Example:

g.go('some URL')
# Choose form automatically
print g.form

# And now choose form manually
g.choose_form(1)
print g.form
form_fields()[source]

Return fields of default form.

Fill some fields with reasonable values.

get_form_request(submit_name=None, url=None, extra_post=None, remove_from_post=None)[source]

Submit default form.

Parameters:
  • submit_name – name of button which should be “clicked” to submit form
  • url – explicitly specify form action url
  • extra_post – (dict or list of pairs) additional form data which will override data automatically extracted from the form.
  • remove_from_post – list of keys to remove from the submitted data

Following input elements are automatically processed:

  • input[type=”hidden”] - default value
  • select: value of last option
  • radio - ???
  • checkbox - ???

Multipart forms are correctly recognized by grab library.

json

Return response body deserialized into JSON object.

parse(charset=None, headers=None)[source]

Parse headers.

This method is called after Grab instance performs network request.

pyquery

Returns pyquery handler.

query_param(key)[source]

Return value of parameter in query string.

rex_assert(rex, byte=False)[source]

If rex expression is not found then raise DataNotFound exception.

Search the regular expression in response body.

Parameters:byte – if False then search is performed in

response.unicode_body() else the rex is searched in response.body.

Note: if you use default non-byte mode than do not forget to build your regular expression with re.U flag.

Return found match object or None

rex_text(regexp, flags=0, byte=False, default=<object object>)[source]

Search regular expression in response body and return content of first matching group.

Parameters:byte – if False then search is performed in

response.unicode_body() else the rex is searched in response.body.

save(path)[source]

Save response body to file.

save_hash(location, basedir, ext=None)[source]

Save response body into file with special path builded from hash. That allows to lower number of files per directory.

Parameters:
  • location – URL of file or something else. It is used to build the SHA1 hash.
  • basedir – base directory to save the file. Note that file will not be saved directly to this directory but to some sub-directory of basedir
  • ext – extension which should be appended to file name. The dot is inserted automatically between filename and extension.
Returns:

path to saved file relative to basedir

Example:

>>> url = 'http://yandex.ru/logo.png'
>>> g.go(url)
>>> g.response.save_hash(url, 'some_dir', ext='png')
'e8/dc/f2918108788296df1facadc975d32b361a6a.png'
# the file was saved to $PWD/some_dir/e8/dc/...

TODO: replace basedir with two options: root and save_to. And returns save_to + path

set_input(name, value)[source]

Set the value of form element by its name attribute.

Parameters:
  • name – name of element
  • value – value which should be set to element

To check/uncheck the checkbox pass boolean value.

Example:

g.set_input('sex', 'male')

# Check the checkbox
g.set_input('accept', True)
set_input_by_id(_id, value)[source]

Set the value of form element by its id attribute.

Parameters:
  • _id – id of element
  • value – value which should be set to element
set_input_by_number(number, value)[source]

Set the value of form element by its number in the form

Parameters:
  • number – number of element
  • value – value which should be set to element
set_input_by_xpath(xpath, value)[source]

Set the value of form element by xpath

Parameters:
  • xpath – xpath path
  • value – value which should be set to element
text_assert(anchor, byte=False)[source]

If anchor is not found then raise DataNotFound exception.

text_assert_any(anchors, byte=False)[source]

If no anchors were found then raise DataNotFound exception.

Search the substring in response body.

Parameters:
  • anchor – string to search
  • byte – if False then anchor should be the unicode string, and search will be performed in response.unicode_body() else anchor should be the byte-string and search will be performed in response.body

If substring is found return True else False.

tree

Return DOM tree of the document built with HTML DOM builder.

unicode_body(ignore_errors=True, fix_special_entities=True)[source]

Return response body as unicode string.

url_details()[source]

Return result of urlsplit function applied to response url.

xml_tree

Return DOM-tree of the document built with XML DOM builder.

grab.document.read_bom(data)[source]

Read the byte order mark in the text, if present, and return the encoding represented by the BOM and the BOM.

If no BOM can be detected, (None, None) is returned.