Module grab.document¶
The Document class is the result of network request made with Grab instance.
- class grab.document.Document(grab=None)[source]¶
- Document (in most cases it is a network response
i.e. result of network request)
- choose_form(number=None, xpath=None, name=None, **kwargs)[source]¶
Set the default form.
- Parameters:
number – number of form (starting from zero)
id – value of “id” attribute
name – value of “name” attribute
xpath – XPath query
- Raises:
DataNotFoundif form not found- Raises:
GrabMisuseErrorif method is called without parameters
Selected form will be available via form attribute of Grab instance. All form methods will work with default form.
Examples:
# Select second form g.choose_form(1) # Select by id g.choose_form(id="register") # Select by name g.choose_form(name="signup") # Select by xpath g.choose_form(xpath='//form[contains(@action, "/submit")]')
- property form¶
This attribute points to default form.
If form was not selected manually then select the form which has the biggest number of input elements.
The form value is just an lxml.html form element.
Example:
g.go('some URL') # Choose form automatically print g.form # And now choose form manually g.choose_form(1) print g.form
- get_form_request(submit_name=None, url=None, extra_post=None, remove_from_post=None)[source]¶
Submit default form.
- Parameters:
submit_name – name of button which should be “clicked” to submit form
url – explicitly specify form action url
extra_post – (dict or list of pairs) additional form data which will override data automatically extracted from the form.
remove_from_post – list of keys to remove from the submitted data
Following input elements are automatically processed:
input[type=”hidden”] - default value
select: value of last option
radio - ???
checkbox - ???
Multipart forms are correctly recognized by grab library.
- property json¶
Return response body deserialized into JSON object.
- parse(charset=None, headers=None)[source]¶
Parse headers.
This method is called after Grab instance performs network request.
- property pyquery¶
Returns pyquery handler.
- rex_assert(rex, byte=False)[source]¶
If rex expression is not found then raise DataNotFound exception.
- rex_search(regexp, flags=0, byte=False, default=<object object>)[source]¶
Search the regular expression in response body.
- Parameters:
byte – if False then search is performed in
response.unicode_body() else the rex is searched in response.body.
Note: if you use default non-byte mode than do not forget to build your regular expression with re.U flag.
Return found match object or None
- rex_text(regexp, flags=0, byte=False, default=<object object>)[source]¶
Search regular expression in response body and return content of first matching group.
- Parameters:
byte – if False then search is performed in
response.unicode_body() else the rex is searched in response.body.
- save_hash(location, basedir, ext=None)[source]¶
Save response body into file with special path builded from hash. That allows to lower number of files per directory.
- Parameters:
location – URL of file or something else. It is used to build the SHA1 hash.
basedir – base directory to save the file. Note that file will not be saved directly to this directory but to some sub-directory of basedir
ext – extension which should be appended to file name. The dot is inserted automatically between filename and extension.
- Returns:
path to saved file relative to basedir
Example:
>>> url = 'http://yandex.ru/logo.png' >>> g.go(url) >>> g.response.save_hash(url, 'some_dir', ext='png') 'e8/dc/f2918108788296df1facadc975d32b361a6a.png' # the file was saved to $PWD/some_dir/e8/dc/...
TODO: replace basedir with two options: root and save_to. And returns save_to + path
- set_input(name, value)[source]¶
Set the value of form element by its name attribute.
- Parameters:
name – name of element
value – value which should be set to element
To check/uncheck the checkbox pass boolean value.
Example:
g.set_input('sex', 'male') # Check the checkbox g.set_input('accept', True)
- set_input_by_id(_id, value)[source]¶
Set the value of form element by its id attribute.
- Parameters:
_id – id of element
value – value which should be set to element
- set_input_by_number(number, value)[source]¶
Set the value of form element by its number in the form
- Parameters:
number – number of element
value – value which should be set to element
- set_input_by_xpath(xpath, value)[source]¶
Set the value of form element by xpath
- Parameters:
xpath – xpath path
value – value which should be set to element
- text_assert_any(anchors, byte=False)[source]¶
If no anchors were found then raise DataNotFound exception.
- text_search(anchor, byte=False)[source]¶
Search the substring in response body.
- Parameters:
anchor – string to search
byte – if False then anchor should be the unicode string, and search will be performed in response.unicode_body() else anchor should be the byte-string and search will be performed in response.body
If substring is found return True else False.
- property tree¶
Return DOM tree of the document built with HTML DOM builder.
- unicode_body(ignore_errors=True, fix_special_entities=<object object>)[source]¶
Return response body as unicode string.
- property xml_tree¶
Return DOM-tree of the document built with XML DOM builder.