Module grab.document¶
The Document class is the result of network request made with Grab instance.
-
class
grab.document.
Document
(grab=None)[source]¶ - Document (in most cases it is a network response
- i.e. result of network request)
-
choose_form
(number=None, xpath=None, name=None, **kwargs)[source]¶ Set the default form.
Parameters: - number – number of form (starting from zero)
- id – value of “id” attribute
- name – value of “name” attribute
- xpath – XPath query
Raises: DataNotFound
if form not foundRaises: GrabMisuseError
if method is called without parametersSelected form will be available via form attribute of Grab instance. All form methods will work with default form.
Examples:
# Select second form g.choose_form(1) # Select by id g.choose_form(id="register") # Select by name g.choose_form(name="signup") # Select by xpath g.choose_form(xpath='//form[contains(@action, "/submit")]')
-
detect_charset
()[source]¶ Detect charset of the response.
Try following methods: * meta[name=”Http-Equiv”] * XML declaration * HTTP Content-Type header
Ignore unknown charsets.
Use utf-8 as fallback charset.
-
form
¶ This attribute points to default form.
If form was not selected manually then select the form which has the biggest number of input elements.
The form value is just an lxml.html form element.
Example:
g.go('some URL') # Choose form automatically print g.form # And now choose form manually g.choose_form(1) print g.form
-
get_form_request
(submit_name=None, url=None, extra_post=None, remove_from_post=None)[source]¶ Submit default form.
Parameters: - submit_name – name of button which should be “clicked” to submit form
- url – explicitly specify form action url
- extra_post – (dict or list of pairs) additional form data which will override data automatically extracted from the form.
- remove_from_post – list of keys to remove from the submitted data
Following input elements are automatically processed:
- input[type=”hidden”] - default value
- select: value of last option
- radio - ???
- checkbox - ???
Multipart forms are correctly recognized by grab library.
-
json
¶ Return response body deserialized into JSON object.
-
parse
(charset=None, headers=None)[source]¶ Parse headers.
This method is called after Grab instance performs network request.
-
pyquery
¶ Returns pyquery handler.
-
rex_assert
(rex, byte=False)[source]¶ If rex expression is not found then raise DataNotFound exception.
-
rex_search
(regexp, flags=0, byte=False, default=<object object>)[source]¶ Search the regular expression in response body.
Parameters: byte – if False then search is performed in response.unicode_body() else the rex is searched in response.body.
Note: if you use default non-byte mode than do not forget to build your regular expression with re.U flag.
Return found match object or None
-
rex_text
(regexp, flags=0, byte=False, default=<object object>)[source]¶ Search regular expression in response body and return content of first matching group.
Parameters: byte – if False then search is performed in response.unicode_body() else the rex is searched in response.body.
-
save_hash
(location, basedir, ext=None)[source]¶ Save response body into file with special path builded from hash. That allows to lower number of files per directory.
Parameters: - location – URL of file or something else. It is used to build the SHA1 hash.
- basedir – base directory to save the file. Note that file will not be saved directly to this directory but to some sub-directory of basedir
- ext – extension which should be appended to file name. The dot is inserted automatically between filename and extension.
Returns: path to saved file relative to basedir
Example:
>>> url = 'http://yandex.ru/logo.png' >>> g.go(url) >>> g.response.save_hash(url, 'some_dir', ext='png') 'e8/dc/f2918108788296df1facadc975d32b361a6a.png' # the file was saved to $PWD/some_dir/e8/dc/...
TODO: replace basedir with two options: root and save_to. And returns save_to + path
-
set_input
(name, value)[source]¶ Set the value of form element by its name attribute.
Parameters: - name – name of element
- value – value which should be set to element
To check/uncheck the checkbox pass boolean value.
Example:
g.set_input('sex', 'male') # Check the checkbox g.set_input('accept', True)
-
set_input_by_id
(_id, value)[source]¶ Set the value of form element by its id attribute.
Parameters: - _id – id of element
- value – value which should be set to element
-
set_input_by_number
(number, value)[source]¶ Set the value of form element by its number in the form
Parameters: - number – number of element
- value – value which should be set to element
-
set_input_by_xpath
(xpath, value)[source]¶ Set the value of form element by xpath
Parameters: - xpath – xpath path
- value – value which should be set to element
-
text_assert_any
(anchors, byte=False)[source]¶ If no anchors were found then raise DataNotFound exception.
-
text_search
(anchor, byte=False)[source]¶ Search the substring in response body.
Parameters: - anchor – string to search
- byte – if False then anchor should be the unicode string, and search will be performed in response.unicode_body() else anchor should be the byte-string and search will be performed in response.body
If substring is found return True else False.
-
tree
¶ Return DOM tree of the document built with HTML DOM builder.
-
unicode_body
(ignore_errors=True, fix_special_entities=True)[source]¶ Return response body as unicode string.
-
xml_tree
¶ Return DOM-tree of the document built with XML DOM builder.