Network Transport¶
Network transport is a component which utilize one of well known 3rd-party network packages to do network requests and retrieve network response. At the moment Grab supports only one network library: urllib3. You may access transport object with Grab.transport attribute. In most cases you do not need direct access to transport object.
Urllib3 transport¶
This transport also could be used in gevent environment. The urllib3 uses native python sockets that could be patched by gevent.monkey.patch_all.
import gevent
import gevent.monkey
from grab import Grab
import time
def worker():
g = Grab(transport='urllib3')
# Request the document that is served with 1 second delay
g.request('http://httpbin.org/delay/1')
return g.doc.json['headers']['User-Agent']
started = time.time()
gevent.monkey.patch_all()
pool = []
for _ in range(10):
pool.append(gevent.spawn(worker))
for th in pool:
th.join()
assert th.value == 'Medved'
# The total time would be less than 2 seconds
# unless you have VERY bad internet connection
assert (time.time() - started) < 2
Use your own transport¶
You can implement you own transport class and use it. Just pass your transport class to transport option.
Here is minimal example to build Grab transport powered by wget.
import email.message
from contextlib import contextmanager
from subprocess import check_output
from grab import Grab
from grab.base_transport import BaseTransport
from grab.document import Document
class WgetTransport(BaseTransport):
def reset(self):
pass
def process_config(self, grab_config, cookies):
self._request_url = grab_config["url"]
def request(self):
out = check_output(["/usr/bin/wget", "-O", "-", self._request_url])
self._response_body = out
def prepare_response(self, grab_config, *, document_class=Document):
return document_class(
grab_config=grab_config,
body=self._response_body,
headers=email.message.Message(),
)
@contextmanager
def wrap_transport_error(self):
yield
g = Grab(transport=WgetTransport)
g.request("https://github.com")
print(g.doc("//title").text())
assert "github" in g.doc("//title").text().lower()