您好, 欢迎来到 !    登录 | 注册 | | 设为首页 | 收藏本站

如何编写一个DownloadHandler以便于通过socksipy发出请求的scrapy?

如何编写一个DownloadHandler以便于通过socksipy发出请求的scrapy?

做了之后pip install txsocksx,我需要更换scrapyScrapyAgent使用txsocksx.http.soCKS5Agent

我只是复制代码HTTP11DownloadHandler,并ScrapyAgentscrapy/core/downloader/handlers/http.py,子类他们写了这样的代码

class TorProxyDownloadHandler(HTTP11DownloadHandler):

    def download_request(self, request, spider):
        """Return a deferred for the HTTP download"""
        agent = ScrapyTorAgent(contextFactory=self._contextFactory, pool=self._pool)
        return agent.download_request(request)


class ScrapyTorAgent(ScrapyAgent):
    def _get_agent(self, request, timeout):
        bindaddress = request.Meta.get('bindaddress') or self._bindAddress
        proxy = request.Meta.get('proxy')
        if proxy:
            _, _, proxyHost, proxyPort, proxyParams = _parse(proxy)
            scheme = _parse(request.url)[0]
            omitConnectTunnel = proxyParams.find('noconnect') >= 0
            if  scheme == 'https' and not omitConnectTunnel:
                proxyConf = (proxyHost, proxyPort,
                             request.headers.get('Proxy-Authorization', None))
                return self._TunnelingAgent(reactor, proxyConf,
                    contextFactory=self._contextFactory, connectTimeout=timeout,
                    bindAddress=bindaddress, pool=self._pool)
            else:
                _, _, host, port, proxyParams = _parse(request.url)
                proxyEndpoint = TCP4ClientEndpoint(reactor, proxyHost, proxyPort,
                    timeout=timeout, bindAddress=bindaddress)
                agent = SOCKS5Agent(reactor, proxyEndpoint=proxyEndpoint)
                return agent

        return self._Agent(reactor, contextFactory=self._contextFactory,
            connectTimeout=timeout, bindAddress=bindaddress, pool=self._pool)

在settings.py中,需要执行以下操作:

DOWNLOAD_HANDLERS = {
    'http': 'crawler.http.TorProxyDownloadHandler'
}

现在通过诸如Tor之类的袜子代理与Scrapy进行代理。

其他 2022/1/1 18:22:13 有409人围观

撰写回答


你尚未登录,登录后可以

和开发者交流问题的细节

关注并接收问题和回答的更新提醒

参与内容的编辑和改进,让解决方法与时俱进

请先登录

推荐问题


联系我
置顶