scrapy学习
conda create -n scrapy-309 python3.09conda activate scrapy-309pip install scrapy2.6.3 Twisted22.10.0 urllib31.26.18 parsel1.7.0 -i https://pypi.tuna.tsinghua.edu.cn/simple创建项目scrapy startproject baidu_spiderscrapy genspider baidu https://www.baidu.com启动项目scrapy crawl baidu爬虫名不是文件名注释白名单# allowed_domains [www.winshangdata.com]指定形参类型from scrapy.http import HtmlResponse补充参数 **kwargsdef parse(self, response: HtmlResponse, **kwargs):pass关闭爬虫君子协议在settings.pyROBOTSTXT_OBEY False True改为False修改UA信息在settings.ps 修改DEFAULT_REQUEST_HEADERS为DEFAULT_REQUEST_HEADERS {User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36}import scrapy from scrapy.http import HtmlResponse class WinshangSpider(scrapy.Spider): name winshang # allowed_domains [www.winshangdata.com] start_urls [http://www.winshangdata.com/projectList] def parse(self, response: HtmlResponse, **kwargs): print(响应对象, response.text) if __name__ __main__: from scrapy import cmdline cmdline.execute(scrapy crawl winshang --nolog.split())
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2421047.html
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!