
zou@zou-VirtualBox:~/qsbk$ tree.items.pyqsbknit__.py items.py pipelines.py settings.py spiders _init__.py qsbk_spider.py scrapy.cfg -------------------------vi items.py from scrapy.item import Item,Field class TutorialItem(Item): # define the fields for your item here like: # name = Field() pass class Qsbk(Item): title = Field() link = Field() desc = Field() -----------------------vi qsbk/spiders/qsbk_spider.py from scrapy.spider import Spider class QsbkSpider(Spider): name = "qsbk" allowed_domains = ["qiushibaike.com"] start_urls = ["http://www.qiushibaike.com"] def parse(self, response): filename = response open(filename, 'wb').write(response.body) ------------------------然后我 scrapy shell www.qiushibaike.com 想先把网页取下来,再xpath里面的子节点(即一些内容)这个想法应该没错吧,但是到scrapy shell www.qiushibaike.com的时候网页内容就无法显示了,错误反馈:Python code?12345678910111213141516171819202122232425262728293031323334zou@zou-VirtualBox:~/qsbk$ scrapy shell http://www.qiushibaike.com/home/zou/qsbk/qsbk/spiders/qsbk_spider.py:1: ScrapyDeprecationWarning: Module `scrapy.spider` is deprecated, use `scrapy.spiders` instead from scrapy.spider import Spider2015-12-21 00:18:30 [scrapy] INFO: Scrapy 1.0.3 started (bot: qsbk)2015-12-21 00:18:30 [scrapy] INFO: Optional features available: ssl, http112015-12-21 00:18:30 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'qsbk.spiders', 'SPIDER_MODULES': ['qsbk.spiders'], 'LOGSTATS_INTERVAL': 0, 'BOT_NAME': 'qsbk'}2015-12-21 00:18:30 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, CoreStats, SpiderState2015-12-21 00:18:30 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats2015-12-21 00:18:30 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware2015-12-21 00:18:30 [scrapy] INFO: Enabled item pipelines: 2015-12-21 00:18:30 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:60232015-12-21 00:18:30 [scrapy] INFO: Spider opened2015-12-21 00:18:30 [scrapy] DEBUG: Retrying
