您当前位置:首页 > 深入 Python > HTTP Web 服务 > 处理重定向 | << >> | ||||
深入 Python从 Python 新手到专家 |
您可以使用不同类型的自定义 URL 处理程序来支持永久和临时重定向。
首先,让我们看看为什么首先需要重定向处理程序。
>>> import urllib2, httplib >>> httplib.HTTPConnection.debuglevel = 1>>> request = urllib2.Request( ... 'http://diveintomark.org/redir/example301.xml')
>>> opener = urllib2.build_opener() >>> f = opener.open(request) connect: (diveintomark.org, 80) send: ' GET /redir/example301.xml HTTP/1.0 Host: diveintomark.org User-agent: Python-urllib/2.1 ' reply: 'HTTP/1.1 301 Moved Permanently\r\n'
header: Date: Thu, 15 Apr 2004 22:06:25 GMT header: Server: Apache/2.0.49 (Debian GNU/Linux) header: Location: http://diveintomark.org/xml/atom.xml
header: Content-Length: 338 header: Connection: close header: Content-Type: text/html; charset=iso-8859-1 connect: (diveintomark.org, 80) send: ' GET /xml/atom.xml HTTP/1.0
Host: diveintomark.org User-agent: Python-urllib/2.1 ' reply: 'HTTP/1.1 200 OK\r\n' header: Date: Thu, 15 Apr 2004 22:06:25 GMT header: Server: Apache/2.0.49 (Debian GNU/Linux) header: Last-Modified: Thu, 15 Apr 2004 19:45:21 GMT header: ETag: "e842a-3e53-55d97640" header: Accept-Ranges: bytes header: Content-Length: 15955 header: Connection: close header: Content-Type: application/atom+xml >>> f.url
'http://diveintomark.org/xml/atom.xml' >>> f.headers.dict {'content-length': '15955', 'accept-ranges': 'bytes', 'server': 'Apache/2.0.49 (Debian GNU/Linux)', 'last-modified': 'Thu, 15 Apr 2004 19:45:21 GMT', 'connection': 'close', 'etag': '"e842a-3e53-55d97640"', 'date': 'Thu, 15 Apr 2004 22:06:25 GMT', 'content-type': 'application/atom+xml'} >>> f.status Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: addinfourl instance has no attribute 'status'
这并非最佳选择,但很容易修复。当 urllib2 遇到 301 或 302 时,它的行为并不完全符合您的期望,所以让我们覆盖它的行为。怎么做?使用自定义 URL 处理程序,就像您处理 304 代码一样。
此类在 openanything.py 中定义。
class SmartRedirectHandler(urllib2.HTTPRedirectHandler):def http_error_301(self, req, fp, code, msg, headers): result = urllib2.HTTPRedirectHandler.http_error_301(
self, req, fp, code, msg, headers) result.status = code
return result def http_error_302(self, req, fp, code, msg, headers):
result = urllib2.HTTPRedirectHandler.http_error_302( self, req, fp, code, msg, headers) result.status = code return result
那么这给我们带来了什么?您现在可以使用自定义重定向处理程序构建 URL 打开程序,它仍然会自动跟随重定向,但现在它还会公开重定向状态码。
>>> request = urllib2.Request('http://diveintomark.org/redir/example301.xml') >>> import openanything, httplib >>> httplib.HTTPConnection.debuglevel = 1 >>> opener = urllib2.build_opener( ... openanything.SmartRedirectHandler())>>> f = opener.open(request) connect: (diveintomark.org, 80) send: 'GET /redir/example301.xml HTTP/1.0 Host: diveintomark.org User-agent: Python-urllib/2.1 ' reply: 'HTTP/1.1 301 Moved Permanently\r\n'
header: Date: Thu, 15 Apr 2004 22:13:21 GMT header: Server: Apache/2.0.49 (Debian GNU/Linux) header: Location: http://diveintomark.org/xml/atom.xml header: Content-Length: 338 header: Connection: close header: Content-Type: text/html; charset=iso-8859-1 connect: (diveintomark.org, 80) send: ' GET /xml/atom.xml HTTP/1.0 Host: diveintomark.org User-agent: Python-urllib/2.1 ' reply: 'HTTP/1.1 200 OK\r\n' header: Date: Thu, 15 Apr 2004 22:13:21 GMT header: Server: Apache/2.0.49 (Debian GNU/Linux) header: Last-Modified: Thu, 15 Apr 2004 19:45:21 GMT header: ETag: "e842a-3e53-55d97640" header: Accept-Ranges: bytes header: Content-Length: 15955 header: Connection: close header: Content-Type: application/atom+xml >>> f.status
301 >>> f.url 'http://diveintomark.org/xml/atom.xml'
相同的重定向处理程序还可以告诉您不应该更新您的地址簿。
>>> request = urllib2.Request( ... 'http://diveintomark.org/redir/example302.xml')>>> f = opener.open(request) connect: (diveintomark.org, 80) send: ' GET /redir/example302.xml HTTP/1.0 Host: diveintomark.org User-agent: Python-urllib/2.1 ' reply: 'HTTP/1.1 302 Found\r\n'
header: Date: Thu, 15 Apr 2004 22:18:21 GMT header: Server: Apache/2.0.49 (Debian GNU/Linux) header: Location: http://diveintomark.org/xml/atom.xml header: Content-Length: 314 header: Connection: close header: Content-Type: text/html; charset=iso-8859-1 connect: (diveintomark.org, 80) send: ' GET /xml/atom.xml HTTP/1.0
Host: diveintomark.org User-agent: Python-urllib/2.1 ' reply: 'HTTP/1.1 200 OK\r\n' header: Date: Thu, 15 Apr 2004 22:18:21 GMT header: Server: Apache/2.0.49 (Debian GNU/Linux) header: Last-Modified: Thu, 15 Apr 2004 19:45:21 GMT header: ETag: "e842a-3e53-55d97640" header: Accept-Ranges: bytes header: Content-Length: 15955 header: Connection: close header: Content-Type: application/atom+xml >>> f.status
302 >>> f.url http://diveintomark.org/xml/atom.xml
<< 处理 Last-Modified 和 ETag |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
处理压缩数据 >> |