小说网站虚拟主机,关注公众号一单一结兼职app,企业文化墙装修效果图,北京做兼职网站什么是Requests Requests是用python语言基于urllib编写的#xff0c;采用的是Apache2 Licensed开源协议的HTTP库如果你看过上篇文章关于urllib库的使用#xff0c;你会发现#xff0c;其实urllib还是非常不方便的#xff0c;而Requests它会比urllib更加方便#xff0c;可以… 什么是Requests Requests是用python语言基于urllib编写的采用的是Apache2 Licensed开源协议的HTTP库如果你看过上篇文章关于urllib库的使用你会发现其实urllib还是非常不方便的而Requests它会比urllib更加方便可以节约我们大量的工作。用了requests之后你基本都不愿意用urllib了一句话requests是python实现的最简单易用的HTTP库建议爬虫使用requests库。 默认安装好python之后是没有安装requests模块的需要单独通过pip安装 requests功能详解 总体功能的一个演示 import requestsresponse requests.get(https://www.baidu.com)
print(type(response))
print(response.status_code)
print(type(response.text))
print(response.text)
print(response.cookies)
print(response.content)
print(response.content.decode(utf-8)) View Code 我们可以看出response使用起来确实非常方便这里有个问题需要注意一下很多情况下的网站如果直接response.text会出现乱码的问题所以这个使用response.content这样返回的数据格式其实是二进制格式然后通过decode()转换为utf-8这样就解决了通过response.text直接返回显示乱码的问题. 请求发出后Requests 会基于 HTTP 头部对响应的编码作出有根据的推测。当你访问 response.text 之时Requests 会使用其推测的文本编码。你可以找出 Requests 使用了什么编码并且能够使用 response.encoding 属性来改变它.如 response requests.get(http://www.baidu.com)
response.encodingutf-8
print(response.text) 不管是通过response.content.decode(utf-8)的方式还是通过response.encodingutf-8的方式都可以避免乱码的问题发生 各种请求方式 requests里提供个各种请求方式 import requests
requests.post(http://httpbin.org/post)
requests.put(http://httpbin.org/put)
requests.delete(http://httpbin.org/delete)
requests.head(http://httpbin.org/get)
requests.options(http://httpbin.org/get) 请求 基本GET请求 import requestsresponse requests.get(http://httpbin.org/get)
print(response.text) 带参数的GET请求例子1 import requestsresponse requests.get(http://httpbin.org/get?namezhaofanage23)
print(response.text) 如果我们想要在URL查询字符串传递数据通常我们会通过httpbin.org/get?keyval方式传递。Requests模块允许使用params关键字传递参数以一个字典来传递这些参数例子如下 import requests
data {name:zhaofan,age:22
}
response requests.get(http://httpbin.org/get,paramsdata)
print(response.url)
print(response.text) 上述两种的结果是相同的通过params参数传递一个字典内容从而直接构造url注意第二种方式通过字典的方式的时候如果字典中的参数为None则不会添加到url上 解析json import requests
import jsonresponse requests.get(http://httpbin.org/get)
print(type(response.text))
print(response.json())
print(json.loads(response.text))
print(type(response.json())) 从结果可以看出requests里面集成的json其实就是执行了json.loads()方法两者的结果是一样的 获取二进制数据 在上面提到了response.content这样获取的数据是二进制数据同样的这个方法也可以用于下载图片以及视频资源 添加headers和前面我们将urllib模块的时候一样我们同样可以定制headers的信息如当我们直接通过requests请求知乎网站的时候默认是无法访问的 import requests
response requests.get(https://www.zhihu.com)
print(response.text) 这样会得到如下的错误 因为访问知乎需要头部信息这个时候我们在谷歌浏览器里输入chrome://version,就可以看到用户代理将用户代理添加到头部信息 import requests
headers {User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
}
response requests.get(https://www.zhihu.com,headersheaders)print(response.text) 这样就可以正常的访问知乎了 基本POST请求 通过在发送post请求时添加一个data参数这个data参数可以通过字典构造成这样对于发送post请求就非常方便 import requestsdata {name:zhaofan,age:23
}
response requests.post(http://httpbin.org/post,datadata)
print(response.text) 同样的在发送post请求的时候也可以和发送get请求一样通过headers参数传递一个字典类型的数据 响应 我们可以通过response获得很多属性例子如下 import requestsresponse requests.get(http://www.baidu.com)
print(type(response.status_code),response.status_code)
print(type(response.headers),response.headers)
print(type(response.cookies),response.cookies)
print(type(response.url),response.url)
print(type(response.history),response.history) 结果如下 状态码判断Requests还附带了一个内置的状态码查询对象主要有如下内容 100: (continue,),101: (switching_protocols,),102: (processing,),103: (checkpoint,),122: (uri_too_long, request_uri_too_long),200: (ok, okay, all_ok, all_okay, all_good, \o/, ✓),201: (created,),202: (accepted,),203: (non_authoritative_info, non_authoritative_information),204: (no_content,),205: (reset_content, reset),206: (partial_content, partial),207: (multi_status, multiple_status, multi_stati, multiple_stati),208: (already_reported,),226: (im_used,), Redirection.300: (multiple_choices,),301: (moved_permanently, moved, \o-),302: (found,),303: (see_other, other),304: (not_modified,),305: (use_proxy,),306: (switch_proxy,),307: (temporary_redirect, temporary_moved, temporary),308: (permanent_redirect,resume_incomplete, resume,), # These 2 to be removed in 3.0 Client Error.400: (bad_request, bad),401: (unauthorized,),402: (payment_required, payment),403: (forbidden,),404: (not_found, -o-),405: (method_not_allowed, not_allowed),406: (not_acceptable,),407: (proxy_authentication_required, proxy_auth, proxy_authentication),408: (request_timeout, timeout),409: (conflict,),410: (gone,),411: (length_required,),412: (precondition_failed, precondition),413: (request_entity_too_large,),414: (request_uri_too_large,),415: (unsupported_media_type, unsupported_media, media_type),416: (requested_range_not_satisfiable, requested_range, range_not_satisfiable),417: (expectation_failed,),418: (im_a_teapot, teapot, i_am_a_teapot),421: (misdirected_request,),422: (unprocessable_entity, unprocessable),423: (locked,),424: (failed_dependency, dependency),425: (unordered_collection, unordered),426: (upgrade_required, upgrade),428: (precondition_required, precondition),429: (too_many_requests, too_many),431: (header_fields_too_large, fields_too_large),444: (no_response, none),449: (retry_with, retry),450: (blocked_by_windows_parental_controls, parental_controls),451: (unavailable_for_legal_reasons, legal_reasons),499: (client_closed_request,), Server Error.500: (internal_server_error, server_error, /o\, ✗),501: (not_implemented,),502: (bad_gateway,),503: (service_unavailable, unavailable),504: (gateway_timeout,),505: (http_version_not_supported, http_version),506: (variant_also_negotiates,),507: (insufficient_storage,),509: (bandwidth_limit_exceeded, bandwidth),510: (not_extended,),511: (network_authentication_required, network_auth, network_authentication), 通过下面例子测试不过通常还是通过状态码判断更方便 import requestsresponse requests.get(http://www.baidu.com)
if response.status_code requests.codes.ok:print(访问成功) requests高级用法 文件上传 实现方法和其他参数类似也是构造一个字典然后通过files参数传递 import requests
files {files:open(git.jpeg,rb)}
response requests.post(http://httpbin.org/post,filesfiles)
print(response.text) 结果如下 获取cookie import requestsresponse requests.get(http://www.baidu.com)
print(response.cookies)for key,value in response.cookies.items():print(keyvalue) 会话维持 cookie的一个作用就是可以用于模拟登陆做会话维持 import requests
s requests.Session()
s.get(http://httpbin.org/cookies/set/number/123456)
response s.get(http://httpbin.org/cookies)
print(response.text) 这是正确的写法而下面的写法则是错误的 import requestsrequests.get(http://httpbin.org/cookies/set/number/123456)
response requests.get(http://httpbin.org/cookies)
print(response.text) 因为这种方式是两次requests请求之间是独立的而第一次则是通过创建一个session对象两次请求都通过这个对象访问 证书验证 现在的很多网站都是https的方式访问所以这个时候就涉及到证书的问题 import requestsresponse requests.get(https:/www.12306.cn)
print(response.status_code) 默认的12306网站的证书是不合法的这样就会提示如下错误 为了避免这种情况的发生可以通过verifyFalse但是这样是可以访问到页面但是会提示InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning) 解决方法为 import requests
from requests.packages import urllib3
urllib3.disable_warnings()
response requests.get(https://www.12306.cn,verifyFalse)
print(response.status_code) 这样就不会提示警告信息当然也可以通过cert参数放入证书路径 代理设置 import requestsproxies {http:http://127.0.0.1:9999,https:http://127.0.0.1:8888
}
response requests.get(https://www.baidu.com,proxiesproxies)
print(response.text) 如果代理需要设置账户名和密码,只需要将字典更改为如下proxies {http:http://user:password127.0.0.1:9999}如果你的代理是通过sokces这种方式则需要pip install requests[socks]proxies {http:socks5://127.0.0.1:9999,https:sockes5://127.0.0.1:8888} 超时设置 通过timeout参数可以设置超时的时间 认证设置 如果碰到需要认证的网站可以通过requests.auth模块实现 import requestsfrom requests.auth import HTTPBasicAuthresponse requests.get(http://120.27.34.24:9001/,authHTTPBasicAuth(user,123))
print(response.status_code) 当然这里还有一种方式 import requestsresponse requests.get(http://120.27.34.24:9001/,auth(user,123))
print(response.status_code) 异常处理 关于reqeusts的异常在这里可以看到详细内容http://www.python-requests.org/en/master/api/#exceptions所有的异常都是在requests.excepitons中 从源码我们可以看出RequestException继承IOError,HTTPErrorConnectionError,Timeout继承RequestionExceptionProxyErrorSSLError继承ConnectionErrorReadTimeout继承Timeout异常这里列举了一些常用的异常继承关系详细的可以看http://cn.python-requests.org/zh_CN/latest/_modules/requests/exceptions.html#RequestException 通过下面的例子进行简单的演示 import requestsfrom requests.exceptions import ReadTimeout,ConnectionError,RequestExceptiontry:response requests.get(http://httpbin.org/get,timout0.1)print(response.status_code)
except ReadTimeout:print(timeout)
except ConnectionError:print(connection Error)
except RequestException:print(error) 其实最后测试可以发现首先被捕捉的异常是timeout,当把网络断掉的haul就会捕捉到ConnectionError如果前面异常都没有捕捉到最后也可以通过RequestExctption捕捉到 转载于:https://www.cnblogs.com/shuai1991/p/10814937.html