requests is an HTTP library under the Apache2 licensed license.
Written in python.
More concise than urllib2 module.
Request supports HTTP connection retention and connection pooling, supports the use of cookies to maintain sessions, supports file uploads, supports automatic response content encoding, and supports internationalized URL and POST data automatic encoding.
A high degree of encapsulation is carried out on the basis of python's built-in modules, so that when python makes network requests, it becomes humane. Using Requests can easily complete any operation that the browser can have.
Modern, international and friendly.
requests will automatically implement persistent connection keep-alive
1 ) Import module
import requests
2 ) Conciseness of sending request
Sample code: Get a web page (personal github)
import requests
r = requests.get('https://github.com/Ranxf') #The most basic get request without parameters
r1 = requests.get(url='http://dict.baidu.com/s', params={'wd':'python'}) #Get request with parameters
We can use the following methods in this way
1 requests.get(‘https://github.com/timeline.json’) #GET request
2 requests.post(“http://httpbin.org/post”) #POST request
3 requests.put(“http://httpbin.org/put”) #PUT request
4 requests.delete(“http://httpbin.org/delete”) #DELETE request
5 requests.head(“http://httpbin.org/get”) #HEAD request
6 requests.options(“http://httpbin.org/get” ) #OPTIONS request
3 ) Pass parameters for url
>>> url_params ={'key':'value'} #The dictionary passes parameters, if the value is None, the key will not be added to the url
>>> r = requests.get('your url',params = url_params)>>>print(r.url)
your url?key=value
4 ) Content of response
r.encoding #Get the current encoding
r.encoding ='utf-8' #Set encoding
r.text #Parse the returned content with encoding. The response body in string mode will be automatically decoded according to the character encoding of the response header.
r.content #Return in byte form (binary). The response body in byte format will automatically decode gzip and deflate compression for you.
r.headers #The server response header is stored as a dictionary object, but this dictionary is special. The dictionary key is not case sensitive. If the key does not exist, it returns None
r.status_code #Response status code
r.raw #Return the original response body, which is the response object of urllib, using r.raw.read()
r.ok #View r.The boolean value of ok can know whether the login is successful
#* Special method*#
r.json() #Built-in JSON decoder in Requests, returned in json form,The content returned by the premise must be in json format, otherwise an exception will be thrown if parsing errors
r.raise_for_status() #Failed request(Non-200 response)Throw an exception
Post json request:
1 import requests
2 import json
34 r = requests.post('https://api.github.com/some/endpoint', data=json.dumps({'some':'data'}))5print(r.json())
5 ) Custom header and cookie information
header ={'user-agent':'my-app/0.0.1''}
cookie ={'key':'value'}
r = requests.get/post('your url',headers=header,cookies=cookie)
data ={'some':'data'}
headers ={'content-type':'application/json','User-Agent':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0'}
r = requests.post('https://api.github.com/some/endpoint', data=data, headers=headers)print(r.text)
6 ) Response status code
After using the requests method, a response object will be returned, which stores the content of the server response, such as the r.text, r.status_code mentioned in the above example...
Example of getting the response body in text mode: When you access r.text, the text encoding of the response will be used for decoding, and you can modify its encoding to let r.text use a custom encoding for decoding.
1 r = requests.get('http://www.itwhy.org')2print(r.text,'\n{}\n'.format('*'*79), r.encoding)3 r.encoding ='GBK'4print(r.text,'\n{}\n'.format('*'*79), r.encoding)
Sample code:
1 import requests
23 r = requests.get('https://github.com/Ranxf') #The most basic get request without parameters
4 print(r.status_code) #Get return status
5 r1 = requests.get(url='http://dict.baidu.com/s', params={'wd':'python'}) #Get request with parameters
6 print(r1.url)7print(r1.text) #Print the decoded return data
operation result:
/usr/bin/python3.5/home/rxf/python3_1000/1000/python3_server/python3_requests/demo1.py
200
http://dict.baidu.com/s?wd=python
…………
Process finished with exit code 0
r.status_code #If it is not 200, you can use r.raise_for_status()Throw an exception
7 )response
r.headers #Return dictionary type,Header information
r.requests.headers #Return the header information sent to the server
r.cookies #Return cookie
r.history #Return redirect information,Of course you can add allow to the request_redirects =false prevents redirection
8 )time out
r = requests.get('url',timeout=1) #Set timeout in seconds, only valid for connection
9) Session object, able to maintain certain parameters across requests
s = requests.Session()
s.auth =('auth','passwd')
s.headers ={'key':'value'}
r = s.get('url')
r1 = s.get('url1')
10 )proxy
proxies ={'http':'ip1','https':'ip2'}
requests.get('url',proxies=proxies)
Summary:
# HTTP request type
# get type
r = requests.get('https://github.com/timeline.json')
# post type
r = requests.post("http://m.ctrip.com/post")
# put type
r = requests.put("http://m.ctrip.com/put")
# delete type
r = requests.delete("http://m.ctrip.com/delete")
# head type
r = requests.head("http://m.ctrip.com/head")
# options type
r = requests.options("http://m.ctrip.com/get")
# Get response content
print(r.content) #Display in bytes, Chinese as characters
print(r.text) #Display in text
# URL passing parameters
payload ={'keyword':'Hong Kong','salecityid':'2'}
r = requests.get("http://m.ctrip.com/webapp/tourvisa/visa_list", params=payload)
print(r.url) #Example is http://m.ctrip.com/webapp/tourvisa/visa_list?salecityid=2&keyword=Hong Kong
# Obtain/Modify web page encoding
r = requests.get('https://github.com/timeline.json')
print (r.encoding)
# json processing
r = requests.get('https://github.com/timeline.json')
print(r.json()) #Need to import json first
# Custom request header
url ='http://m.ctrip.com'
headers ={'User-Agent':'Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 4 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.166 Mobile Safari/535.19'}
r = requests.post(url, headers=headers)
print (r.request.headers)
# Complex post request
url ='http://m.ctrip.com'
payload ={'some':'data'}
r = requests.post(url, data=json.dumps(payload)) #If the payload passed is a string instead of a dict, you need to call the dumps method to format it first
# post multi-part encoded file
url ='http://m.ctrip.com'
files ={'file':open('report.xls','rb')}
r = requests.post(url, files=files)
# Response status code
r = requests.get('http://m.ctrip.com')print(r.status_code)
# Response header
r = requests.get('http://m.ctrip.com')print(r.headers)print(r.headers['Content-Type'])print(r.headers.get('content-type')) #Two ways to access part of the response header
# Cookies
url ='http://example.com/some/cookie/setting/url'
r = requests.get(url)
r.cookies['example_cookie_name'] #Read cookies
url ='http://m.ctrip.com/cookies'
cookies =dict(cookies_are='working')
r = requests.get(url, cookies=cookies) #Send cookies
# Set timeout
r = requests.get('http://m.ctrip.com', timeout=0.001)
# Set access proxy
proxies ={"http":"http://10.10.1.10:3128","https":"http://10.10.1.100:4444",}
r = requests.get('http://m.ctrip.com', proxies=proxies)
# If the agent requires a username and password, it needs to be like this:
proxies ={"http":"http://user:[email protected]:3128/",}
# HTTP request type
# get type
r = requests.get('https://github.com/timeline.json')
# post type
r = requests.post("http://m.ctrip.com/post")
# put type
r = requests.put("http://m.ctrip.com/put")
# delete type
r = requests.delete("http://m.ctrip.com/delete")
# head type
r = requests.head("http://m.ctrip.com/head")
# options type
r = requests.options("http://m.ctrip.com/get")
# Get response content
print(r.content) #Display in bytes, Chinese as characters
print(r.text) #Display in text
# URL passing parameters
payload ={'keyword':'Hong Kong','salecityid':'2'}
r = requests.get("http://m.ctrip.com/webapp/tourvisa/visa_list", params=payload)
print(r.url) #Example is http://m.ctrip.com/webapp/tourvisa/visa_list?salecityid=2&keyword=Hong Kong
# Obtain/Modify web page encoding
r = requests.get('https://github.com/timeline.json')
print (r.encoding)
# json processing
r = requests.get('https://github.com/timeline.json')
print(r.json()) #Need to import json first
# Custom request header
url ='http://m.ctrip.com'
headers ={'User-Agent':'Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 4 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.166 Mobile Safari/535.19'}
r = requests.post(url, headers=headers)
print (r.request.headers)
# Complex post request
url ='http://m.ctrip.com'
payload ={'some':'data'}
r = requests.post(url, data=json.dumps(payload)) #If the payload passed is a string instead of a dict, you need to call the dumps method to format it first
# post multi-part encoded file
url ='http://m.ctrip.com'
files ={'file':open('report.xls','rb')}
r = requests.post(url, files=files)
# Response status code
r = requests.get('http://m.ctrip.com')print(r.status_code)
# Response header
r = requests.get('http://m.ctrip.com')print(r.headers)print(r.headers['Content-Type'])print(r.headers.get('content-type')) #Two ways to access part of the response header
# Cookies
url ='http://example.com/some/cookie/setting/url'
r = requests.get(url)
r.cookies['example_cookie_name'] #Read cookies
url ='http://m.ctrip.com/cookies'
cookies =dict(cookies_are='working')
r = requests.get(url, cookies=cookies) #Send cookies
# Set timeout
r = requests.get('http://m.ctrip.com', timeout=0.001)
# Set access proxy
proxies ={"http":"http://10.10.1.10:3128","https":"http://10.10.1.100:4444",}
r = requests.get('http://m.ctrip.com', proxies=proxies)
# If the agent requires a username and password, it needs to be like this:
proxies ={"http":"http://user:[email protected]:3128/",}
1 # 1、 No parameter example
23 import requests
45 ret = requests.get('https://github.com/timeline.json')67print(ret.url)8print(ret.text)9101112 #2. There are parameter examples
1314 import requests
1516 payload ={'key1':'value1','key2':'value2'}17 ret = requests.get("http://httpbin.org/get", params=payload)1819print(ret.url)20print(ret.text)
# 1、 Basic POST example
import requests
payload ={'key1':'value1','key2':'value2'}
ret = requests.post("http://httpbin.org/post", data=payload)print(ret.text)
# 2、 Send request header and data instance
import requests
import json
url ='https://api.github.com/some/endpoint'
payload ={'some':'data'}
headers ={'content-type':'application/json'}
ret = requests.post(url, data=json.dumps(payload), headers=headers)print(ret.text)print(ret.cookies)
def request(method, url,**kwargs):"""Constructs and sends a :class:`Request <Request>`.:param method: method for the new:class:`Request` object.:param url: URL for the new:class:`Request` object.:param params:(optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.:param data:(optional) Dictionary, bytes, or file-like object to send in the body of the :class:`Request`.:param json:(optional) json data to send in the body of the :class:`Request`.:param headers:(optional) Dictionary of HTTP Headers to send with the :class:`Request`.:param cookies:(optional) Dict or CookieJar object to send with the :class:`Request`.:param files:(optional) Dictionary of``'name': file-like-objects``(or ``{'name': file-tuple}``)for multipart encoding upload.``file-tuple`` can be a 2-tuple ``('filename', fileobj)``,3-tuple ``('filename', fileobj,'content_type')``
or a 4-tuple ``('filename', fileobj,'content_type', custom_headers)``, where ``'content-type'`` is a string
defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
to add for the file.:param auth:(optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.:param timeout:(optional) How long to wait for the server to send data
before giving up,as a float, or a :ref:`(connect timeout, read
timeout) <timeouts>` tuple.:type timeout: float or tuple
: param allow_redirects:(optional) Boolean. Set to True if POST/PUT/DELETE redirect following is allowed.:type allow_redirects: bool
: param proxies:(optional) Dictionary mapping protocol to the URL of the proxy.:param verify:(optional) whether the SSL cert will be verified. A CA_BUNDLE path can also be provided. Defaults to ``True``.:param stream:(optional)if``False``, the response content will be immediately downloaded.:param cert:(optional)if String, path to ssl client cert file(.pem). If Tuple,('cert','key') pair.:return::class:`Response <Response>` object
: rtype: requests.Response
Usage::>>>import requests
>>> req = requests.request('GET','http://httpbin.org/get')<Response [200]>"""
parameter list
Request parameter
def param_method_url():
# requests.request(method='get', url='http://127.0.0.1:8000/test/')
# requests.request(method='post', url='http://127.0.0.1:8000/test/')
pass
def param_param():
# - Can be a dictionary
# - Can be a string
# - Can be bytes (within ascii encoding)
# requests.request(method='get',
# url='http://127.0.0.1:8000/test/',
# params={'k1':'v1','k2':'Utility bill'})
# requests.request(method='get',
# url='http://127.0.0.1:8000/test/',
# params="k1=v1&k2=Utility bill&k3=v3&k3=vv3")
# requests.request(method='get',
# url='http://127.0.0.1:8000/test/',
# params=bytes("k1=v1&k2=k2&k3=v3&k3=vv3", encoding='utf8'))
# error
# requests.request(method='get',
# url='http://127.0.0.1:8000/test/',
# params=bytes("k1=v1&k2=Utility bill&k3=v3&k3=vv3", encoding='utf8'))
pass
def param_data():
# Can be a dictionary
# Can be a string
# Can be bytes
# Can be a file object
# requests.request(method='POST',
# url='http://127.0.0.1:8000/test/',
# data={'k1':'v1','k2':'Utility bill'})
# requests.request(method='POST',
# url='http://127.0.0.1:8000/test/',
# data="k1=v1; k2=v2; k3=v3; k3=v4"
# )
# requests.request(method='POST',
# url='http://127.0.0.1:8000/test/',
# data="k1=v1;k2=v2;k3=v3;k3=v4",
# headers={'Content-Type':'application/x-www-form-urlencoded'}
# )
# requests.request(method='POST',
# url='http://127.0.0.1:8000/test/',
# data=open('data_file.py', mode='r', encoding='utf-8'), #The content of the file is: k1=v1;k2=v2;k3=v3;k3=v4
# headers={'Content-Type':'application/x-www-form-urlencoded'}
# )
pass
def param_json():
# Serialize the corresponding data in json into a string, json.dumps(...)
# Then sent to the body of the server, and Content-Type is{'Content-Type':'application/json'}
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
json={'k1':'v1','k2':'Utility bill'})
def param_headers():
# Send the request header to the server
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
json={'k1':'v1','k2':'Utility bill'},
headers={'Content-Type':'application/x-www-form-urlencoded'})
def param_cookies():
# Send cookies to the server
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
data={'k1':'v1','k2':'v2'},
cookies={'cook1':'value1'},)
# CookieJar can also be used (the dictionary form is encapsulated on this basis)
from http.cookiejar import CookieJar
from http.cookiejar import Cookie
obj =CookieJar()
obj.set_cookie(Cookie(version=0, name='c1', value='v1', port=None, domain='', path='/', secure=False, expires=None,
discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False,
port_specified=False, domain_specified=False, domain_initial_dot=False, path_specified=False))
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
data={'k1':'v1','k2':'v2'},
cookies=obj)
def param_files():
# Send File
# file_dict ={
# ' f1':open('readme','rb')
# }
# requests.request(method='POST',
# url='http://127.0.0.1:8000/test/',
# files=file_dict)
# Send file, customize file name
# file_dict ={
# ' f1':('test.txt',open('readme','rb'))
# }
# requests.request(method='POST',
# url='http://127.0.0.1:8000/test/',
# files=file_dict)
# Send file, customize file name
# file_dict ={
# ' f1':('test.txt',"hahsfaksfa9kasdjflaksdjf")
# }
# requests.request(method='POST',
# url='http://127.0.0.1:8000/test/',
# files=file_dict)
# Send file, customize file name
# file_dict ={
# ' f1':('test.txt',"hahsfaksfa9kasdjflaksdjf",'application/text',{'k1':'0'})
# }
# requests.request(method='POST',
# url='http://127.0.0.1:8000/test/',
# files=file_dict)
pass
def param_auth():from requests.auth import HTTPBasicAuth, HTTPDigestAuth
ret = requests.get('https://api.github.com/user', auth=HTTPBasicAuth('wupeiqi','sdfasdfasdf'))print(ret.text)
# ret = requests.get('http://192.168.1.1',
# auth=HTTPBasicAuth('admin','admin'))
# ret.encoding ='gbk'
# print(ret.text)
# ret = requests.get('http://httpbin.org/digest-auth/auth/user/pass', auth=HTTPDigestAuth('user','pass'))
# print(ret)
#
def param_timeout():
# ret = requests.get('http://google.com/', timeout=1)
# print(ret)
# ret = requests.get('http://google.com/', timeout=(5,1))
# print(ret)
pass
def param_allow_redirects():
ret = requests.get('http://127.0.0.1:8000/test/', allow_redirects=False)print(ret.text)
def param_proxies():
# proxies ={
# " http":"61.172.249.96:80",
# " https":"http://61.185.219.126:3128",
# }
# proxies ={'http://10.20.1.128':'http://10.10.1.10:5323'}
# ret = requests.get("http://www.proxy360.cn/Proxy", proxies=proxies)
# print(ret.headers)
# from requests.auth import HTTPProxyAuth
#
# proxyDict ={
# ' http':'77.75.105.165',
# ' https':'77.75.105.165'
# }
# auth =HTTPProxyAuth('username','mypassword')
#
# r = requests.get("http://www.google.com", proxies=proxyDict, auth=auth)
# print(r.text)
pass
def param_stream():
ret = requests.get('http://127.0.0.1:8000/test/', stream=True)print(ret.content)
ret.close()
# from contextlib import closing
# withclosing(requests.get('http://httpbin.org/get', stream=True))as r:
# # The response is processed here.
# for i in r.iter_content():
# print(i)
def requests_session():import requests
session = requests.Session()
### 1、 First log in to any page to get the cookie
i1 = session.get(url="http://dig.chouti.com/help/service")
### 2、 The user logs in, carries the last cookie, and the background authorizes the gpsd in the cookie
i2 = session.post(
url="http://dig.chouti.com/login",
data={'phone':"8615131255089",'password':"xxxxxx",'oneMonth':""})
i3 = session.post(
url="http://dig.chouti.com/link/vote?linksId=8589623",)print(i3.text)
#! /usr/bin/python3
import requests
import json
classurl_request():
def __init__(self):''' init '''if __name__ =='__main__':
heard ={'Content-Type':'application/json'}
payload ={'CountryName':'China','ProvinceName':'Sichuan Province','L1CityName':'chengdu','L2CityName':'yibing','TownName':'','Longitude':'107.33393','Latitude':'33.157131','Language':'CN'}
r = requests.post("http://www.xxxxxx.com/CityLocation/json/LBSLocateCity", heards=heard, data=payload)
data = r.json()if r.status_code!=200:print('LBSLocateCity API Error'+str(r.status_code))print(data['CityEntities'][0]['CityID']) #Print the value of a key in the returned json
print(data['ResponseStatus']['Ack'])print(json.dump(data, indent=4, sort_keys=True, ensure_ascii=False)) #Tree print json, ensure_ascii must be set to False otherwise Chinese will be displayed as unicode
#! /usr/bin/python3
import requests
classurl_request():
def __init__(self):"""init"""if __name__ =='__main__':
heards ={'Content-type':'text/xml'}
XML ='<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Body><Request xmlns="http://tempuri.org/"><jme><JobClassFullName>WeChatJSTicket.JobWS.Job.JobRefreshTicket,WeChatJSTicket.JobWS</JobClassFullName><Action>RUN</Action><Param>1</Param><HostIP>127.0.0.1</HostIP><JobInfo>1</JobInfo><NeedParallel>false</NeedParallel></jme></Request></soap:Body></soap:Envelope>'
url ='http://jobws.push.mobile.xxxxxxxx.com/RefreshWeiXInTokenJob/RefreshService.asmx'
r = requests.post(url=url, heards=heards, data=XML)
data = r.text
print(data)
import requests
URL ='http://ip.taobao.com/service/getIpInfo.php' #Taobao IP address library API
try:
r = requests.get(URL, params={'ip':'8.8.8.8'}, timeout=1)
r.raise_for_status() #If the response status code is not 200, take the initiative to throw an exception
except requests.RequestException as e:print(e)else:
result = r.json()print(type(result), result, sep='\n')
Using the request module, you can also upload files, and the file type will be processed automatically:
import requests
url ='http://127.0.0.1:8080/upload'
files ={'file':open('/home/rxf/test.jpg','rb')}
# files ={'file':('report.jpg',open('/home/lyb/sjzl.mpg','rb'))} #Explicitly set the file name
r = requests.post(url, files=files)print(r.text)
Request is more convenient, you can upload the string as a file:
import requests
url ='http://127.0.0.1:8080/upload'
files ={'file':('test.txt', b'Hello Requests.')} #The file name must be set explicitly
r = requests.post(url, files=files)print(r.text)
Basic authentication (HTTP Basic Auth)
import requests
from requests.auth import HTTPBasicAuth
r = requests.get('https://httpbin.org/hidden-basic-auth/user/passwd', auth=HTTPBasicAuth('user','passwd'))
# r = requests.get('https://httpbin.org/hidden-basic-auth/user/passwd', auth=('user','passwd')) #Shorthand
print(r.json())
Another very popular form of HTTP authentication is digest authentication, and Requests supports it out of the box:
requests.get(URL, auth=HTTPDigestAuth('user','pass')
If a response contains some cookies, you can quickly access them:
import requests
r = requests.get('http://www.google.com.hk/')print(r.cookies['NID'])print(tuple(r.cookies))
To send your cookies to the server, you can use the cookies parameter:
import requests
url ='http://httpbin.org/cookies'
cookies ={'testCookies_1':'Hello_Python3','testCookies_2':'Hello_Requests'}
# In Cookie Version 0, spaces, square brackets, parentheses, equal signs, commas, double quotes, slashes, question marks,@, Colon, semicolon and other special symbols cannot be used as the content of the cookie.
r = requests.get(url, cookies=cookies)print(r.json())
Session objects allow you to keep certain parameters across requests. The most convenient way is to keep cookies between all requests issued by the same Session instance, and these are handled automatically, which is very convenient.
Here is a real example, the following is the fast disk sign-in script:
import requests
headers ={'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Encoding':'gzip, deflate, compress','Accept-Language':'en-us;q=0.5,en;q=0.3','Cache-Control':'max-age=0','Connection':'keep-alive','User-Agent':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0'}
s = requests.Session()
s.headers.update(headers)
# s.auth =('superuser','123')
s.get('https://www.kuaipan.cn/account_login.htm')
_ URL ='http://www.kuaipan.cn/index.php'
s.post(_URL, params={'ac':'account','op':'login'},
data={'username':'****@foxmail.com','userpwd':'********','isajax':'yes'})
r = s.get(_URL, params={'ac':'zone','op':'taskdetail'})print(r.json())
s.get(_URL, params={'ac':'common','op':'usersign'})
This is a basic file saving operation, but there are a few noteworthy issues:
Install the requests package, enter pip install requests on the command line to install it automatically. Many people recommend using requests, and the built-in urllib.request can also grab the source code of web pages
The encoding parameter of the open method is set to utf-8, otherwise the saved file will appear garbled.
If you output the captured content directly in cmd, various encoding errors will be prompted, so save it to a file for viewing.
The with open method is a better way to write, it can release resources after the operation is completed automatically
#! /urs/bin/python3
import requests
''' The requests module grabs the source code of the webpage and saves it to a file example'''
html = requests.get("http://www.baidu.com")withopen('test.txt','w', encoding='utf-8')as f:
f.write(html.text)'''Example of reading a txt file, reading one line at a time, and saving to another txt file'''
ff =open('testt.txt','w', encoding='utf-8')withopen('test.txt', encoding="utf-8")as f:for line in f:
ff.write(line)
ff.close()
Because printing the data read one line at a time in the command line, there will be coding errors in Chinese, so read one line at a time and save it to another file to test whether the reading is normal. (Pay attention to the encoding method when opening)
#! /usr/bin/env python
# - *- coding:utf-8-*-import requests
# ############## method one##############
"""
# ## 1、 First log in to any page to get the cookie
i1 = requests.get(url="http://dig.chouti.com/help/service")
i1_cookies = i1.cookies.get_dict()
# ## 2、 The user logs in, carries the last cookie, and the background authorizes the gpsd in the cookie
i2 = requests.post(
url="http://dig.chouti.com/login",
data={'phone':"8615131255089",'password':"xxooxxoo",'oneMonth':""},
cookies=i1_cookies
)
# ## 3、 Like (just need to bring the authorized gpsd)
gpsd = i1_cookies['gpsd']
i3 = requests.post(
url="http://dig.chouti.com/link/vote?linksId=8589523",
cookies={'gpsd': gpsd})print(i3.text)"""
# ############## Way two##############
"""
import requests
session = requests.Session()
i1 = session.get(url="http://dig.chouti.com/help/service")
i2 = session.post(
url="http://dig.chouti.com/login",
data={'phone':"8615131255089",'password':"xxooxxoo",'oneMonth':""})
i3 = session.post(
url="http://dig.chouti.com/link/vote?linksId=8589523")print(i3.text)"""
#! /usr/bin/env python
# - *- coding:utf-8-*-import requests
from bs4 import BeautifulSoup
# ############## method one##############
#
# # 1. Visit the landing page to get authenticity_token
# i1 = requests.get('https://github.com/login')
# soup1 =BeautifulSoup(i1.text, features='lxml')
# tag = soup1.find(name='input', attrs={'name':'authenticity_token'})
# authenticity_token = tag.get('value')
# c1 = i1.cookies.get_dict()
# i1.close()
#
# # 1. Carry authenticity_token, username and password and other information, send user verification
# form_data ={
# " authenticity_token": authenticity_token,
# " utf8":"",
# " commit":"Sign in",
# " login":"[email protected]",
# ' password':'xxoo'
# }
#
# i2 = requests.post('https://github.com/session', data=form_data, cookies=c1)
# c2 = i2.cookies.get_dict()
# c1.update(c2)
# i3 = requests.get('https://github.com/settings/repositories', cookies=c1)
#
# soup3 =BeautifulSoup(i3.text, features='lxml')
# list_group = soup3.find(name='div', class_='listgroup')
#
# from bs4.element import Tag
#
# for child in list_group.children:
# ifisinstance(child, Tag):
# project_tag = child.find(name='a', class_='mr-1')
# size_tag = child.find(name='small')
# temp ="project:%s(%s); project路径:%s"%(project_tag.get('href'), size_tag.string, project_tag.string,)
# print(temp)
# ############## Way two##############
# session = requests.Session()
# # 1. Visit the landing page to get authenticity_token
# i1 = session.get('https://github.com/login')
# soup1 =BeautifulSoup(i1.text, features='lxml')
# tag = soup1.find(name='input', attrs={'name':'authenticity_token'})
# authenticity_token = tag.get('value')
# c1 = i1.cookies.get_dict()
# i1.close()
#
# # 1. Carry authenticity_token, username and password and other information, send user verification
# form_data ={
# " authenticity_token": authenticity_token,
# " utf8":"",
# " commit":"Sign in",
# " login":"[email protected]",
# ' password':'xxoo'
# }
#
# i2 = session.post('https://github.com/session', data=form_data)
# c2 = i2.cookies.get_dict()
# c1.update(c2)
# i3 = session.get('https://github.com/settings/repositories')
#
# soup3 =BeautifulSoup(i3.text, features='lxml')
# list_group = soup3.find(name='div', class_='listgroup')
#
# from bs4.element import Tag
#
# for child in list_group.children:
# ifisinstance(child, Tag):
# project_tag = child.find(name='a', class_='mr-1')
# size_tag = child.find(name='small')
# temp ="project:%s(%s); project路径:%s"%(project_tag.get('href'), size_tag.string, project_tag.string,)
# print(temp)
#! /usr/bin/env python
# - *- coding:utf-8-*-import time
import requests
from bs4 import BeautifulSoup
session = requests.Session()
i1 = session.get(
url='https://www.zhihu.com/#signin',
headers={'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',})
soup1 =BeautifulSoup(i1.text,'lxml')
xsrf_tag = soup1.find(name='input', attrs={'name':'_xsrf'})
xsrf = xsrf_tag.get('value')
current_time = time.time()
i2 = session.get(
url='https://www.zhihu.com/captcha.gif',
params={'r': current_time,'type':'login'},
headers={'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',})withopen('zhihu.gif','wb')as f:
f.write(i2.content)
captcha =input('Please open zhihu.gif file, view and enter the verification code:')
form_data ={"_xsrf": xsrf,'password':'xxooxxoo',"captcha":'captcha','email':'[email protected]'}
i3 = session.post(
url='https://www.zhihu.com/login/email',
data=form_data,
headers={'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',})
i4 = session.get(
url='https://www.zhihu.com/settings/profile',
headers={'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',})
soup4 =BeautifulSoup(i4.text,'lxml')
tag = soup4.find(id='rename-section')
nick_name = tag.find('span',class_='name').string
print(nick_name)
#! /usr/bin/env python
# - *- coding:utf-8-*-import re
import json
import base64
import rsa
import requests
def js_encrypt(text):
b64der ='MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCp0wHYbg/NOPO3nzMD3dndwS0MccuMeXCHgVlGOoYyFwLdS24Im2e7YyhB0wrUsyYf0/nhzCzBK8ZC9eCWqd0aHbdgOQT6CuFQBMjbyGYvlVYU2ZP7kG9Ft6YV6oc9ambuO7nPZh+bvXH0zDKfi02prknrScAKC0XhadTHT3Al0QIDAQAB'
der = base64.standard_b64decode(b64der)
pk = rsa.PublicKey.load_pkcs1_openssl_der(der)
v1 = rsa.encrypt(bytes(text,'utf8'), pk)
value = base64.encodebytes(v1).replace(b'\n', b'')
value = value.decode('utf8')return value
session = requests.Session()
i1 = session.get('https://passport.cnblogs.com/user/signin')
rep = re.compile("'VerificationToken': '(.*)'")
v = re.search(rep, i1.text)
verification_token = v.group(1)
form_data ={'input1':js_encrypt('wptawy'),'input2':js_encrypt('asdfasdf'),'remember': False
}
i2 = session.post(url='https://passport.cnblogs.com/user/signin',
data=json.dumps(form_data),
headers={'Content-Type':'application/json; charset=UTF-8','X-Requested-With':'XMLHttpRequest','VerificationToken': verification_token})
i3 = session.get(url='https://i.cnblogs.com/EditDiary.aspx')print(i3.text)
#! /usr/bin/env python
# - *- coding:utf-8-*-import requests
# Step 1: Visit the landing page,Get X_Anti_Forge_Token,X_Anti_Forge_Code
# 1、 Request url:https://passport.lagou.com/login/login.html
# 2、 Request method:GET
# 3、 Request header:
# User-agent
r1 = requests.get('https://passport.lagou.com/login/login.html',
headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',},)
X_Anti_Forge_Token = re.findall("X_Anti_Forge_Token = '(.*?)'", r1.text, re.S)[0]
X_Anti_Forge_Code = re.findall("X_Anti_Forge_Code = '(.*?)'", r1.text, re.S)[0]print(X_Anti_Forge_Token, X_Anti_Forge_Code)
# print(r1.cookies.get_dict())
# Step 2: Log in
# 1、 Request url:https://passport.lagou.com/login/login.json
# 2、 Request method:POST
# 3、 Request header:
# cookie
# User-agent
# Referer:https://passport.lagou.com/login/login.html
# X-Anit-Forge-Code:53165984
# X-Anit-Forge-Token:3b6a2f62-80f0-428b-8efb-ef72fc100d78
# X-Requested-With:XMLHttpRequest
# 4、 Request body:
# isValidate:true
# username:15131252215
# password:ab18d270d7126ea65915c50288c22c0d
# request_form_verifyCode:''
# submit:''
r2 = requests.post('https://passport.lagou.com/login/login.json',
headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36','Referer':'https://passport.lagou.com/login/login.html','X-Anit-Forge-Code': X_Anti_Forge_Code,'X-Anit-Forge-Token': X_Anti_Forge_Token,'X-Requested-With':'XMLHttpRequest'},
data={"isValidate": True,'username':'15131255089','password':'ab18d270d7126ea65915c50288c22c0d','request_form_verifyCode':'','submit':''},
cookies=r1.cookies.get_dict())print(r2.text)
reference:
http://cn.python-requests.org/zh_CN/latest/user/quickstart.html#id4
http://www.python-requests.org/en/master/
http://docs.python-requests.org/en/latest/user/quickstart/
https://www.cnblogs.com/tangdongchu/p/4229049.html#t0
http://www.cnblogs.com/wupeiqi/articles/6283017.html
Recommended Posts