python crawling weather#

Overview##

For the simple use of beautifulsoup, beautifulsoup is a third-party library used by beginners in crawlers, with simple operation and friendly code.
Include the code in the function, and implement repeated crawling by calling the function

Code##

import requests
from bs4 import BeautifulSoup
# pandas library, used to save data, and this is also the basic library
import pandas as pd
# retrieve data

# Get webpage source code
def get_data(url):
 resp=requests.get(url)
 # utf-8 not supported
 html=resp.content.decode('gbk')
 # Parse the original html file
 # html.parser is a built-in parser, which may be slower in analysis
 soup=BeautifulSoup(html,'html.parser')
 # By find_The all function finds all tr tags
 tr_list=soup.find_all('tr')
 # Name three lists for receiving data
 dates,conditions,temp=[],[],[]for data in tr_list[1:]:
  sub_data=data.text.split()
  dates.append(sub_data[0])
  conditions.append(''.join(sub_data[1:3]))
  temp.append(''.join(sub_data[3:6]))
 # Create an empty data frame for storing data
 _ data=pd.DataFrame()
 _ data['date']=dates
 _ data['the weather']=conditions
 _ data['temperature']=temp
 # Return data
 return _data

data1=get_data('http://www.tianqihoubao.com/lishi/beijing/month/201101.html')
data2=get_data('http://www.tianqihoubao.com/lishi/beijing/month/201102.html')
data3=get_data('http://www.tianqihoubao.com/lishi/beijing/month/201103.html')
# Connect the three data frames through concat and reset the index
df=pd.concat([data1,data2,data3]).reset_index(drop=True)

# Data preprocessing
# Pass the temperature/Sort
df['Maximum temperature']=df['temperature'].str.split('/',expand=True)[0]
df['lowest temperature']=df['temperature'].str.split('/',expand=True)[1]
# Use the map function to replace the ℃ in the temperature and convert it to a number to facilitate subsequent analysis
df['Maximum temperature']=df['Maximum temperature'].map(lambda x:int(x.replace('℃','')))
df['lowest temperature']=df['lowest temperature'].map(lambda x:int(x.replace('℃','')))

# save
df.to_csv('./python/Crawling weather data/beijing.csv',index=False,encoding='utf-8')

# Read when used
pd.read_csv('./python/Crawling weather data/beijing.csv')

Conclusion#

All projects about crawlers are practical projects. There is no theory. The idea is that the basic theory is easy to expire. It feels a bit laborious to eat textbooks. Many projects have changed. And some crawlers are based on python2, so this method may be the best. Way out.

Python crawler-beautifulsoup use

python crawling weather#

Overview##

Code##

Conclusion#