Author: ZHU Xiao five
Source: Bump Data
Hello everyone, I am Brother J in the second group of the exchange group.
This time I want to combine the real estate business and use Python for data analysis for the city of Guangzhou, hoping to provide you with some analysis ideas.
Why should we analyze the real estate market? The real estate industry has distinct geographical characteristics. From the perspective of real estate companies, the choice of cities determines the success or failure of investment to a certain extent. Therefore, the market research and judgment of a city is very important. As early as a few years ago, when the same funds were allocated to Nanjing and Changsha, the difference in return on investment was huge.
So, how should we analyze the real estate market?
From the perspective of data analysis, I have summarized and sorted out my ideas. I think that the real estate market analysis of a city should include the urban economy, related policies, the land market and the real estate market**. Urban economy reflects the economic strength and potential of a city, which can be subdivided into the following indicators: Per capita GDP and GDP per unit area, per capita fiscal revenue and fiscal income per unit area, size of high-net-worth groups, net population inflow, third Industry share, industry complementarity, real estate investment dependence, city friendliness, etc.. The policies formulated by the government also have a huge impact on the real estate market. The more relevant policies include ** financial policy, population policy, land policy and house purchase policy**. Finally, there is the analysis of urban ** land market and real estate market **, which is also the core part of the entire analysis **.
Next, I will combine Python and take Guangzhou as an example to try to analyze the land market and real estate market in Guangzhou. The analysis of the urban economy and related policies will be described in a future article.
The land market includes the primary market and the secondary market. The primary market is the market for the transfer of land use rights, that is, the state uses its designated government departments to acquire urban state-owned land or rural collective land as state-owned land and then transfer it to The user's market and the land to be sold can be raw land or mature land that has been developed to achieve "seven connections and one leveling". The secondary market is the retransaction after the transfer of land use rights. Land users will reach the required and tradable land use rights and enter the market for transactions in the circulation field. Due to space limitations, this article only starts data analysis from the primary land market.
Land market data is generally publicized in the local Public Resource Exchange Center, but it often happens that only the data of the current week or month is publicized. Therefore, we can go to a professional land website to obtain transaction data. For example, the soil drift net:
This website has a simple structure, simple url page turning structure, and then use xpath to parse the data. Due to space limitations, the crawler code will not be repeated, only the core code is provided.
def main():for page inrange(1,46): #Set the number of pages here
url ='https://www.tudinet.com/market-213-0-0-0/list-o1ctime-pg{}.html'.format(page)print(url)
headers ={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36',}
response = requests.request("GET", url, headers = headers)
# print(response.status_code)if response.status_code ==200:
re = response.content.decode('utf-8')print("Extracting"+str(page)+"page")
time.sleep(random.uniform(1,2))print("-"*80)
# print(re)
parse = etree.HTML(re) #Parse webpage
items = parse.xpath('.//div[@class="land-l-cont"]/dl')parse_page(items)iflen(items)<10:print('Acquired complete')breakif __name__ =='__main__':
time.sleep(random.uniform(1,2))main()
Run the crawler code to extract the land data of Guangzhou 1238. The following is some data after simple cleaning:
2011 From the year 2020 to 2020, the unsold and unsold land in Guangzhou will account for half of the land, and only 49.71% of the land will be sold. The overall transaction rate is not high. The main reasons for the unsuccessful deal were that there were no intentional bidders and the bid did not reach the prescribed reserve price.
2011 From 2016 to 2016, Guangzhou had less land for bidding, auction and listing. In 2016, the planned construction area was only 773,000 square meters. After 2017, the transaction scale began to reach a climax. In 2018, the planned construction area of transactions was only 16.35 million square meters.
From the perspective of monthly land transactions, the local auction market in Guangzhou was relatively quiet in the first half of 2019, and it began to return to normal after the middle of the year. The local auction market entered a hot state at the end of 2019. 21 and 38 parcels of land were sold in November and December 19, respectively.
In the past ten years, Guangzhou's land transactions were mainly industrial land, other land and residential land, and industrial land accounted for 41.19%. This also reflects the reason why Guangzhou's industrial enterprises are developed.
From the perspective of transaction area, Nansha District and Panyu District have a certain amount of land transaction each year, while Yuexiu District and Tianhe District have less land transaction. Since 2020, the land market in Nansha District has been hot, and the transaction area is much higher than other areas in Guangzhou.
The real estate market analysis mainly includes the new house and second-hand house transaction market. Since the number of second-hand houses on the general real estate information publishing platform is much larger than that of new houses, this article uses Guangzhou second-hand house transaction data to analyze the real estate market.
In order to obtain more comprehensive and real data, this article uses Python to obtain the latest Guangzhou second-hand housing transaction data released by Fang Tianxia.
The crawler in Fang Tianxia is also relatively simple. The crawler logic is similar to that of finding a house in a shell. The only difficulty is that when it traverses one area, it jumps to the next area. The core code is given below:
def main():
# Zengcheng a080;Panyu a078; Nansha a084; Huadu a0639;Baiyun a076; Haizhu a074;Yuexiu a072; Liwan a071;Tianhe a073;Conghua a079; Huangpu a075
district_list =['a084','a078','a080','a0639','a076','a074','a072','a071','a073','a079','a075'] #area
for district in district_list:for page inrange(1,101): #Set the number of pages here
url ='https://gz.esf.fang.com/chengjiao-{0}/i3{1}/'.format(district, page)print(url)
headers ={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36',}
response = requests.request("GET", url, headers = headers)if response.status_code ==200:
re = response.content.decode('utf-8')print("Extracting"+ district +'First'+str(page)+"page")
time.sleep(random.uniform(1,2))print("-"*80)
# print(re)
parse = etree.HTML(re) #Parse webpage
items = parse.xpath('.//div[@name="div_houselist"]/dl')parse_page(items)iflen(items)<30: #Jump after traversing the sub-region
print('Acquired complete')breakif __name__ =='__main__':
time.sleep(random.uniform(1,2))main()
After the code runs for a few minutes, the data of 22170 sets of Guangzhou second-hand houses are extracted. After simple cleaning, some of the data are shown as follows:
From the perspective of the volume and price trend of second-hand housing in Guangzhou in recent years, housing prices have been rising since 2015. In 2018, the average price of second-hand housing reached 35,000 yuan/㎡. House prices fell in 2019, but the number of second-hand housing transactions reached the peak in recent years, with 8,940 units sold throughout the year.
2020 From January to June 2009, the average price of second-hand housing in Guangzhou was basically the same as in 2019. In terms of transaction volume, 70 sets of second-hand houses were sold in February due to the epidemic. Since March, the epidemic has gradually been brought under control and the real estate market is improving. In June, 1337 sets of second-hand houses were sold.
From the perspective of housing price distribution, the areas with the highest average prices of second-hand houses from January to June 2020 are Yuexiu District and Tianhe District, with average prices of 46767.52 yuan/㎡ and 46433.89 yuan/㎡ respectively. Conghua District has the lowest house price, only 12190.67 yuan/㎡.
From the perspective of real estate transactions, the real estate with the largest number of second-hand housing transactions in Guangzhou from January to June 2020 is the Jinxiu Tianlun Garden in Zengcheng District, with a total of 78 transactions, with an average transaction price of 18565.40 yuan/㎡.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
%matplotlib inline
sns.set_style('white') #Set the graphic background style to white
df = pd.read_excel("D:\data\Real estate data analysis\Guangzhou second-hand housing.xlsx")
df = df[['room','hall','area(㎡)','Number of layers','Transaction price(yuan/㎡)']] #Select the desired column
df.rename(columns={'room':'room','hall':'hall','area(㎡)':'area','Number of layers':'floor','Transaction price(yuan/㎡)':'price'}, inplace=True)
fig,axes=plt.subplots(1,2,figsize=(12,5))
sns.regplot(x='room',y='price',data=df,color='r',marker='+',ax=axes[0])
sns.regplot(x='hall',y='price',data=df,color='g',marker='*',ax=axes[1])
By drawing the regression map of Guangzhou's second-hand housing, we found that the number of rooms and area of Guangzhou's second-hand housing are not related to housing prices. House floors and housing prices seem to have a strong positive correlation, but they are actually affected by three outliers, which are not correlated.
From the perspective of the Guangzhou land market, the land transaction rate in Guangzhou has been less than 50% in the past 10 years. The land market has picked up in recent years, especially the land market in Nansha and Panyu districts have stable transactions, which still have development potential in the future.
From the perspective of the real estate market, the price of second-hand housing in Guangzhou has not changed much since 2019, maintaining around 30,000 yuan/㎡. Under the epidemic, second-hand housing transactions were frustrated. Some real estate companies tried to exchange prices for more sales. In addition, the epidemic was gradually controlled, and second-hand housing transactions gradually resumed. Housing prices in the city center remain high, while Conghua and Zengcheng north of Guangzhou have lower housing prices and still have room for growth.
This data analysis is for learning and research purposes only, and the conclusions provided are for reference only;
The author has very little knowledge of the real estate industry, and the description may be imperfect, so please do not agree.
Recommended Posts