Destination URL: Baidu Map
The technology used is actually a developer tool provided by Baidu Maps. There is no violence. Generally speaking, it is relatively standard. The only thing that feels strange is that when I call the interface, my computer’s network is always disconnected. It seems to be rectified by the firewall in the office, but it shouldn't be. This is not an illegal operation.
The next step is to choose the verification method. At first, I chose the ip whitelist. Then I thought about it if I changed the ip, wouldn’t it work? So the sn verification method is adopted.
What you need to remember at this step is your developer ak and sk. I removed this secret key from the code part
4. Make a request
The request revolves around this URL
The next step the crawler needs to change is
http://api.map.baidu.com/geocoding/v3/?address=No. 10, Shangdi Tenth Street, Haidian District, Beijing&output=json&ak=Your ak&callback=showLocation //GET request
Note: The current interface document is V3.0, and new users of V2.0 and previous versions cannot use it since 2019.6.18. Old users can still use V2.0 and previous versions to request reverse geocoding services. To ensure user experience, we recommend that you migrate to V3.0 as soon as possible.
# - *- coding: utf-8-*-import urllib.request, urllib.parse, urllib.error
import json
import hashlib
import csv
# The output format is json
output ='json'
# Ak obtained by the developer platform
ak ='*****************'
# Sk obtained by the developer platform
sk='******************'
# Target location,Here can be imported externally
a=['Beijing','Capital medical university','Temple of Heaven Hospital','Tiantongyuan','Texas','Hangzhou','Shanghai','Beijing大学','Tianjin']
# Open save location
csv_obj =open('./python/Crawl geographic coordinates/data.csv','w',newline='', encoding="utf-8")
# Write title
csv.writer(csv_obj).writerow(["position","lng","lat"])
# Crawl
for i in a:
queryStr ='/geocoding/v3/?address={}&output=json&ak={}'.format(i,ak)
# Transcode, safe is the part that is not transcoded
encodedStr = urllib.parse.quote(queryStr, safe="/:=&?#+!$,;'@()*[]")
# Add sk
rawStr = encodedStr + sk
# Calculate sn value, used to call Baidu interface
# You can refer to the official document here
sn =(hashlib.md5(urllib.parse.quote_plus(rawStr).encode("utf8")).hexdigest())
# Splicing url
url = urllib.parse.quote("http://api.map.baidu.com"+ queryStr +"&sn="+ sn, safe="/:=&?#+!$,;'@()*[]")
# Target request
req = urllib.request.urlopen(url)
# Decode
res = req.read().decode()
# json to dictionary
temp = json.loads(res)
# Extract longitude and latitude
lng,lat=temp['result']['location']['lng'],temp['result']['location']['lat']
# Write to csv file
csv.writer(csv_obj).writerow([i,lng,lat])
# Close csv file
csv_obj.close()
There are no problems in the small-scale operation, and the subsequent calculations for large samples are prepared.
Baidu Dad is so nice. Otherwise, you have to lose one by one
Regarding the update, we can’t do daily updates now, we can only update at will
love&peace