python- crawl geographic coordinates

Crawling geographic coordinates#

Overview##

Destination URL: Baidu Map
The technology used is actually a developer tool provided by Baidu Maps. There is no violence. Generally speaking, it is relatively standard. The only thing that feels strange is that when I call the interface, my computer’s network is always disconnected. It seems to be rectified by the firewall in the office, but it shouldn't be. This is not an illegal operation.

Process##

  1. Apply for a Baidu account
    This part is slightly
  2. Apply to become a Baidu developer
    Slightly
  3. Get Baidu secret key
    Enter the developer console, select application management, create an application, choose any application name, type according to your needs, I choose the server type here, the following services look at it, the most important ones are geocoding and inverse geocoding

The next step is to choose the verification method. At first, I chose the ip whitelist. Then I thought about it if I changed the ip, wouldn’t it work? So the sn verification method is adopted.

What you need to remember at this step is your developer ak and sk. I removed this secret key from the code part
4. Make a request

The request revolves around this URL
The next step the crawler needs to change is

http://api.map.baidu.com/geocoding/v3/?address=No. 10, Shangdi Tenth Street, Haidian District, Beijing&output=json&ak=Your ak&callback=showLocation //GET request

Note: The current interface document is V3.0, and new users of V2.0 and previous versions cannot use it since 2019.6.18. Old users can still use V2.0 and previous versions to request reverse geocoding services. To ensure user experience, we recommend that you migrate to V3.0 as soon as possible.

Code section##

# - *- coding: utf-8-*-import urllib.request, urllib.parse, urllib.error
import json
import hashlib
import csv

# The output format is json
output ='json'
# Ak obtained by the developer platform
ak ='*****************'
# Sk obtained by the developer platform
sk='******************'
# Target location,Here can be imported externally
a=['Beijing','Capital medical university','Temple of Heaven Hospital','Tiantongyuan','Texas','Hangzhou','Shanghai','Beijing大学','Tianjin']
# Open save location
csv_obj =open('./python/Crawl geographic coordinates/data.csv','w',newline='', encoding="utf-8")
# Write title
csv.writer(csv_obj).writerow(["position","lng","lat"])
# Crawl
for i in a:
 queryStr ='/geocoding/v3/?address={}&output=json&ak={}'.format(i,ak)
 # Transcode, safe is the part that is not transcoded
 encodedStr = urllib.parse.quote(queryStr, safe="/:=&?#+!$,;'@()*[]")
 # Add sk
 rawStr = encodedStr + sk
 # Calculate sn value, used to call Baidu interface
 # You can refer to the official document here
 sn =(hashlib.md5(urllib.parse.quote_plus(rawStr).encode("utf8")).hexdigest())
 # Splicing url
 url = urllib.parse.quote("http://api.map.baidu.com"+ queryStr +"&sn="+ sn, safe="/:=&?#+!$,;'@()*[]")
 # Target request
 req = urllib.request.urlopen(url)
 # Decode
 res = req.read().decode()
 # json to dictionary
 temp = json.loads(res)
 # Extract longitude and latitude
 lng,lat=temp['result']['location']['lng'],temp['result']['location']['lat']
 # Write to csv file
 csv.writer(csv_obj).writerow([i,lng,lat])
# Close csv file
csv_obj.close()

result##

There are no problems in the small-scale operation, and the subsequent calculations for large samples are prepared.
Baidu Dad is so nice. Otherwise, you have to lose one by one

Concluding remarks##

Regarding the update, we can’t do daily updates now, we can only update at will

love&peace

Recommended Posts

python- crawl geographic coordinates
What can python crawlers crawl
How to use PYTHON to crawl news articles
Python- crawl all pictures of a station