From location name to latitude and longitude¶
When combining different datasets based on their geolocation it is, no surprise here, vital to know the exact location of each record. But what if you run into data without this vital piece of information? Well, as long as you have something that describes a location, you might be able to retrieve this information with the API of Google Maps or OpenStreetMap. Sounds difficult? Don't be fooled, it is surprisingly easy (and free).
For this example we're using data we retrieved from https://www.dbb-wolf.de/Wolfsvorkommen/territorien/entwicklung-der-rudel. This is data about the location of wolves in Germany.
import pandas as pd
import time
import googlemaps
import geopy
# import the data
df = pd.read_csv("data/wolvendata_dbbw.csv", encoding="UTF-8", delimiter=";")[['StateName', 'AreaName']].drop_duplicates().reset_index(drop=True)
df['maps_search'] = df.AreaName + ', ' + df.StateName + ', Germany'
df.head()
StateName | AreaName | maps_search | |
---|---|---|---|
0 | Bayern | Allgäuer Alpen | Allgäuer Alpen, Bayern, Germany |
1 | Sachsen-Anhalt | Altengrabow | Altengrabow, Sachsen-Anhalt, Germany |
2 | Sachsen-Anhalt | Annaburger Heide | Annaburger Heide, Sachsen-Anhalt, Germany |
3 | Bayern | Altmühltal | Altmühltal, Bayern, Germany |
4 | Sachsen-Anhalt | Altmärkische Höhe | Altmärkische Höhe, Sachsen-Anhalt, Germany |
As you can see we have the name of an area and the state this area is in, but we don't know a latitude and longitude for each record. That is what we will try to retrieve using both Google Maps and OpenStreetMaps.
Google Maps¶
To be able to call the Google Maps API, we first need to create an API key, you can read more about how to do this on this page. Make sure you don't share this key with others.
# current key is disabled, to rerun the script, add new key
googlekey = '<google api key>'
Now we can initialize a Google Maps object using our API key and call the function gmaps.places for each record in our dataframe, giving it the value of our "maps_search" column, which contains the area, state and country we are looking for. The function gmaps.places gives us multiple search results of which we can pick the one we think is best. There is also a function called gmaps.geocode, but this will only return a single match, which might not be the one that we feel fits our needs best.
gmaps = googlemaps.Client(key=googlekey)
df['gmaps_places_result'] = df['maps_search'].apply(gmaps.places)
The final result looks like this:
df['gmaps_places_result'][0]
{'html_attributions': [], 'results': [{'formatted_address': 'Allgäu Alps, 83661 Lenggries, Germany', 'geometry': {'location': {'lat': 47.5124089, 'lng': 11.4057938}, 'viewport': {'northeast': {'lat': 47.5199448, 'lng': 11.4218012}, 'southwest': {'lat': 47.5048719, 'lng': 11.3897864}}}, 'icon': 'https://maps.gstatic.com/mapfiles/place_api/icons/v1/png_71/geocode-71.png', 'icon_background_color': '#7B9EB0', 'icon_mask_base_uri': 'https://maps.gstatic.com/mapfiles/place_api/icons/v2/generic_pinlet', 'name': 'Allgäu Alps', 'photos': [{'height': 480, 'html_attributions': ['<a href="https://maps.google.com/maps/contrib/116485343700311600047">Stoyan Tzvetansky</a>'], 'photo_reference': 'ATplDJZ7WVQgyTc8Zq51UmD4lhDBX3qcpb2bmisfD5KUQhtJYQzaIO4GacA_S2qFV_9ndLI0vn3zEyeqo01gC3SBOnNjAiF4WgKABCaNdWRKZAsaKK4GU-ji0N-MN1s3cYhIJgFZJN2YSVQtsamQZ7YexEUZ1B3udO66Lvr3UBB4vkMIO1bi', 'width': 720}], 'place_id': 'ChIJI-93SeybnEcRI29jTPNhXG0', 'rating': 4.5, 'reference': 'ChIJI-93SeybnEcRI29jTPNhXG0', 'types': ['natural_feature', 'establishment'], 'user_ratings_total': 38}], 'status': 'OK'}
For the sake of simplicity of this example we choose to use the first result for each record.
# get the first result out of the query result returned by the Google Maps API
df['firstresult'] = df['gmaps_places_result'].apply(lambda x: x['results'][0] if isinstance(x, dict) and 'results' in x and len(x['results']) > 0 else None)
# retrieve the lat and the lon of this first result
df['lat_google'] = df["firstresult"].apply(lambda x: x['geometry']['location']['lat'] if isinstance(x, dict) and 'geometry' in x and 'location' in x['geometry'] and 'lat' in x['geometry']['location'] else None)
# df['lon_google'] = df["firstresult"].apply(lambda x:x['geometry']['location']['lng'])
df['lon_google'] = df["firstresult"].apply(lambda x: x['geometry']['location']['lng'] if isinstance(x, dict) and 'geometry' in x and 'location' in x['geometry'] and 'lng' in x['geometry']['location'] else None)
df.drop(columns=['firstresult', 'gmaps_places_result'], inplace=True)
df.head()
StateName | AreaName | maps_search | lat_google | lon_google | |
---|---|---|---|---|---|
0 | Bayern | Allgäuer Alpen | Allgäuer Alpen, Bayern, Germany | 47.512409 | 11.405794 |
1 | Sachsen-Anhalt | Altengrabow | Altengrabow, Sachsen-Anhalt, Germany | 52.208155 | 12.195526 |
2 | Sachsen-Anhalt | Annaburger Heide | Annaburger Heide, Sachsen-Anhalt, Germany | 51.688333 | 13.077222 |
3 | Bayern | Altmühltal | Altmühltal, Bayern, Germany | 49.031489 | 10.804621 |
4 | Sachsen-Anhalt | Altmärkische Höhe | Altmärkische Höhe, Sachsen-Anhalt, Germany | 52.833331 | 11.604819 |
OpenStreetMap¶
Next we will try exactly the same with OpenStreetMap. Just because we were curious which results are better in our specific case and because OpenStreetMap is also easier to use, since you only need to have an account to be able to use their API. You can create an account on the website of OpenStreetMap: www.openstreetmap.org.
openstreetmap_username = '<openstreetmap username>'
# initialize the api to openstreetmap
geolocator = geopy.Nominatim(user_agent=openstreetmap_username)
# create a function to retrieve the lat and lon of an address
def get_latlon(address: str):
location = geolocator.geocode(address)
if location is not None:
location = location.raw
return [location[k] for k in ['lat', 'lon']]
else:
return [None, None]
# make sure we don't exceed the maximum of API calls
time.sleep(1)
# retrieve the lat and lon of each record
df["latlon"] = (df['maps_search']).apply(get_latlon)
df.head()
StateName | AreaName | maps_search | lat_google | lon_google | latlon | |
---|---|---|---|---|---|---|
0 | Bayern | Allgäuer Alpen | Allgäuer Alpen, Bayern, Germany | 47.512409 | 11.405794 | [47.4472501, 10.314918249152736] |
1 | Sachsen-Anhalt | Altengrabow | Altengrabow, Sachsen-Anhalt, Germany | 52.208155 | 12.195526 | [52.2021689, 12.1795198] |
2 | Sachsen-Anhalt | Annaburger Heide | Annaburger Heide, Sachsen-Anhalt, Germany | 51.688333 | 13.077222 | [51.70973275, 13.118867562875781] |
3 | Bayern | Altmühltal | Altmühltal, Bayern, Germany | 49.031489 | 10.804621 | [49.0353396, 10.829200123790446] |
4 | Sachsen-Anhalt | Altmärkische Höhe | Altmärkische Höhe, Sachsen-Anhalt, Germany | 52.833331 | 11.604819 | [52.831484950000004, 11.567036018056264] |
# split the lat and the lon in two separate columns
df = pd.merge(df.reset_index(), pd.DataFrame(df['latlon'].values.tolist(), columns=['lat_osm', 'lon_osm']).reset_index(), on='index')
df.drop(columns=['latlon'], inplace=True)
df.head()
index | StateName | AreaName | maps_search | lat_google | lon_google | lat_osm | lon_osm | |
---|---|---|---|---|---|---|---|---|
0 | 0 | Bayern | Allgäuer Alpen | Allgäuer Alpen, Bayern, Germany | 47.512409 | 11.405794 | 47.4472501 | 10.314918249152736 |
1 | 1 | Sachsen-Anhalt | Altengrabow | Altengrabow, Sachsen-Anhalt, Germany | 52.208155 | 12.195526 | 52.2021689 | 12.1795198 |
2 | 2 | Sachsen-Anhalt | Annaburger Heide | Annaburger Heide, Sachsen-Anhalt, Germany | 51.688333 | 13.077222 | 51.70973275 | 13.118867562875781 |
3 | 3 | Bayern | Altmühltal | Altmühltal, Bayern, Germany | 49.031489 | 10.804621 | 49.0353396 | 10.829200123790446 |
4 | 4 | Sachsen-Anhalt | Altmärkische Höhe | Altmärkische Höhe, Sachsen-Anhalt, Germany | 52.833331 | 11.604819 | 52.831484950000004 | 11.567036018056264 |
Process the results¶
After retrieving the lat and the lon in both ways, it's up to us to compare these locations and decide which solution we would prefer here. We did this by plotting all locations on a map and compare them to each other. You can find a way to plot geodata on a map using Python by reading our blog regarding this subject in this same series.