From location name to latitude and longitude¶

When combining different datasets based on their geolocation it is, no surprise here, vital to know the exact location of each record. But what if you run into data without this vital piece of information? Well, as long as you have something that describes a location, you might be able to retrieve this information with the API of Google Maps or OpenStreetMap. Sounds difficult? Don't be fooled, it is surprisingly easy (and free).

For this example we're using data we retrieved from https://www.dbb-wolf.de/Wolfsvorkommen/territorien/entwicklung-der-rudel. This is data about the location of wolves in Germany.

In [1]:
import pandas as pd
import time
import googlemaps
import geopy
In [2]:
# import the data
df = pd.read_csv("data/wolvendata_dbbw.csv", encoding="UTF-8", delimiter=";")[['StateName', 'AreaName']].drop_duplicates().reset_index(drop=True)

df['maps_search'] = df.AreaName + ', ' + df.StateName + ', Germany' 

df.head()
Out[2]:
StateName AreaName maps_search
0 Bayern Allgäuer Alpen Allgäuer Alpen, Bayern, Germany
1 Sachsen-Anhalt Altengrabow Altengrabow, Sachsen-Anhalt, Germany
2 Sachsen-Anhalt Annaburger Heide Annaburger Heide, Sachsen-Anhalt, Germany
3 Bayern Altmühltal Altmühltal, Bayern, Germany
4 Sachsen-Anhalt Altmärkische Höhe Altmärkische Höhe, Sachsen-Anhalt, Germany

As you can see we have the name of an area and the state this area is in, but we don't know a latitude and longitude for each record. That is what we will try to retrieve using both Google Maps and OpenStreetMaps.

Google Maps¶

To be able to call the Google Maps API, we first need to create an API key, you can read more about how to do this on this page. Make sure you don't share this key with others.

In [3]:
# current key is disabled, to rerun the script, add new key
googlekey = '<google api key>'

Now we can initialize a Google Maps object using our API key and call the function gmaps.places for each record in our dataframe, giving it the value of our "maps_search" column, which contains the area, state and country we are looking for. The function gmaps.places gives us multiple search results of which we can pick the one we think is best. There is also a function called gmaps.geocode, but this will only return a single match, which might not be the one that we feel fits our needs best.

In [4]:
gmaps = googlemaps.Client(key=googlekey)
df['gmaps_places_result'] = df['maps_search'].apply(gmaps.places) 

The final result looks like this:

In [5]:
df['gmaps_places_result'][0]
Out[5]:
{'html_attributions': [],
 'results': [{'formatted_address': 'Allgäu Alps, 83661 Lenggries, Germany',
   'geometry': {'location': {'lat': 47.5124089, 'lng': 11.4057938},
    'viewport': {'northeast': {'lat': 47.5199448, 'lng': 11.4218012},
     'southwest': {'lat': 47.5048719, 'lng': 11.3897864}}},
   'icon': 'https://maps.gstatic.com/mapfiles/place_api/icons/v1/png_71/geocode-71.png',
   'icon_background_color': '#7B9EB0',
   'icon_mask_base_uri': 'https://maps.gstatic.com/mapfiles/place_api/icons/v2/generic_pinlet',
   'name': 'Allgäu Alps',
   'photos': [{'height': 480,
     'html_attributions': ['<a href="https://maps.google.com/maps/contrib/116485343700311600047">Stoyan Tzvetansky</a>'],
     'photo_reference': 'ATplDJZ7WVQgyTc8Zq51UmD4lhDBX3qcpb2bmisfD5KUQhtJYQzaIO4GacA_S2qFV_9ndLI0vn3zEyeqo01gC3SBOnNjAiF4WgKABCaNdWRKZAsaKK4GU-ji0N-MN1s3cYhIJgFZJN2YSVQtsamQZ7YexEUZ1B3udO66Lvr3UBB4vkMIO1bi',
     'width': 720}],
   'place_id': 'ChIJI-93SeybnEcRI29jTPNhXG0',
   'rating': 4.5,
   'reference': 'ChIJI-93SeybnEcRI29jTPNhXG0',
   'types': ['natural_feature', 'establishment'],
   'user_ratings_total': 38}],
 'status': 'OK'}

For the sake of simplicity of this example we choose to use the first result for each record.

In [6]:
# get the first result out of the query result returned by the Google Maps API
df['firstresult'] = df['gmaps_places_result'].apply(lambda x: x['results'][0] if isinstance(x, dict) and 'results' in x and len(x['results']) > 0 else None)

# retrieve the lat and the lon of this first result
df['lat_google'] = df["firstresult"].apply(lambda x: x['geometry']['location']['lat'] if isinstance(x, dict) and 'geometry' in x and 'location' in x['geometry'] and 'lat' in x['geometry']['location'] else None)
# df['lon_google'] = df["firstresult"].apply(lambda x:x['geometry']['location']['lng'])
df['lon_google'] = df["firstresult"].apply(lambda x: x['geometry']['location']['lng'] if isinstance(x, dict) and 'geometry' in x and 'location' in x['geometry'] and 'lng' in x['geometry']['location'] else None)

df.drop(columns=['firstresult', 'gmaps_places_result'], inplace=True)

df.head()
Out[6]:
StateName AreaName maps_search lat_google lon_google
0 Bayern Allgäuer Alpen Allgäuer Alpen, Bayern, Germany 47.512409 11.405794
1 Sachsen-Anhalt Altengrabow Altengrabow, Sachsen-Anhalt, Germany 52.208155 12.195526
2 Sachsen-Anhalt Annaburger Heide Annaburger Heide, Sachsen-Anhalt, Germany 51.688333 13.077222
3 Bayern Altmühltal Altmühltal, Bayern, Germany 49.031489 10.804621
4 Sachsen-Anhalt Altmärkische Höhe Altmärkische Höhe, Sachsen-Anhalt, Germany 52.833331 11.604819

OpenStreetMap¶

Next we will try exactly the same with OpenStreetMap. Just because we were curious which results are better in our specific case and because OpenStreetMap is also easier to use, since you only need to have an account to be able to use their API. You can create an account on the website of OpenStreetMap: www.openstreetmap.org.

In [7]:
openstreetmap_username = '<openstreetmap username>'
In [8]:
# initialize the api to openstreetmap
geolocator = geopy.Nominatim(user_agent=openstreetmap_username)

# create a function to retrieve the lat and lon of an address
def get_latlon(address: str):
  location = geolocator.geocode(address)
  if location is not None:
    location = location.raw
    return [location[k] for k in ['lat', 'lon']]
  else:
    return [None, None]
  # make sure we don't exceed the maximum of API calls
  time.sleep(1)
In [9]:
# retrieve the lat and lon of each record
df["latlon"] = (df['maps_search']).apply(get_latlon)
In [10]:
df.head()
Out[10]:
StateName AreaName maps_search lat_google lon_google latlon
0 Bayern Allgäuer Alpen Allgäuer Alpen, Bayern, Germany 47.512409 11.405794 [47.4472501, 10.314918249152736]
1 Sachsen-Anhalt Altengrabow Altengrabow, Sachsen-Anhalt, Germany 52.208155 12.195526 [52.2021689, 12.1795198]
2 Sachsen-Anhalt Annaburger Heide Annaburger Heide, Sachsen-Anhalt, Germany 51.688333 13.077222 [51.70973275, 13.118867562875781]
3 Bayern Altmühltal Altmühltal, Bayern, Germany 49.031489 10.804621 [49.0353396, 10.829200123790446]
4 Sachsen-Anhalt Altmärkische Höhe Altmärkische Höhe, Sachsen-Anhalt, Germany 52.833331 11.604819 [52.831484950000004, 11.567036018056264]
In [11]:
# split the lat and the lon in two separate columns
df = pd.merge(df.reset_index(), pd.DataFrame(df['latlon'].values.tolist(), columns=['lat_osm', 'lon_osm']).reset_index(), on='index')
df.drop(columns=['latlon'], inplace=True)

df.head()
Out[11]:
index StateName AreaName maps_search lat_google lon_google lat_osm lon_osm
0 0 Bayern Allgäuer Alpen Allgäuer Alpen, Bayern, Germany 47.512409 11.405794 47.4472501 10.314918249152736
1 1 Sachsen-Anhalt Altengrabow Altengrabow, Sachsen-Anhalt, Germany 52.208155 12.195526 52.2021689 12.1795198
2 2 Sachsen-Anhalt Annaburger Heide Annaburger Heide, Sachsen-Anhalt, Germany 51.688333 13.077222 51.70973275 13.118867562875781
3 3 Bayern Altmühltal Altmühltal, Bayern, Germany 49.031489 10.804621 49.0353396 10.829200123790446
4 4 Sachsen-Anhalt Altmärkische Höhe Altmärkische Höhe, Sachsen-Anhalt, Germany 52.833331 11.604819 52.831484950000004 11.567036018056264

Process the results¶

After retrieving the lat and the lon in both ways, it's up to us to compare these locations and decide which solution we would prefer here. We did this by plotting all locations on a map and compare them to each other. You can find a way to plot geodata on a map using Python by reading our blog regarding this subject in this same series.