How to geocode your data with R

In my first post I would like to introduce you one of the easiest ways to geocode your data. Geocoding is one of the most essential functionalities within the spatial analysis. It allow us to transform a description of a location, generally an address, postcode or similar, into a pair of coordinates -longitude, latitude or X, Y-. Then we can map our data.

worldmapgreen

There are an increasing wide diversity of alternatives to geocode our datasets. From websites to programming applications. Here I will show you how to do it using R. First, it is necessary to download and install R (I recomend using RStudio, a more friendly desktop IDE). Secondly, you have to set up your working environment. Then  load your data. For this tutorial I have scraped and clean a csv dataset called fires of the 100 Spanish towns with more fires from 2001 to 2013 from the España en Llamas website. You can download it here. In addition, I have created a GitHub repository with all the material covered in this post.

Now it is turn of the funny stuff. In order to geocode our data, we are going to use the geocode() function from the ggmap package which uses the Google Maps API. First install the libraries needed and then call the function as this:

After the geocoding process that should last a few minutes to be completed, time to merge the lonlat dataset wich you have just generated and the original dataset. In order to achieve this, we must create an id column to match our rows:

We can stop here. But it is more than advisable to explore your data in order to look for outliers. For this task, we plot our towns as dots over a simple worldmap (see figure above). Any point outside the Spanish borders will be considered as an outlier. So we will need to correct our data again.

As you can observe there are plenty of outliers. In the next post I will explain how to select these rebel dots and correct them.

GIS Analyst. Working at @CARTO and @ongawa4d.

2 Comments

  1. Pingback: How to geocode your data with r... and CartoDB! Ramiro Aznar

  2. naiyu

    is latlon a dataframe?

    if i want to geocode the data and get the results coordinates put in my original csv as columns,
    do I use:
    data$latlon = geocode(data$address)

    is it going to create latlon in one column?

    and does the geocode function automatically looping through a list of addresses

    also

Leave a Reply

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *