How to geocode and visualize your data with CartoDB

In the last post we geocode our data with R. But the geocoding process did not work enterely right. Some points were badly geolocated. In this post, I will explain how to subset these outliers, correct their coordenates and finally, visualize the entire dataset both using R and CartoDB.

As always, set your working directory and load your data. You can download the fires_lonlat csv dataset here. As always, I have also upload a GitHub repository for this tutorial.

Now it is turn to subset those outliers. You can do it easily using the select tool with a GIS desktop software. But here I will take a longer but more interesting methodology. In fact, you will learn along this path a few useful tips, functions and applications for GIS programming. First, we have to figure out what the extreme geographic points are. Wikipedia can help us with this. Secondly, we have to identify those datapoints outside these extreme points. Finally, we must subset them from the original dataset.

It is worth to point out the utility of the char2dms() function from the sp package. It allows us to translate a string into a degrees minutes seconds format.

We have our outliers dataset. Now it is time to correct these rebel datapoints. Normally, geocoding algoriths fail to get the right coordinates because there more than one place in The Earth with the same name. It looks like this our case. In order to avoid this, we will try CartoDB’s georeferencing tool. First, load and connect the outliersNew csv dataset. Then click on “the_geom” column to launch the georeferencing application and select the “City Names” option. And choose “town_name”, “region” and “country” in those three tabs as shown in the following image.


Sadly CartoDB has only georeferenced half of our points. The other half we will have to edit it manually. To export this new dataset, we have to split the_geom column into two separate ones, one from the latitude values and one for the longitude values. First, create a new number type column for latitude, and one for longitude. Second, with the CartoDB Editor enter the following SQL query:

Apply and export this new dataset as outliersCDB.csv. Load it again in R and clean the useless columns that CartoDB has added:

As you remember, we need to find 5 pairs of coordinates. For this purpuse, we utilize the Latitude Longitude Finder. Write down the values and introduce them using these simple lines of code:

Finally, the last step before visualizing consits in replacing the wrong coordinates in the fires_lonlat dataset with the right ones from the outliersCDB dataset:

Time to map! First, with R. Use the symbol() function to draw proportional circles in relation to the number of fires per town.

spainmapNice. But have a try with CartoDB. You will discover that CartoDB has more elegant ways to visualize your data. Once more, load and connect your dataset. Then, create a map. Now it is up to you how to display the data. As the dataset consists in fires and it is arranged in clear clusters, I seems a good idea to make a heatmap. So go to the wizard tab and select heatmap and change your color gradient with the CartoCSS editor according to your preferences. These are mines:

The result is shown above. You can see clearly the two clusters I was talking about. One in Galicia, and the other in Asturias. Time to take decisions. Any comment?


GIS Analyst. Working at @CARTO and @ongawa4d.

Leave a Reply

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *