Top Beers in 2016.

After completing my previous post on food I wanted to work on something which I have started to explore recently,craft beer. A friend of mine introduced me to a beer club membership prior to which I never knew anything beyond the Corona’s . Then began the collection and here it is, what have so far.

This is an image

So when I started searching for the data google lead me to Beer Advocate. Below is how the raw html table from the website looked like.

This is an image

I have used R to scrape the table from the website using R. The library I am using here to scrape is Rvest. Below is the code on how to get the data.


                    # Enter the url below
                    url <- ""

                      beer <- url %>%
                      html() %>%

                      ## to get xpath for a table ,right click on the table,inspect,
                      ## go to the table tag ,right click again and go to copy xpath .. phew ...
                      ## not clear click here for <a href="">more details</a>

                      html_nodes(xpath = '//*[@id="extendedInfo"]/a[1]') %>%
                      beer <- beer[[1]]

                        file = "topus250.csv",
                        quote = TRUE,
                        sep = ",",
                        row.names = FALSE

Now that I got the scraped data and address parameter as the name of the brewing company , it looks something like this.

This is an image

The next step here is to get the address geocoded which would help me plot this on a map . For this I have used the library ggmap.


                 # Read in the CSV data and store it in a variable
                  origAddress <- read.csv("topus250.csv", stringsAsFactors = FALSE)

                  # Initialize the data frame
                  geocoded <- data.frame(stringsAsFactors = FALSE)

                  # Loop through the addresses to get the latitude and longitude of each address
                  # and add it to the origAddress data frame in new columns lat and lon

                  for(i in 1:nrow(origAddress))
                    # Print("Working...")
                    result <- geocode(origAddress$Address[i], output = "latlona", source = "google")
                    origAddress$lon[i] <- as.numeric(result[1])
                    origAddress$lat[i] <- as.numeric(result[2])
                    origAddress$geoAddress[i] <- as.character(result[3])
                    origAddress$state[i] <- as.character(result[3])

                  # Save the output as csv to the working directory
                  write.csv(result, file = geocoded.csv)

Now I got the data cleaned, gecoded and ready to plot it on the map. Another task …another library. Here I have used the leaflet library to add the basemap,plot the points , add clusters and markers to it. All it took was a couple of lines in R !! As a continuation to this project.



    lf <-
      read.csv("beer_lat_long.csv", stringsAsFactors = FALSE) # Brings in the file 'ctlist.csv'

    map <-
      leaflet(lf) %>% addTiles('http://{s}{z}/{x}/{y}.png',
                               attribution = 'Map tiles by <a href="">Stamen Design</a>, <a href="">CC BY 3.0</a> &mdash; Map data &copy; <a href="">OpenStreetMap</a>')

    map %>% setView(-95.712891, 37.09024, zoom = 5)

    #add cluster

    map %>% addMarkers(
      popup = paste(
        "Beer Name:",
      clusterOptions = markerClusterOptions()

Sadly we dont see many breweries in the top list from Texas, but then that gives me more the reasons to travel and visit many breweries. As I was doing this project I did realize that when I want to show the leaflet map using R in an html page this converts all the data loaded and that makes the html file heavy to load. So going on next I would to try do some more advanced visualization and representation of the same data using leaflet library and html.