Recently I visited Austin and many of my friends had mentioned about the variety in food options here.
So my wife and I decided to search for places to eat on the foursquare app. As a standard search filter
with high rating we ended up at pretty good places and foursquare did alert us to checkins whenever we
reached a place. Post the trip I wanted to see how many people do checkins using this app and how the checkins are
correlated with the ratings.
The first step here is to get the data . So I started to play around with the foursquare API
and started working around the URL on what category(food,places to see, etc) to get the data .
The authentication process for the foursquare API was a bit tricky but with my google-fu
(( and special mention to the GIS tribe ) I was able to get going. Below is how you would
get the client id and client secret when you create a new app.
![This is an image](createfsqapi.png)
The idea was how to do it for many places across the country. So I decided to use R to scrap and clean the data. You can find the code here.
library(RJSONIO)
library(RCurl)
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
# Obtained from http://notebook.gaslampmedia.com/download-zip-code-latitude-longitude-city-state-county-csv/
ll = read.csv('zip_codes_states.csv',sep=",",head=TRUE)
clientid = "ENTER YOUR CLIENT ID"
clientsecret = "ENTER YOUR CLIENT SECRET"
venue_name = c()
venue_lat = c()
venue_long = c()
venue_city = c()
venue_state = c()
venue_country = c()
venue_checkins = c()
venue_users = c()
venue_hasMenu = c()
venue_rating = c()
venue_postalCode = c()
venue_usersCount = c()
venue_formattedAddress = c()
# To go through the lat longs in the csv and get the data.
for (i in 1:dim(ll)[1]) {
lat = ll$latitude[i]
long = ll$longitude[i]
# Do query and parse results
query = paste("https://api.foursquare.com/v2/venues/explore?client_id=",clientid,"&client_secret=",clientsecret,"&ll=",lat,",",long,"&query=food&v=20170131",sep="")
result = getURL(query)
data <- fromJSON(result)
# For each result, save a bunch of fields, you can tweak this to your liking
if (length(data$response$groups[[1]]$items) > 0) {
for (r in 1:length(data$response$groups[[1]]$items)) {
tmp = data$response$groups[[1]]$items[[r]]$venue
venue_name = c(venue_name,tmp$name)
venue_lat = c(venue_lat,tmp$location$lat)
venue_long = c(venue_long,tmp$location$lng)
venue_city = c(venue_city,tmp$location$city)
venue_state = c(venue_state,tmp$location$state)
venue_country = c(venue_country,tmp$location$country)
venue_checkins = c(venue_checkins,tmp$stats[1])
venue_hasMenu = c(venue_hasMenu,tmp$hasMenu)
venue_rating = c(venue_rating,tmp$rating)
# venue_shortName = c(venue_shortName,tmp$shortName)
}
}
}
# To Save the raw output
save(venue_name,venue_lat,venue_long,venue_city,venue_state,venue_country,venue_checkins,venue_hasMenu ,venue_rating ,file='venuesResult.RData')
# put this into a dataframe
data = as.data.frame(cbind(locationvar,venue_checkins,venue_name,venue_lat,venue_long,venue_checkins,venue_users))
# remove the duplicate results
dsub = subset(data,!duplicated(data))
names(dsub) = c("latlong","checkins","name","latitude","longitude")
# Export to file to csv which can be used for the next step.
write.csv(tabley,file = "Austin_Foursquare.csv")
Once this was done the next part was to how do I visualize this data . Since I have been trying my hands on d3js I
used the cleaned output from R in CSV format to display how checkins and ratings vary for these places using bubble chart.