R Leaflet Tutorial

Your own web map with R Leaflet


Overview

We’ve just created a whole bunch of data and so far it is just a bunch of shapefiles (or geojson files – depending on what you exported) and .csv files. We haven’t even had the chance to look at what we’ve made! While there are a dozen tools we could use to visualize and interact with our data (e.g. Tilemill, Mapbox Studio, CartoDB, Tableau, etc), we are going to stick with R for now and let it “gently” introduce us to the fundamentals of a web map.

Web Map Fundamentals

A number of great people out there have created an overview of web map fundamentals, let’s take a look and learn how it all works:

Alan McConchie & Beth Schechter! – anatomy of a web map

Maptime ATX – web map fundamentals

Joey’s Hello Web Maps Intro

Enter: R Leaflet

So if we know that to make a web map generally is composed of:

  1. map tiles (most of the time)leaflet_tiles
  1. geodatageodata
  2. html/css/javascript (your webstack)htmlcss.png

And we also know that:

  1. we know little or nothing about html/css/javascript
  2. we haven’t made any map tiles

THEN HOW THE HELL ARE WE SUPPOSED TO MAKE OUR OWN WEB MAP???

Well, turns out that others have also been in your same predicament and have developed a library to bring web mapping to R programmers.

Wait a second… so we can make a web map, without coding any html, css, or javascript?

YES! sort of. Some clever people got together and wrote a library in R that takes a very famous and awesome javascript library (yes there are libraries in javascript, and every other language out there!) called “Leaflet.js” and allows you to write R code and export a fully functional web map, with tiles and geodata drawn right on top!

Basically what happens is:

  1. you load up the “leaflet” library for R
  2. you write R code that then gets translated into html/css/javascript
  3. your translated code is written into an “.html” file which includes:
    • your html skeleton
    • the leaflet.js library and leaflet css style
    • your javascript which was translated from your R code

Our First Example

With 5 lines of code, we’re going to make our first interactive web map!

leaflet_test.png

Run these lines of code and create your first interactive map with R!

# install the leaflet library
install.packages("leaflet")

# add the leaflet library to your script
library(leaflet)
    
# initiate the leaflet instance and store it to a variable
m = leaflet()
    
# we want to add map tiles so we use the addTiles() function - the default is openstreetmap
m = addTiles(m)
    
# we can add markers by using the addMarkers() function
m = addMarkers(m, lng=-123.256168, lat=49.266063, popup="T")
    
# we can "run"/compile the map, by running the printing it
m

Now if we export the map and save as webpage…:

leaflet_webpage.png

The convention for naming .html files is “index”, therefore, let’s name our file: index.html

leaflet_html

CRAZY – with just 4 lines of code, you added a pin to a map that now works on the interwebz! Let’s scale this up and use our dataset and see what we can come up with!


Working with our 3-1-1 Data


hex_van

So let’s shift gears and work with our 3-1-1 data. We now we have a bunch of points for the month of January and hexagonal grids with the call densities. How do we begin to interact with the data?

What are the main interactions we are going to work with?:

  1. pop-up details (aka tool tips) on mouse click
  • reveal specific case type
    • reveal counts
  1. toggle layers on and off

** Let’s get started with a rather verbose first example**

Version 1: Working with Point Data

point_van.png

# reading in a csv file - our data_filter.csv from the 311 Tutorial
# this string tells my R where to find data_filter.csv on my laptop.
# you will need to change it to tell your R where to find your own
# file! 
filename = "/Users/andrasszeitz/Desktop/GEOB_472/data_filter.csv"
data_filter = read.csv(filename, header = TRUE)

# if you are not able to read in your own data, use this as a backup
# NOTE - you need to have the `curl` package loaded for this to work
data_filter = read.csv(curl('https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/201401CaseLocationsDetails-geo-filtered.csv'), header=TRUE)

We’re going to use the subset() function to get store our data into separate variables

# subset out your data_filter:
df_graffiti_noise = subset(data_filter, cid == 1)
df_street_traffic_maint = subset(data_filter, cid == 2)
df_garbage_recycling_organics = subset(data_filter, cid == 3)
df_water = subset(data_filter, cid == 4)
df_animal_vegetation = subset(data_filter, cid == 5)
df_other = subset(data_filter, cid == 0)

Great, now each of our data categories lives in its own variable name.

Initiate Base Map Tiles

Now we’re going to initiate our leaflet map:

# initiate leaflet
m = leaflet()

# add openstreetmap tiles (default)
m = addTiles(m)

# you can now see that our maptiles are rendered
m

Define Symbology for Point Layers

Now we are going to define some colors using leaflet’s special colorFactor() function:

colorFactors = colorFactor(c('red', 'orange', 'purple', 'blue', 'pink', 'brown'),
                           domain = data_filter$cid)

What the colorFactor function does is take our list of colors and “maps” them to the domain that we defined. For example, when we apply the colorFactor() function to our data, it will color a point “red”, if the “cid” in the data is equal to “0”, “orange” if the “cid” is equal to 1, etc.

The next step is to add our points to the map. We can do so using the addCircleMarkers() function:

m = addCircleMarkers(m, 
                     lng = df_graffiti_noise$lon_offset, # we feed the longitude coordinates 
                     lat = df_graffiti_noise$lat_offset,
                     popup = df_graffiti_noise$Case_Type, 
                     radius = 2, 
                     stroke = FALSE, 
                     fillOpacity = 0.75, 
                     color = colorFactors(df_graffiti_noise$cid),
                     group = "1 - graffiti & noise"
                     )

Here’s what each argument is saying:

  • lng: we add our markers to our m
  • lat:we feed the latitude coordinates
  • popup:each feature will show the Case_Type on click
  • radius: we set the circle radius size
  • stroke: we set stroke to false
  • fillOpacity: we set a fill opacity
  • color: the colorFactors() function is applying the color based on the value of the “cid”
  • group:we name the group

Now repeat this function across the other layers:

m = addCircleMarkers(m, 
                     lng = df_street_traffic_maint$lon_offset, lat=df_street_traffic_maint$lat_offset, 
                     popup = df_street_traffic_maint$Case_Type, 
                     radius = 2, 
                     stroke = FALSE, fillOpacity = 0.75,
                     color = colorFactors(df_street_traffic_maint$cid),
                     group = "2 - street & traffic & maintenance")
m = addCircleMarkers(m, 
                     lng = df_garbage_recycling_organics$lon_offset, lat=df_garbage_recycling_organics$lat_offset, 
                     popup = df_garbage_recycling_organics$Case_Type, 
                     radius = 2, 
                     stroke = FALSE, fillOpacity = 0.75,
                     color = colorFactors(df_garbage_recycling_organics$cid),
                     group = "3 - garbage related")
m = addCircleMarkers(m, 
                     lng = df_water$lon_offset, lat=df_water$lat_offset, 
                     popup = df_water$Case_Type, 
                     radius = 2, 
                     stroke = FALSE, fillOpacity = 0.75,
                     color = colorFactors(df_water$cid),
                     group = "4 - water related")
m = addCircleMarkers(m, 
                     lng = df_animal_vegetation$lon_offset, lat=df_animal_vegetation$lat_offset, 
                     popup = df_animal_vegetation$Case_Type, 
                     radius = 2,
                     stroke = FALSE, fillOpacity = 0.75,
                     color = colorFactors(df_animal_vegetation$cid),
                     group = "5 - animals & vegetation")
m = addCircleMarkers(m, 
                     lng = df_other$lon_offset, lat=df_other$lat_offset, 
                     popup = df_other$Case_Type, 
                     radius = 2,
                     stroke = FALSE, fillOpacity = 0.75,
                     color = colorFactors(df_other$cid),
                     group = "0 - other")

Other Options for Base Map Tiles

Now we’re going to add some additional map tiles by using the addTiles() and the addProviderTiles() functions:

m = addTiles(m, group = "OSM (default)") 
m = addProviderTiles(m,"Stamen.Toner", group = "Toner")
m = addProviderTiles(m, "Stamen.TonerLite", group = "Toner Lite")

We can now add in our layer toggle control using the addLayersControl() function – notice baseGroups are our tileLayers and the overlayGroups are our data layers:

m = addLayersControl(m,
                     baseGroups = c("Toner Lite","Toner"),
                     overlayGroups = c("1 - graffiti & noise", "2 - street & traffic & maintenance",
                                       "3 - garbage related","4 - water related", "5 - animals & vegetation",
                                       "0 - other")
)

# make the map
m

Version 2: Optimizing version 1

point_van2.png

We already subset our data from before:

# subset out your data_filter:
df_graffiti_noise = subset(data_filter, cid == 1)
df_street_traffic_maint = subset(data_filter, cid == 2)
df_garbage_recycling_organics = subset(data_filter, cid == 3)
df_water = subset(data_filter, cid == 4)
df_animal_vegetation = subset(data_filter, cid == 5)
df_other = subset(data_filter, cid == 0)

Here’ we are:

  1. adding those variables to a list so we can loop through them
  2. creating a list of our layers that we want the toggle name to be
  3. looping though the list
# Now we're going to put those variables into a **list** so we can loop through them:
data_filterlist = list(df_graffiti_noise = subset(data_filter, cid == 1),
                df_street_traffic_maint = subset(data_filter, cid == 2),
                df_garbage_recycling_organics = subset(data_filter, cid == 3),
                df_water = subset(data_filter, cid == 4),
                df_animal_vegetation = subset(data_filter, cid == 5),
                df_other = subset(data_filter, cid == 0))

# Remember we also had these groups associated with each variable? Let's put them in a list too:
layerlist = c("1 - graffiti & noise", "2 - street & traffic & maintenance",
              "3 - garbage related","4 - water related", "5 - animals & vegetation",
              "0 - other")

# We can keep that same color variable:
colorFactors = colorFactor(c('red', 'orange', 'purple', 'blue', 'pink', 'brown'), domain=data_filter$cid)

# Now we have our loop - each time through the loop, it is adding our markers to the map object:
for (i in 1:length(data_filterlist)){
  m = addCircleMarkers(m, 
                        lng=data_filterlist[[i]]$lon_offset,
                        lat=data_filterlist[[i]]$lat_offset, 
                        popup=data_filterlist[[i]]$Case_Type, 
                        radius=2,
                        stroke = FALSE, 
                        fillOpacity = 0.75,
                        color = colorFactors(data_filterlist[[i]]$cid),
                        group = layerlist[i]
              )
}

Now this time we flip “overlayGroups” and “baseGroups” so that we can get the radiobutton functionality – that way only 1 category of calls are shown:

m = addTiles(m, "Stamen.TonerLite", group = "Toner Lite") 
m = addLayersControl(m,
                      overlayGroups = c("Toner Lite"),
                      baseGroups = layerlist
                      )
m

Version 3: Working with Polygons

hex_van2.png

Remember we creates that hexagonal grid to aggregate our data to? Well, let’s work with it to get a general impression of the call density across the city:

If you haven’t already, you should have the rgdal and the GISTools libraries loaded:

library(rgdal)
library(GISTools)

Now we need to read in our hexgrid we aggregted our data to:

# define the filepath
hex_1401_fn = 'https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/geo/hgrid_250m_1401_counts.geojson'

# read in the geojson
hex_1401 = readOGR(hex_1401_fn, "OGRGeoJSON")
## OGR data source with driver: GeoJSON 
## Source: "https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/geo/hgrid_250m_1401_counts.geojson", layer: "OGRGeoJSON"
## with 1418 features
## It has 5 fields

Now initiate a new map object but this time with the Stamen Toner lite style:

m = leaflet()
m = addProviderTiles(m, "Stamen.TonerLite", group = "Toner Lite")

Like any choropleth map, we need to set a color scale. We can do so by using the colorNumeric() function which is part of the R leaflet package.

We use the “Greens” color and set the “domain” to the column called “data” in our geojson file.

Here’s the exciting stuff. Let’s add our polygon to the map:

# Create a continuous palette function
pal = colorNumeric(
  palette = "Greens",
  domain = hex_1401$data
)

# add the polygons to the map
m = addPolygons(m, 
                data = hex_1401,
                stroke = FALSE, 
                smoothFactor = 0.2, 
                fillOpacity = 1,
                color = ~pal(hex_1401$data),
                popup = paste("Number of calls: ", hex_1401$data, sep="")
                )

let’s break this down:

  • m: this is our map object
  • data: this is our spatialPolygonsDataframe
  • smoothFactor: adding some smooth operator
  • fillOpacity: changing how transparent object fill is
  • color: we let our colorNumeric() function handle the color
  • popup: we set popup on click to our call density

And what is a choropleth map without a legend? Let’s add one using the addLegend() function:

m = addLegend(m, "bottomright", pal = pal, values = hex_1401$data,
              title = "3-1-1 call density",
              labFormat = labelFormat(prefix = " "),
              opacity = 0.75
)

m

Version 3: Synthesis

hex_van3.png

Now that we’ve got the two pieces of our map – toggleable & clickable point layers and hex grid – let’s put them together.

# initiate leaflet map layer
m = leaflet()
m = addProviderTiles(m, "Stamen.TonerLite", group = "Toner Lite") 

# --- hex grid --- #
# store the file name of the geojson hex grid
hex_1401_fn = 'https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/geo/hgrid_250m_1401_counts.geojson'

# read in the data
hex_1401 = readOGR(hex_1401_fn, "OGRGeoJSON")
## OGR data source with driver: GeoJSON 
## Source: "https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/geo/hgrid_250m_1401_counts.geojson", layer: "OGRGeoJSON"
## with 1418 features
## It has 5 fields
# Create a continuous palette function
pal = colorNumeric(
  palette = "Greens",
  domain = hex_1401$data
)

# add hex grid
m = addPolygons(m, 
                data = hex_1401,
                stroke = FALSE, 
                smoothFactor = 0.2, 
                fillOpacity = 1,
                color = ~pal(hex_1401$data),
                popup= paste("Number of calls: ",hex_1401$data, sep=""),
                group = "hex"
)

# add legend
m = addLegend(m, "bottomright", pal = pal, values = hex_1401$data,
              title = "3-1-1 call density",
              labFormat = labelFormat(prefix = " "),
              opacity = 0.75
)

# --- points data --- #

# subset out your data:
df_graffiti_noise = subset(data_filter, cid == 1)
df_street_traffic_maint = subset(data_filter, cid == 2)
df_garbage_recycling_organics = subset(data_filter, cid == 3)
df_water = subset(data_filter, cid == 4)
df_animal_vegetation = subset(data_filter, cid == 5)
df_other = subset(data_filter, cid == 0)

data_filterlist = list(df_graffiti_noise = subset(data_filter, cid == 1),
                df_street_traffic_maint = subset(data_filter, cid == 2),
                df_garbage_recycling_organics = subset(data_filter, cid == 3),
                df_water = subset(data_filter, cid == 4),
                df_animal_vegetation = subset(data_filter, cid == 5),
                df_other = subset(data_filter, cid == 0))
layerlist = c("1 - graffiti & noise", "2 - street & traffic & maintenance",
              "3 - garbage related","4 - water related", "5 - animals & vegetation",
              "0 - other")


colorFactors = colorFactor(c('red', 'orange', 'purple', 'blue', 'pink', 'brown'), domain=data_filter$cid)
for (i in 1:length(data_filterlist)){
  m = addCircleMarkers(m, 
                       lng=data_filterlist[[i]]$lon_offset, lat=data_filterlist[[i]]$lat_offset, 
                       popup=data_filterlist[[i]]$Case_Type, 
                       radius=2,
                       stroke = FALSE, fillOpacity = 0.75,
                       color = colorFactors(data_filterlist[[i]]$cid),
                       group = layerlist[i]
  )

}


m = addLayersControl(m,
                     overlayGroups = c("Toner Lite", "hex"),
                     baseGroups = layerlist
)

# show map
m

Congratulations you just made your first super awesome and fully functional web map using R!

Now, play with your maps and try to identify weaknesses and strengths of your map. Are the colors appropriate? could they be better? what’s missing? REFINE YOUR STUFF


Review:


In this tutorial we covered a lot of ground. We’re neither masters of R nor data viz experts, however we got the chance to go through the entire data viz pipeline:

  • acquire: pulled 3-1-1 data from github
  • parse: manipulated table data and geocoded the data
  • filter: removed points that fell outside of the bounding box of Vancouver
  • mine: classified the data & added jitter
  • represent: web mapping
  • refine: web mapping
  • interact: web mapping
Advertisements

R – Vancouver 311 Tutorial

Created by: Joey K Lee

Edited by: Andras Szeitz


Project: What are Vancouverites Complaining About?

Brief:

The City of Vancouver releases a dataset of the 3-1-1 phone calls – the general hotline regarding maintenance issues in the city. Currently there is no tool to visualize and access the data. How can citizens engage the city for these matters if there’s no way to work with the data? In the name of civic “hacking”, the project brief is to develop a project that:

  1. can parse and handle the BIG data being delivered by the city.
  2. shows the 3-1-1 data and potentially highlights insights derived from the data in an accessible web application.

Overview:

We will use this project to go through Ben Fry’s Data Visualization Pipeline. The final script generated by going through the steps of the data viz pipeline will be a good foundation to use for the upcoming R Assignment.


Process: Exploring the 3-1-1 data

Data viz is hard and in the end comes down to a lot of experimentation and exploration. This script attempts to showcase how the data viz pipeline is done in practice and how it is far from a linear process, but rather a very interactive and dynamic process.

setup:

Let’s being our script with a nice commented header:

######################################################
# Vancouver 3-1-1: Data Processing Script
# Date:
# By: 
# Desc: 
######################################################

We noticed how libraries can help us to read in geographic data and even help us make new scales. Since this is a bigger project, we’re going to need the help of some more libraries:


# ------------------------------------------------------------------ #
# ---------------------- Install Libraries ------------------------- #
# ------------------------------------------------------------------ #
install.packages("GISTools")
install.packages("RJSONIO")
install.packages("rgdal")
install.packages("RCurl")
install.packages("curl")
  
# Unused Libraries:
# install.packages("ggmap")
# library(ggmap)

You will notice we have some unused libraries – these are some that I started out using in the beginning, but decided to not use. I kept them here just for future reference. NOTE: ggmap was previously used for it’s geocode() function – but with google’s 2500 api call limit, it wasn’t enough for the 10,000+ geocoding events we would need for our project.

Now that you’ve installed the libraries now Let’s load up our libraries to make all those new functions available to us:

# ------------------------------------------------------------------ #
# ----------------------- Load Libararies -------------------------- #
# ------------------------------------------------------------------ #
library(GISTools)
library(RJSONIO)
library(rgdal)
library(RCurl)
library(curl)

Acquire

We use the curl() function to make http/https requests from the web to get data and used our read.csv() function to read our table in to R.

# ------------------------------------------------------------------ #
# ---------------------------- Acquire ----------------------------- #
# ------------------------------------------------------------------ #
# access from the interwebz using "curl"
fname = curl('https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/201401CaseLocationsDetails.csv')

# Read data as csv
data = read.csv(fname, header=T)

# inspect your data
print(head(data))
##   Year Month Day Hour Minute           Department
## 1 2014     1   1    7     21       CSG - Licenses
## 2 2014     1   1    7     45    ENG - Solid Waste
## 3 2014     1   1    9     27    ENG - Solid Waste
## 4 2014     1   1   10      6    ENG - Solid Waste
## 5 2014     1   1   10     15 PRB - Administration
## 6 2014     1   1   10     17        ENG - Streets
##                            Division
## 1                    Animal Control
## 2                        Sanitation
## 3                        Sanitation
## 4                        Sanitation
## 5    General Park Board Information
## 6 Traffic and Electrical Operations
##                                          Case_Type Hundred_Block
## 1                          Dead Animal Pickup Case  Intersection
## 2                            Missed Garbage Pickup          22##
## 3 Abandoned Garbage Pickup - City Property & Parks          10##
## 4                            Missed Garbage Pickup          27##
## 5                               PRB_Park Ranger SR          11##
## 6                                    Sign - Repair          8600
##                      Street_Name    Local_Area
## 1 E 64TH AV and PRINCE ALBERT ST        Sunset
## 2                SW MARINE DRIVE    Kerrisdale
## 3                      W 12TH AV      Fairview
## 4                      W 16TH AV Arbutus Ridge
## 5                   W CORDOVA ST      Downtown
## 6            - 8699 GRANVILLE ST       Marpole

Parse

Upon inspecting our data, we notice we have the addresses, but the city has put int “#’s” to help with the anonymity of the callers. Furthermore, we notice that we don’t have any lat/lon coordinates to work with to turn our 3-1-1 calls into a something spatial. How can we develop a solution for this?

First let’s sub out the “#” with “0”:

# ------------------------------------------------- #
# -------------- Parse: Geocoder ------------------- #
# ------------------------------------------------- #
# change intersection to 00's
data$h_block = gsub("#", "0", data$Hundred_Block)
print(head(data$h_block))
## [1] "Intersection" "2200"         "1000"         "2700"        
## [5] "1100"         "8600"

Next let’s concatenate the newly created “h_block” column with the Street_Name column, and a string that specifices that all of the calls are from Vancouver, BC:

# Join the strings from each column together & add "Vancouver, BC":
data$full_address = paste(data$h_block, 
                          paste(data$Street_Name,
                                "Vancouver, BC",
                                sep=", "),
                          sep=" ")

We also notice that the city has put in the word “intersection” for those calls that refer to an intersection. Let’s take those out so as to make our geocoding parsing potentially easier:

# removing "Intersection " from the full_address entries
data$full_address = gsub("Intersection ", "", data$full_address)
print(head(data$full_address))
## [1] "E 64TH AV and PRINCE ALBERT ST, Vancouver, BC"
## [2] "2200 SW MARINE DRIVE, Vancouver, BC"           
## [3] "1000 W 12TH AV, Vancouver, BC"                 
## [4] "2700 W 16TH AV, Vancouver, BC"                 
## [5] "1100 W CORDOVA ST, Vancouver, BC"              
## [6] "8600  - 8699 GRANVILLE ST, Vancouver, BC"

Now, we need to convert the street addresses into latitude & longitude coordinates so that we have a spatial reference to where these calls occurred. We will use the BC Government’s geocoding API, which will take our street addresses and return lat/long coordinates. But, the whole dataset has just over 10,000 rows (each one is an individual call to 311) and geocoding the entire dataset takes about 1 hour.  We will reduce the size of the dataset so that we can actually geocode the addresses in a realistic amount of time during the tutorial. But for your assignment, you will have to geocode the entire dataset!

# creating a random sequence of 1000 numbers between 1 and the length
# of the dataset. These numbers will be the rows we retain from the
# full dataset in order to more quickly geocode during the tutorial
set.seed(1)
keepers = sample(seq(from = 1, to = 10195, by = 1), size = 1000, replace = TRUE)

# creating a new variable to hold the full data set, then trimming the 
# original dataset to have only the 1000 random rows
full_data = data
data = data[keepers,] 

But before we can geocode the street addresses, we need to create a function to do the task! The BC government’s geocoding API works by taking a URL that has a street address as a search term, processing it, and returning a web page with the latitude and longitude of the street address (plus a bunch of other information). The code block below defines a function in R, and after we define the function, we can use it and R to process all of our street addresses in one go!

# a function taking a full address string, formatting it, and making
# a call to the BC government's geocoding API
bc_geocode = function(search){
  # return a warning message if input is not a character string
  if(!is.character(search)){stop("'search' must be a character string")}
  
  # formatting characters that need to be escaped in the URL, ie:
  # substituting spaces ' ' for '%20'.
  search = RCurl::curlEscape(search)
  
  # first portion of the API call URL
  base_url = "http://apps.gov.bc.ca/pub/geocoder/addresses.json?addressString="

  # constant end of the API call URL
  url_tail = "&locationDescriptor=any&maxResults=1&interpolation=adaptive&echo=true&setBack=0&outputSRS=4326&minScore=1&provinceCode=BC"
  
  # combining the URL segments into one string
  final_url = paste0(base_url, search, url_tail)
  
  # making the call to the geocoding API by getting the response from the URL
  response = RCurl::getURL(final_url)
  
  # parsing the JSON response into an R list
  response_parsed = RJSONIO::fromJSON(response)
  
  # if there are coordinates in the response, assign them to `geocoords`
  if(length(response_parsed$features[[1]]$geometry[[3]]) > 0){
    geocoords = list(lon = response_parsed$features[[1]]$geometry[[3]][1],
                      lat = response_parsed$features[[1]]$geometry[[3]][2])
  }else{
    geocoords = NA
  }
  
  # returns the `geocoords` object
  return(geocoords)
}

Now the function is defined. Let’s use it to geocode the 1000 rows in the dataset that we retained.

# Geocode the events - we use the BC Government's geocoding API
# Create an empty vector for lat and lon coordinates
lat = c() 
lon = c()

# loop through the addresses
for(i in 1:length(data$full_address)){
  # store the address at index "i" as a character
  address = data$full_address[i]
  # append the latitude of the geocoded address to the lat vector
  lat = c(lat, bc_geocode(address)$lat)
  # append the longitude of the geocoded address to the lon vector
  lon = c(lon, bc_geocode(address)$lon)
  # at each iteration through the loop, print the coordinates - takes about 20 min.
  print(paste("#", i, ", ", lat[i], lon[i], sep = ","))
}

After a major computational task (e.g. geocoding) you probably want to write your file out so that you keep a copy of your data at this step. In side the quotes for the variable ofile, put in the path to where you want your data stored.

# add the lat lon coordinates to the dataframe
data$lat = lat
data$lon = lon

# after geocoding, it's a good idea to write your file out!
# you will need to modify the directory where you want R to write
# your data!
# joey's computer (mac): '/Users/Jozo/Projects/Github-local/Workshop/aloha-r/data/calls_2014/201401CaseLocationsDetails-geo.csv'
# sally's computer (windows): 'c:\\Sally\\Documents\\van311-project\\201401CaseLocationsDetails-geo.csv'
ofile = "/Users/andrasszeitz/Desktop/GEOB_472/geocoded2.csv" 
write.csv(data, ofile)

Mine

Creating Categories or Classification

We won’t be doing any heavy analysis on the data, but we will try to tease out some categories to make the data more intuitive for exploring. The first thing we’ll do is to see what are the types of unique cases that are being called in:

# ------------------------------------------------- #
# --------------------- Mine ---------------------- #
# ------------------------------------------------- #
# --- Examine the unique cases --- #

# examine how the cases are grouped - are these intuitive?
unique(data$Department)
##  [1] CSG - Licenses                          
##  [2] ENG - Solid Waste                       
##  [3] PRB - Administration                    
##  [4] ENG - Streets                           
##  [5] ENG - Transportation                    
##  [6] ENG - Water & Sewer                     
##  [7] PRB - Planning and Operations           
##  [8] Business Planning & Services            
##  [9] Public Safety - Fire                    
## [10] CSG - Inspections                       
## [11] CSG - Licenses & Inspections            
## [12] Interdepartmental Initiatives           
## [13] ENG - Office of the Deputy City Engineer
## [14] City Administration                     
## 14 Levels: Business Planning & Services ... Public Safety - Fire
unique(data$Division)
##  [1] Animal Control                                    
##  [2] Sanitation                                        
##  [3] General Park Board Information                    
##  [4] Traffic and Electrical Operations                 
##  [5] Neighbourhood Parking and Transportation          
##  [6] Traffic and Data Management                       
##  [7] Street Activities - Integrated Graffiti Management
##  [8] Street Activities - Streets Furniture             
##  [9] Water Operations                                  
## [10] Streets Design - P & D - Accessibility            
## [11] Street Trees                                      
## [12] 311 Contact Centre                                
## [13] Sewers Operations                                 
## [14] Prevention                                        
## [15] Electrical                                        
## [16] Property Use                                      
## [17] Administration Branch                             
## [18] License Office                                    
## [19] Streets Operations                                
## [20] Solid Waste Management                            
## [21] Parking Ops & Enforcement                         
## [22] Vegetation                                        
## [23] Building                                          
## [24] Snow Angel Program                                
## [25] Plumbing and Gas                                  
## [26] Kent Construction Supplies and Services           
## [27] Elections                                         
## [28] Communications                                    
## [29] Street Activities - Streets Horticulture          
## [30] Sewers and Drainage Design                        
## [31] Sewer Separation                                  
## [32] Water Design                                      
## [33] Transfer and Landfill Operations                  
## [34] Active Transportation                             
## 34 Levels: 311 Contact Centre ... Water Operations
# examine the types of cases - can we make new groups that are more useful?
# Print each unique case on a new line for easier inspection
for (i in 1:length(unique(data$Case_Type))){
  print(unique(data$Case_Type)[i], max.levels=0)
}
## [1] Dead Animal Pickup Case
## [1] Missed Garbage Pickup
## [1] Abandoned Garbage Pickup - City Property & Parks
## [1] PRB_Park Ranger SR
## [1] Sign - Repair
## [1] Animal Control General Inquiry Case
## [1] Lost Pets Case
## [1] Residential Parking Requests
## [1] Missed Yard Trimmings and Food Scraps Pickup
## [1] Traffic & Pedestrian Signal - New
## [1] Graffiti Removal - City Property
## [1] Poster/Sign Removal Request
## [1] Street Furniture Repair and Maintenance Request
## [1] Water Leaks/Breaks
## [1] Street Cleaning & Debris Pickup
## [1] Wheelchair Curb/Ramp Request
## [1] Recycling Bag Request
## [1] Animal Complaint - Non-Emergency Case
## [1] Street Light - Out
## [1] Missed Recycling Pickup
## [1] Missed Apartment Recycling Pickup
## [1] Street Tree Work Request SR
## [1] Citizen Feedback
## [1] Cart - Green (Yard Trimmings and Food Scraps)
## [1] Water Service Turn On/Off Request
## [1] Sewer Pipe Inquiries
## [1] Fire Reinspection Request for Firehall
## [1] Street Litter Can Request
## [1] Electrical Inspection Cancellation Case
## [1] Cart - Garbage
## [1] Fire Reinspection Request for Inspector
## [1] Collection Calendar Mail-Out Request
## [1] PUI Noise Complaint Case
## [1] Building Plans Information Request
## [1] Licence Payment Request Case
## [1] Streets - General Issues
## [1] Street Light - Pole Repair
## [1] Water Service Locate Request
## [1] Recycling Box Request
## [1] Gone Out of Business Case
## [1] Street - Surface Water Flooding
## [1] Blue Box and Leaf Removal Guide Mail-Out Request
## [1] Illegal Dumping/Abandoned Garbage Pickup
## [1] Sidewalk - Repair
## [1] Abandoned Vehicle Request
## [1] Cart - Apartment Recycling
## [1] Water Work Site Complaint
## [1] Parks Litter Can or Cart Request
## [1] Water Hydrant Issue
## [1] Secondary Suite Information Request
## [1] Vegetation Maintenance SR
## [1] Street - Repair
## [1] Building Inspection Cancellation Case
## [1] PUI Noise General Inquiry Case
## [1] Snow Angel Program - Individual Volunteer
## [1] Street Light - New/Relocation
## [1] FPB_General Inquiry Case
## [1] PUI General Inquiry Case
## [1] Pothole - Repair
## [1] Trees and Vegetation Encroachment - City Property
## [1] Holding Stray Case
## [1] Green Bin Program - Feedback and General Inquiry
## [1] Traffic Calming Request
## [1] Traffic & Pedestrian Signal - Modify
## [1] Catch Basin Issues
## [1] Snow & Ice Removal - City Property
## [1] Plumbing and Gas Inspection Cancellation Case
## [1] Street Light - Flat Glass Fixture Request
## [1] General Information Request SR
## [1] Traffic Sign - Modify
## [1] Sewer Manhole Issues
## [1] Snow and Ice Removal - Sidewalk Bylaw Violation
## [1] Water General Inquiry
## [1] Sewer General Inquiries
## [1] Curbside Sign - New
## [1] Occupancy Permit Information Request
## [1] Graffiti Removal - External Organization
## [1] Water General Work Request
## [1] Pavement Marking - Repair
## [1] Election General Concerns
## [1] Home Safety Check Request Case
## [1] Apartment Recycling - Registration Request
## [1] Street and Traffic Light - Utility Damage
## [1] Sewer Odour Complaints
## [1] Horticulture Inquiry on Right-of-Way
## [1] Traffic Sign - New
## [1] Sewer Design General Inquiries
## [1] Homelessness/Transient Issue
## [1] Banner Request
## [1] Sewer Separation Inspection Cancellation Case
## [1] Boulevard Maintenance Issues
## [1] Parking Meter Requests
## [1] Bridges & Structures - Repair
## [1] Street Sign - New
## [1] Animal Cremation Case
## [1] Dead Skunk Pickup
## [1] Water Pressure or No Water Issue
## [1] Water Conservation Violation
## [1] Truck Violation
## [1] Water Meter Issue
## [1] Flag Request
## [1] Pavement Markings Request - New/Modify
## [1] Snow and Ice Removal - Sidewalk Bylaw Inquiry
## [1] Transfer Station & Recycling - General Inquiries
## [1] Curbside Sign - Modify
## [1] Crosswalk Marking - New
## [1] Chafer Beetle Feedback
## [1] Water Damage To City Water System
## [1] Traffic Count Request
## [1] Sewer Utility Damage
## [1] Bicycle Route Map Request

Whoa! That’s a heck of a lot of cases. Our role of data visualization people is to take this mess and try to make sense of it so that we can represent it in a manner that is more intuitive and possibly delightful. There are a bunch of ways that we can organize these cases into specific classes, so I invite you to think about how we might best organize these. For now, I’ve come up with 6 classes that bin these cases together. Does this make sense? Should we change it?:

# --- graffiti and noise --- #
Graffiti Removal - City Property
Graffiti Removal - External Organization
PUI Noise Complaint Case
PUI Noise General Inquiry Case

# --- street surface & maintenance --- #
Street Furniture Repair and Maintenance Request
Street Cleaning & Debris Pickup
Street Light - Out
Street Tree Work Request SR
Street Litter Can Request
Streets - General Issues
Street Light - Pole Repair
Street - Surface Water Flooding
Street - Repair
Street Light - New/Relocation
Street Light - Flat Glass Fixture Request
Street and Traffic Light - Utility Damage
Street Sign - New
Crosswalk Marking - New
Boulevard Maintenance Issues
Bicycle Route Map Request
Sidewalk - Repair
Pothole - Repair
Pavement Markings Request - New/Modify
Pavement Marking - Repair
Sewer Pipe Inquiries
Sewer Manhole Issues
Sewer General Inquiries
Sewer Design General Inquiries
Sewer Separation Inspection Cancellation Case
Sewer Utility Damage
Sewer Odour Complaints
Plumbing and Gas Inspection Cancellation Case
Snow Angel Program - Individual Volunteer
Snow & Ice Removal - City Property
Snow and Ice Removal - Sidewalk Bylaw Violation
Snow and Ice Removal - Sidewalk Bylaw Inquiry
Traffic & Pedestrian Signal - New
Traffic Calming Request
Traffic & Pedestrian Signal - Modify
Traffic Sign - Modify
Street and Traffic Light - Utility Damage
Traffic Sign - New
Traffic Count Request
Truck Violation
Residential Parking Requests
Parking Meter Requests
Abandoned Vehicle Request

# --- garbage, recycling & organics --- #
Missed Garbage Pickup
Abandoned Garbage Pickup - City Property & Parks
Cart - Garbage
Illegal Dumping/Abandoned Garbage Pickup
Parks Litter Can or Cart Request
Recycling Bag Request
Missed Recycling Pickup
Missed Apartment Recycling Pickup
Recycling Box Request
Cart - Apartment Recycling
Apartment Recycling - Registration Request
Transfer Station & Recycling - General Inquiries
Blue Box and Leaf Removal Guide Mail-Out Request
Missed Yard Trimmings and Food Scraps Pickup
Cart - Green (Yard Trimmings and Food Scraps)
Green Bin Program - Feedback and General Inquiry
Collection Calendar Mail-Out Request

# --- water related issues --- #
Water Leaks/Breaks
Water Service Turn On/Off Request
Water Service Locate Request
Street - Surface Water Flooding
Water Work Site Complaint
Water Hydrant Issue
Water General Inquiry
Water General Work Request
Water Pressure or No Water Issue
Water Conservation Violation
Water Meter Issue
Water Damage To City Water System
Catch Basin Issues

# --- animal and vegetation related --- #
Dead Animal Pickup Case
Animal Control General Inquiry Case
Animal Complaint - Non-Emergency Case
Animal Cremation Case
Dead Skunk Pickup
Lost Pets Case
Holding Stray Case
Chafer Beetle Feedback
Vegetation Maintenance SR
Trees and Vegetation Encroachment - City Property
Horticulture Inquiry on Right-of-Way

# --- other --- #
Poster/Sign Removal Request
Sign - Repair
Curbside Sign - New
Curbside Sign - Modify
Banner Request
Fire Reinspection Request for Firehall
Fire Reinspection Request for Inspector
Citizen Feedback
Wheelchair Curb/Ramp Request
Wheelchair
PRB_Park Ranger SR
Building Plans Information Request
Building Inspection Cancellation Case
Licence Payment Request Case
Gone Out of Business Case
FPB_General Inquiry Case
PUI General Inquiry Case
Electrical Inspection Cancellation Case
Bridges & Structures - Repair
Secondary Suite Information Request
General Information Request SR
Election General Concerns
Occupancy Permit Information Request
Home Safety Check Request Case
Flag Request
Homelessness/Transient Issue

In the end our 6 classes are:

  1. graffiti and noise
  2. street surface & maintenance
  3. animal and vegetation related
  4. water related
  5. garbage, recycling & organics related
  6. other

Now that we have our classes, we can put them into vectors:

# graffiti and noise
graffiti_noise = c('Graffiti Removal - City Property',
                    'Graffiti Removal - External Organization',
                    'PUI Noise Complaint Case',
                    'PUI Noise General Inquiry Case')

# street surface & maintenance
street_traffic_maint = c('Street Furniture Repair and Maintenance Request',
                          'Street Cleaning & Debris Pickup',
                          'Street Light - Out',
                          'Street Tree Work Request SR',
                          'Street Litter Can Request',
                          'Streets - General Issues',
                          'Street Light - Pole Repair',
                          'Street - Surface Water Flooding',
                          'Street - Repair', 
                          'Street Light - New/Relocation',
                          'Street Light - Flat Glass Fixture Request',
                          'Street and Traffic Light - Utility Damage',
                          'Street Sign - New',
                          'Crosswalk Marking - New',
                          'Boulevard Maintenance Issues',
                          'Bicycle Route Map Request',
                          'Sidewalk - Repair',
                          'Pothole - Repair',
                          'Pavement Markings Request - New/Modify',
                          'Pavement Marking - Repair',
                          'Sewer Pipe Inquiries',
                          'Sewer Manhole Issues',
                          'Sewer General Inquiries',
                          'Sewer Design General Inquiries',
                          'Sewer Separation Inspection Cancellation Case',
                          'Sewer Utility Damage',
                          'Sewer Odour Complaints',
                          'Plumbing and Gas Inspection Cancellation Case',
                          'Snow Angel Program - Individual Volunteer',
                          'Snow & Ice Removal - City Property',
                          'Snow and Ice Removal - Sidewalk Bylaw Violation',
                          'Snow and Ice Removal - Sidewalk Bylaw Inquiry',
                          'Traffic & Pedestrian Signal - New',
                          'Traffic Calming Request',
                          'Traffic & Pedestrian Signal - Modify',
                          'Traffic Sign - Modify',
                          'Street and Traffic Light - Utility Damage',
                          'Traffic Sign - New',
                          'Traffic Count Request','Truck Violation',
                          'Residential Parking Requests',
                          'Parking Meter Requests',
                          'Abandoned Vehicle Request')

# garbage, recycling & organics related
garbage_recycling_organics = c('Missed Garbage Pickup',
                                'Abandoned Garbage Pickup - City Property & Parks',
                                'Cart - Garbage',
                                'Illegal Dumping/Abandoned Garbage Pickup',
                                'Parks Litter Can or Cart Request',
                                'Recycling Bag Request',
                                'Missed Recycling Pickup',
                                'Missed Apartment Recycling Pickup',
                                'Recycling Box Request',
                                'Cart - Apartment Recycling',
                                'Apartment Recycling - Registration Request',
                                'Transfer Station & Recycling - General Inquiries',
                                'Blue Box and Leaf Removal Guide Mail-Out Request',
                                'Missed Yard Trimmings and Food Scraps Pickup',
                                'Cart - Green (Yard Trimmings and Food Scraps)',
                                'Green Bin Program - Feedback and General Inquiry',
                                'Collection Calendar Mail-Out Request')

# water related
water = c('Water Leaks/Breaks',
           'Water Service Turn On/Off Request',
           'Water Service Locate Request',
           'Street - Surface Water Flooding',
           'Water Work Site Complaint',
           'Water Hydrant Issue', 'Water General Inquiry',
           'Water General Work Request',
           'Water Pressure or No Water Issue',
           'Water Conservation Violation',
           'Water Meter Issue',
           'Water Damage To City Water System',
           'Catch Basin Issues')

# animal and vegetation related
animal_vegetation = c('Dead Animal Pickup Case',
                       'Animal Control General Inquiry Case',
                       'Animal Complaint - Non-Emergency Case',
                       'Animal Cremation Case',
                       'Dead Skunk Pickup','Lost Pets Case',
                       'Holding Stray Case',
                       'Chafer Beetle Feedback',
                       'Vegetation Maintenance SR',
                       'Trees and Vegetation Encroachment - City Property',
                       'Horticulture Inquiry on Right-of-Way')

# other
other = c('Poster/Sign Removal Request',
           'Sign - Repair',
           'Curbside Sign - New',
           'Curbside Sign - Modify',
           'Banner Request',
           'Fire Reinspection Request for Firehall',
           'Fire Reinspection Request for Inspector',
           'Citizen Feedback',
           'Wheelchair Curb/Ramp Request','Wheelchair',
           'PRB_Park Ranger SR',
           'Building Plans Information Request',
           'Building Inspection Cancellation Case',
           'Licence Payment Request Case',
           'Gone Out of Business Case',
           'FPB_General Inquiry Case',
           'PUI General Inquiry Case',
           'Electrical Inspection Cancellation Case',
           'Bridges & Structures - Repair',
           'Secondary Suite Information Request',
           'General Information Request SR',
           'Election General Concerns',
           'Occupancy Permit Information Request',
           'Home Safety Check Request Case',
           'Flag Request',
           'Homelessness/Transient Issue')

Assigning Categories a Unique Id (CID)

We can then run through the lists and give each of the classes and cid so that it is easier to identify and represent them later:

# give class id numbers:
data$cid = 9999
for(i in 1:length(data$Case_Type)){
  if(data$Case_Type[i] %in% graffiti_noise){
    data$cid[i] = 1    
  }else if(data$Case_Type[i] %in% street_traffic_maint){
    data$cid[i] = 2   
  }else if(data$Case_Type[i] %in% garbage_recycling_organics){
    data$cid[i] = 3   
  }else if(data$Case_Type[i] %in% water){
    data$cid[i] = 4   
  }else if(data$Case_Type[i] %in% animal_vegetation){
    data$cid[i] = 5   
  }else{
    data$cid[i] = 0   
  }
}

Great, so we now have our data binned into specific groups in a way that seems to make sense. However, if do a little poking around at our data, we notice that since our addresses are aggregated a lot of times to the same address point, our lat/lon coordinates come out the same. How can we deal with this spatial overlap?

Jitter: Offsetting Overlapping Points

One way to do this is to add in some “jitter” to each point if it happens to have the same coordinates.

# --- handle overlapping points --- #
# Set offset for points in same location:
data$lat_offset = data$lat
data$lon_offset = data$lon

# Run loop - if value overlaps, offset it by a random number
for(i in 1:length(data$lat)){
  if ( (data$lat_offset[i] %in% data$lat_offset) && (data$lon_offset[i] %in% data$lon_offset)){
    data$lat_offset[i] = data$lat_offset[i] + runif(1, 0.0001, 0.0005)
    data$lon_offset[i] = data$lon_offset[i] + runif(1, 0.0001, 0.0005)
  } 
}

To derive some more insight into the data, how about looking at the top calls:

# --- what are the top calls? --- #
# get a frequency distribution of the calls:
top_calls = data.frame(table(data$Case_Type))
top_calls = top_calls[order(top_calls$Freq), ]
print(top_calls)
##                                                  Var1 Freq
## 7                                      Banner Request    1
## 18                             Chafer Beetle Feedback    1
## 30                                       Flag Request    1
## 38                     Home Safety Check Request Case    1
## 67                             Sewer Odour Complaints    1
## 70                               Sewer Utility Damage    1
## 74      Snow and Ice Removal - Sidewalk Bylaw Inquiry    1
## 76          Snow Angel Program - Individual Volunteer    1
## 98                                    Truck Violation    1
## 100                      Water Conservation Violation    1
## 8                           Bicycle Route Map Request    2
## 11                      Bridges & Structures - Repair    2
## 21                            Crosswalk Marking - New    2
## 69      Sewer Separation Inspection Cancellation Case    2
## 95                                 Traffic Sign - New    2
## 101                 Water Damage To City Water System    2
## 26                          Election General Concerns    3
## 48               Occupancy Permit Information Request    3
## 52             Pavement Markings Request - New/Modify    3
## 63                Secondary Suite Information Request    3
## 64                     Sewer Design General Inquiries    3
## 96   Transfer Station & Recycling - General Inquiries    3
## 49                             Parking Meter Requests    4
## 75    Snow and Ice Removal - Sidewalk Bylaw Violation    4
## 82          Street Light - Flat Glass Fixture Request    4
## 93                              Traffic Count Request    4
## 106                                 Water Meter Issue    4
## 5                               Animal Cremation Case    5
## 17                                 Catch Basin Issues    5
## 22                             Curbside Sign - Modify    5
## 29            Fire Reinspection Request for Inspector    5
## 51                          Pavement Marking - Repair    5
## 87                                  Street Sign - New    5
## 25                                  Dead Skunk Pickup    6
## 111                      Wheelchair Curb/Ramp Request    6
## 59                     PUI Noise General Inquiry Case    7
## 94                              Traffic Sign - Modify    7
## 40               Horticulture Inquiry on Right-of-Way    8
## 107                  Water Pressure or No Water Issue    8
## 6          Apartment Recycling - Registration Request    9
## 79          Street and Traffic Light - Utility Damage    9
## 102                             Water General Inquiry   10
## 66                               Sewer Manhole Issues   12
## 110                         Water Work Site Complaint   12
## 10                       Boulevard Maintenance Issues   13
## 23                                Curbside Sign - New   13
## 32                     General Information Request SR   13
## 83                      Street Light - New/Relocation   13
## 9    Blue Box and Leaf Removal Guide Mail-Out Request   14
## 36   Green Bin Program - Feedback and General Inquiry   14
## 99                          Vegetation Maintenance SR   14
## 39                       Homelessness/Transient Issue   15
## 108                      Water Service Locate Request   17
## 53      Plumbing and Gas Inspection Cancellation Case   18
## 27            Electrical Inspection Cancellation Case   19
## 103                        Water General Work Request   19
## 33                          Gone Out of Business Case   24
## 92                            Traffic Calming Request   25
## 97  Trees and Vegetation Encroachment - City Property   28
## 65                            Sewer General Inquiries   29
## 31                           FPB_General Inquiry Case   30
## 54                        Poster/Sign Removal Request   30
## 90               Traffic & Pedestrian Signal - Modify   30
## 37                                 Holding Stray Case   31
## 12              Building Inspection Cancellation Case   32
## 73                 Snow & Ice Removal - City Property   32
## 57                           PUI General Inquiry Case   33
## 91                  Traffic & Pedestrian Signal - New   33
## 104                               Water Hydrant Issue   39
## 81    Street Furniture Repair and Maintenance Request   40
## 86                          Street Litter Can Request   44
## 85                         Street Light - Pole Repair   45
## 68                               Sewer Pipe Inquiries   50
## 35           Graffiti Removal - External Organization   55
## 109                 Water Service Turn On/Off Request   55
## 43                                     Lost Pets Case   56
## 56                                 PRB_Park Ranger SR   64
## 44                  Missed Apartment Recycling Pickup   65
## 13                 Building Plans Information Request   66
## 20               Collection Calendar Mail-Out Request   67
## 50                   Parks Litter Can or Cart Request   70
## 71                                  Sidewalk - Repair   71
## 72                                      Sign - Repair   77
## 62                       Residential Parking Requests   80
## 24                            Dead Animal Pickup Case   82
## 4                 Animal Control General Inquiry Case   87
## 77                                    Street - Repair   87
## 58                           PUI Noise Complaint Case  102
## 89                           Streets - General Issues  105
## 28             Fire Reinspection Request for Firehall  119
## 34                   Graffiti Removal - City Property  140
## 55                                   Pothole - Repair  142
## 105                                Water Leaks/Breaks  149
## 78                    Street - Surface Water Flooding  164
## 61                              Recycling Box Request  165
## 3               Animal Complaint - Non-Emergency Case  168
## 2                           Abandoned Vehicle Request  175
## 14                         Cart - Apartment Recycling  181
## 41           Illegal Dumping/Abandoned Garbage Pickup  190
## 42                       Licence Payment Request Case  201
## 80                    Street Cleaning & Debris Pickup  266
## 88                        Street Tree Work Request SR  267
## 60                              Recycling Bag Request  325
## 46                            Missed Recycling Pickup  366
## 16      Cart - Green (Yard Trimmings and Food Scraps)  371
## 19                                   Citizen Feedback  513
## 15                                     Cart - Garbage  689
## 1    Abandoned Garbage Pickup - City Property & Parks  721
## 84                                 Street Light - Out  761
## 45                              Missed Garbage Pickup  763
## 47       Missed Yard Trimmings and Food Scraps Pickup 1229

Filter

Removing Outliers

So we’ve added in some new coordinates and fiddled with them a bit to help with the representation. At this point, it is a good idea to filter out any extraneous points or points that fall outside of our area of interest – in this case Vancouver. Geocoders aren’t perfect and so we should be wary to include any false positives.

First, let’s subset out data that is either an NA or falls outside of Vancouver’s bounds:

# ------------------------------------------------------------------ #
# ---------------------------- Filter ------------------------------ #
# ------------------------------------------------------------------ #
# Subset only the data if the coordinates are within our bounds or if 
# it is not a NA
data_filter = subset(data, (lat <= 49.313162) & (lat >= 49.199554) & 
                    (lon <= -123.019028) & (lon >= -123.271371) & is.na(lon) == FALSE )
# plot the data
plot(data_filter$lon, data_filter$lat)
# If everthing looks good, we might also write out our file:
# Andras' laptop (Mac): '/Users/andrasszeitz/Desktop/GEOB_472/data/311-geo-filtered.csv'
# Andras' computer (Windows): 'C:/Users/aszeitz/Desktop/GEOB_472/data/311-geo-filtered.csv'
ofile_filtered = '/Users/andrasszeitz/Desktop/GEOB_472/Data/311-geo-filtered.csv' 
write.csv(data_filter, ofile_filtered)

NOTE: we write another file out here so we can save our “final dataset”

With a filtered dataset, we can now write our file out to a shapefile and a geojson (if you want to open it later in another GIS).

I highly recommend that you look into learning more about all the different geographic data types like shapefiles and geojson. The last thing you want to do is show up to a job interview and not understand how they work, what the differences are, what are their advantages and disadvantages, etc. For more check out this link, or this link, or this link

Saving Dataset

# --- Convert Data to Shapefile --- #
# store coordinates in dataframe
coords_311 = data.frame(data_filter$lon_offset, data_filter$lat_offset)

# create spatialPointsDataFrame
data_shp = SpatialPointsDataFrame(coords = coords_311, data = data_filter)

# set the projection to wgs84
projection_wgs84 = CRS("+proj=longlat +datum=WGS84")
proj4string(data_shp) = projection_wgs84

# set an output folder for our geojson files:
geofolder = "/Users/andrasszeitz/Desktop/GEOB_472/"

# Join the folderpath to each file name:
opoints_shp = paste(geofolder, "calls_", "1401", ".shp", sep = "")
print(opoints_shp) # print this out to see where your file will go & what it will be called
## [1] "/Users/andrasszeitz/Desktop/GEOB_472/data/geo/calls_1401.shp"
opoints_geojson = paste(geofolder, "calls_", "1401", ".geojson", sep = "") 
print(opoints_geojson) # print this out to see where your file will go & what it will be called
## [1] "/Users/andrasszeitz/Desktop/GEOB_472/data/geo/calls_1401.geojson"
# write the file to a shp
writeOGR(data_shp, opoints_shp, layer = "data_shp", driver = "ESRI Shapefile",
         check_exists = FALSE)
## Warning in writeOGR(check_exists = FALSE, data_shp, opoints_shp, layer =
## "data_shp", : Field names abbreviated for ESRI Shapefile driver
# write file to geojson
writeOGR(data_shp, opoints_geojson, layer = "data_shp", driver = "GeoJSON",
         check_exists = FALSE)

We have all the raw data, but remember that saying “overview first, details on demand”? Our brains simply can’t understand the sheer number of points on the map. How about using some way of aggregating the points to a grid? Turns our hexagonal grids are quite good for conveying density of data points. I’ve created a hexagonal grid in QGIS at a resolution of about 250m x 280m:

Creating Hexagon Grid

hex_van

First let’s read in the grid and reproject it from UTM zone 10 north to WGS84:

# --- aggregate to a grid --- #
# ref: http://www.inside-r.org/packages/cran/GISTools/docs/poly.counts
# set the file name - combine the shpfolder with the name of the grid
grid_fn = 'https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/geo/hgrid_250m.geojson'

# read in the hex grid setting the projection to utm10n
hexgrid = readOGR(grid_fn, 'OGRGeoJSON')
## OGR data source with driver: GeoJSON 
## Source: "https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/geo/hgrid_250m.geojson", layer: "OGRGeoJSON"
## with 5104 features
## It has 4 fields
# transform the projection to wgs84 to match the point file and store
#  it to a new variable (see variable: projection_wgs84)
hexgrid_wgs84 = spTransform(hexgrid, projection_wgs84)

Assigning Values to Hexagons

Next, let’s use the poly.counts() function to count the number of points in each grid cell:

# Use the poly.counts() function to count the number of occurrences of calls per grid cell
grid_cnt = poly.counts(data_shp, hexgrid_wgs84)

Finally, write out the data to a shapefile and geojson – we will use the geojson later, but just for good practice:

# define the output names:
ohex_shp = paste(geofolder, "hexgrid_250m_", "1401", "_cnts",
                  ".shp", sep = "")
print(ohex_shp)
## [1] "/Users/andrasszeitz/Desktop/GEOB_472/data/geo/hexgrid_250m_1401_cnts.sh
ohex_geojson = paste(geofolder, "hexgrid_250m_","1401","_cnts",".geojson") 
print(ohex_geojson)
## [1] "/Users/andrasszeitz/Desktop/GEOB_472/data/geo/hexgrid_250m_1401_cnts.geojson"
# write the file to a shp
writeOGR(check_exists = FALSE, hexgrid_wgs84, ohex_shp, 
         layer = "hexgrid_wgs84", driver = "ESRI Shapefile")

# write file to geojson
writeOGR(check_exists = FALSE, hexgrid_wgs84, ohex_geojson, 
         layer = "hexgrid_wgs84", driver = "GeoJSON")

Technically, we have enough data now to start putting together our web app which visualizes the 3-1-1 data. The next step will be to work with the representation and using the data we produced to filter the visuals with user interactions.

Full Script:

If all is well, this should run in it’s entirety – remember you need to change the folder paths!!!:

#####################################################################
# Vancouver 3-1-1: Data Processing Script
# Date:
# By: 
# Desc: 
#####################################################################

# ------------------------------------------------------------------ #
# ---------------------- Install Libraries ------------------------- #
# ------------------------------------------------------------------ #
# uncomment these lines if these libraries have not yet been installed
# install.packages("GISTools")
# install.packages("RJSONIO")
# install.packages("rgdal")
# install.packages("RCurl")
# install.packages("curl")

# ------------------------------------------------------------------ #
# ----------------------- Load Libararies -------------------------- #
# ------------------------------------------------------------------ #
library(GISTools)
library(RJSONIO)
library(rgdal)
library(RCurl)
library(curl)

# ------------------------------------------------------------------ #
# ---------------------------- Acquire ----------------------------- #
# ------------------------------------------------------------------ #
# access from the interwebz using "curl"
fname = curl('https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/201401CaseLocationsDetails.csv')

# Read data as csv
data = read.csv(fname, header=T)

# inspect your data
print(head(data))

# ------------------------------------------------------------------ #
# ------------------------ Geocoding Function ---------------------- #
# ------------------------------------------------------------------ #

# a function taking a full address string, formatting it, and making
# a call to the BC government's geocoding API
bc_geocode = function(search){
  # return a warning message if input is not a character string
  if(!is.character(search)){stop("'search' must be a character string")}
  
  # formatting characters that need to be escaped in the URL, ie:
  # substituting spaces ' ' for '%20'.
  search = RCurl::curlEscape(search)
  
  # first portion of the API call URL
  base_url = "http://apps.gov.bc.ca/pub/geocoder/addresses.json?addressString="

  # constant end of the API call URL
  url_tail = "&locationDescriptor=any&maxResults=1&interpolation=adaptive&echo=true&setBack=0&outputSRS=4326&minScore=1&provinceCode=BC"
  
  # combining the URL segments into one string
  final_url = paste0(base_url, search, url_tail)
  
  # making the call to the geocoding API by getting the response from the URL
  response = RCurl::getURL(final_url)
  
  # parsing the JSON response into an R list
  response_parsed = RJSONIO::fromJSON(response)
  
  # if there are coordinates in the response, assign them to `geocoords`
  if(length(response_parsed$features[[1]]$geometry[[3]]) > 0){
    geocoords = list(lon = response_parsed$features[[1]]$geometry[[3]][1],
                      lat = response_parsed$features[[1]]$geometry[[3]][2])
  }else{
    geocoords = NA
  }
  
  # returns the `geocoords` object
  return(geocoords)
}

# ------------------------------------------------------------------ #
# ---------------------- Parse: Geocoder --------------------------- #
# ------------------------------------------------------------------ #
# change intersection to 00's
data$h_block = gsub("#", "0", data$Hundred_Block)
print(head(data$h_block))

# Join the strings from each column together & add "Vancouver, BC":
data$full_address = paste(data$h_block, 
                          paste(data$Street_Name,
                                "Vancouver, BC",
                                sep=", "),
                          sep=" ")

# removing "Intersection " from the full_address entries
data$full_address = gsub("Intersection ", "", data$full_address)
print(head(data$full_address))

# creating a random sequence of 1000 numbers between 1 and the length
# of the dataset. These numbers will be the rows we retain from the
# full dataset in order to more quickly geocode during the tutorial
set.seed(1)
keepers = sample(seq(from = 1, to = 10195, by = 1), size = 1000, replace = TRUE)

# creating a new variable to hold the full data set, then trimming the 
# original dataset to have only the 1000 random rows
full_data = data
data = data[keepers, ]

# Geocode the events - we use the BC Government's geocoding API
# Create an empty vector for lat and lon coordinates
lat = c() 
lon = c()
# loop through the addresses
for(i in 1:length(data$full_address)){
  # store the address at index "i" as a character
  address = data$full_address[i]
  # append the latitude of the geocoded address to the lat vector
  lat = c(lat, bc_geocode(address)$lat)
  # append the longitude of the geocoded address to the lon vector
  lon = c(lon, bc_geocode(address)$lon)
  # at each iteration through the loop, print the coordinates - takes about 20 min.
  print(paste("#", i, ", ", lat[i], lon[i], sep = ","))
}

# add the lat lon coordinates to the dataframe
data$lat = lat
data$lon = lon

# after geocoding, it's a good idea to write your file out!
# joey's computer (mac): '/Users/Jozo/Projects/Github-local/Workshop/aloha-r/data/calls_2014/201401CaseLocationsDetails-geo.csv'
# sally's computer (windows): 'c:\\Sally\\Documents\\van311-project\\201401CaseLocationsDetails-geo.csv'
ofile = "/Users/andrasszeitz/Desktop/GEOB_472/geocoded2.csv" 
write.csv(data, ofile)

# ------------------------------------------------------------------ #
# ----------------------------- Mine ------------------------------- #
# ------------------------------------------------------------------ #
# --- Examine the unique cases --- #

# examine how the cases are grouped - are these intuitive?
unique(data$Department)
unique(data$Division)

# examine the types of cases - can we make new groups that are more useful?
# Print each unique case on a new line for easier inspection
for (i in 1:length(unique(data$Case_Type))){
  print(unique(data$Case_Type)[i], max.levels=0)
}

# Determine classes to group case types:

# graffiti and noise
graffiti_noise = c('Graffiti Removal - City Property',
                    'Graffiti Removal - External Organization',
                    'PUI Noise Complaint Case',
                    'PUI Noise General Inquiry Case')

# street surface & maintenance
street_traffic_maint = c('Street Furniture Repair and Maintenance Request',
                          'Street Cleaning & Debris Pickup',
                          'Street Light - Out',
                          'Street Tree Work Request SR',
                          'Street Litter Can Request',
                          'Streets - General Issues',
                          'Street Light - Pole Repair',
                          'Street - Surface Water Flooding',
                          'Street - Repair', 
                          'Street Light - New/Relocation',
                          'Street Light - Flat Glass Fixture Request',
                          'Street and Traffic Light - Utility Damage',
                          'Street Sign - New',
                          'Crosswalk Marking - New',
                          'Boulevard Maintenance Issues',
                          'Bicycle Route Map Request',
                          'Sidewalk - Repair',
                          'Pothole - Repair',
                          'Pavement Markings Request - New/Modify',
                          'Pavement Marking - Repair',
                          'Sewer Pipe Inquiries',
                          'Sewer Manhole Issues',
                          'Sewer General Inquiries',
                          'Sewer Design General Inquiries',
                          'Sewer Separation Inspection Cancellation Case',
                          'Sewer Utility Damage',
                          'Sewer Odour Complaints',
                          'Plumbing and Gas Inspection Cancellation Case',
                          'Snow Angel Program - Individual Volunteer',
                          'Snow & Ice Removal - City Property',
                          'Snow and Ice Removal - Sidewalk Bylaw Violation',
                          'Snow and Ice Removal - Sidewalk Bylaw Inquiry',
                          'Traffic & Pedestrian Signal - New',
                          'Traffic Calming Request',
                          'Traffic & Pedestrian Signal - Modify',
                          'Traffic Sign - Modify',
                          'Street and Traffic Light - Utility Damage',
                          'Traffic Sign - New',
                          'Traffic Count Request','Truck Violation',
                          'Residential Parking Requests',
                          'Parking Meter Requests',
                          'Abandoned Vehicle Request')

# garbage, recycling & organics related
garbage_recycling_organics = c('Missed Garbage Pickup',
                                'Abandoned Garbage Pickup - City Property & Parks',
                                'Cart - Garbage',
                                'Illegal Dumping/Abandoned Garbage Pickup',
                                'Parks Litter Can or Cart Request',
                                'Recycling Bag Request',
                                'Missed Recycling Pickup',
                                'Missed Apartment Recycling Pickup',
                                'Recycling Box Request',
                                'Cart - Apartment Recycling',
                                'Apartment Recycling - Registration Request',
                                'Transfer Station & Recycling - General Inquiries',
                                'Blue Box and Leaf Removal Guide Mail-Out Request',
                                'Missed Yard Trimmings and Food Scraps Pickup',
                                'Cart - Green (Yard Trimmings and Food Scraps)',
                                'Green Bin Program - Feedback and General Inquiry',
                                'Collection Calendar Mail-Out Request')

# water related
water = c('Water Leaks/Breaks',
           'Water Service Turn On/Off Request',
           'Water Service Locate Request',
           'Street - Surface Water Flooding',
           'Water Work Site Complaint',
           'Water Hydrant Issue', 'Water General Inquiry',
           'Water General Work Request',
           'Water Pressure or No Water Issue',
           'Water Conservation Violation',
           'Water Meter Issue',
           'Water Damage To City Water System',
           'Catch Basin Issues')

# animal and vegetation related
animal_vegetation = c('Dead Animal Pickup Case',
                       'Animal Control General Inquiry Case',
                       'Animal Complaint - Non-Emergency Case',
                       'Animal Cremation Case',
                       'Dead Skunk Pickup','Lost Pets Case',
                       'Holding Stray Case',
                       'Chafer Beetle Feedback',
                       'Vegetation Maintenance SR',
                       'Trees and Vegetation Encroachment - City Property',
                       'Horticulture Inquiry on Right-of-Way')

# other
other = c('Poster/Sign Removal Request',
           'Sign - Repair',
           'Curbside Sign - New',
           'Curbside Sign - Modify',
           'Banner Request',
           'Fire Reinspection Request for Firehall',
           'Fire Reinspection Request for Inspector',
           'Citizen Feedback',
           'Wheelchair Curb/Ramp Request','Wheelchair',
           'PRB_Park Ranger SR',
           'Building Plans Information Request',
           'Building Inspection Cancellation Case',
           'Licence Payment Request Case',
           'Gone Out of Business Case',
           'FPB_General Inquiry Case',
           'PUI General Inquiry Case',
           'Electrical Inspection Cancellation Case',
           'Bridges & Structures - Repair',
           'Secondary Suite Information Request',
           'General Information Request SR',
           'Election General Concerns',
           'Occupancy Permit Information Request',
           'Home Safety Check Request Case',
           'Flag Request',
           'Homelessness/Transient Issue')

# give class id numbers:
data$cid = 9999
for(i in 1:length(data$Case_Type)){
  if(data$Case_Type[i] %in% graffiti_noise){
    data$cid[i] = 1    
  }else if(data$Case_Type[i] %in% street_traffic_maint){
    data$cid[i] = 2   
  }else if(data$Case_Type[i] %in% garbage_recycling_organics){
    data$cid[i] = 3   
  }else if(data$Case_Type[i] %in% water){
    data$cid[i] = 4   
  }else if(data$Case_Type[i] %in% animal_vegetation){
    data$cid[i] = 5   
  }else{
    data$cid[i] = 0   
  }
}

# --- handle overlapping points --- #
# Set offset for points in same location:
data$lat_offset = data$lat
data$lon_offset = data$lon

# Run loop - if value overlaps, offset it by a random number
for(i in 1:length(data$lat)){
  if ( (data$lat_offset[i] %in% data$lat_offset) && (data$lon_offset[i] %in% data$lon_offset)){
    data$lat_offset[i] = data$lat_offset[i] + runif(1, 0.0001, 0.0005)
    data$lon_offset[i] = data$lon_offset[i] + runif(1, 0.0001, 0.0005)
  } 
}

# --- what are the top calls? --- #
# get a frequency distribution of the calls:
top_calls = data.frame(table(data$Case_Type))
top_calls = top_calls[order(top_calls$Freq), ]
print(top_calls)

# ------------------------------------------------------------------ #
# ---------------------------- Filter ------------------------------ #
# ------------------------------------------------------------------ #
# Subset only the data if the coordinates are within our bounds or if it is not a NA
data_filter = subset(data, (lat <= 49.313162) & (lat >= 49.199554) & 
                    (lon <= -123.019028) & (lon >= -123.271371) & is.na(lon) == FALSE )

# plot the data
plot(data_filter$lon, data_filter$lat)

# If everthing looks good, we might also write out our file:
ofile_filtered = '/Users/andrasszeitz/Desktop/GEOB_472/filtered.csv'
write.csv(data_filter, ofile_filtered)

# --- Convert Data to Shapefile --- #
# store coordinates in dataframe
coords_311 = data.frame(data_filter$lon_offset, data_filter$lat_offset)

# create spatialPointsDataFrame
data_shp = SpatialPointsDataFrame(coords = coords_311, data = data_filter)

# set the projection to wgs84
projection_wgs84 = CRS("+proj=longlat +datum=WGS84")
proj4string(data_shp) = projection_wgs84

# set an output folder for our geojson files:
geofolder = "/Users/andrasszeitz/Desktop/GEOB_472/"

# Join the folderpath to each file name:
opoints_shp = paste(geofolder, "calls_", "1401", ".shp", sep = "")
print(opoints_shp) # print this out to see where your file will go & what it will be called
opoints_geojson = paste(geofolder, "calls_", "1401", ".geojson", sep = "") 
print(opoints_geojson) # print this out to see where your file will go & what it will be called

# write the file to a shp
writeOGR(data_shp, opoints_shp, layer = "data_shp", driver = "ESRI Shapefile",
         check_exists = FALSE)

# write file to geojson
writeOGR(data_shp, opoints_geojson, layer = "data_shp", driver = "GeoJSON",
         check_exists = FALSE)

# --- aggregate to a grid --- #
# ref: http://www.inside-r.org/packages/cran/GISTools/docs/poly.counts
# set the file name - combine the shpfolder with the name of the grid
# grid_fn = paste(shpfolder,'hgrid_100m.shp', sep="")
grid_fn = 'https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/geo/hgrid_250m.geojson'

# read in the hex grid setting the projection to utm10n
hexgrid = readOGR(grid_fn, 'OGRGeoJSON')

# transform the projection to wgs84 to match the point file and store
# it to a new variable (see variable: projection_wgs84)
hexgrid_wgs84 = spTransform(hexgrid, projection_wgs84)

# Use the poly.counts() function to count the number of occurences of calls per grid cell
grid_cnt = poly.counts(data_shp, hexgrid_wgs84)

# create a data frame of the counts
grid_total_counts = data.frame(grid_cnt)

# set grid_total_counts dataframe to the hexgrid data
hexgrid_wgs84$data = grid_total_counts$grid_cnt

# remove all the grids without any calls
hexgrid_wgs84 = subset(hexgrid_wgs84, grid_cnt > 0)

# define the output names:
ohex_shp = paste(geofolder, "hexgrid_250m_", "1401", "_cnts",
                  ".shp", sep = "")
print(ohex_shp)

ohex_geojson = paste(geofolder, "hexgrid_250m_", "1401", 
                      "_cnts", ".geojson", sep = "") 
print(ohex_geojson)

# write the file to a shp
writeOGR(check_exists = FALSE, hexgrid_wgs84, ohex_shp, 
         layer = "hexgrid_wgs84", driver = "ESRI Shapefile")

# write file to geojson
writeOGR(check_exists = FALSE, hexgrid_wgs84, ohex_geojson, 
         layer = "hexgrid_wgs84", driver = "GeoJSON")

Creating a full dataset:

We completed the exploration for 1 month, but what about the remaining 11 months? What is the value added of having the other temporal data? For the future, you can explore the full year’s worth of data by running this script on each of the datasets. Don’t forget to change the filenames for each year – if you don’t you’ll write over your data and have to run all the code again 😉

R Tutorial

Created by: Joey K Lee

Edited by: Andras Szeitz

About

This is an intro to the R programming language which we will use throughout the rest of the course and hopefully you’ll use when you leave UBC and start your lives in research, industry, and beyond!

Requirements:

  • First: download and install R – make sure you download the right one for your system.
    • For Windows users, you can choose between 32-bit and 64-bit versions of R, you need to choose which one is right for your computer. If you’re really unsure, then choose the 32-bit, BUT if you don’t mind google searching “32-bit or 64-bit windows for “
    • SUPER IMPORTANT for Windows: We’ve noticed some issues with R and the packages if you install R and R studio to to the “C:\Program Files” which is the default option.
      • Instead try to install R and RStudio on your “C:\” drive. If you have already installed R to your “C:\Program Files” and have been noticing problems, uninstall R and Rstudio and reinstall them in your “C:\” drive.
  • Second: download and install R studio (make sure to install R first!)

Outline

Why are we learning R?

Why are we using R after spending all that time learning Processing? Well, first I would go back to the fact that programming languages are like materials, each with a distinct qualities, benefits, and limitations. In this way, it doesn’t make sense to say one language is better or worse than another, rather that they are just different. So why learn R? Here’s why I chose to introduce R in this course:

  • High Level Programming Language: R is what we consider a high level programming language. With a few lines of code, you can do a lot of stuff which is super nice when we’re learning.
  • Easy to setup and has a nice Integrated Development Environment (IDE): R is nice because it sort of lives within itself – it is easy to download, install, and setup, it is cross platform (it can be used in Windows, Mac OS, etc), and has a nice IDE called RStudio which we will use to help us write our code.
  • Large user community: Like Processing, R has a HUGE user community that has grown to include domains of all sorts. This means also if you have a specific question that most likely someone has already found the answer and posted it online. The New York Times Data team is using R (among other things), for example!
  • Super handy packages: Because of the large user community people have developed packages for doing all sorts of cool things. Want to make an interactive map? Bam! There’s an R package for that. Want to make a hillshade of a DEM in R? There’s a package for that too! You’ll start to see that packages will be your best friend, so use your favorite search engine and find the right packages for you!
  • Nice plots: R kicks ass for plotting – and this is a data visualization class after all. With one line of code you can already start looking at your data, add a few more and you already have something pretty to look at. As a way of exploring and displaying your data, R is a nice way to go.
  • Local expertise: Its always handy to have people around to help you if you get stuck. There are plenty of people teaching R at UBC and also some expert grad students and faculty in Geography.
  • Open source mapping: When we dive further into R we’ll start to see that we can start to do some really fancy geoprocessing in R – Basically your own, free, GIS 🙂 When you leave UBC and no longer have access to proprietary software, you’ll be happy to know that you have the tools to continue mapping away.

Getting Started

Throughout this workshop, try your best to make connections between what you learned in the intro to Processing to what R has to offer. There are some fundamental differences you will begin to notice, but you’ll really quickly begin to see the advantages (and disadvantages) of using R.


R Studio IDE:

This is the R studio IDE. We will use R studio as our environment for running our code, making our scripts, and viewing our plots. It’s sort of like our Processing PDE, with the major difference being that R is an interpreted language meaning we can run our code line-by-line versus Processing which is compiled which means Processing runs all of the code at once. You’ll see why this is advantages for working with and exploring data. Anyways, because of this key difference, R studio is setup to help us explore our data line-by-line.

rstudio

Your typical R studio IDE will look like the above and includes:

  • Script window: A place to put your script (your entire code)
  • R console: every line of code your run from your script will be run, evaluated and printed here.
  • Global Environment: This will show all the variables you declare and assign. The fancy thing about R studio is that it will tell you what kind of variable it is and even let your preview your data that you read in!
  • Plot, Packages, Help, & more: Your plots will be shown here. Also you will be able to find packages and access R’s help documentation.

Keeping Track:

  • Print everything – check that things are working and see how you change and manipulate data with functions and operations.
  • Plot everything – again, check that things are working and see how you change and manipulate data with functions and operations.
  • Comment your code – it will make it easier for you to remember what you were trying to do, and make it easier for others to read and understand your code

Quick Syntax review:

Mathematical operators in R

While most of the mathematical operators are the same as in Processing, there’s a few differences. Let’s check out those mathematical operators in R again.

Operator        Description
+               addition
-               subtraction
*               multiplication
/               division
^ or **         exponentiation
x %% y          modulus (x mod y) 5%%2 is 1
x %/% y         integer division 5%/%2 is 2

Logical Operators in R

Operator        Description
<               less than
<=              less than or equal to   >               greater than
>=              greater than or equal to
==              exactly equal to
!=              not equal to
!x              Not x
x | y           x OR y
x & y           x AND y
isTRUE(x)       test if 'x' is TRUE

Assignment Operators and Subsetting in R

Operator        Description
<-              assigns a value to a name
=               assigns a value to a name
[]              extracts a part of an object
$               extracts a part of a vector

Gettin’ down and di-R-ty:

After learning about the fundamental building blocks of programming (and the data visualization pipeline), the goal now is to apply our understanding by taking code and data and producing visual output.

Instead of learning how to code R from the bottom up (which we did with Processing), we will go from the top down, meaning: we are going to learn by running examples and in the process train ourselves to deconstruct the patterns and functions we pick up along the way.

But before we get into some examples, let’s do a quick tour of running code in R, some R syntax and data types!


Your First R commands: “Hello World”

Let’s start by just running a few commands in the R Console. To run a command in the console window, simply:

  • type your statement and
  • press [enter].

In the console window, let’s write 3 statements and run them. Here’s the result of running the 3 examples below.

r-intro

  • Addition
    5 + 3
    
  • Logical Operation
    100 < 200
    
  • String
    "Oops I did it again"
    

Running code from the script window

Now that you’ve seen how R reads, evaluates, and prints the statements you feed it line-by-line, .

When running code line-by-line in the script window you:

  1. Highlight the lines you want to run:

highlight.png

  1. then “run the code” by pressing at the same time:
    • On Mac: [cmd] + [enter]
    • On Windows: [ctrl] + [enter]

The results will then be “printed” to the console:

script-print.png

Now, to familiarize yourself with running code line-by-line from the script window, let’s write another 3 statements, but this time insert a comment between each of the statements to neatly organize your code.

NOTE: In R, comments are made by using the hash symbol (#)

Try writing 3 examples of your own in the script window and seeing what is printed to the console:

  • Commenting:
    # --- Gettin' Down and di-R-ty: an intro to R --- #
    # remember anything after the comment symbol is not evaluated
    
  • Multiplication
    # Multiplication
    6 * 5
    
  • Exponentiation
    # Exponents
    5**2
    
  • String
    # Strings
    "Hello World"
    

FYI, we can also use the print() function to print statements for us like this:

# --- heres another way to print in R --- #
print("Hello World!")
print(5*10)
print(10-3)

Run this and see that it return the same thing! Pretty neat stuff.

NOTICE SOMETHING? WE DON’T NEED TO USE SEMI-COLONS (;) IN R! This means that R doesn’t use semi-colons to signify the end of a statement in contrast to Processing. Rather R uses line breaks to interpret each statement. Excited? Sad? Indifferent? In any case, just remember you don’t need semi-colons to end statements in R. Are you beginning to see how programming languages all have their own flavor?

Now that you see how to run code line-by-line, let’s dive further into some more exciting stuff.

Help! Information!

If you are trying to use a function and you aren’t getting the output you are expecting, or maybe it’s straight-up not working, RStudio has very useful help documentation for all the functions and packages you might use! To get the information about what parameters a function is expecting, and in what format, just type ?’function’ in the console. ie:

# Getting help on mean:
?mean

untitled


Variables, Data, & Plotting

Before we take a leap into a real-world example, it’s worth noting a few fundamental things about R variables, data, and plotting.

R Variables:

Unlike in Processing which is strongly typed, R is loosely typed meaning that we don’t need to specify the type of data that we are assigning to each variable – notice we don’t have to specify “int” or “float” or “String” – R knows that type of data we are assigning!:

# I'm an integer variable:
x <- 12
print(x)

# I'm a float variable:
y <- 16.5
print(y)

# I'm a string variable:
z <- "Hello"
print(z)

Fundamental R Data Types:

Here are a number of fundamental R data objects that over time you will come to fully understand what they are, what makes them special, and when and where you would use them. For the 4 major data objects below, all we need to now is that:

Vectors (pretty much just a 1-D Array):

A vector in R is basically a collection of elements of the same data type (e.g numeric, character, boolean.) – a 1-D array. We create a vector in R by using the c() function. The c() is short for “concatenate” which means “paste together or chain together in a series”.

Here’s 2 examples of a vectors:

# Here's some data of months of the year and the precipitation in Vancouver
months <- c(1, 2, 3, 4, 5, 6, 7, 8, 9 , 10, 11, 12)
precipitation <- c(186, 94, 118, 85, 60, 59, 38, 39, 48, 126, 183, 177)

Lists:

An R list is a collection of elements (similar to a vector) BUT the key difference is that a list can contain different data types. Lists can be created by using the list() function and filling in values to be stored in the list.

Here’s an example of a list of all the same values:

# Hey I'm a list:
my_list <- list(3, 6, 9, 12,15)

Here’s a list of lists of mixed data types:

# ---- a list ---- #
my_fav_things <- list(pi, 42, "Vancouver")

As you can see, depending on your data, this could be useful way of organizing and structuring your data. Looking kind of like Javascript Object Notation (JSON) with those keys and values, eh?

Dataframes:

A dataframe is essentially data in a tabular form (think an excel spreadsheet) – it is composed of rows and columns of equal length. Each column is a vector ( aka 1-D array), meaning it contains all of the same data type (except for the header row) and is recursive, meaning that we can apply functions and mathematical operations on each column. Holy guacamole, does this mean R is really great at processing tabular data like excel sheets, csv files, and maybe even shapefiles? Yes! That is exactly right!

Very generally, dataframes are composed of vectors, which are like each column. OH! That means we can take our month and precipitation data and create a data frame?! yes! All we have to do is use the data.frame() function:

A dataframe of the rain data:

# Here's some data of months of the year and the precipitation in Vancouver
months <- c(1, 2, 3, 4, 5, 6, 7, 8, 9 , 10, 11, 12)
precipitation <- c(186, 94, 118, 85, 60, 59, 38, 39, 48, 126, 183, 177)

# rain dataframe
rain_data <- data.frame(months, precipitation)
print(rain_data)

This gives us a nice tabular dataset 🙂

precip-df

We use R’s selector operator ($) to select each column:

# select the precpitation column and calculate the annual average
mean_precip <- mean( rain_data$precipitation )
print(mean_precip)

We see that the mean precipitation in Vancouver is 101.0833 mm.

Plotting!

Ok, so now we are familiar with some of the basic data objects.  But just looking at a vector of numbers or a dataframe doesn’t make it easy to see what information the data holds. The great thing about R is how easily we can make plots to visualize our data! Here, we’ll plot the monthly precipitation data as a scatterplot:

# A simple scatterplot:
plot(x = months, y = precipitation)

precip-p.png

Or how about a line plot:

# a line plot:
plot(x = months, y = precipitation, type = "l")

precip-l.png

Or how about both lines and points:

# scatterplot with lines - "b" stands for both:
plot(x = months, y = precipitation, type = "b")

precip-b.png
Or how about a barplot:

# plot it as a bar plot
barplot(height = precipitation, names.arg = months)

precip-bar

Say, we wanted to plot the annual average on top of the monthly values, we could do something like this:

# create the plot:
plot(x = rain_data$month, y = rain_data$precipitation, type = "b" )

# add the mean line:
abline(h = mean_precip)

precip-mean.png

How easy and great is that? Sure sure, you’re probably thinking “This is lame, I could do this in Excel” – but what if you had to make these graphs for every year in Vancouver since 1965? You could write a loop to automate that in R instead of clicking over and over again in Excel… AND if you needed to change a color or an axis label, you could just run your script again… just sayin’.

*NOTE: We usually won’t be making our own dataframes from scratch, but rather reading them in the form of delimited text files like .csv, .xls, .tsv, .dat, etc. *

and finally, Matrices:

Think of matrices as a raster or bitmap image. While the x and y dimensions can be different (e.g. like a photograph) all the columns must be the same data type.

As a quick example, let’s use one of R’s preloaded datasets – a matrix (e.g. raster) of Auckland’s Maunga Whau Volcano:

# store the volcano data to a variable:
cano <- volcano
print(cano)

Whoa! we get a bunch of print outs that essentially show this structure/numbers which represent the topography of this volcano:
cano-ex

Now let’s plot it, but instead of using plot() let’s use the image() function:

# image() fuction to plot the cano
image(cano)

Wow! Such colors!

cano-img.png

What other goodies can we plot? How about some contours?

# contour lines of matrix
contour(cano)

cano-contours.png

And believe it or not, enter the 3rd dimension:

# Get some perspective
persp(cano, expand = 0.3, phi = 35, theta = 10)

cano-persp.png


Reading Dataframes from files:

We just saw how to make a dataframe using the data.frame() function using a set of vectors. Now let’s read in data from a file as a dataframe using R’s read.csv() function:

# --- reading in data from a csv file --- #
# store the filename & path to a variable
fileName <- 'https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/rain2014.csv'

# pass the filename variable to the "read.csv()" function
# use "header = TRUE" if there's a header
rain_csv <- read.csv(fileName, header = TRUE)

If we look at the data again, we see that it looks exactly like the dataframe we created earlier:

precip-df

Now, we can do exactly the same steps as before EXCEPT we need to change the data name from rain_data to rain_csv:

# select the precpitation column and calculate the annual average
mean_precip <- mean(rain_data$precipitation)
print(mean_precip)

rain-csv-plot.png


R Packages (aka libraries)

R packages (aka libraries) may be one of the most useful things about R. Libraries are basically bundled up scripts that people (software developers, researchers, designers, artists, etc) have written to help take complicated tasks or computations, wrapped up in simple(r) to use functions to make programming easier, more fun, and easy to read.

In this way, you can say that Processing is a language and also a “library” because it makes functions available (e.g. ellipse() and rect()) which would take 10 to 20 lines of code in Java and puts them into an easy to use function.

In R, we have countless numbers of libraries to help us to tasks. Remember earlier I mentioned that there are R packages that can help you do geoprocessing and even make your own interactive web map? Well, now we’re going to introduce a package that allows us to read in shapefiles into R so that we can use it as a basemap for our crime plot.

NOTE: It may seem like R libraries are just doing magic behind the scenes, but they are simply making more functions available to you to use that aren’t already included in the base R library – think of it as an actually library, you go there to get books that you don’t have at home 😉

SO to start, we need to install the package we want to use. After doing a few google searches, I found an R package called “GISTools” that allows us to read shapefiles into R, make choropleth maps, do geoprocessing, etc – essentially your own, free and open source GIS.

installing and adding a library

We install packages in R by using the install.packages() function:

# ---------- Using our first R package to display a shapefile! ---------- #
# install the maptools library
install.packages("GISTools")
install.packages('scales')

NOTE: If you get an error like “cannot write to lib” or something on windows — you need to change the read/write permissions on that folder. You can do so by navigating to that folder > right click > properties > check the box that says “read/write”.

After we install our R package, we need to import it to our script. We do so using the library() function:

# import the GISTools functions by calling the library() function
library(GISTools)
library(scales)

Now that we have our library imported, we have can read in some shapefiles that are conveniently sitting in our data folder. Similar to how we read in our .csv file, we just have to include the path name to each of our .geojson files.

# --- read in shps --- #
# Building shp
fname_buildings <- "https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/example/buildings.geojson"
buildings <- readOGR(fname_buildings, 'OGRGeoJSON')
plot(buildings) # plot to inspect

# Roads shp
fname_roads <- "https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/example/roads.geojson"
roads <- readOGR(fname_roads, 'OGRGeoJSON')
plot(roads) # plot to inspect

# Co2 points
fname_co2 <- "https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/example/co2.geojson"
co2 <- readOGR(fname_co2, 'OGRGeoJSON')
plot(co2)

By plotting the data, we can see what we have! Now, let’s combine the plots together to make a very basic map:
ex1.png

# Plot the buildings
plot(buildings, 
     col = "#808080", 
     border = F,
     bg = "#FFFFFF",
     main = "CO2 Mixing Ratios Downtown Vancouver")

ex2

# Add the roads to the plot with "add = TRUE"
plot(roads,
     col = "#000000",
     lwd = 1,
     add = TRUE)

ex3.png

# Add the CO2 observation points to the plot with "add = TRUE"
vals = rescale(co2@data$co2, c(0.5, 5))
plot(co2,
     col = "#FF6600",
     pch = 20,
     alpha = 0.5,
     cex = vals,
     add = TRUE)

We can even add a north arrow in R (or later in Illustrator):

# We can even add a north arrow:
north.arrow(xb = -123.144035, yb = 49.273001, len = 0.0005)

And a scale bar:

# scale bar:
map.scale(xc = -123.151295, 49.271729, len = 0.01,
          ndivs = 2, subdiv = 3, units = "test")

Now you’re more than primed to construct and deconstruct our first real-world R project!