R – Vancouver 311 Tutorial

Created by: Joey K Lee

Edited by: Andras Szeitz


Project: What are Vancouverites Complaining About?

Brief:

The City of Vancouver releases a dataset of the 3-1-1 phone calls – the general hotline regarding maintenance issues in the city. Currently there is no tool to visualize and access the data. How can citizens engage the city for these matters if there’s no way to work with the data? In the name of civic “hacking”, the project brief is to develop a project that:

  1. can parse and handle the BIG data being delivered by the city.
  2. shows the 3-1-1 data and potentially highlights insights derived from the data in an accessible web application.

Overview:

We will use this project to go through Ben Fry’s Data Visualization Pipeline. The final script generated by going through the steps of the data viz pipeline will be a good foundation to use for the upcoming R Assignment.


Process: Exploring the 3-1-1 data

Data viz is hard and in the end comes down to a lot of experimentation and exploration. This script attempts to showcase how the data viz pipeline is done in practice and how it is far from a linear process, but rather a very interactive and dynamic process.

setup:

Let’s being our script with a nice commented header:

######################################################
# Vancouver 3-1-1: Data Processing Script
# Date:
# By: 
# Desc: 
######################################################

We noticed how libraries can help us to read in geographic data and even help us make new scales. Since this is a bigger project, we’re going to need the help of some more libraries:


# ------------------------------------------------------------------ #
# ---------------------- Install Libraries ------------------------- #
# ------------------------------------------------------------------ #
install.packages("GISTools")
install.packages("RJSONIO")
install.packages("rgdal")
install.packages("RCurl")
install.packages("curl")
  
# Unused Libraries:
# install.packages("ggmap")
# library(ggmap)

You will notice we have some unused libraries – these are some that I started out using in the beginning, but decided to not use. I kept them here just for future reference. NOTE: ggmap was previously used for it’s geocode() function – but with google’s 2500 api call limit, it wasn’t enough for the 10,000+ geocoding events we would need for our project.

Now that you’ve installed the libraries now Let’s load up our libraries to make all those new functions available to us:

# ------------------------------------------------------------------ #
# ----------------------- Load Libararies -------------------------- #
# ------------------------------------------------------------------ #
library(GISTools)
library(RJSONIO)
library(rgdal)
library(RCurl)
library(curl)

Acquire

We use the curl() function to make http/https requests from the web to get data and used our read.csv() function to read our table in to R.

# ------------------------------------------------------------------ #
# ---------------------------- Acquire ----------------------------- #
# ------------------------------------------------------------------ #
# access from the interwebz using "curl"
fname = curl('https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/201401CaseLocationsDetails.csv')

# Read data as csv
data = read.csv(fname, header=T)

# inspect your data
print(head(data))
##   Year Month Day Hour Minute           Department
## 1 2014     1   1    7     21       CSG - Licenses
## 2 2014     1   1    7     45    ENG - Solid Waste
## 3 2014     1   1    9     27    ENG - Solid Waste
## 4 2014     1   1   10      6    ENG - Solid Waste
## 5 2014     1   1   10     15 PRB - Administration
## 6 2014     1   1   10     17        ENG - Streets
##                            Division
## 1                    Animal Control
## 2                        Sanitation
## 3                        Sanitation
## 4                        Sanitation
## 5    General Park Board Information
## 6 Traffic and Electrical Operations
##                                          Case_Type Hundred_Block
## 1                          Dead Animal Pickup Case  Intersection
## 2                            Missed Garbage Pickup          22##
## 3 Abandoned Garbage Pickup - City Property & Parks          10##
## 4                            Missed Garbage Pickup          27##
## 5                               PRB_Park Ranger SR          11##
## 6                                    Sign - Repair          8600
##                      Street_Name    Local_Area
## 1 E 64TH AV and PRINCE ALBERT ST        Sunset
## 2                SW MARINE DRIVE    Kerrisdale
## 3                      W 12TH AV      Fairview
## 4                      W 16TH AV Arbutus Ridge
## 5                   W CORDOVA ST      Downtown
## 6            - 8699 GRANVILLE ST       Marpole

Parse

Upon inspecting our data, we notice we have the addresses, but the city has put int “#’s” to help with the anonymity of the callers. Furthermore, we notice that we don’t have any lat/lon coordinates to work with to turn our 3-1-1 calls into a something spatial. How can we develop a solution for this?

First let’s sub out the “#” with “0”:

# ------------------------------------------------- #
# -------------- Parse: Geocoder ------------------- #
# ------------------------------------------------- #
# change intersection to 00's
data$h_block = gsub("#", "0", data$Hundred_Block)
print(head(data$h_block))
## [1] "Intersection" "2200"         "1000"         "2700"        
## [5] "1100"         "8600"

Next let’s concatenate the newly created “h_block” column with the Street_Name column, and a string that specifices that all of the calls are from Vancouver, BC:

# Join the strings from each column together & add "Vancouver, BC":
data$full_address = paste(data$h_block, 
                          paste(data$Street_Name,
                                "Vancouver, BC",
                                sep=", "),
                          sep=" ")

We also notice that the city has put in the word “intersection” for those calls that refer to an intersection. Let’s take those out so as to make our geocoding parsing potentially easier:

# removing "Intersection " from the full_address entries
data$full_address = gsub("Intersection ", "", data$full_address)
print(head(data$full_address))
## [1] "E 64TH AV and PRINCE ALBERT ST, Vancouver, BC"
## [2] "2200 SW MARINE DRIVE, Vancouver, BC"           
## [3] "1000 W 12TH AV, Vancouver, BC"                 
## [4] "2700 W 16TH AV, Vancouver, BC"                 
## [5] "1100 W CORDOVA ST, Vancouver, BC"              
## [6] "8600  - 8699 GRANVILLE ST, Vancouver, BC"

Now, we need to convert the street addresses into latitude & longitude coordinates so that we have a spatial reference to where these calls occurred. We will use the BC Government’s geocoding API, which will take our street addresses and return lat/long coordinates. But, the whole dataset has just over 10,000 rows (each one is an individual call to 311) and geocoding the entire dataset takes about 1 hour.  We will reduce the size of the dataset so that we can actually geocode the addresses in a realistic amount of time during the tutorial. But for your assignment, you will have to geocode the entire dataset!

# creating a random sequence of 1000 numbers between 1 and the length
# of the dataset. These numbers will be the rows we retain from the
# full dataset in order to more quickly geocode during the tutorial
set.seed(1)
keepers = sample(seq(from = 1, to = 10195, by = 1), size = 1000, replace = TRUE)

# creating a new variable to hold the full data set, then trimming the 
# original dataset to have only the 1000 random rows
full_data = data
data = data[keepers,] 

But before we can geocode the street addresses, we need to create a function to do the task! The BC government’s geocoding API works by taking a URL that has a street address as a search term, processing it, and returning a web page with the latitude and longitude of the street address (plus a bunch of other information). The code block below defines a function in R, and after we define the function, we can use it and R to process all of our street addresses in one go!

# a function taking a full address string, formatting it, and making
# a call to the BC government's geocoding API
bc_geocode = function(search){
  # return a warning message if input is not a character string
  if(!is.character(search)){stop("'search' must be a character string")}
  
  # formatting characters that need to be escaped in the URL, ie:
  # substituting spaces ' ' for '%20'.
  search = RCurl::curlEscape(search)
  
  # first portion of the API call URL
  base_url = "http://apps.gov.bc.ca/pub/geocoder/addresses.json?addressString="

  # constant end of the API call URL
  url_tail = "&locationDescriptor=any&maxResults=1&interpolation=adaptive&echo=true&setBack=0&outputSRS=4326&minScore=1&provinceCode=BC"
  
  # combining the URL segments into one string
  final_url = paste0(base_url, search, url_tail)
  
  # making the call to the geocoding API by getting the response from the URL
  response = RCurl::getURL(final_url)
  
  # parsing the JSON response into an R list
  response_parsed = RJSONIO::fromJSON(response)
  
  # if there are coordinates in the response, assign them to `geocoords`
  if(length(response_parsed$features[[1]]$geometry[[3]]) > 0){
    geocoords = list(lon = response_parsed$features[[1]]$geometry[[3]][1],
                      lat = response_parsed$features[[1]]$geometry[[3]][2])
  }else{
    geocoords = NA
  }
  
  # returns the `geocoords` object
  return(geocoords)
}

Now the function is defined. Let’s use it to geocode the 1000 rows in the dataset that we retained.

# Geocode the events - we use the BC Government's geocoding API
# Create an empty vector for lat and lon coordinates
lat = c() 
lon = c()

# loop through the addresses
for(i in 1:length(data$full_address)){
  # store the address at index "i" as a character
  address = data$full_address[i]
  # append the latitude of the geocoded address to the lat vector
  lat = c(lat, bc_geocode(address)$lat)
  # append the longitude of the geocoded address to the lon vector
  lon = c(lon, bc_geocode(address)$lon)
  # at each iteration through the loop, print the coordinates - takes about 20 min.
  print(paste("#", i, ", ", lat[i], lon[i], sep = ","))
}

After a major computational task (e.g. geocoding) you probably want to write your file out so that you keep a copy of your data at this step. In side the quotes for the variable ofile, put in the path to where you want your data stored.

# add the lat lon coordinates to the dataframe
data$lat = lat
data$lon = lon

# after geocoding, it's a good idea to write your file out!
# you will need to modify the directory where you want R to write
# your data!
# joey's computer (mac): '/Users/Jozo/Projects/Github-local/Workshop/aloha-r/data/calls_2014/201401CaseLocationsDetails-geo.csv'
# sally's computer (windows): 'c:\\Sally\\Documents\\van311-project\\201401CaseLocationsDetails-geo.csv'
ofile = "/Users/andrasszeitz/Desktop/GEOB_472/geocoded2.csv" 
write.csv(data, ofile)

Mine

Creating Categories or Classification

We won’t be doing any heavy analysis on the data, but we will try to tease out some categories to make the data more intuitive for exploring. The first thing we’ll do is to see what are the types of unique cases that are being called in:

# ------------------------------------------------- #
# --------------------- Mine ---------------------- #
# ------------------------------------------------- #
# --- Examine the unique cases --- #

# examine how the cases are grouped - are these intuitive?
unique(data$Department)
##  [1] CSG - Licenses                          
##  [2] ENG - Solid Waste                       
##  [3] PRB - Administration                    
##  [4] ENG - Streets                           
##  [5] ENG - Transportation                    
##  [6] ENG - Water & Sewer                     
##  [7] PRB - Planning and Operations           
##  [8] Business Planning & Services            
##  [9] Public Safety - Fire                    
## [10] CSG - Inspections                       
## [11] CSG - Licenses & Inspections            
## [12] Interdepartmental Initiatives           
## [13] ENG - Office of the Deputy City Engineer
## [14] City Administration                     
## 14 Levels: Business Planning & Services ... Public Safety - Fire
unique(data$Division)
##  [1] Animal Control                                    
##  [2] Sanitation                                        
##  [3] General Park Board Information                    
##  [4] Traffic and Electrical Operations                 
##  [5] Neighbourhood Parking and Transportation          
##  [6] Traffic and Data Management                       
##  [7] Street Activities - Integrated Graffiti Management
##  [8] Street Activities - Streets Furniture             
##  [9] Water Operations                                  
## [10] Streets Design - P & D - Accessibility            
## [11] Street Trees                                      
## [12] 311 Contact Centre                                
## [13] Sewers Operations                                 
## [14] Prevention                                        
## [15] Electrical                                        
## [16] Property Use                                      
## [17] Administration Branch                             
## [18] License Office                                    
## [19] Streets Operations                                
## [20] Solid Waste Management                            
## [21] Parking Ops & Enforcement                         
## [22] Vegetation                                        
## [23] Building                                          
## [24] Snow Angel Program                                
## [25] Plumbing and Gas                                  
## [26] Kent Construction Supplies and Services           
## [27] Elections                                         
## [28] Communications                                    
## [29] Street Activities - Streets Horticulture          
## [30] Sewers and Drainage Design                        
## [31] Sewer Separation                                  
## [32] Water Design                                      
## [33] Transfer and Landfill Operations                  
## [34] Active Transportation                             
## 34 Levels: 311 Contact Centre ... Water Operations
# examine the types of cases - can we make new groups that are more useful?
# Print each unique case on a new line for easier inspection
for (i in 1:length(unique(data$Case_Type))){
  print(unique(data$Case_Type)[i], max.levels=0)
}
## [1] Dead Animal Pickup Case
## [1] Missed Garbage Pickup
## [1] Abandoned Garbage Pickup - City Property & Parks
## [1] PRB_Park Ranger SR
## [1] Sign - Repair
## [1] Animal Control General Inquiry Case
## [1] Lost Pets Case
## [1] Residential Parking Requests
## [1] Missed Yard Trimmings and Food Scraps Pickup
## [1] Traffic & Pedestrian Signal - New
## [1] Graffiti Removal - City Property
## [1] Poster/Sign Removal Request
## [1] Street Furniture Repair and Maintenance Request
## [1] Water Leaks/Breaks
## [1] Street Cleaning & Debris Pickup
## [1] Wheelchair Curb/Ramp Request
## [1] Recycling Bag Request
## [1] Animal Complaint - Non-Emergency Case
## [1] Street Light - Out
## [1] Missed Recycling Pickup
## [1] Missed Apartment Recycling Pickup
## [1] Street Tree Work Request SR
## [1] Citizen Feedback
## [1] Cart - Green (Yard Trimmings and Food Scraps)
## [1] Water Service Turn On/Off Request
## [1] Sewer Pipe Inquiries
## [1] Fire Reinspection Request for Firehall
## [1] Street Litter Can Request
## [1] Electrical Inspection Cancellation Case
## [1] Cart - Garbage
## [1] Fire Reinspection Request for Inspector
## [1] Collection Calendar Mail-Out Request
## [1] PUI Noise Complaint Case
## [1] Building Plans Information Request
## [1] Licence Payment Request Case
## [1] Streets - General Issues
## [1] Street Light - Pole Repair
## [1] Water Service Locate Request
## [1] Recycling Box Request
## [1] Gone Out of Business Case
## [1] Street - Surface Water Flooding
## [1] Blue Box and Leaf Removal Guide Mail-Out Request
## [1] Illegal Dumping/Abandoned Garbage Pickup
## [1] Sidewalk - Repair
## [1] Abandoned Vehicle Request
## [1] Cart - Apartment Recycling
## [1] Water Work Site Complaint
## [1] Parks Litter Can or Cart Request
## [1] Water Hydrant Issue
## [1] Secondary Suite Information Request
## [1] Vegetation Maintenance SR
## [1] Street - Repair
## [1] Building Inspection Cancellation Case
## [1] PUI Noise General Inquiry Case
## [1] Snow Angel Program - Individual Volunteer
## [1] Street Light - New/Relocation
## [1] FPB_General Inquiry Case
## [1] PUI General Inquiry Case
## [1] Pothole - Repair
## [1] Trees and Vegetation Encroachment - City Property
## [1] Holding Stray Case
## [1] Green Bin Program - Feedback and General Inquiry
## [1] Traffic Calming Request
## [1] Traffic & Pedestrian Signal - Modify
## [1] Catch Basin Issues
## [1] Snow & Ice Removal - City Property
## [1] Plumbing and Gas Inspection Cancellation Case
## [1] Street Light - Flat Glass Fixture Request
## [1] General Information Request SR
## [1] Traffic Sign - Modify
## [1] Sewer Manhole Issues
## [1] Snow and Ice Removal - Sidewalk Bylaw Violation
## [1] Water General Inquiry
## [1] Sewer General Inquiries
## [1] Curbside Sign - New
## [1] Occupancy Permit Information Request
## [1] Graffiti Removal - External Organization
## [1] Water General Work Request
## [1] Pavement Marking - Repair
## [1] Election General Concerns
## [1] Home Safety Check Request Case
## [1] Apartment Recycling - Registration Request
## [1] Street and Traffic Light - Utility Damage
## [1] Sewer Odour Complaints
## [1] Horticulture Inquiry on Right-of-Way
## [1] Traffic Sign - New
## [1] Sewer Design General Inquiries
## [1] Homelessness/Transient Issue
## [1] Banner Request
## [1] Sewer Separation Inspection Cancellation Case
## [1] Boulevard Maintenance Issues
## [1] Parking Meter Requests
## [1] Bridges & Structures - Repair
## [1] Street Sign - New
## [1] Animal Cremation Case
## [1] Dead Skunk Pickup
## [1] Water Pressure or No Water Issue
## [1] Water Conservation Violation
## [1] Truck Violation
## [1] Water Meter Issue
## [1] Flag Request
## [1] Pavement Markings Request - New/Modify
## [1] Snow and Ice Removal - Sidewalk Bylaw Inquiry
## [1] Transfer Station & Recycling - General Inquiries
## [1] Curbside Sign - Modify
## [1] Crosswalk Marking - New
## [1] Chafer Beetle Feedback
## [1] Water Damage To City Water System
## [1] Traffic Count Request
## [1] Sewer Utility Damage
## [1] Bicycle Route Map Request

Whoa! That’s a heck of a lot of cases. Our role of data visualization people is to take this mess and try to make sense of it so that we can represent it in a manner that is more intuitive and possibly delightful. There are a bunch of ways that we can organize these cases into specific classes, so I invite you to think about how we might best organize these. For now, I’ve come up with 6 classes that bin these cases together. Does this make sense? Should we change it?:

# --- graffiti and noise --- #
Graffiti Removal - City Property
Graffiti Removal - External Organization
PUI Noise Complaint Case
PUI Noise General Inquiry Case

# --- street surface & maintenance --- #
Street Furniture Repair and Maintenance Request
Street Cleaning & Debris Pickup
Street Light - Out
Street Tree Work Request SR
Street Litter Can Request
Streets - General Issues
Street Light - Pole Repair
Street - Surface Water Flooding
Street - Repair
Street Light - New/Relocation
Street Light - Flat Glass Fixture Request
Street and Traffic Light - Utility Damage
Street Sign - New
Crosswalk Marking - New
Boulevard Maintenance Issues
Bicycle Route Map Request
Sidewalk - Repair
Pothole - Repair
Pavement Markings Request - New/Modify
Pavement Marking - Repair
Sewer Pipe Inquiries
Sewer Manhole Issues
Sewer General Inquiries
Sewer Design General Inquiries
Sewer Separation Inspection Cancellation Case
Sewer Utility Damage
Sewer Odour Complaints
Plumbing and Gas Inspection Cancellation Case
Snow Angel Program - Individual Volunteer
Snow & Ice Removal - City Property
Snow and Ice Removal - Sidewalk Bylaw Violation
Snow and Ice Removal - Sidewalk Bylaw Inquiry
Traffic & Pedestrian Signal - New
Traffic Calming Request
Traffic & Pedestrian Signal - Modify
Traffic Sign - Modify
Street and Traffic Light - Utility Damage
Traffic Sign - New
Traffic Count Request
Truck Violation
Residential Parking Requests
Parking Meter Requests
Abandoned Vehicle Request

# --- garbage, recycling & organics --- #
Missed Garbage Pickup
Abandoned Garbage Pickup - City Property & Parks
Cart - Garbage
Illegal Dumping/Abandoned Garbage Pickup
Parks Litter Can or Cart Request
Recycling Bag Request
Missed Recycling Pickup
Missed Apartment Recycling Pickup
Recycling Box Request
Cart - Apartment Recycling
Apartment Recycling - Registration Request
Transfer Station & Recycling - General Inquiries
Blue Box and Leaf Removal Guide Mail-Out Request
Missed Yard Trimmings and Food Scraps Pickup
Cart - Green (Yard Trimmings and Food Scraps)
Green Bin Program - Feedback and General Inquiry
Collection Calendar Mail-Out Request

# --- water related issues --- #
Water Leaks/Breaks
Water Service Turn On/Off Request
Water Service Locate Request
Street - Surface Water Flooding
Water Work Site Complaint
Water Hydrant Issue
Water General Inquiry
Water General Work Request
Water Pressure or No Water Issue
Water Conservation Violation
Water Meter Issue
Water Damage To City Water System
Catch Basin Issues

# --- animal and vegetation related --- #
Dead Animal Pickup Case
Animal Control General Inquiry Case
Animal Complaint - Non-Emergency Case
Animal Cremation Case
Dead Skunk Pickup
Lost Pets Case
Holding Stray Case
Chafer Beetle Feedback
Vegetation Maintenance SR
Trees and Vegetation Encroachment - City Property
Horticulture Inquiry on Right-of-Way

# --- other --- #
Poster/Sign Removal Request
Sign - Repair
Curbside Sign - New
Curbside Sign - Modify
Banner Request
Fire Reinspection Request for Firehall
Fire Reinspection Request for Inspector
Citizen Feedback
Wheelchair Curb/Ramp Request
Wheelchair
PRB_Park Ranger SR
Building Plans Information Request
Building Inspection Cancellation Case
Licence Payment Request Case
Gone Out of Business Case
FPB_General Inquiry Case
PUI General Inquiry Case
Electrical Inspection Cancellation Case
Bridges & Structures - Repair
Secondary Suite Information Request
General Information Request SR
Election General Concerns
Occupancy Permit Information Request
Home Safety Check Request Case
Flag Request
Homelessness/Transient Issue

In the end our 6 classes are:

  1. graffiti and noise
  2. street surface & maintenance
  3. animal and vegetation related
  4. water related
  5. garbage, recycling & organics related
  6. other

Now that we have our classes, we can put them into vectors:

# graffiti and noise
graffiti_noise = c('Graffiti Removal - City Property',
                    'Graffiti Removal - External Organization',
                    'PUI Noise Complaint Case',
                    'PUI Noise General Inquiry Case')

# street surface & maintenance
street_traffic_maint = c('Street Furniture Repair and Maintenance Request',
                          'Street Cleaning & Debris Pickup',
                          'Street Light - Out',
                          'Street Tree Work Request SR',
                          'Street Litter Can Request',
                          'Streets - General Issues',
                          'Street Light - Pole Repair',
                          'Street - Surface Water Flooding',
                          'Street - Repair', 
                          'Street Light - New/Relocation',
                          'Street Light - Flat Glass Fixture Request',
                          'Street and Traffic Light - Utility Damage',
                          'Street Sign - New',
                          'Crosswalk Marking - New',
                          'Boulevard Maintenance Issues',
                          'Bicycle Route Map Request',
                          'Sidewalk - Repair',
                          'Pothole - Repair',
                          'Pavement Markings Request - New/Modify',
                          'Pavement Marking - Repair',
                          'Sewer Pipe Inquiries',
                          'Sewer Manhole Issues',
                          'Sewer General Inquiries',
                          'Sewer Design General Inquiries',
                          'Sewer Separation Inspection Cancellation Case',
                          'Sewer Utility Damage',
                          'Sewer Odour Complaints',
                          'Plumbing and Gas Inspection Cancellation Case',
                          'Snow Angel Program - Individual Volunteer',
                          'Snow & Ice Removal - City Property',
                          'Snow and Ice Removal - Sidewalk Bylaw Violation',
                          'Snow and Ice Removal - Sidewalk Bylaw Inquiry',
                          'Traffic & Pedestrian Signal - New',
                          'Traffic Calming Request',
                          'Traffic & Pedestrian Signal - Modify',
                          'Traffic Sign - Modify',
                          'Street and Traffic Light - Utility Damage',
                          'Traffic Sign - New',
                          'Traffic Count Request','Truck Violation',
                          'Residential Parking Requests',
                          'Parking Meter Requests',
                          'Abandoned Vehicle Request')

# garbage, recycling & organics related
garbage_recycling_organics = c('Missed Garbage Pickup',
                                'Abandoned Garbage Pickup - City Property & Parks',
                                'Cart - Garbage',
                                'Illegal Dumping/Abandoned Garbage Pickup',
                                'Parks Litter Can or Cart Request',
                                'Recycling Bag Request',
                                'Missed Recycling Pickup',
                                'Missed Apartment Recycling Pickup',
                                'Recycling Box Request',
                                'Cart - Apartment Recycling',
                                'Apartment Recycling - Registration Request',
                                'Transfer Station & Recycling - General Inquiries',
                                'Blue Box and Leaf Removal Guide Mail-Out Request',
                                'Missed Yard Trimmings and Food Scraps Pickup',
                                'Cart - Green (Yard Trimmings and Food Scraps)',
                                'Green Bin Program - Feedback and General Inquiry',
                                'Collection Calendar Mail-Out Request')

# water related
water = c('Water Leaks/Breaks',
           'Water Service Turn On/Off Request',
           'Water Service Locate Request',
           'Street - Surface Water Flooding',
           'Water Work Site Complaint',
           'Water Hydrant Issue', 'Water General Inquiry',
           'Water General Work Request',
           'Water Pressure or No Water Issue',
           'Water Conservation Violation',
           'Water Meter Issue',
           'Water Damage To City Water System',
           'Catch Basin Issues')

# animal and vegetation related
animal_vegetation = c('Dead Animal Pickup Case',
                       'Animal Control General Inquiry Case',
                       'Animal Complaint - Non-Emergency Case',
                       'Animal Cremation Case',
                       'Dead Skunk Pickup','Lost Pets Case',
                       'Holding Stray Case',
                       'Chafer Beetle Feedback',
                       'Vegetation Maintenance SR',
                       'Trees and Vegetation Encroachment - City Property',
                       'Horticulture Inquiry on Right-of-Way')

# other
other = c('Poster/Sign Removal Request',
           'Sign - Repair',
           'Curbside Sign - New',
           'Curbside Sign - Modify',
           'Banner Request',
           'Fire Reinspection Request for Firehall',
           'Fire Reinspection Request for Inspector',
           'Citizen Feedback',
           'Wheelchair Curb/Ramp Request','Wheelchair',
           'PRB_Park Ranger SR',
           'Building Plans Information Request',
           'Building Inspection Cancellation Case',
           'Licence Payment Request Case',
           'Gone Out of Business Case',
           'FPB_General Inquiry Case',
           'PUI General Inquiry Case',
           'Electrical Inspection Cancellation Case',
           'Bridges & Structures - Repair',
           'Secondary Suite Information Request',
           'General Information Request SR',
           'Election General Concerns',
           'Occupancy Permit Information Request',
           'Home Safety Check Request Case',
           'Flag Request',
           'Homelessness/Transient Issue')

Assigning Categories a Unique Id (CID)

We can then run through the lists and give each of the classes and cid so that it is easier to identify and represent them later:

# give class id numbers:
data$cid = 9999
for(i in 1:length(data$Case_Type)){
  if(data$Case_Type[i] %in% graffiti_noise){
    data$cid[i] = 1    
  }else if(data$Case_Type[i] %in% street_traffic_maint){
    data$cid[i] = 2   
  }else if(data$Case_Type[i] %in% garbage_recycling_organics){
    data$cid[i] = 3   
  }else if(data$Case_Type[i] %in% water){
    data$cid[i] = 4   
  }else if(data$Case_Type[i] %in% animal_vegetation){
    data$cid[i] = 5   
  }else{
    data$cid[i] = 0   
  }
}

Great, so we now have our data binned into specific groups in a way that seems to make sense. However, if do a little poking around at our data, we notice that since our addresses are aggregated a lot of times to the same address point, our lat/lon coordinates come out the same. How can we deal with this spatial overlap?

Jitter: Offsetting Overlapping Points

One way to do this is to add in some “jitter” to each point if it happens to have the same coordinates.

# --- handle overlapping points --- #
# Set offset for points in same location:
data$lat_offset = data$lat
data$lon_offset = data$lon

# Run loop - if value overlaps, offset it by a random number
for(i in 1:length(data$lat)){
  if ( (data$lat_offset[i] %in% data$lat_offset) && (data$lon_offset[i] %in% data$lon_offset)){
    data$lat_offset[i] = data$lat_offset[i] + runif(1, 0.0001, 0.0005)
    data$lon_offset[i] = data$lon_offset[i] + runif(1, 0.0001, 0.0005)
  } 
}

To derive some more insight into the data, how about looking at the top calls:

# --- what are the top calls? --- #
# get a frequency distribution of the calls:
top_calls = data.frame(table(data$Case_Type))
top_calls = top_calls[order(top_calls$Freq), ]
print(top_calls)
##                                                  Var1 Freq
## 7                                      Banner Request    1
## 18                             Chafer Beetle Feedback    1
## 30                                       Flag Request    1
## 38                     Home Safety Check Request Case    1
## 67                             Sewer Odour Complaints    1
## 70                               Sewer Utility Damage    1
## 74      Snow and Ice Removal - Sidewalk Bylaw Inquiry    1
## 76          Snow Angel Program - Individual Volunteer    1
## 98                                    Truck Violation    1
## 100                      Water Conservation Violation    1
## 8                           Bicycle Route Map Request    2
## 11                      Bridges & Structures - Repair    2
## 21                            Crosswalk Marking - New    2
## 69      Sewer Separation Inspection Cancellation Case    2
## 95                                 Traffic Sign - New    2
## 101                 Water Damage To City Water System    2
## 26                          Election General Concerns    3
## 48               Occupancy Permit Information Request    3
## 52             Pavement Markings Request - New/Modify    3
## 63                Secondary Suite Information Request    3
## 64                     Sewer Design General Inquiries    3
## 96   Transfer Station & Recycling - General Inquiries    3
## 49                             Parking Meter Requests    4
## 75    Snow and Ice Removal - Sidewalk Bylaw Violation    4
## 82          Street Light - Flat Glass Fixture Request    4
## 93                              Traffic Count Request    4
## 106                                 Water Meter Issue    4
## 5                               Animal Cremation Case    5
## 17                                 Catch Basin Issues    5
## 22                             Curbside Sign - Modify    5
## 29            Fire Reinspection Request for Inspector    5
## 51                          Pavement Marking - Repair    5
## 87                                  Street Sign - New    5
## 25                                  Dead Skunk Pickup    6
## 111                      Wheelchair Curb/Ramp Request    6
## 59                     PUI Noise General Inquiry Case    7
## 94                              Traffic Sign - Modify    7
## 40               Horticulture Inquiry on Right-of-Way    8
## 107                  Water Pressure or No Water Issue    8
## 6          Apartment Recycling - Registration Request    9
## 79          Street and Traffic Light - Utility Damage    9
## 102                             Water General Inquiry   10
## 66                               Sewer Manhole Issues   12
## 110                         Water Work Site Complaint   12
## 10                       Boulevard Maintenance Issues   13
## 23                                Curbside Sign - New   13
## 32                     General Information Request SR   13
## 83                      Street Light - New/Relocation   13
## 9    Blue Box and Leaf Removal Guide Mail-Out Request   14
## 36   Green Bin Program - Feedback and General Inquiry   14
## 99                          Vegetation Maintenance SR   14
## 39                       Homelessness/Transient Issue   15
## 108                      Water Service Locate Request   17
## 53      Plumbing and Gas Inspection Cancellation Case   18
## 27            Electrical Inspection Cancellation Case   19
## 103                        Water General Work Request   19
## 33                          Gone Out of Business Case   24
## 92                            Traffic Calming Request   25
## 97  Trees and Vegetation Encroachment - City Property   28
## 65                            Sewer General Inquiries   29
## 31                           FPB_General Inquiry Case   30
## 54                        Poster/Sign Removal Request   30
## 90               Traffic & Pedestrian Signal - Modify   30
## 37                                 Holding Stray Case   31
## 12              Building Inspection Cancellation Case   32
## 73                 Snow & Ice Removal - City Property   32
## 57                           PUI General Inquiry Case   33
## 91                  Traffic & Pedestrian Signal - New   33
## 104                               Water Hydrant Issue   39
## 81    Street Furniture Repair and Maintenance Request   40
## 86                          Street Litter Can Request   44
## 85                         Street Light - Pole Repair   45
## 68                               Sewer Pipe Inquiries   50
## 35           Graffiti Removal - External Organization   55
## 109                 Water Service Turn On/Off Request   55
## 43                                     Lost Pets Case   56
## 56                                 PRB_Park Ranger SR   64
## 44                  Missed Apartment Recycling Pickup   65
## 13                 Building Plans Information Request   66
## 20               Collection Calendar Mail-Out Request   67
## 50                   Parks Litter Can or Cart Request   70
## 71                                  Sidewalk - Repair   71
## 72                                      Sign - Repair   77
## 62                       Residential Parking Requests   80
## 24                            Dead Animal Pickup Case   82
## 4                 Animal Control General Inquiry Case   87
## 77                                    Street - Repair   87
## 58                           PUI Noise Complaint Case  102
## 89                           Streets - General Issues  105
## 28             Fire Reinspection Request for Firehall  119
## 34                   Graffiti Removal - City Property  140
## 55                                   Pothole - Repair  142
## 105                                Water Leaks/Breaks  149
## 78                    Street - Surface Water Flooding  164
## 61                              Recycling Box Request  165
## 3               Animal Complaint - Non-Emergency Case  168
## 2                           Abandoned Vehicle Request  175
## 14                         Cart - Apartment Recycling  181
## 41           Illegal Dumping/Abandoned Garbage Pickup  190
## 42                       Licence Payment Request Case  201
## 80                    Street Cleaning & Debris Pickup  266
## 88                        Street Tree Work Request SR  267
## 60                              Recycling Bag Request  325
## 46                            Missed Recycling Pickup  366
## 16      Cart - Green (Yard Trimmings and Food Scraps)  371
## 19                                   Citizen Feedback  513
## 15                                     Cart - Garbage  689
## 1    Abandoned Garbage Pickup - City Property & Parks  721
## 84                                 Street Light - Out  761
## 45                              Missed Garbage Pickup  763
## 47       Missed Yard Trimmings and Food Scraps Pickup 1229

Filter

Removing Outliers

So we’ve added in some new coordinates and fiddled with them a bit to help with the representation. At this point, it is a good idea to filter out any extraneous points or points that fall outside of our area of interest – in this case Vancouver. Geocoders aren’t perfect and so we should be wary to include any false positives.

First, let’s subset out data that is either an NA or falls outside of Vancouver’s bounds:

# ------------------------------------------------------------------ #
# ---------------------------- Filter ------------------------------ #
# ------------------------------------------------------------------ #
# Subset only the data if the coordinates are within our bounds or if 
# it is not a NA
data_filter = subset(data, (lat <= 49.313162) & (lat >= 49.199554) & 
                    (lon <= -123.019028) & (lon >= -123.271371) & is.na(lon) == FALSE )
# plot the data
plot(data_filter$lon, data_filter$lat)
# If everthing looks good, we might also write out our file:
# Andras' laptop (Mac): '/Users/andrasszeitz/Desktop/GEOB_472/data/311-geo-filtered.csv'
# Andras' computer (Windows): 'C:/Users/aszeitz/Desktop/GEOB_472/data/311-geo-filtered.csv'
ofile_filtered = '/Users/andrasszeitz/Desktop/GEOB_472/Data/311-geo-filtered.csv' 
write.csv(data_filter, ofile_filtered)

NOTE: we write another file out here so we can save our “final dataset”

With a filtered dataset, we can now write our file out to a shapefile and a geojson (if you want to open it later in another GIS).

I highly recommend that you look into learning more about all the different geographic data types like shapefiles and geojson. The last thing you want to do is show up to a job interview and not understand how they work, what the differences are, what are their advantages and disadvantages, etc. For more check out this link, or this link, or this link

Saving Dataset

# --- Convert Data to Shapefile --- #
# store coordinates in dataframe
coords_311 = data.frame(data_filter$lon_offset, data_filter$lat_offset)

# create spatialPointsDataFrame
data_shp = SpatialPointsDataFrame(coords = coords_311, data = data_filter)

# set the projection to wgs84
projection_wgs84 = CRS("+proj=longlat +datum=WGS84")
proj4string(data_shp) = projection_wgs84

# set an output folder for our geojson files:
geofolder = "/Users/andrasszeitz/Desktop/GEOB_472/"

# Join the folderpath to each file name:
opoints_shp = paste(geofolder, "calls_", "1401", ".shp", sep = "")
print(opoints_shp) # print this out to see where your file will go & what it will be called
## [1] "/Users/andrasszeitz/Desktop/GEOB_472/data/geo/calls_1401.shp"
opoints_geojson = paste(geofolder, "calls_", "1401", ".geojson", sep = "") 
print(opoints_geojson) # print this out to see where your file will go & what it will be called
## [1] "/Users/andrasszeitz/Desktop/GEOB_472/data/geo/calls_1401.geojson"
# write the file to a shp
writeOGR(data_shp, opoints_shp, layer = "data_shp", driver = "ESRI Shapefile",
         check_exists = FALSE)
## Warning in writeOGR(check_exists = FALSE, data_shp, opoints_shp, layer =
## "data_shp", : Field names abbreviated for ESRI Shapefile driver
# write file to geojson
writeOGR(data_shp, opoints_geojson, layer = "data_shp", driver = "GeoJSON",
         check_exists = FALSE)

We have all the raw data, but remember that saying “overview first, details on demand”? Our brains simply can’t understand the sheer number of points on the map. How about using some way of aggregating the points to a grid? Turns our hexagonal grids are quite good for conveying density of data points. I’ve created a hexagonal grid in QGIS at a resolution of about 250m x 280m:

Creating Hexagon Grid

hex_van

First let’s read in the grid and reproject it from UTM zone 10 north to WGS84:

# --- aggregate to a grid --- #
# ref: http://www.inside-r.org/packages/cran/GISTools/docs/poly.counts
# set the file name - combine the shpfolder with the name of the grid
grid_fn = 'https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/geo/hgrid_250m.geojson'

# read in the hex grid setting the projection to utm10n
hexgrid = readOGR(grid_fn, 'OGRGeoJSON')
## OGR data source with driver: GeoJSON 
## Source: "https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/geo/hgrid_250m.geojson", layer: "OGRGeoJSON"
## with 5104 features
## It has 4 fields
# transform the projection to wgs84 to match the point file and store
#  it to a new variable (see variable: projection_wgs84)
hexgrid_wgs84 = spTransform(hexgrid, projection_wgs84)

Assigning Values to Hexagons

Next, let’s use the poly.counts() function to count the number of points in each grid cell:

# Use the poly.counts() function to count the number of occurrences of calls per grid cell
grid_cnt = poly.counts(data_shp, hexgrid_wgs84)

Finally, write out the data to a shapefile and geojson – we will use the geojson later, but just for good practice:

# define the output names:
ohex_shp = paste(geofolder, "hexgrid_250m_", "1401", "_cnts",
                  ".shp", sep = "")
print(ohex_shp)
## [1] "/Users/andrasszeitz/Desktop/GEOB_472/data/geo/hexgrid_250m_1401_cnts.sh
ohex_geojson = paste(geofolder, "hexgrid_250m_","1401","_cnts",".geojson") 
print(ohex_geojson)
## [1] "/Users/andrasszeitz/Desktop/GEOB_472/data/geo/hexgrid_250m_1401_cnts.geojson"
# write the file to a shp
writeOGR(check_exists = FALSE, hexgrid_wgs84, ohex_shp, 
         layer = "hexgrid_wgs84", driver = "ESRI Shapefile")

# write file to geojson
writeOGR(check_exists = FALSE, hexgrid_wgs84, ohex_geojson, 
         layer = "hexgrid_wgs84", driver = "GeoJSON")

Technically, we have enough data now to start putting together our web app which visualizes the 3-1-1 data. The next step will be to work with the representation and using the data we produced to filter the visuals with user interactions.

Full Script:

If all is well, this should run in it’s entirety – remember you need to change the folder paths!!!:

#####################################################################
# Vancouver 3-1-1: Data Processing Script
# Date:
# By: 
# Desc: 
#####################################################################

# ------------------------------------------------------------------ #
# ---------------------- Install Libraries ------------------------- #
# ------------------------------------------------------------------ #
# uncomment these lines if these libraries have not yet been installed
# install.packages("GISTools")
# install.packages("RJSONIO")
# install.packages("rgdal")
# install.packages("RCurl")
# install.packages("curl")

# ------------------------------------------------------------------ #
# ----------------------- Load Libararies -------------------------- #
# ------------------------------------------------------------------ #
library(GISTools)
library(RJSONIO)
library(rgdal)
library(RCurl)
library(curl)

# ------------------------------------------------------------------ #
# ---------------------------- Acquire ----------------------------- #
# ------------------------------------------------------------------ #
# access from the interwebz using "curl"
fname = curl('https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/201401CaseLocationsDetails.csv')

# Read data as csv
data = read.csv(fname, header=T)

# inspect your data
print(head(data))

# ------------------------------------------------------------------ #
# ------------------------ Geocoding Function ---------------------- #
# ------------------------------------------------------------------ #

# a function taking a full address string, formatting it, and making
# a call to the BC government's geocoding API
bc_geocode = function(search){
  # return a warning message if input is not a character string
  if(!is.character(search)){stop("'search' must be a character string")}
  
  # formatting characters that need to be escaped in the URL, ie:
  # substituting spaces ' ' for '%20'.
  search = RCurl::curlEscape(search)
  
  # first portion of the API call URL
  base_url = "http://apps.gov.bc.ca/pub/geocoder/addresses.json?addressString="

  # constant end of the API call URL
  url_tail = "&locationDescriptor=any&maxResults=1&interpolation=adaptive&echo=true&setBack=0&outputSRS=4326&minScore=1&provinceCode=BC"
  
  # combining the URL segments into one string
  final_url = paste0(base_url, search, url_tail)
  
  # making the call to the geocoding API by getting the response from the URL
  response = RCurl::getURL(final_url)
  
  # parsing the JSON response into an R list
  response_parsed = RJSONIO::fromJSON(response)
  
  # if there are coordinates in the response, assign them to `geocoords`
  if(length(response_parsed$features[[1]]$geometry[[3]]) > 0){
    geocoords = list(lon = response_parsed$features[[1]]$geometry[[3]][1],
                      lat = response_parsed$features[[1]]$geometry[[3]][2])
  }else{
    geocoords = NA
  }
  
  # returns the `geocoords` object
  return(geocoords)
}

# ------------------------------------------------------------------ #
# ---------------------- Parse: Geocoder --------------------------- #
# ------------------------------------------------------------------ #
# change intersection to 00's
data$h_block = gsub("#", "0", data$Hundred_Block)
print(head(data$h_block))

# Join the strings from each column together & add "Vancouver, BC":
data$full_address = paste(data$h_block, 
                          paste(data$Street_Name,
                                "Vancouver, BC",
                                sep=", "),
                          sep=" ")

# removing "Intersection " from the full_address entries
data$full_address = gsub("Intersection ", "", data$full_address)
print(head(data$full_address))

# creating a random sequence of 1000 numbers between 1 and the length
# of the dataset. These numbers will be the rows we retain from the
# full dataset in order to more quickly geocode during the tutorial
set.seed(1)
keepers = sample(seq(from = 1, to = 10195, by = 1), size = 1000, replace = TRUE)

# creating a new variable to hold the full data set, then trimming the 
# original dataset to have only the 1000 random rows
full_data = data
data = data[keepers, ]

# Geocode the events - we use the BC Government's geocoding API
# Create an empty vector for lat and lon coordinates
lat = c() 
lon = c()
# loop through the addresses
for(i in 1:length(data$full_address)){
  # store the address at index "i" as a character
  address = data$full_address[i]
  # append the latitude of the geocoded address to the lat vector
  lat = c(lat, bc_geocode(address)$lat)
  # append the longitude of the geocoded address to the lon vector
  lon = c(lon, bc_geocode(address)$lon)
  # at each iteration through the loop, print the coordinates - takes about 20 min.
  print(paste("#", i, ", ", lat[i], lon[i], sep = ","))
}

# add the lat lon coordinates to the dataframe
data$lat = lat
data$lon = lon

# after geocoding, it's a good idea to write your file out!
# joey's computer (mac): '/Users/Jozo/Projects/Github-local/Workshop/aloha-r/data/calls_2014/201401CaseLocationsDetails-geo.csv'
# sally's computer (windows): 'c:\\Sally\\Documents\\van311-project\\201401CaseLocationsDetails-geo.csv'
ofile = "/Users/andrasszeitz/Desktop/GEOB_472/geocoded2.csv" 
write.csv(data, ofile)

# ------------------------------------------------------------------ #
# ----------------------------- Mine ------------------------------- #
# ------------------------------------------------------------------ #
# --- Examine the unique cases --- #

# examine how the cases are grouped - are these intuitive?
unique(data$Department)
unique(data$Division)

# examine the types of cases - can we make new groups that are more useful?
# Print each unique case on a new line for easier inspection
for (i in 1:length(unique(data$Case_Type))){
  print(unique(data$Case_Type)[i], max.levels=0)
}

# Determine classes to group case types:

# graffiti and noise
graffiti_noise = c('Graffiti Removal - City Property',
                    'Graffiti Removal - External Organization',
                    'PUI Noise Complaint Case',
                    'PUI Noise General Inquiry Case')

# street surface & maintenance
street_traffic_maint = c('Street Furniture Repair and Maintenance Request',
                          'Street Cleaning & Debris Pickup',
                          'Street Light - Out',
                          'Street Tree Work Request SR',
                          'Street Litter Can Request',
                          'Streets - General Issues',
                          'Street Light - Pole Repair',
                          'Street - Surface Water Flooding',
                          'Street - Repair', 
                          'Street Light - New/Relocation',
                          'Street Light - Flat Glass Fixture Request',
                          'Street and Traffic Light - Utility Damage',
                          'Street Sign - New',
                          'Crosswalk Marking - New',
                          'Boulevard Maintenance Issues',
                          'Bicycle Route Map Request',
                          'Sidewalk - Repair',
                          'Pothole - Repair',
                          'Pavement Markings Request - New/Modify',
                          'Pavement Marking - Repair',
                          'Sewer Pipe Inquiries',
                          'Sewer Manhole Issues',
                          'Sewer General Inquiries',
                          'Sewer Design General Inquiries',
                          'Sewer Separation Inspection Cancellation Case',
                          'Sewer Utility Damage',
                          'Sewer Odour Complaints',
                          'Plumbing and Gas Inspection Cancellation Case',
                          'Snow Angel Program - Individual Volunteer',
                          'Snow & Ice Removal - City Property',
                          'Snow and Ice Removal - Sidewalk Bylaw Violation',
                          'Snow and Ice Removal - Sidewalk Bylaw Inquiry',
                          'Traffic & Pedestrian Signal - New',
                          'Traffic Calming Request',
                          'Traffic & Pedestrian Signal - Modify',
                          'Traffic Sign - Modify',
                          'Street and Traffic Light - Utility Damage',
                          'Traffic Sign - New',
                          'Traffic Count Request','Truck Violation',
                          'Residential Parking Requests',
                          'Parking Meter Requests',
                          'Abandoned Vehicle Request')

# garbage, recycling & organics related
garbage_recycling_organics = c('Missed Garbage Pickup',
                                'Abandoned Garbage Pickup - City Property & Parks',
                                'Cart - Garbage',
                                'Illegal Dumping/Abandoned Garbage Pickup',
                                'Parks Litter Can or Cart Request',
                                'Recycling Bag Request',
                                'Missed Recycling Pickup',
                                'Missed Apartment Recycling Pickup',
                                'Recycling Box Request',
                                'Cart - Apartment Recycling',
                                'Apartment Recycling - Registration Request',
                                'Transfer Station & Recycling - General Inquiries',
                                'Blue Box and Leaf Removal Guide Mail-Out Request',
                                'Missed Yard Trimmings and Food Scraps Pickup',
                                'Cart - Green (Yard Trimmings and Food Scraps)',
                                'Green Bin Program - Feedback and General Inquiry',
                                'Collection Calendar Mail-Out Request')

# water related
water = c('Water Leaks/Breaks',
           'Water Service Turn On/Off Request',
           'Water Service Locate Request',
           'Street - Surface Water Flooding',
           'Water Work Site Complaint',
           'Water Hydrant Issue', 'Water General Inquiry',
           'Water General Work Request',
           'Water Pressure or No Water Issue',
           'Water Conservation Violation',
           'Water Meter Issue',
           'Water Damage To City Water System',
           'Catch Basin Issues')

# animal and vegetation related
animal_vegetation = c('Dead Animal Pickup Case',
                       'Animal Control General Inquiry Case',
                       'Animal Complaint - Non-Emergency Case',
                       'Animal Cremation Case',
                       'Dead Skunk Pickup','Lost Pets Case',
                       'Holding Stray Case',
                       'Chafer Beetle Feedback',
                       'Vegetation Maintenance SR',
                       'Trees and Vegetation Encroachment - City Property',
                       'Horticulture Inquiry on Right-of-Way')

# other
other = c('Poster/Sign Removal Request',
           'Sign - Repair',
           'Curbside Sign - New',
           'Curbside Sign - Modify',
           'Banner Request',
           'Fire Reinspection Request for Firehall',
           'Fire Reinspection Request for Inspector',
           'Citizen Feedback',
           'Wheelchair Curb/Ramp Request','Wheelchair',
           'PRB_Park Ranger SR',
           'Building Plans Information Request',
           'Building Inspection Cancellation Case',
           'Licence Payment Request Case',
           'Gone Out of Business Case',
           'FPB_General Inquiry Case',
           'PUI General Inquiry Case',
           'Electrical Inspection Cancellation Case',
           'Bridges & Structures - Repair',
           'Secondary Suite Information Request',
           'General Information Request SR',
           'Election General Concerns',
           'Occupancy Permit Information Request',
           'Home Safety Check Request Case',
           'Flag Request',
           'Homelessness/Transient Issue')

# give class id numbers:
data$cid = 9999
for(i in 1:length(data$Case_Type)){
  if(data$Case_Type[i] %in% graffiti_noise){
    data$cid[i] = 1    
  }else if(data$Case_Type[i] %in% street_traffic_maint){
    data$cid[i] = 2   
  }else if(data$Case_Type[i] %in% garbage_recycling_organics){
    data$cid[i] = 3   
  }else if(data$Case_Type[i] %in% water){
    data$cid[i] = 4   
  }else if(data$Case_Type[i] %in% animal_vegetation){
    data$cid[i] = 5   
  }else{
    data$cid[i] = 0   
  }
}

# --- handle overlapping points --- #
# Set offset for points in same location:
data$lat_offset = data$lat
data$lon_offset = data$lon

# Run loop - if value overlaps, offset it by a random number
for(i in 1:length(data$lat)){
  if ( (data$lat_offset[i] %in% data$lat_offset) && (data$lon_offset[i] %in% data$lon_offset)){
    data$lat_offset[i] = data$lat_offset[i] + runif(1, 0.0001, 0.0005)
    data$lon_offset[i] = data$lon_offset[i] + runif(1, 0.0001, 0.0005)
  } 
}

# --- what are the top calls? --- #
# get a frequency distribution of the calls:
top_calls = data.frame(table(data$Case_Type))
top_calls = top_calls[order(top_calls$Freq), ]
print(top_calls)

# ------------------------------------------------------------------ #
# ---------------------------- Filter ------------------------------ #
# ------------------------------------------------------------------ #
# Subset only the data if the coordinates are within our bounds or if it is not a NA
data_filter = subset(data, (lat <= 49.313162) & (lat >= 49.199554) & 
                    (lon <= -123.019028) & (lon >= -123.271371) & is.na(lon) == FALSE )

# plot the data
plot(data_filter$lon, data_filter$lat)

# If everthing looks good, we might also write out our file:
ofile_filtered = '/Users/andrasszeitz/Desktop/GEOB_472/filtered.csv'
write.csv(data_filter, ofile_filtered)

# --- Convert Data to Shapefile --- #
# store coordinates in dataframe
coords_311 = data.frame(data_filter$lon_offset, data_filter$lat_offset)

# create spatialPointsDataFrame
data_shp = SpatialPointsDataFrame(coords = coords_311, data = data_filter)

# set the projection to wgs84
projection_wgs84 = CRS("+proj=longlat +datum=WGS84")
proj4string(data_shp) = projection_wgs84

# set an output folder for our geojson files:
geofolder = "/Users/andrasszeitz/Desktop/GEOB_472/"

# Join the folderpath to each file name:
opoints_shp = paste(geofolder, "calls_", "1401", ".shp", sep = "")
print(opoints_shp) # print this out to see where your file will go & what it will be called
opoints_geojson = paste(geofolder, "calls_", "1401", ".geojson", sep = "") 
print(opoints_geojson) # print this out to see where your file will go & what it will be called

# write the file to a shp
writeOGR(data_shp, opoints_shp, layer = "data_shp", driver = "ESRI Shapefile",
         check_exists = FALSE)

# write file to geojson
writeOGR(data_shp, opoints_geojson, layer = "data_shp", driver = "GeoJSON",
         check_exists = FALSE)

# --- aggregate to a grid --- #
# ref: http://www.inside-r.org/packages/cran/GISTools/docs/poly.counts
# set the file name - combine the shpfolder with the name of the grid
# grid_fn = paste(shpfolder,'hgrid_100m.shp', sep="")
grid_fn = 'https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/geo/hgrid_250m.geojson'

# read in the hex grid setting the projection to utm10n
hexgrid = readOGR(grid_fn, 'OGRGeoJSON')

# transform the projection to wgs84 to match the point file and store
# it to a new variable (see variable: projection_wgs84)
hexgrid_wgs84 = spTransform(hexgrid, projection_wgs84)

# Use the poly.counts() function to count the number of occurences of calls per grid cell
grid_cnt = poly.counts(data_shp, hexgrid_wgs84)

# create a data frame of the counts
grid_total_counts = data.frame(grid_cnt)

# set grid_total_counts dataframe to the hexgrid data
hexgrid_wgs84$data = grid_total_counts$grid_cnt

# remove all the grids without any calls
hexgrid_wgs84 = subset(hexgrid_wgs84, grid_cnt > 0)

# define the output names:
ohex_shp = paste(geofolder, "hexgrid_250m_", "1401", "_cnts",
                  ".shp", sep = "")
print(ohex_shp)

ohex_geojson = paste(geofolder, "hexgrid_250m_", "1401", 
                      "_cnts", ".geojson", sep = "") 
print(ohex_geojson)

# write the file to a shp
writeOGR(check_exists = FALSE, hexgrid_wgs84, ohex_shp, 
         layer = "hexgrid_wgs84", driver = "ESRI Shapefile")

# write file to geojson
writeOGR(check_exists = FALSE, hexgrid_wgs84, ohex_geojson, 
         layer = "hexgrid_wgs84", driver = "GeoJSON")

Creating a full dataset:

We completed the exploration for 1 month, but what about the remaining 11 months? What is the value added of having the other temporal data? For the future, you can explore the full year’s worth of data by running this script on each of the datasets. Don’t forget to change the filenames for each year – if you don’t you’ll write over your data and have to run all the code again 😉

Advertisements

One thought on “R – Vancouver 311 Tutorial

  1. For those of you who successfully geocoded your data during the last tutorial, and were able to save it to on your computer, please read it back into R using the read.csv() function.

    I, for example, will run:

    file_loc <- "/Users/andrasszeitz/Desktop/GEOB_472/geocoded_311.csv"
    data <- read.csv(file_loc, header = TRUE)

    For those of you who were unable to geocode your data, didn't save it, or are having trouble loading it from your computer into R, run this code to download a geocoded .csv from Joey's github website:

    library(curl)
    file_loc <- curl('https://raw.githubusercontent.com/joeyklee/aloha-r/master/data/calls_2014/201401CaseLocationsDetails-geo-filtered.csv')
    data <- read.csv(file_loc, header = TRUE)

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s