0%

Generating GeoJSON File for Toronto

In IBM Data Science Capstone Project, we have to import Neighbourhoods Postal Codes of Toronto in order to fetch information using Foursquare API. We can obtain geojson file of Toronto Open Data but the geojson file is actually divided by Neighbourhood division, not on its postal codes. We have to customize our geojson file, using shape file from Statistics Canada.

Table of contents

1. Preparation

1.1 Data Sources

  1. List of Postal code of Toronto: This wikipedia page is for obtain all the neighbourhoods, including postal code, borough in Toronto.
  2. Boundary file: You can download the shape file from https://www12.statcan.gc.ca/census-recensement/2011/geo/bound-limit/bound-limit-2016-eng.cfm. Please note the 2016 Census - Boundary files portray the full extent of the geographic areas, including the coastal water area of Canada.

Pay attention to select right shapefile:

  • format: ArcGIS ® (.shp) file
  • Geographic area or water feature: Forward Sortation Area
  • Digital Boundary File
    Select the Right Boundary File

1.2 Data Cleaning

Let us first import python packages.

1
2
3
4
5
6
7
8
9
# library to handle data in a vectorized manner
import numpy as np

# library for data analsysis
import pandas as pd

# library to handle JSON files
import json
import requests

Then let us try to clean data and have postal codes of Toronto FSAs.

1
2
3
4
5
6
7
8
9
# Read wikipedia page
postal_codes = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M',header=0)[0]

# Clean dataframe, delete rows with 'Not assigned'
postal_codes = postal_codes[~postal_codes.Borough.str.contains("Not assigned") == True]

# Reset index for dataframe
postal_codes = postal_codes.reset_index(drop=True)
postal_codes.head()

1.3 GIS software

We will need GIS software to open shapefile (.shp). Fortunately Mac user could download the QGIS software from here.QGIS is A Free and Open Source Geographic Information System. You can create, edit, visualise, analyse and publish geospatial information on Windows, Mac, Linux, BSD and mobile devices.

2. Read Shapefile

Now we have our valid postal codes list for analysis, shapefile of Canada as well as GIS software to read shapefile. Let’s try read shapefile of canada.
shapefile

2.1 Add Layer

After open QGIS, first we have to open shapefile and add vector layer.
vector layer
choose shapefile
Then we could take a close look of our file.

2.2 Filter Layer

We only need Toronto’s FSAs geojson so we have to filter our layer. Again, we select Layer - Filter or we could just simply use shortcut for filter: command + f

Ok. So on query builder there are three fields on the file: “CFSAUID” “PRUID” “PRNAME”. On the Values box we can press ‘all’ to observe each field.

Now we could filter our layer using postal codes of Toronto. Let us provide specific filter expression

1
"CFSAUID" IN (postal codes list)"

Let’s test the filter to see if it is working.

96 rows. Looks good.

2.3 Generating GeoJSON File

Now let us save this layer to GeoJSON format. Layer - Save as. Please set to format to WGS84 format since we will need folium package to create a map based on GeoJSON file.

Voila. Here is out geojson file of Toronto. Let us try to create population choropleth map of Toronto. Here I uploaded the GeoJSON to Github. Feel free to download it.

3. Choropleth Map of Toronto (Bonus Part)

3.1 Obtain Population information

Statistics Canada has provided the information based on FSAs. Let us first do some data cleaning.

1
2
3
4
5
6
7
8
9
10
11
12
# fetch url 
url_pop = 'https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/Tables/CompFile.cfm?Lang=Eng&T=1201&OFT=FULLCSV'

# read into dataframe
pop = pd.read_csv(url_pop)

# drop irrelevant columns and make sure we are looking at Ontario
pop = pop[['Geographic code','Geographic name','Province or territory','Population, 2016']]
pop=pop[(pop['Province or territory']== "Ontario")]

# change population column type to float
pop['Population, 2016'].astype(float)

3.2 Create population choropleth map of Toronto

Ok. We have our population data as well as geojson file. Let’s map it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# map rendering library
import folium

# read geojson file
with open('/toronto_m.geojson') as data:
t_geo= json.load(data)

# create choropleth map
a = folium.Map(location=[43.6534817, -79.3839347], tiles='cartodbpositron', zoom_start=11)

# Add choropleth to the map, set as
a.choropleth(
geo_data= t_geo,data= pop,
columns=['Geographic code','Population, 2016'],
key_on='feature.properties.CFSAUID',
fill_color='Blues',
fill_opacity=0.9,
line_opacity=0.1,
legend_name='Population of Toronto by FSAs')
a

Here is the map. Thank you for reading this.
choropleth map