The Battle of the neighbourhoods - Applied Data Science Capstone Project
Table of contents
- 1. Introduction
- 2. Data Preparation
- 3. Methodology
- 4. Analysis
- 5. Discussion
- 6. Conclusion
- 7. References
1. Introduction
1.1 Backgrounds
The Chinese Canadian community in the Greater Toronto Area (GTA) was first established around 1877. According to Statistics Canada - 2016 Census of Population, there are 631,050 Chinese in the Greater Toronto Area, second only to New York City for largest Chinese community in North America. Food and Restaurant is always a vital part in immigrant communities progress, not only maintaining links to homeland culture but also in allowing immigrants to share this culture with the host community.
1.2 Problems
It is well known that there are countless Chinese restaurants located throughout Toronto. However, Szechuan cuisine, which is extremely welcomed by people all over China, is still a newcomer in Toronto. In this capstone project I will try to identify several areas suitable for new Szechuan Restaurant in Greater Toronto Area, Ontario.
In the following section I will use knowledges and skills learned from IBM Data Science to locate promising neighbourhoods for Szechuan Restaurant.
1.3 Interest
Realtors, financial services companies, immigrants services agencies as well as immigrant investors surging into Canada would be very interested in finding new hotspots for investment.
2. Data Preparation
2.1 Data Sources
Data sources will be used in this analysis are:
List of Postal code of Canada: This wikipedia page is for obtain all the neighbourhoods, including postal code, borough in Toronto.
Coordinates of neighbourhoods: This CSV file is downloaded from Coursera IBM Data Science course page.
Restaurants details, including category and location in Toronto neighbourhood will be obtained using Foursquare API.
Toronto GeoJSON file: City of Toronto Open Data provides Toronto Boundaries of City of Toronto neighbourhoods. I used this GeoJson file to create choropleth map for Toronto.
Immigration and ethnocultural diversity statistics: City of Toronto Open Data and Census of Population, 2016. These two resources are extremely helpful when conducting this analysis.
City Center of Toronto : Here I am using Toronto City Hall as city center. The geographical coordinates are obtained using GeoPy.
Business Improvement Areas: A Business Improvement Area (BIA) is an association of commercial property owners and tenants within a defined area who work in partnership with the City to create thriving, competitive, and safe business areas that attract shoppers, diners, tourists, and new businesses. The GeoJson file for BIA could be found here.
2.2 Data Cleaning
List of Data obtained and cleaned:
- neighbourhoods List in Toronto (Postal Codes starts with ‘M’)
- Geographical Coordinates of neighbourhoods in Toronto
- Number of Chinese Restaurants and Szechuan Restaurants in Toronto ( using Foursquare API )
- Toronto GeoJSON file
- Coordinates of City Center, Toronto
- Chinatown area GeoJson of Toronto
3. Methodology
In this section I will try to find patterns for existing Chinese/Szechuan Restaurants in Toronto and spot potential venues for opening new Szechuan Restaurants. Toronto is the most populous city in Canada and the fourth most populous city in North America with 140 neighbourhoods. However, according to Toronto Neighbourhood Population Profiles, the estimated downtown population of Toronto in 2015 is between 242,845 to 245,830, compared to 2,956,024 of City Toronto. In that case, there must be some distinct business approaches to open a business in and out of downtown area. Besides, the Immigration and ethnocultural diversity in Toronto, which in our project we will focus on East and Southeast Asians population distributions in Toronto, will have impacts on locations and numbers of our target restaurants.In following sections we are expected to see some totally different strategies for local Chinese and Szechuan restaurants.
Now we have our data:
- Toronto GeoJSON file and Toronto neighbourhoods Profiles for Choropleth Map based on population density of East and Southeast Asians population
- Number of Chinese Restaurants and locations in Toronto
- Number of Szechuan Restaurants and locations in Toronto
First I will try to visualize number and locations of Chinese Restaurants in and out of Toronto Downtown area in each neighbourhood.
Then let us try to find patterns for locations of Chinese Restaurants in and out of Toronto Downtown area. I will use heatmaps to conduct this analysis.
Last I will use Density Based Clustering to find patterns of Chinese/Szechuan Restaurants and identify certain cluster groups as candidate neighbourhoods for opening Szechuan Restaurants.
Most of the traditional clustering techniques, such as k-means, hierarchical and fuzzy clustering, can be used to group data without supervision. However, when applied to tasks with arbitrary shape clusters, or clusters within cluster, the traditional techniques might be unable to achieve good results. That is, elements in the same cluster might not share enough similarity or the performance may be poor. Additionally, Density-based Clustering locates regions of high density that are separated from one another by regions of low density. Density, in this context, is defined as the number of points within a specified radius. In this section, the main focus will be manipulating the data and properties of DBSCAN and observing the resulting clustering.
4. Analysis
4.1 Visualize number of Chinese Restaurants in Toronto ( neighbourhoods )
Here we can create a bar chart to better analyze the data.
We can see from the bar chart that the dominant number of Chinese Restaurants are in Kensington Market, Chinatown and Grange Park, all of them are in downtown area, which totally make sense since it is the commercial and business center of a city.
4.2 Visualize number of Chinese Restaurants in Toronto ( boroughs )
We can also create a bar chart to show number of Chinese Restaurants in different Boroughs.
However, when we turn to borough analysis, there is a different story. The high density of Chinese restaurants in downtown is totally making sense. Then why the number of Chinese Restaurants in North York area stands out?
4.3 Create a choropleth map of Toronto
In order to find reasons for high number of Chinese Restaurants in North York, let us try to create choropleth map of Toronto and see whether there is positive correlation between high population density of East and Southeast Asians and numbers of Chinese Restaurants.
According to Toronto Open Data, the North York neighbourhoods are consisted of ward 6, 8, 15, 16, 17, 18, with high density of East and Southeast Asians origins. From the map above we can conclude that there is positive correlation between high population density of East and Southeast Asians and numbers of Chinese Restaurants. That is our first conclusion.
4.4 Create a Map of Szechuan Restaurants
It seems like most of the Szechuan Restaurants tend to stay in clusters with Chinese Restaurants in downtown areas. None of Szechuan Restaurants are spotted in other neighbourhoods besides downtown Toronto. Let us see if we can create a Heatmap to better visualize it.
4.5 Create Heatmap for Chinese & Szechuan Restaurants.
4.6 Create a map for Toronto Business Improvement Area
This heatmap shows that most of Szechuan Restaurants are in Kensington Market, Chinatown. Let us create a BIA (Business Improvement Area) map to see whether it can back up our conclusion.
Disclaimer : A Business Improvement Area (BIA) is an association of commercial property owners and tenants within a defined area who work in partnership with the City to create thriving, competitive, and safe business areas that attract shoppers, diners, tourists, and new businesses. The BIA layer represents the active BIAs in the City of Toronto that has been enacted by Council. Each BIA has been defined by a by-law and is represented by a Board of Management. The layer is updated as BIAs are created, amended or deleted by Council.
Now we can make several conclusions based on the maps above:
- Most of the Szechuan restaurants (6 out of 12) are in Chinatown. In this area,Szechuan restaurants tend to be in clusters with each other.
- For Szechuan restaurants outside of Chinatown, most of them are in neighbourhoods with no Szechuan restaurants nearby. However, they are still in Chinese restaurants clusters.
- There are several candidates locations for Szechuan restaurants, with no Szechuan restaurants nearby and are all in Chinese restaurants cµMlusters.
4.7 Find optimal locations for New Szechuan Restaurants using DBSCAN
Now let’s continue searching for optimal locations for Szechuan restaurants. In this section I will use Density Based Clustering to locate candidates clusters for opening Szechuan Restaurants.
5. Discussion
Now we have our candidate clusters for opening new Szechuan Restaurants. In order to find the optimal clusters, here I set criterion for this analysis:
- the restaurant should be in geographic clustering by Chinese restaurant segment;
- the restaurant should be in a neighbourhood with no Szechuan restaurants nearby;
- the restaurant should be located in a BIA area;
- the restaurant should be located in neighbourhoods with high East and Southeast Asian population density;
Let us take a look at clusters, from outlier cluster label 0 to label 12. Outlier with label -1 will not be included in the analysis.
Cluster 0: Potential.
1) In Chinese Restaurants Clusters;
2) No Szechuan Restaurant in Cluster;
3) In Downtown BIA area;
4) High population density of East and Southeast Asian origins.Cluster 1: Potential.
1) In Chinese Restaurants Clusters;
2) Szechuan Restaurant in Cluster;
3) In Downtown BIA area;
4) High population density of East and Southeast Asian origins.Cluster 2: Not Recommended.
1) Out of Chinese Restaurants Clusters (potential outlier);
2) No Szechuan Restaurant in Cluster;
3) In Downtown BIA area;
4) High population density of East and Southeast Asian origins.Cluster 3: Not Recommended.
1) In Chinese Restaurants Clusters;
2) FOUR Szechuan Restaurant in Cluster;
3) In Downtown BIA area;
4) High population density of East and Southeast Asian origins.Cluster 4: Fair.
1) In Chinese Restaurants Clusters;
2) No Szechuan Restaurant in Cluster;
3) Out of Downtown BIA area;
4) Moderate population density of East and Southeast Asian origins (Near Chinatown).Cluster 5: Potential.
1) In Chinese Restaurants Clusters;
2) No Szechuan Restaurant in Cluster;
3) In Downtown BIA area;
4) Low population density of East and Southeast Asian origins (Near Chinatown).Cluster 6: Potential.
1) In Chinese Restaurants Clusters;
2) No Szechuan Restaurant in Cluster;
3) In Downtown BIA area;
4) Low population density of East and Southeast Asian origins (Near Chinatown).Cluster 7: Not Recommended.
1) In Chinese Restaurants Clusters;
2) SIX Szechuan Restaurant in Cluster;
3) In Downtown BIA area;
4) High population density of East and Southeast Asian origins (Chinatown).Cluster 8: Fair.
1) In Chinese Restaurants Clusters;
2) No Szechuan Restaurant in Cluster;
3) Not in Downtown BIA area;
4) Moderate population density of East and Southeast Asian origins.Cluster 9: Fair.
1) In Chinese Restaurants Clusters;
2) No Szechuan Restaurant in Cluster;
3) Not in Downtown BIA area;
4) Moderate population density of East and Southeast Asian origins.Cluster 10: Fair.
1) In Chinese Restaurants Clusters;
2) No Szechuan Restaurant in Cluster;
3) Not in Downtown BIA area;
4) High population density of East and Southeast Asian origins.Cluster 11: Fair.
1) In Chinese Restaurants Clusters;
2) No Szechuan Restaurant in Cluster;
3) Not in Downtown BIA area;
4) High population density of East and Southeast Asian origins.Cluster 12: Fair.
1) In Chinese Restaurants Clusters;
2) No Szechuan Restaurant in Cluster;
3) In Downtown BIA area;
4) Low population density of East and Southeast Asian origins.
6. Conclusion
Finally we have spotted 4 clusters with high potential to open new business. We give it star label with colour blue.
This concludes our analysis. We have spot 4 cluster Areas with high potential to open Szechuan restaurants, with Chinese Restaurants nearby, no Szechuan Restaurant in cluster, high population density of East and Southeast Asian origins and all clusters in Downtown Toronto Business Improvement Area.
Please notice this analysis is only a starting line to find and define business patterns of Restaurant & Food service industry. Except for locations, there are still tons of factors that should be taken into considerations such as anticipated sales volume, accessibility to potential customers, rents, security issues et cetera.
7. Reference
[1] Vieregge, M., Lin, J. J., Drakopoulos, R., & Bruggmann, C. (2009). Immigrants Perception of Ethnic Restaurants: The Case of Asian Immigrants Perception of Chinese Restaurants in Switzerland. Tourism Culture & Communication, 9(1), 49–64. doi: 10.3727/109830409787556684
[2]finding optimal locations of new stores by ibm
[3]generating geojson file for toronto fsas by amy gordon
[4]housing sales prices & venues data analysis of istanbul by sercan yıldız
[5]visualizing geospatial data in python using folium by aly sivji
[6]folium documentation 0.11.0