Opening a New African Restaurant in London
Coursera Capstone
IBM Applied Data Science Capstone
Opening a New African Restaurant in London
By: Robbin Dilles
June 2020
Introduction
Considering
how diverse the city of London is, it can be difficult to find nice African
dining experiences. This project will try to find the best spots within London
for opening a new restaurant for African Cuisine. This will mainly target
African (black) people and as such will focus on the demographics of London for
finding a spot.
The restaurant itself could be anything ranging from Senegalese to Cameroonian,
Nigerian, South African, Ghanaian etc, or a combination of them of course.
Business Problem
The
objective of this capstone project is to analyse and select the best locations
in the city of London to open a new African Restaurant. Using data science
methodology and machine learning techniques like clustering, this project aims
to provide solutions to answer the business question: In the city of London, if
a Chef or branch manager is looking to open a new African Restaurant, where
would you recommend that they open it?
Target Audience of this project
This
project is particularly useful to chefs, cooks, branch managers, entrepreneurs and
investors looking to open or invest in new restaurants in the capital city of London. This project is
timely as the city is currently suffering from an undersupply of African
Restaurants.
Data
The
following data will be used to solve the problem:
·
A
list of neighbourhoods in London.
·
Data
of demographics in London.
·
Latitude
and Longitude coordinates of those neighbourhoods. Required to plot the map and
get venue data.
·
Venue
data, particularly related to restaurants. We will use this data to perform
clustering on the neighbourhoods.
Sources of data and
methods to extract them
This
Wikipedia page (https://en.wikipedia.org/wiki/List_of_areas_of_London) contains a list of neighbourhoods
in London. We will use web scraping techniques to extract the data from the
Wikipedia page, with the help of Python requests and beautifulsoup packages.
Then we will get the geographical coordinates of the neighbourhoods using
Python Geocoder package which will give us the latitude and longitude
coordinates of the neighbourhoods. After that, we will use Foursquare API to
get the venue data for those neighbourhoods. Foursquare has one of the largest
database of 105+ million places and is used by over 125,000 developers.
Foursquare API will provide many categories of the venue data, we are
particularly interested in the Shopping Mall category in order to help us to
solve the business problem put forward. This is a project that will make use of
many data science skills, from web scraping (Wikipedia), working with API
(Foursquare), data cleaning, data wrangling, to machine learning (K-means
clustering) and map visualization (Folium). In the next section, we will
present the Methodology section where we will discuss the steps taken in this
project, the data analysis that we did and the machine learning technique that
was used.
Methodology
Firstly, we
need to get the list of neighbourhoods in the city of London. Fortunately, the
list is available in the Wikipedia page (https://en.wikipedia.org/wiki/List_of_areas_of_London). We will do web scraping using
Python requests and beautifulsoup packages to extract the list of
neighbourhoods data. However, this is just a list of names. We need to get the
geographical coordinates in the form of latitude and longitude in order to be
able to use Foursquare API. To do so, we will use the wonderful Geocoder package
that will allow us to convert address into geographical coordinates in the form
of latitude and longitude. After gathering the data, we will populate the data
into a pandas DataFrame and then visualize the neighbourhoods in a map using
Folium package. This allows us to perform a sanity check to make sure that the
geographical coordinates data returned by Geocoder are correctly plotted in the
city of London. Next, we will use Foursquare API to get the top 100 venues that
are within a radius of 2000 meters. We need to register a Foursquare Developer
Account in order to obtain the Foursquare ID and Foursquare secret key. We then
make API calls to Foursquare passing in the geographical coordinates of the
neighbourhoods in a Python loop. Foursquare will return the venue data in JSON
format and we will extract the venue name, venue category, venue latitude and
longitude. With the data, we can check how many venues were returned for each
neighbourhood and examine how many unique categories can be curated from all
the returned venues. Then, we will analyse each neighbourhood by grouping the
rows by neighbourhood and taking the mean of the frequency of occurrence of
each venue category. By doing so, we are also preparing the data for use in
clustering. Since we are analysing the “Restaurant” data, we will filter the “Restaurantl”
as venue category for the neighbourhoods. Lastly, we will perform clustering on
the data by using k-means clustering. K-means clustering algorithm identifies k
number of centroids, and then allocates every data point to the nearest
cluster, while keeping the centroids as small as possible. It is one of the
simplest and popular unsupervised machine learning algorithms and is
particularly suited to solve the problem for this project.
Results
Some important highlights of the 5 clusters:
- Drinking establishments like Pubs,
Cafe's and Coffee Shops are popular in the South East London Area.
- For restaurants it looks like the
Italian Restaurants are the most popular. Especially in Southwark and Lambeth.
- Considering the Lewisham area is the
most condensed area of Africans in the South East Area, it is surprising
to see how you can barely see restaurants in the top 5 venues.
- In all clusters, it is easy to see a
predominance of pubs.
Discussion and Conclusion
We find 2
clusters that look like the most viable clusters to establish an African
Restaurant. Their proximity to other amenities and to the station are of high
importance. These 2 clusters do not have top restaurants that could rival a new
restaurant if it were established. The proximity to much needed resources is
also important as Lewisham and Lambeth are not far out from Peckham.
Ultimately,
this project would have had better results if it were possible to analyse all
neighbourhoods, and get more data in terms of crime within the area, traffic
access, store and warehouse proximity and ability to explore more venues with
the Foursquare api.
Of course,
getting ratings and feedbacks of the current establishments within the clusters
would provide more insight as well.
Reacties
Een reactie posten