This assignment is due at the end of the day February 3, 2023
We will work with Toronto Police Department Public Safety Data Portal, in particular, about bicycle thefts.
The primary question you will answer with these data is: When and where are bikes most likely to be reported as stolen from in Toronto?
Download the data from here.
Your assignment should be completed in R markdown.
Given the formulated question from the assignment description, you will now conduct EDA Checklist items 2-4.
First, read in the data. Update the missing data identifiers to
NA
. Check for import issues (dimensions, headers, footers,
variable names and variable types). Check for any data issues (import
issues, missing values, data errors) particularly in the key variable we
are analyzing. Make sure you write up a summary of all of your
findings.
Clean the data – keep only necessary data columns and change the names of the key variables so that they are easier to identify. Change the type of key variables from string to factor as appropriate. Identify any outlier reports, and justify how you handle them.
Explore the main question of interest. Calculate summary statistics for each season (e.g. means, medians, variances) and then conduct some basic analyses that enable you to compare across winter/spring/summer/fall (e.g. t-test, linear regression). Be sure to show the results and write up explanations of what you observe in these data.
Create exploratory plots (boxplots, histograms, time series) of which neighbourhoods and time of day bikes were reported stolen from.
Create a basic map in leaflet() that shows the locations of all stolen unicycles (and their colours).