Due Date

This assignment is due at the end of the day February 3, 2023

Learning Goals

Assignment Description

We will work with Toronto Police Department Public Safety Data Portal, in particular, about bicycle thefts.

The primary question you will answer with these data is: When and where are bikes most likely to be reported as stolen from in Toronto?

Download the data from here.

Your assignment should be completed in R markdown.

Steps

Given the formulated question from the assignment description, you will now conduct EDA Checklist items 2-4.

  1. First, read in the data. Update the missing data identifiers to NA. Check for import issues (dimensions, headers, footers, variable names and variable types). Check for any data issues (import issues, missing values, data errors) particularly in the key variable we are analyzing. Make sure you write up a summary of all of your findings.

  2. Clean the data – keep only necessary data columns and change the names of the key variables so that they are easier to identify. Change the type of key variables from string to factor as appropriate. Identify any outlier reports, and justify how you handle them.

  3. Explore the main question of interest. Calculate summary statistics for each season (e.g. means, medians, variances) and then conduct some basic analyses that enable you to compare across winter/spring/summer/fall (e.g. t-test, linear regression). Be sure to show the results and write up explanations of what you observe in these data.

  4. Create exploratory plots (boxplots, histograms, time series) of which neighbourhoods and time of day bikes were reported stolen from.

  5. Create a basic map in leaflet() that shows the locations of all stolen unicycles (and their colours).