JSC370: Data Science II
Winter 2026 · University of Toronto

Where and When
- Instructor: Meredith Franklin
- Email: meredith.franklin@utoronto.ca, please put “JSC370” in the subject line.
- Teaching Assistants: Johnny Meng and Kevin Yang
- Time: Mondays (Lecture) and Wednesdays (Lab), 1-3pm
- Location: MS 3278 (Mondays), HS 108 (Wednesdays)
- Office hours: TBD
- Course Forum: Piazza
Course Description
This course serves as the second in a series of courses on data science. We will focus on the acquisition and analysis of real-life data. Students will learn the toolsets needed to 1) create workable and reproducible data by accessing, scraping, sampling and cleaning data; 2) conduct exploratory data analysis and data visualizations; 3) apply statistical and machine learning tools to learn from data; 4) conduct computing on remote systems. Python, VS Code, Quarto, and GitHub will be used.
Weekly Course Schedule
| Week | Dates | Topics / Weekly Activities | Due (end of day): Labs Wed, HW Sun |
|---|---|---|---|
| Week 1 | January 5 (lecture) January 7 (lab) not in person this week only! |
Introduction to Data Science tools: Python, Quarto, VS Code | Lab 1 |
| Week 2 | January 12 (lecture) January 14 (lab) |
Version Control & Reproducible Research, Git/GitHub | Lab 2 |
| Week 3 | January 19 (lecture) January 21 (lab) |
Exploratory Data Analysis & Data Viz 1 | Lab 3 |
| Week 4 | January 26 (lecture) January 28 (lab) |
Data Viz 2 & ML 1 | HW1, Lab 4 |
| Week 5 | February 2 (lecture) February 4 (lab) |
Regular expressions; data scraping; using APIs | Lab 5 |
| Week 6 | February 9 (lecture) February 11 (lab) |
Text mining | HW2, Lab 6 |
| Week 7 | February 16 | Reading Week | |
| Week 8 | February 23 (lecture) February 25 (lab) |
ML 2 (trees, random forests, boosting) | HW3, Lab 8 |
| Week 9 | March 2 (lecture) March 4 (lab) |
ML 3 (model evaluation and interpretation) | Lab 9 |
| Week 10 | March 9 (lecture) March 11 (lab) |
Parallel computing, high performance computing | Midterm, Lab 10 |
| Week 11 | March 16 (lecture) March 18 (lab) |
Parallel computing, high performance computing | Lab 11 |
| Week 12 | March 23 (lecture) March 25 (lab) |
Interactive visualization & effective data communication | HW4, Lab 12 |
| Week 13 | March 30 (lecture) April 1 (drop-in lab) |
Building Website with Interactive Apps | HW5 due with Final project |
| Week 15 | April 26 | Final Project |
Grading Breakdown
| Task | % of Grade |
|---|---|
| Labs (including attendance and guest speaker reflections) | 15 |
| Homework (5) | 25 |
| Midterm report | 25 |
| Final project | 35 |
Homework: There will be 5 homeworks given throughout the semester. Students may discuss the problems with one another; however, individual solutions must be submitted and copying will not be tolerated. All homework must be completed in Quarto (.qmd) using Python code chunks, and submitted through the course GitHub Classroom. Late assignments will be penalized by 10% per day past the due date.
Midterm Project: A mid-semester report detailing the dataset you will use for the final project. Exploratory data analysis, visualizations, and summaries of the data will be presented.
Final Project: Apply the concepts learned in the course to analyze a dataset that you have chosen. Create and deploy a GitHub website with interactive components.
Labs: Lab attendance and participation is required and counts toward the overall lab grade. The lab assignment will be handed in at the end of the lab (or by the end of the lab day if more time is needed). The lowest lab grade will be dropped in calculating your final grade.
Readings (Not Required)
- Python for Data Analysis (3e), 2023. Wes McKinney.
- Python Data Science Handbook, Jake VanderPlas.
- The Turing Way: Reproducible, ethical, and collaborative data science.
- Pro Git (online book), Scott Chacon and Ben Straub.
- An Introduction to Statistical Learning, 2023 James, Witten, Hastie, Tibshirani, Taylor
Resources
Helpers and Templates
Guides
Tools
- Visual Studio Code
- Python
- Core libraries:
- Data access & wrangling:
- Publishing / reproducibility:
Data
Many of these websites provide APIs and/or bulk downloads.
Canadian Data
- Government of Canada Open Data (Open Government Portal)
- Statistics Canada (Census + surveys)
- Statistics Canada Web Data Service (API)
- Canada GIS Data
- University of Toronto Library (Geospatial data guides)
- City of Toronto Open Data
- Toronto Police Service Open Data
- Ontario Data Catalogue
- Public Health Ontario Open Data
- British Columbia Data Catalogue
Environmental and Climate Data
- US EPA Air Quality Data
- US EPA AQS Data API (Air Quality System)
- NOAA National Centers for Environmental Information (NCEI)
- NOAA NCEI Access Data Service (API)
- North American Regional Climate Change Assessment Program (NARCCAP)
- Natural Resources Canada (Geospatial data/tools)
- Coastal wave / buoy data (CDIP, UCSD)
- Great Lakes Bathymetry (NOAA)
- US Energy Information Administration (EIA)
Social Networks and Platforms