Lab 12 - Interactive Visualization

Download lab `.qmd` here

Learning Goals

Read in and process Starbucks data.
Create interactive visualizations of different types using plotly.express and plotly.graph_objects.
Customize hoverinfo and other plot features.
Create a Choropleth map.
Build a simple interactive dashboard with Dash.

Lab Description

We will work with two Starbucks datasets, one on the store locations (global) and one for the nutritional data for their food and drink items. We will do some text analysis of the menu items.

Deliverables

Upload an html file to Quercus and make sure the figures remain interactive. Use embed-resources: true in your yaml.

Steps

0. Install and load libraries

import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from collections import Counter
import re

1. Read in the data

There are 4 datasets to read in: Starbucks locations, Starbucks nutrition, US population by state, and US state abbreviations. All of them are available in the lab folder.

sb_locs = pd.read_csv("starbucks-locations.csv")

sb_nutr = pd.read_csv("starbucks-menu-nutrition.csv")
sb_nutr.columns = sb_nutr.columns.str.strip()

usa_pop = pd.read_csv("us_state_pop.csv")

usa_states = pd.read_csv("states.csv")

2. Look at the data

Inspect each dataset to look at variable names and ensure it was imported correctly.

# YOUR CODE HERE

3. Format and merge the data

Subset Starbucks location data to the US.
Create counts of Starbucks stores by state.
Merge population and state abbreviations with the store count by state.
Inspect the range of values for each variable.

# Subset to US
sb_usa = sb_locs[sb_locs["Country"] == ___]

# Count stores by state
sb_locs_state = (
    sb_usa
    .groupby(___)
    .size()
    .reset_index(name="n_stores")
)

# Merge population with state abbreviations
usa_pop_abbr = usa_pop.merge(___, left_on=___, right_on=___, how="left")

# Merge store counts with population data
sb_locs_state = sb_locs_state.merge(___, left_on=___, right_on=___, how="left")

4. Use `plotly.express` for EDA

Answer the following questions:

4a) Are the number of Starbucks proportional to the population of a state? (scatterplot) Add hover data for state, population, number of stores, and annotate with State/Province.

# YOUR CODE HERE

4a) Answer:
4b) Is the caloric distribution of Starbucks menu items different for drinks and food? (histogram) Make sure to add barmode="overlay" and choose a representative number of bins with nbins.

# YOUR CODE HERE

4b) Answer:
4c) What are the top 20 words in Food Starbucks menu items? Make a horizontal barplot sorted by count using orientation="h".

stop_words = {"the", "and", "with", "a", "an", "of", "in", "or", "no", "on"}

word_counts = Counter(
    w for name in sb_nutr["Item"].dropna()
    for w in re.findall(r"[a-zA-Z]+", name.lower())
    if len(w) > 2 and w not in stop_words
)

top20_words = [w for w, _ in word_counts.most_common(20)]

# Top 20 words for Food
food_counts = Counter(
    w for name in sb_nutr[sb_nutr["Category"] == "Food"]["Item"].dropna()
    for w in re.findall(r"[a-zA-Z]+", name.lower())
    if len(w) > 2 and w not in stop_words
)
top20_food = pd.DataFrame(food_counts.most_common(20), columns=["word", "count"])

# YOUR CODE HERE

4c) Answer:
4d) Create a scatterplot representing the relationship between calories and carbs. Color the points by category (Food or Beverage) and add trendline="ols" for each category. Is there a relationship, and do food or beverages tend to have more calories?

# YOUR CODE HERE

4d) Answer:

5. `plotly` Boxplots

Create a boxplot of calorie content grouped by the top 10 item words.
Which top word is associated with the most calories?

# Find top 10 words and tag each item
top10_words = [w for w, _ in word_counts.most_common(10)]

def find_top_word(item_name):
    for w in top10_words:
        if w in item_name.lower():
            return w
    return None

sb_nutr["top_word"] = sb_nutr["Item"].apply(find_top_word)
sb_top = sb_nutr.dropna(subset=["top_word"]).copy()

# YOUR CODE HERE

1. Answer:

6. `plotly` 3D Scatterplot

Create a 3D scatterplot of Calories, Carbs, and Protein for items containing the top 10 words.
Do you see any patterns (clusters or trends)?

fig = px.scatter_3d(
    sb_top,
    x="Calories",
    y="Carb. (g)",
    z="Protein (g)",
    color="Category",
    hover_data=["Item", "top_word"],
    title="3D Scatter: Calories, Carbs, and Protein for Top-Word Items",
    template="plotly_white",
    opacity=0.7,
)
fig.update_layout(
    scene=dict(
        xaxis_title="Calories",
        yaxis_title="Carbohydrates (g)",
        zaxis_title="Protein (g)",
    )
)
fig.show()

1. Answer:

7. Choropleth Map

Create two choropleth maps: one for the number of stores per state, one for population by state. Add custom hover text. Display them separately.
Describe any differences.

# Create hover text
sb_locs_state["hover"] = (
    "State: " + sb_locs_state["State"].astype(str) + "<br>" +
    "Stores: " + sb_locs_state["n_stores"].astype(str) + "<br>" +
    "Population: " + sb_locs_state["population"].astype(str)
)

# Map 1: Stores per state
fig1 = go.Figure(go.Choropleth(
    locations=___,
    z=___,
    locationmode=___,
    text=___,
    hoverinfo=___,
    colorscale=___,
    colorbar_title=___,
))
fig1.update_layout(
    title=___,
    geo=dict(scope="usa", projection_type="albers usa", showlakes=True),
    height=450,
)
fig1.show()

# Map 2: Population per state
fig2 = go.Figure(go.Choropleth(
    locations=___,
    z=___,
    locationmode=___,
    text=___,
    hoverinfo=___,
    colorscale=___,
    colorbar_title=___,
))
fig2.update_layout(
    title=___,
    geo=dict(scope="usa", projection_type="albers usa", showlakes=True),
    height=450,
)
fig2.show()

1. Answer:

8. Dash Dashboard

Build a simple interactive Dash dashboard for the Starbucks data. The app below provides a starting structure — save it as a .py file and run it with python app.py.

The dashboard should include: - A dropdown to select a nutritional variable (Calories, Fat, Carbs, Protein, etc.) - A bar chart showing the mean value of that variable for the top 10 menu item words - A scatter plot of Calories vs. the selected variable, colored by Food/Beverage category - Fill in the ... components

# Save this as starbucks_dash.py and run with: python3 starbucks_dash.py

from dash import Dash, dcc, html, Input, Output
import plotly.express as px
import pandas as pd
from collections import Counter
import re

# Data preparation
sb_nutr = pd.read_csv("starbucks-menu-nutrition.csv")
sb_nutr.columns = sb_nutr.columns.str.strip()

# Tokenize item names and find top 10 words
words = Counter(
    w for name in sb_nutr["Item"].dropna()
    for w in re.findall(r"[a-zA-Z]+", name.lower())
    if len(w) > 2
)
top10 = [w for w, _ in words.most_common(10)]

def find_word(item_name):
    for w in top10:
        if w in item_name.lower():
            return w
    return None

sb_nutr["top_word"] = sb_nutr["Item"].apply(find_word)
sb_top = sb_nutr.dropna(subset=["top_word"]).copy()

nutr_vars = ["Calories", "Fat (g)", "Carb. (g)", "Fiber (g)", "Protein (g)"]

# App layout
app = Dash(__name__)

app.layout = html.Div([
    html.H1("Starbucks Menu Nutrition Explorer",
            style={"textAlign": "center", "fontFamily": "Arial"}),

    html.Div([
        html.Label("Select nutritional variable:", style={"fontWeight": "bold"}),
        dcc.Dropdown(
            id="nutr-dd",
            options=[{"label": v, "value": v} for v in nutr_vars],
            value="Calories",
            clearable=False,
            style={"width": "300px"}
        ),
    ], style={"padding": "20px 40px", "fontFamily": "Arial"}),

    html.Div([
        dcc.Graph(id="bar-chart", style={"width": "50%", "display": "inline-block"}),
        dcc.Graph(id="scatter-chart", style={"width": "50%", "display": "inline-block"}),
    ]),
], style={"maxWidth": "1200px", "margin": "auto"})

# Callbacks
@app.callback(
    Output("bar-chart", "figure"),
    Output("scatter-chart", "figure"),
    Input("nutr-dd", "value"),
)
def update(selected_var):
    bar_df = (
        sb_top.groupby("top_word")[selected_var]
        .mean()
        .reset_index()
        .sort_values(..., ascending=False)  # YOUR CODE HERE
    )
    bar_fig = px.bar(
        bar_df, x=..., y=...,  # YOUR CODE HERE
        title=f"Mean {selected_var} by Top Menu Word",
        labels={"top_word": "Menu Word"},
        template="plotly_white",
        color="top_word",
        color_discrete_sequence=px.colors.qualitative.Set2,
    )
    bar_fig.update_layout(height=420, showlegend=False)

    scatter_fig = px.scatter(
        sb_top, x=..., y=...,  # YOUR CODE HERE
        color="Category",
        hover_data=["Item", "top_word"],
        title=f"Calories vs. {selected_var}",
        template="plotly_white",
        opacity=0.7,
    )
    scatter_fig.update_layout(height=420, hovermode="closest")

    return ..., ...  # YOUR CODE HERE


if __name__ == "__main__":
    app.run(debug=True)

To run: Copy the code above into a file called starbucks_dash.py in the same folder as your data, then run python3 starbucks_dash.py in your terminal. Open http://127.0.0.1:8050 in your browser.

Download lab .qmd here

Download data: starbucks-locations.csv, starbucks-menu-nutrition.csv, us_state_pop.csv, states.csv