Pandas Practice¶

In this lesson, we will practice working with DataFrames.

In [1]:
import pandas as pd
import doctest

Our Dataset!¶

Run the cell below to see the DataFrame you'll be working with today. It is a dataset about the competing bakers at the Great Seattle Bake Off.

In [2]:
great_seattle_bakeoff = pd.DataFrame({
    "BakerID": [101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111],
    "Name": ["Arona", "Hannah", "Renusree", "Arpan", "Mia", "Asmi", "Alyssa", "Vani", "Vatsal", "Laura", "Suh Young"],
    "DessertBaked": ["red velvet cake", "basque cheesecake", "baumkuchen", "pound cake", "german chocolate cake", "victoria sponge cake", "birthday cake", "carrot cake", "matcha cake", "tres leches", "fruit cake"],
    "FlavorScore": [60, 92, 78, 40, 38, 73, 50, 59, 75, 99, 0],
    "PresentationScore": [95, 80, 88, 92, 98, 100, 98, 60, 77, 100, 100],
    "CityOfOrigin": ["Paris", "New York City", "Florence", "Seattle", "Seattle", "Paris", "New York City", "Seattle", "Paris", "New York City", "Florence"],
    "StartedBaking": [2006, 2007, 2010, 2020, 2020, 2015, 2019, 2013, 2018, 2018, 1325]
})

great_seattle_bakeoff
Out[2]:
BakerID Name DessertBaked FlavorScore PresentationScore CityOfOrigin StartedBaking
0 101 Arona red velvet cake 60 95 Paris 2006
1 102 Hannah basque cheesecake 92 80 New York City 2007
2 103 Renusree baumkuchen 78 88 Florence 2010
3 104 Arpan pound cake 40 92 Seattle 2020
4 105 Mia german chocolate cake 38 98 Seattle 2020
5 106 Asmi victoria sponge cake 73 100 Paris 2015
6 107 Alyssa birthday cake 50 98 New York City 2019
7 108 Vani carrot cake 59 60 Seattle 2013
8 109 Vatsal matcha cake 75 77 Paris 2018
9 110 Laura tres leches 99 100 New York City 2018
10 111 Suh Young fruit cake 0 100 Florence 1325

Individual Activity¶

Display all the column names in the cell below.

In [5]:
great_seattle_bakeoff.columns
Out[5]:
BakerID                          101
Name                           Arona
DessertBaked         red velvet cake
FlavorScore                       60
PresentationScore                 95
CityOfOrigin                   Paris
StartedBaking                   2006
Name: 0, dtype: object

Then, select the bakers who recieved more than 70 points on their cake's flavor.

In [7]:
great_seattle_bakeoff.loc[great_seattle_bakeoff['FlavorScore'] > 70]
Out[7]:
BakerID Name DessertBaked FlavorScore PresentationScore CityOfOrigin StartedBaking
1 102 Hannah basque cheesecake 92 80 New York City 2007
2 103 Renusree baumkuchen 78 88 Florence 2010
5 106 Asmi victoria sponge cake 73 100 Paris 2015
8 109 Vatsal matcha cake 75 77 Paris 2018
9 110 Laura tres leches 99 100 New York City 2018

Group Activity¶

Add a new column called CreativityScore with the values [100, 80, 70, 90, 100, 20, 80, 90, 100, 70, 70]

In [ ]:
great_seattle_bakeoff['CreativityScore'] = [100, 80, 70, 90, 100, 20, 80, 90, 100, 70, 70]
great_seattle_bakeoff

Now, calculate the mean of the FlavorScore, PresentationScore, and CreativityScore column, and store that value under a new column, TotalScore.

In [ ]:
great_seattle_bakeoff['TotalScore'] = great_seattle_bakeoff[["FlavorScore", 
                                                             "PresentationScore", 
                                                             "CreativityScore"]].mean(axis=1)
great_seattle_bakeoff

Next, change the index of the DataFrame to the BakerID and find what Baker 105's CityOfOrigin is. What about their StartedBaking year?

In [ ]:
great_seattle_bakeoff = great_seattle_bakeoff.set_index("BakerID")
great_seattle_bakeoff
In [ ]:
# Who is Baker 105?
great_seattle_bakeoff.loc[[105]][["CityOfOrigin", "StartedBaking"]]

Whole Class Activity¶

Write a function advancing_bakers that advances bakers into "Advanced" if their TotalScore is 85 or above, and "Eliminated" otherwise. Apply this function to create a new column CompetitionStatus.

In [3]:
def advancing_bakers(data):
    """
    Returns the bakers who will advance to the next round of the Great Seattle Bake Off.
    Parameters:
    - DataFrame: data of a round of the Great Seattle Bake Off
    
    >>> advancing_bakers(great_seattle_bakeoff).loc[101]["CompetitionStatus"]
    'Advanced'
    """
    advanced = data["TotalScore"] >= 85
    eliminated = data["TotalScore"] < 85
    data.loc[advanced, "CompetitionStatus"] = "Advanced"
    data.loc[eliminated, "CompetitionStatus"] = "Eliminated"
    return data

doctest.run_docstring_examples(advancing_bakers, globals())