Pandas Practice¶
In this lesson, we will practice working with DataFrames.
import pandas as pd
import doctest
Our Dataset!¶
Run the cell below to see the DataFrame you'll be working with today. It is a dataset about the competing bakers at the Great Seattle Bake Off.
great_seattle_bakeoff = pd.DataFrame({
"BakerID": [101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111],
"Name": ["Arona", "Hannah", "Renusree", "Arpan", "Mia", "Asmi", "Alyssa", "Vani", "Vatsal", "Laura", "Suh Young"],
"DessertBaked": ["red velvet cake", "basque cheesecake", "baumkuchen", "pound cake", "german chocolate cake", "victoria sponge cake", "birthday cake", "carrot cake", "matcha cake", "tres leches", "fruit cake"],
"FlavorScore": [60, 92, 78, 40, 38, 73, 50, 59, 75, 99, 0],
"PresentationScore": [95, 80, 88, 92, 98, 100, 98, 60, 77, 100, 100],
"CityOfOrigin": ["Paris", "New York City", "Florence", "Seattle", "Seattle", "Paris", "New York City", "Seattle", "Paris", "New York City", "Florence"],
"StartedBaking": [2006, 2007, 2010, 2020, 2020, 2015, 2019, 2013, 2018, 2018, 1325]
})
great_seattle_bakeoff
BakerID | Name | DessertBaked | FlavorScore | PresentationScore | CityOfOrigin | StartedBaking | |
---|---|---|---|---|---|---|---|
0 | 101 | Arona | red velvet cake | 60 | 95 | Paris | 2006 |
1 | 102 | Hannah | basque cheesecake | 92 | 80 | New York City | 2007 |
2 | 103 | Renusree | baumkuchen | 78 | 88 | Florence | 2010 |
3 | 104 | Arpan | pound cake | 40 | 92 | Seattle | 2020 |
4 | 105 | Mia | german chocolate cake | 38 | 98 | Seattle | 2020 |
5 | 106 | Asmi | victoria sponge cake | 73 | 100 | Paris | 2015 |
6 | 107 | Alyssa | birthday cake | 50 | 98 | New York City | 2019 |
7 | 108 | Vani | carrot cake | 59 | 60 | Seattle | 2013 |
8 | 109 | Vatsal | matcha cake | 75 | 77 | Paris | 2018 |
9 | 110 | Laura | tres leches | 99 | 100 | New York City | 2018 |
10 | 111 | Suh Young | fruit cake | 0 | 100 | Florence | 1325 |
Individual Activity¶
Display all the column names in the cell below.
great_seattle_bakeoff.columns
BakerID 101 Name Arona DessertBaked red velvet cake FlavorScore 60 PresentationScore 95 CityOfOrigin Paris StartedBaking 2006 Name: 0, dtype: object
Then, select the bakers who recieved more than 70 points on their cake's flavor.
great_seattle_bakeoff.loc[great_seattle_bakeoff['FlavorScore'] > 70]
BakerID | Name | DessertBaked | FlavorScore | PresentationScore | CityOfOrigin | StartedBaking | |
---|---|---|---|---|---|---|---|
1 | 102 | Hannah | basque cheesecake | 92 | 80 | New York City | 2007 |
2 | 103 | Renusree | baumkuchen | 78 | 88 | Florence | 2010 |
5 | 106 | Asmi | victoria sponge cake | 73 | 100 | Paris | 2015 |
8 | 109 | Vatsal | matcha cake | 75 | 77 | Paris | 2018 |
9 | 110 | Laura | tres leches | 99 | 100 | New York City | 2018 |
Group Activity¶
Add a new column called CreativityScore
with the values [100, 80, 70, 90, 100, 20, 80, 90, 100, 70, 70]
great_seattle_bakeoff['CreativityScore'] = [100, 80, 70, 90, 100, 20, 80, 90, 100, 70, 70]
great_seattle_bakeoff
Now, calculate the mean of the FlavorScore
, PresentationScore
, and CreativityScore
column, and store that value under a new column, TotalScore
.
great_seattle_bakeoff['TotalScore'] = great_seattle_bakeoff[["FlavorScore",
"PresentationScore",
"CreativityScore"]].mean(axis=1)
great_seattle_bakeoff
Next, change the index of the DataFrame to the BakerID
and find what Baker 105's CityOfOrigin
is. What about their StartedBaking
year?
great_seattle_bakeoff = great_seattle_bakeoff.set_index("BakerID")
great_seattle_bakeoff
# Who is Baker 105?
great_seattle_bakeoff.loc[[105]][["CityOfOrigin", "StartedBaking"]]
Whole Class Activity¶
Write a function advancing_bakers
that advances bakers into "Advanced" if their TotalScore
is 85 or above, and "Eliminated" otherwise. Apply this function to create a new column CompetitionStatus
.
def advancing_bakers(data):
"""
Returns the bakers who will advance to the next round of the Great Seattle Bake Off.
Parameters:
- DataFrame: data of a round of the Great Seattle Bake Off
>>> advancing_bakers(great_seattle_bakeoff).loc[101]["CompetitionStatus"]
'Advanced'
"""
advanced = data["TotalScore"] >= 85
eliminated = data["TotalScore"] < 85
data.loc[advanced, "CompetitionStatus"] = "Advanced"
data.loc[eliminated, "CompetitionStatus"] = "Eliminated"
return data
doctest.run_docstring_examples(advancing_bakers, globals())