I am just trying to get some data and re-arrange it. Here is my dataset showing foods and the scores they received in different years. What I want to do is find the foods which had the lowest and highest scores on average and track their scores across the years.
Here is my code so far:
#import
menu = pd.read_csv("Food.csv")
#Get average rank
avg_ranking = menu.groupby(['Food']).Score.mean().round()
#Convert to DataFrame
data2 = pd.DataFrame(avg_ranking.reset_index(name = "Group"))
#Display the highest and lowest ranking
filter_food = data2[['Food', 'Group']][(data2['Group'] == avg_ranking.max()) | (data2['Group'] == avg_ranking.min())]
#This is where I have been playing around with getting the individual min and max food names:
filter_food_max = data2[['Food']][(data2['Group'] == data2.Group.max())]
Max = filter_food_max.Food
filter_food_min = data2[['Food']][(data2['Group'] == data2.Group.min())]
Min = filter_food_min.Food
The next part is where I am a little stuck (please note - I do want to practice using the group-by, so it'll be super awesome if that is kept somewhere): I'd need to display the max and min foods from the original dataset that would show all the columns - Food, year, Score. This is what I have tried, but it doesn't work:
menu[menu.Food == Max & menu.Food == Min]
Basically I want it to display something like the below in a dataframe, so I can plot some graphs (i.e. I want to then make a line plot which would display the years on the x-axis, scores on the y-axis and plot the lowest scoring food and the top scoring food:
If you guys know any other ways of doing this (using the group-by function), please let me know!
Any help would be appreciated
CodePudding user response:
You can select first and last rows per year
by Series.duplicated
with invert mask and chain by |
for bitwise OR
, filter in boolean indexing
:
df1 = df[~df['year'].duplicated() | ~df['year'].duplicated(keep='last')]
Solution with groupby:
df1 = df.groupby('year').agg(['first','last']).stack(1).droplevel(1).reset_index()
If need minimal and maximal per year
s:
df = df.sort_values(['year','food'])
df2 = df[~df['year'].duplicated() | ~df['year'].duplicated(keep='last')]
Solution with groupby:
df2 = df.loc[df.groupby('year')['Score'].agg(['idxmax','idxmin']).stack()]