I'm trying to look in a dataframe, and find the median of data within a column based on another column.
I have a dataframe with 'zipcode' data and 'price' data. I want to find the median of the 'price' based on the 'zipcode', and report it in a new column. When I run the program as is, I get a column that reports the median of the whole dataset, but I want to add the column such that we would get the median of each zip code reported. What is the piece I am missing?
'''
d = {'zipcode': [99516, 99516, 99516, 99516, 89507, 89507, 89507],
'price': [15000, 14000, 13000, 78000, 3000, 4000, 500]}
df = pd.DataFrame(data=d)
medians = df.groupby(['zipcode','price'])['price'].transform('median')
df['median'] = df['price'].median()
df
'''
CodePudding user response:
You should groupby
with zip code only
df['median_cal'] = df.groupby('zipcode')['price'].transform('median')