Home > OS >  Find the median of a column based on criteria within dataframe and insert this as a new column
Find the median of a column based on criteria within dataframe and insert this as a new column

Time:11-06

I'm trying to look in a dataframe, and find the median of data within a column based on another column.

I have a dataframe with 'zipcode' data and 'price' data. I want to find the median of the 'price' based on the 'zipcode', and report it in a new column. When I run the program as is, I get a column that reports the median of the whole dataset, but I want to add the column such that we would get the median of each zip code reported. What is the piece I am missing?

'''

d = {'zipcode': [99516, 99516, 99516, 99516, 89507, 89507, 89507], 
    'price': [15000, 14000, 13000, 78000, 3000, 4000, 500]}
df = pd.DataFrame(data=d)

medians = df.groupby(['zipcode','price'])['price'].transform('median')

df['median'] = df['price'].median()
df 

'''

CodePudding user response:

You should groupby with zip code only

df['median_cal'] = df.groupby('zipcode')['price'].transform('median')
  • Related