Home > Software engineering >  pandas: get mean and median of derived value?
pandas: get mean and median of derived value?

Time:10-21

Sorry, this probably isn't a very good title, but I'm not sure how else to explain it.

I have a dataframe of houses in different towns:

data = [
  ['Oxford', 2016, True],
  ['Oxford', 2016, True],
  ['Oxford', 2018, False],
  ['Cambridge', 2016, False],
  ['Cambridge', 2016, True],
  ['Brighton', 2019, True],
]
df = pd.DataFrame(data, columns=['town', 'year_built', 'is_detached'])

I want to get the mean and median number of houses per town.

How can I do this?

I know how to get the mean (hackily):

len(df) / len(df.town.value_counts())

But I don't know how to get the median.

CodePudding user response:

Use value_counts to get the number of houses per town, then agg with 'mean' and 'median':

df['town'].value_counts().agg(['mean', 'median'])

output:

mean      2.0
median    2.0
Name: town, dtype: float64
  • Related