Given the following table:
quarter town flat_type median_rent
2011 CLEMENTI 4-RM 2250
2011 CLEMENTI 4-RM 2300
2011 CLEMENTI 4-RM 2300
2011 CLEMENTI 4-RM 2400
The median average rent for 4-RM in 2011 is (2250 2300 2300 2400)/4 = 2312.5.
My current code:
test = df.groupby(["town","quarter"], as_index=True)
print(test['median_rent'].mean())
output:
town quarter
CLEMENTI 2011 5.625575e 14
2012 5.875638e 14
2013 6.500675e 14
2014 6.875675e 14
2015 6.625650e 14
2016 6.500650e 14
2017 6.250613e 14
2018 5.750575e 14
2019 7.667467e 10
2020 4.800470e 18
2021 6.000625e 14
Name: median_rent, dtype: float64
This is not what is expeced. What am I doing wrong?
CodePudding user response:
Judging by the sample code that you have written, you actually need to group by flat_type and quarter, not town and quarter. Just try
test = df.groupby(["flat_type", "quarter"], as_index=True)
and see whether it is what you were expecting.
CodePudding user response:
the groupby object should be used either to split the groupBy key and grouped df, or to apply actions directly to
example of applying mean directly to the groupby
import pandas as pd
df = pd.read_csv('tmp.csv')
print(df)
test = df.groupby(["town","quarter"], as_index=True)['median_rent'].mean()
print(test)
tmp.csv
quarter,town,flat_type,median_rent
2011,CLEMENTI,4-RM,2250
2011,CLEMENTI,4-RM,2300
2011,CLEMENTI,4-RM,2300
2011,CLEMENTI,4-RM,2400
2012,CLEMENT,4-RM,2250
2012,CLEMENT,4-RM,2300
2011,CLEMENT,4-RM,2300
2011,CLEMENT,4-RM,2400
output
quarter town flat_type median_rent
0 2011 CLEMENTI 4-RM 2250
1 2011 CLEMENTI 4-RM 2300
2 2011 CLEMENTI 4-RM 2300
3 2011 CLEMENTI 4-RM 2400
4 2012 CLEMENT 4-RM 2250
5 2012 CLEMENT 4-RM 2300
6 2011 CLEMENT 4-RM 2300
7 2011 CLEMENT 4-RM 2400
town quarter
CLEMENT 2011 2350.0
2012 2275.0
CLEMENTI 2011 2312.5
Name: median_rent, dtype: float64