I have a dataframe of houses, one row per house, that looks like this:
data = [
['Oxford', 2016, True],
['Oxford', 2016, True],
['Oxford', 2018, False],
['Cambridge', 2016, False],
['Cambridge', 2016, True]
]
df = pd.DataFrame(data, columns=['town', 'year', 'is_detached'])
town year is_detached
0 Oxford 2016 True
1 Oxford 2016 True
2 Oxford 2018 False
3 Cambridge 2016 False
4 Cambridge 2016 True
And I want to end up with a table that looks like this:
town total_houses_2016 total_houses_2018 is_detached_2016 is_detached_2018
0 Oxford 2 1 2 0
1 Cambridge 2 0 1 0
Currently I'm doing two separate groupby calls, and then joining them together:
by_town_totals = df.groupby([df.town, df.year])\
.size()\
.reset_index()\
.pivot(index=["town"], columns="year", values=0).fillna(0)\
.add_prefix('total_houses_')
by_town_detached = df.groupby([df.town, df.year])\
.is_detached.sum().reset_index()\
.pivot(index=["town"], columns="year", values="is_detached").fillna(0)\
.add_prefix('is_detached_')
by_town = pd.concat([by_town_totals, by_town_detached], axis=1).reset_index()
Is there a way I could do this with a single groupby?
CodePudding user response:
data = [
['Oxford', 2016, True],
['Oxford', 2016, True],
['Oxford', 2018, False],
['Cambridge', 2016, False],
['Cambridge', 2016, True]
]
df = pd.DataFrame(data, columns=['town', 'year', 'is_detached'])
data = df.groupby(['town', 'year']).agg({'is_detached': 'sum', 'year': 'count'}).rename(columns={'year': 'total_houses'}).reset_index().pivot(index='town', columns='year', values=['is_detached', 'total_houses']).fillna(0)
data.columns = ['_'.join(map(str, col)).strip() for col in data.columns.values]
data.reset_index()[['town','total_houses_2016','total_houses_2018','is_detached_2016','is_detached_2018']]
CodePudding user response:
df.year = df.year.astype(str)
df = df.pivot_table(index='town',
columns='year',
values='is_detached',
aggfunc=['size', 'sum'],
fill_value=0)
df.columns = (df.columns.to_flat_index()
.str.join('_')
.str.replace('size','total_houses')
.str.replace('sum', 'is_detached'))
print(df.reset_index())
Output:
town total_houses_2016 total_houses_2018 is_detached_2016 is_detached_2018
0 Cambridge 2 0 1 0
1 Oxford 2 1 2 0
CodePudding user response:
Check Below code:
new_df = pd.crosstab(df['town'], df['year'], df['is_detached'], aggfunc=['size','sum']).reset_index().fillna(0)
new_df.columns = ['is_detached_' str(i[1]) if i[0]=='sum' else 'total_houses_' str(i[1]) if i[0]=='size' else i[0] for i in new_df.columns]
new_df
Output: