Pandas groupby two columns, one by row and another by column-CodePudding

I have a csv file that contains n rows of sales of houses.

House	House_type	Sale_year
One	Semi	2010
two	Flat	2011
three	bungalow	2012
four	Semi	2013
five	Semi	2013

I want to groupby the data by House_type (flat, bungalow, semi) by sale_year (2010,2011,etc) counts as columns. So I'm trying to output the data in the below format.

House_type	2010	2011	2012	2013
Semi	1	0	0	2
Flat	0	1	0	0
bungalow	0	0	1	0

However, when I run the code, it returns both House_type and Sale_year as two columns.

house= housedata.groupby(["House_type", "Sale_year"])["Sale_year"].count()
house

House_type               Sale_year
Flat                     2011.0          1
bungalow                 2012.0          1
Semi                     2010.0          1
                         2013.0          2

How do I get pandas to output the data desired?

Many thanks

CodePudding user response：

You can achieve the same using get_dummies method of pandas. It basically creates multiple columns for a categorical column and fills it with values.

df = pd.DataFrame({'House_type':['Semi','Flat','Bungalow','Semi','Semi'],'sale_year':[2010,2011,2012,2013,2013]})
df_final = pd.get_dummies(df,columns=['sale_year']).groupby('House_type').sum()
df_final

CodePudding user response：

You can use pivot_table here:

result = pd.pivot_table(df, index='House_type', columns='Sale_year',
                        aggfunc='count', fill_value=0)

It gives directly:

           House               
Sale_year   2010 2011 2012 2013
House_type                     
Flat           0    1    0    0
Semi           1    0    0    2
bungalow       0    0    1    0

You can format it a little if you want:

result.columns = result.columns.get_level_values(1).rename(None)
result = result.reset_index()

to finally get:

  House_type  2010  2011  2012  2013
0       Flat     0     1     0     0
1       Semi     1     0     0     2
2   bungalow     0     0     1     0