pandas pivot table on multiple columns-CodePudding

INPUT TABLE

pcd	INCOME	Education	age1to_20	TG
a1001	INCOME_1	Education_1	1	1
a1003	INCOME_2	Education_2	0	2
a1001	INCOME_3	Education_2	5	2
a1002	INCOME_2	Education_2	1	5
a1003	INCOME_1	Education_2	3	4

REQUIRED OUTPUT

pcd	INCOME_1	INCOME_2	INCOME_3	Education_1	Education_2	age1to_20	TG
a1001	1	0	1	1	1	6	1.5
a1002	0	1	0	0	1	1	5
a1003	1	1	0	0	2	3	3

pcd is index and income1,income2,income3,education1,education2,age are aggregated to sum while TG is aggregated to average.

pd.pivot_table(df, index=['pcd', 'age1to_20'],
           aggfunc={'INCOME':sum,'Education'=sum,'age1to_20'=sum,'TG':avg},fill_value=0)

Tried above code but finding no success

CodePudding user response：

You can first melt, then pivot_table to reshape, and finally groupby.agg to combine the 'pcd':

agg_funcs = {'TG': 'mean', 'pcd': 'first'}

out = (df
   .melt(['pcd', 'age1to_20', 'TG'])
   .assign(v=1)
   .pivot_table(index=['pcd', 'age1to_20', 'TG'], columns='value',
                values='v', fill_value=0)
   .reset_index().rename_axis(columns=None)
   .pipe(lambda d: d.groupby('pcd', as_index=False)
                    .agg({c: agg_funcs.get(c, 'sum') for c in d.columns}))
)

Output:

     pcd  age1to_20   TG  Education_1  Education_2  INCOME_1  INCOME_2  INCOME_3
0  a1001          6  1.5            1            1         1         0         1
1  a1002          1  5.0            0            1         0         1         0
2  a1003          3  3.0            0            2         1         1         0

CodePudding user response：

As an alternative, you can use crosstab and groupby:

x=pd.crosstab(df['pcd'],columns=df['INCOME'])
print(x)
'''
INCOME  INCOME_1  INCOME_2  INCOME_3
pcd                                 
a1001          1         0         1
a1002          0         1         0
a1003          1         1         0
'''

y=pd.crosstab([df['pcd']],columns=[df['Education']])
z=df.groupby('pcd').agg({'age1to_20':'sum','TG':'mean'})
final=x.join([y,z])
print(final)
'''
       INCOME_1  INCOME_2  INCOME_3  Education_1  Education_2  age1to_20   TG
pcd                                                                          
a1001         1         0         1            1            1          6  1.5
a1002         0         1         0            0            1          1  5.0
a1003         1         1         0            0            2          3  3.0
'''

CodePudding user response：

You also need to specify where the values are going to come from and which columns will specify your new column names.

Also, it's not very clear to me where your values are coming from.

But it would look like something like this, e.g.,

pd.pivot_table(df, index=['pcd', 'age1to_20'], 
           values=['age1to_20']
           columns=['income', 'education']
           aggfunc={'INCOME':sum,'TG':avg},fill_value=0)