Home > Mobile >  pandas pivot table on multiple columns
pandas pivot table on multiple columns

Time:11-16

INPUT TABLE

pcd INCOME Education age1to_20 TG
a1001 INCOME_1 Education_1 1 1
a1003 INCOME_2 Education_2 0 2
a1001 INCOME_3 Education_2 5 2
a1002 INCOME_2 Education_2 1 5
a1003 INCOME_1 Education_2 3 4

REQUIRED OUTPUT

pcd INCOME_1 INCOME_2 INCOME_3 Education_1 Education_2 age1to_20 TG
a1001 1 0 1 1 1 6 1.5
a1002 0 1 0 0 1 1 5
a1003 1 1 0 0 2 3 3

pcd is index and income1,income2,income3,education1,education2,age are aggregated to sum while TG is aggregated to average.

pd.pivot_table(df, index=['pcd', 'age1to_20'],
           aggfunc={'INCOME':sum,'Education'=sum,'age1to_20'=sum,'TG':avg},fill_value=0)

Tried above code but finding no success

CodePudding user response:

You can first melt, then pivot_table to reshape, and finally groupby.agg to combine the 'pcd':

agg_funcs = {'TG': 'mean', 'pcd': 'first'}

out = (df
   .melt(['pcd', 'age1to_20', 'TG'])
   .assign(v=1)
   .pivot_table(index=['pcd', 'age1to_20', 'TG'], columns='value',
                values='v', fill_value=0)
   .reset_index().rename_axis(columns=None)
   .pipe(lambda d: d.groupby('pcd', as_index=False)
                    .agg({c: agg_funcs.get(c, 'sum') for c in d.columns}))
)

Output:

     pcd  age1to_20   TG  Education_1  Education_2  INCOME_1  INCOME_2  INCOME_3
0  a1001          6  1.5            1            1         1         0         1
1  a1002          1  5.0            0            1         0         1         0
2  a1003          3  3.0            0            2         1         1         0

CodePudding user response:

As an alternative, you can use crosstab and groupby:

x=pd.crosstab(df['pcd'],columns=df['INCOME'])
print(x)
'''
INCOME  INCOME_1  INCOME_2  INCOME_3
pcd                                 
a1001          1         0         1
a1002          0         1         0
a1003          1         1         0
'''

y=pd.crosstab([df['pcd']],columns=[df['Education']])
z=df.groupby('pcd').agg({'age1to_20':'sum','TG':'mean'})
final=x.join([y,z])
print(final)
'''
       INCOME_1  INCOME_2  INCOME_3  Education_1  Education_2  age1to_20   TG
pcd                                                                          
a1001         1         0         1            1            1          6  1.5
a1002         0         1         0            0            1          1  5.0
a1003         1         1         0            0            2          3  3.0
'''

CodePudding user response:

You also need to specify where the values are going to come from and which columns will specify your new column names.

Also, it's not very clear to me where your values are coming from.

But it would look like something like this, e.g.,

pd.pivot_table(df, index=['pcd', 'age1to_20'], 
           values=['age1to_20']
           columns=['income', 'education']
           aggfunc={'INCOME':sum,'TG':avg},fill_value=0)
  • Related