INPUT TABLE
pcd | INCOME | Education | age1to_20 | TG |
---|---|---|---|---|
a1001 | INCOME_1 | Education_1 | 1 | 1 |
a1003 | INCOME_2 | Education_2 | 0 | 2 |
a1001 | INCOME_3 | Education_2 | 5 | 2 |
a1002 | INCOME_2 | Education_2 | 1 | 5 |
a1003 | INCOME_1 | Education_2 | 3 | 4 |
REQUIRED OUTPUT
pcd | INCOME_1 | INCOME_2 | INCOME_3 | Education_1 | Education_2 | age1to_20 | TG |
---|---|---|---|---|---|---|---|
a1001 | 1 | 0 | 1 | 1 | 1 | 6 | 1.5 |
a1002 | 0 | 1 | 0 | 0 | 1 | 1 | 5 |
a1003 | 1 | 1 | 0 | 0 | 2 | 3 | 3 |
pcd is index and income1,income2,income3,education1,education2,age are aggregated to sum while TG is aggregated to average.
pd.pivot_table(df, index=['pcd', 'age1to_20'],
aggfunc={'INCOME':sum,'Education'=sum,'age1to_20'=sum,'TG':avg},fill_value=0)
Tried above code but finding no success
CodePudding user response:
You can first melt
, then pivot_table
to reshape, and finally groupby.agg
to combine the 'pcd':
agg_funcs = {'TG': 'mean', 'pcd': 'first'}
out = (df
.melt(['pcd', 'age1to_20', 'TG'])
.assign(v=1)
.pivot_table(index=['pcd', 'age1to_20', 'TG'], columns='value',
values='v', fill_value=0)
.reset_index().rename_axis(columns=None)
.pipe(lambda d: d.groupby('pcd', as_index=False)
.agg({c: agg_funcs.get(c, 'sum') for c in d.columns}))
)
Output:
pcd age1to_20 TG Education_1 Education_2 INCOME_1 INCOME_2 INCOME_3
0 a1001 6 1.5 1 1 1 0 1
1 a1002 1 5.0 0 1 0 1 0
2 a1003 3 3.0 0 2 1 1 0
CodePudding user response:
As an alternative, you can use crosstab and groupby:
x=pd.crosstab(df['pcd'],columns=df['INCOME'])
print(x)
'''
INCOME INCOME_1 INCOME_2 INCOME_3
pcd
a1001 1 0 1
a1002 0 1 0
a1003 1 1 0
'''
y=pd.crosstab([df['pcd']],columns=[df['Education']])
z=df.groupby('pcd').agg({'age1to_20':'sum','TG':'mean'})
final=x.join([y,z])
print(final)
'''
INCOME_1 INCOME_2 INCOME_3 Education_1 Education_2 age1to_20 TG
pcd
a1001 1 0 1 1 1 6 1.5
a1002 0 1 0 0 1 1 5.0
a1003 1 1 0 0 2 3 3.0
'''
CodePudding user response:
You also need to specify where the values are going to come from and which columns will specify your new column names.
Also, it's not very clear to me where your values are coming from.
But it would look like something like this, e.g.,
pd.pivot_table(df, index=['pcd', 'age1to_20'],
values=['age1to_20']
columns=['income', 'education']
aggfunc={'INCOME':sum,'TG':avg},fill_value=0)