I have the following toy dataframe example:
import pandas as pd
df = pd.DataFrame({'id': [0, 0, 0], 'key':['role', 'role', 'role'], 'val': ['admin', 'local_usr', 'fin_dep_ds']})
As you may see the same key has multiple values.
When I pivot the table, the agg
function I am using is ",".join()
but the final goal is to apply one hot encoding on these values:
pd.DataFrame(pd.pivot_table(df, \
values='val', \
index='id', \
columns='key', \
aggfunc=','.join).to_records())
Final goal:
id admin local_usr fin_dep_ds
0 1 1 1
Please advise how can I do it? Any best practice to tackle this situation?
CodePudding user response:
IIUC, .pivot_table()
with aggfunc="size"
produces your result:
x = df.pivot_table(index="id", columns="val", aggfunc="size").reset_index()
x.columns.name = None
print(x)
Prints:
id admin fin_dep_ds local_usr
0 0 1 1 1
CodePudding user response:
Another option is to use pandas.crosstab
res = (
pd.crosstab(index=df.id, columns=df.val)
.reset_index()
.rename_axis(columns=None)
)
Output:
>>> res
id admin fin_dep_ds local_usr
0 0 1 1 1
Setup:
import pandas as pd
df = pd.DataFrame({
'id': [0, 0, 0],
'key':['role', 'role', 'role'],
'val': ['admin', 'local_usr', 'fin_dep_ds']
})