Home > Back-end >  Dataframe same key with multiple values and One Hot Encoding (Python, Pandas)?
Dataframe same key with multiple values and One Hot Encoding (Python, Pandas)?

Time:04-17

I have the following toy dataframe example:

import pandas as pd
df = pd.DataFrame({'id': [0, 0, 0], 'key':['role', 'role', 'role'], 'val': ['admin', 'local_usr', 'fin_dep_ds']})

As you may see the same key has multiple values. When I pivot the table, the agg function I am using is ",".join() but the final goal is to apply one hot encoding on these values:

pd.DataFrame(pd.pivot_table(df, \
                            values='val', \
                            index='id', \
                            columns='key', \
                            aggfunc=','.join).to_records())

Final goal:

id admin local_usr fin_dep_ds
0  1     1         1

Please advise how can I do it? Any best practice to tackle this situation?

CodePudding user response:

IIUC, .pivot_table() with aggfunc="size" produces your result:

x = df.pivot_table(index="id", columns="val", aggfunc="size").reset_index()
x.columns.name = None
print(x)

Prints:

   id  admin  fin_dep_ds  local_usr
0   0      1           1          1

CodePudding user response:

Another option is to use pandas.crosstab

res = (
    pd.crosstab(index=df.id, columns=df.val)
      .reset_index()
      .rename_axis(columns=None)
)

Output:

>>> res

   id  admin  fin_dep_ds  local_usr
0   0      1           1          1

Setup:

import pandas as pd

df = pd.DataFrame({
    'id': [0, 0, 0], 
    'key':['role', 'role', 'role'], 
    'val': ['admin', 'local_usr', 'fin_dep_ds']
})
  • Related