Home > Software engineering >  Make dummy variable for categorical data, based on ID column with duplicate values in python
Make dummy variable for categorical data, based on ID column with duplicate values in python

Time:03-23

I have the following pandas dataframe:

    ID    value
0   1     A
1   1     B
2   1     C
3   2     B
4   10    C
5   4     C
6   4     A

I want to make dummy variables for the values in the column 'value', for each value in the column 'ID'. So I want it this:

    ID    A    B    C
0   1     1    1    1
1   2     0    1    0
2   10    0    0    1
3   4     1    0    1

How can I do this in python?

CodePudding user response:

Use crosstab with limit counts to 1 by DataFrame.clip:

df1  = (pd.crosstab(df['ID'], df['value'])
          .clip(upper=1)
          .reset_index()
          .rename_axis(None, axis=1))
print (df1)
   ID  A  B  C
0   1  1  1  1
1   2  0  1  0
2   4  1  0  1
3  10  0  0  1
  • Related