Home > Software design >  Convert categories to binary columns (concat the category columns)
Convert categories to binary columns (concat the category columns)

Time:01-10

Want to convert the categories to binary columns, concatenated to the df. Category column values should be new columns with 0 or 1s for each id based on if the value is present or not.

df = pd.DataFrame({"id": [0,1,1,3,3],
                     "value1": ["ryan", "delta", "delta", "delta", "alpha"], 
                     "category": ["teacher", "pilot", "engineer", "pilot", "teacher"], 
                     "value2": [1, 1, 2, 3, 7]})
df

Answer df should be:

finaldf = pd.DataFrame({"id": [0,1,3],
                       "teacher":[1,0,1],
                       "pilot":[0,1,1],
                       "engineer": [0,1,0]})

CodePudding user response:

Use pd.get_dummies:

finaldf = pd.get_dummies(df, columns=["category"], prefix="", prefix_sep="")

output:

     value1  value2  engineer  pilot  teacher
0  0   ryan       1         0      0        1
1  1  delta       1         0      1        0
2  2  delta       2         1      0        0
3  3  delta       3         0      1        0

Edit to new question ;)

Use pd.crosstab with some additional cleaning:

finaldf = pd.crosstab(df["id"], df["category"]).reset_index().rename_axis(columns=None)

output:

   id  engineer  pilot  teacher
0   0         0      0        1
1   1         1      1        0
2   3         0      1        1
  • Related