I have a dataframe like this,
id Name
100 Apple
200 Orange
100 Banana
I want to create new columns based on the name, and mark them with 0 or 1.
Like this,
id Name_Apple Name_Orange Name_Banana
100 1 0 1
200 0 1 0
Id 100 is marked 1 for apple and banana in the name and 0 for orange. Id 200 marked only 1 for orange. I am not actually sure how to do this. Appreciate any help.
Thanks.
CodePudding user response:
Version 1: If you want to mark the entries with only either 0 or 1:
Use pd.get_dummies()
GroupBy.max()
, as follows:
df_out = (pd.get_dummies(df)
.groupby('id').max()
.reset_index()
)
Result:
print(df_out)
id Name_Apple Name_Banana Name_Orange
0 100 1 1 0
1 200 0 0 1
Version 2: If you want to mark the entries as count of occurrences (can be > 1)
If you want the entries to reflect the count of occurrences (can be multiple occurences and thus > 1), you can use pd.crosstab()
, as follows:
pd.crosstab(df['id'], df['Name'])
Data Input
id Name
0 100 Apple
1 200 Orange
2 100 Banana <=== multiple occurrences
3 100 Banana <=== multiple occurrences
Result
Name Apple Banana Orange
id
100 1 2 0
200 0 0 1
Note that Banana
for id==100
has a count of 2 instead of either 0 or 1.
You can also use pd.get_dummies()
, as follows:
df_out = (pd.get_dummies(df)
.groupby('id').sum()
.reset_index()
)
Here, we use GroupBy.sum()
instead of GroupBy.max()
Result:
print(df_out)
id Name_Apple Name_Banana Name_Orange
0 100 1 2 0
1 200 0 0 1