I have a dataset with many names. I want to create a new column for each of certain names, with 1 if it's the same name, and 0 if not.
Original data:
Desired output:
I've tried the following:
names=['Tom','Sarah','Bob']
def function(x):
for n in names:
if (x['Name']==n):
return 1
else:
return 0
for n in names:
df[n]=df.apply(function,axis=1)
This doesn't work because it returns the 'Tom' column for all names:
What am I doing wrong?
CodePudding user response:
You can just do get_dummies
out = df.join(df.Name.str.get_dummies()[names])
CodePudding user response:
You needn't the for
loop in your function
.
You can use
names = ['Tom','Sarah','Bob']
for n in names:
df[n] = df['Name'].eq(n).astype(int)
Or with numpy broadcasting
df[names] = (df[['Name']].values == names).astype(int)