I have a pandas dataframe and I want to create a new dummy variable based on if the values of a variable in my dataframe equal values in a list.
df = pd.DataFrame({'variable1':[1,2,3,4,5,6,7,8],
'variable2':['a', 'r', 'b', 'w', 'c', 'p', 'l', 'a']})
my_list = ['a', 'b', 'c', 'd', 'e']
How can I create a new dummy variable for the dataframe, called variable 3, that equals 1 if variable 2 is present in the list and 0 if not?
I tried this using:
df['variable3'] = np.where(
dataset['variable2'] in my_list,
1, 0)
However, this throws a ValueError: The truth value of a Series is ambiguous.
I've been looking for an answer for this for a long time but none were sufficient for this problem.
Do you have any suggestions?
CodePudding user response:
You're almost there. When you want to check if the value of a dataframe column matches some list or another dataframe column, you can use df.isin
.
df['variable3'] = np.where(
df['variable2'].isin(my_list),
1, 0)
df
Out[16]:
variable1 variable2 variable3
0 1 a 1
1 2 r 0
2 3 b 1
3 4 w 0
4 5 c 1
5 6 p 0
6 7 l 0
7 8 a 1