I have a column in DataFrame that looks like this:
Col1 |
---|
A |
B |
A |
C |
B |
I want to add a boolean column that indicates for each row whether the value in that row has appeared in the previous rows. The desired output would look like this:
Col1 | col2 |
---|---|
A | True |
B | True |
A | False |
C | True |
B | False |
How can I achieve it? I've tried window.expanding()
with isin()
, but it appears to apply to numeric columns only (mine contains strings only).
CodePudding user response:
Use Series.duplicated
with invert mask by ~
, alterntive solution is use DataFrame.duplicated
with specify column name:
df['col2'] = ~df['Col1'].duplicated()
#alternative solution
#df['col2'] = ~df.duplicated('Col1')
print (df)
Col1 col2
0 A True
1 B True
2 A False
3 C True
4 B False
Details:
print (df['Col1'].duplicated())
0 False
1 False
2 True
3 False
4 True
Name: Col1, dtype: bool
CodePudding user response:
Just use duplicated
and invert the result with ~
:
df['col2'] = ~df['Col1'].duplicated()
output:
Col1 col2
0 A True
1 B True
2 A False
3 C True
4 B False