Home > Software design >  Pandas - check if a value has appeared in previous rows
Pandas - check if a value has appeared in previous rows

Time:07-18

I have a column in DataFrame that looks like this:

Col1
A
B
A
C
B

I want to add a boolean column that indicates for each row whether the value in that row has appeared in the previous rows. The desired output would look like this:

Col1 col2
A True
B True
A False
C True
B False

How can I achieve it? I've tried window.expanding() with isin(), but it appears to apply to numeric columns only (mine contains strings only).

CodePudding user response:

Use Series.duplicated with invert mask by ~, alterntive solution is use DataFrame.duplicated with specify column name:

df['col2'] = ~df['Col1'].duplicated()
#alternative solution
#df['col2'] = ~df.duplicated('Col1')

print (df)
  Col1   col2
0    A   True
1    B   True
2    A  False
3    C   True
4    B  False

Details:

print (df['Col1'].duplicated())
0    False
1    False
2     True
3    False
4     True
Name: Col1, dtype: bool

CodePudding user response:

Just use duplicated and invert the result with ~:

df['col2'] = ~df['Col1'].duplicated()

output:

  Col1   col2
0    A   True
1    B   True
2    A  False
3    C   True
4    B  False
  • Related