I have a dataframe that I am working with and I would like to identify the second instance of a particular value in a column, and then map a value in a different column for that same row. An example might be easiest:
col1 col2
'a' 's'
'b' 's'
'f' 's'
'g' 's'
'b' 's'
'l' 's'
The above is a short example of the dataframe(s) that I am working with, so I would like to identify the second occurance of 'b'
and for that row map col2
to 'p'
.
So the output would be:
col1 col2
'a' 's'
'b' 's'
'f' 's'
'g' 's'
'b' 'p'
'l' 's'
I am quite new to dataframes so no existing code to work with I'm afraid. I should mention that there IS NO INDEXING for this dataframe, so iloc
cannot be used, otherwise it would be trivial.
CodePudding user response:
Use boolean indexing with two masks (equality to "b" and (&
) duplicated
to identify the second "b"):
# is the row a "b"?
m1 = df['col1'].eq('b')
# is the value duplicated? (i.e. a second "b")
m2 = df['col1'].duplicated()
# if both match, then update
df.loc[m1&m2, 'col2'] = 'p'
Output:
col1 col2
0 a s
1 b s
2 f s
3 g s
4 b p
5 l s
Intermediates:
col1 col2 m1 m2 m1&m2
0 a s False False False
1 b s True False False
2 f s False False False
3 g s False False False
4 b p True True True
5 l s False False False
CodePudding user response:
Use Series.duplicated
chained by &
for bitwise AND
with compare value by Series.eq
and pass to DataFrame.loc
- here all duplicated values without first are converted to p
:
df.loc[df['col1'].eq('b') & df['col1'].duplicated(), 'col2'] = 'p'
Or chain GroupBy.cumcount
for compare second value by compare 1
(python count from 0
):
m = df['col1'].eq('b')
df.loc[m & df[m].groupby('col1').cumcount().eq(1), 'col2'] = 'p'
Difference of solutions are if 3 or more values b
:
m = df['col1'].eq('b')
df.loc[m & df[m].groupby('col1').cumcount().eq(1), 'col3'] = 'p'
df.loc[df['col1'].eq('b') & df['col1'].duplicated(), 'col4'] = 'p'
print (df)
col1 col2 col3 col4
0 a s NaN NaN
1 b s NaN NaN
2 f s NaN NaN
3 g s NaN NaN
4 b s p p
5 l s NaN NaN
6 b s NaN p