I have a data frame of a few thousand rows. There is a column where each value in that column appears exactly twice. I want to locate the index of each matching value. The column looks like so:
col
1 cat
2 dog
3 bird
4 dog
5 bird
6 cat
And I would like to know the corresponding index where the match appears so it would return something like this:
[1] 6 4 5 2 3 1
CodePudding user response:
Group by the values in col
, and use np.roll
to shift group indices by 1 :
(the last index comes to front)
s = df['col']
s.groupby(s).transform(lambda x: np.roll(x.index, 1))
Result:
1 6
2 4
3 5
4 2
5 3
6 1
Name: col, dtype: int64
CodePudding user response:
You can use the following to get a list of all the index's for each word in col
df.reset_index().groupby(['col'])['index'].apply(list).reset_index(name='matching_index')
CodePudding user response:
This works for me:
df = pd.DataFrame(['Car','Bike','Truck','Car','Airplane'])
df.columns = ["Something"]
duplicateRows = df[df.duplicated()] # Just this line makes the magic =)
duplicateRows
The Output:
Something
3 Car
To consider:
df[df.duplicated()]
Find Duplicate Rows Across All Columns, if you want to keep the last value, you can use df[df.duplicated(keep='last')]
. And if you need find duplicates in a specific columns, you can use: df[df.duplicated(['column1', 'column2'])]