Python Pandas: Index of a matching value within the same column-CodePudding

I have a data frame of a few thousand rows. There is a column where each value in that column appears exactly twice. I want to locate the index of each matching value. The column looks like so:

  col
1 cat
2 dog 
3 bird
4 dog
5 bird
6 cat

And I would like to know the corresponding index where the match appears so it would return something like this:

[1] 6 4 5 2 3 1

CodePudding user response：

Group by the values in col, and use np.roll to shift group indices by 1 :

(the last index comes to front)

s = df['col']
s.groupby(s).transform(lambda x: np.roll(x.index, 1))

Result:

1    6
2    4
3    5
4    2
5    3
6    1
Name: col, dtype: int64

CodePudding user response：

You can use the following to get a list of all the index's for each word in col

df.reset_index().groupby(['col'])['index'].apply(list).reset_index(name='matching_index')

CodePudding user response：

This works for me:

df = pd.DataFrame(['Car','Bike','Truck','Car','Airplane'])
df.columns = ["Something"]
duplicateRows = df[df.duplicated()] # Just this line makes the magic =)
duplicateRows

The Output:

    Something
3         Car

To consider:

df[df.duplicated()] Find Duplicate Rows Across All Columns, if you want to keep the last value, you can use df[df.duplicated(keep='last')]. And if you need find duplicates in a specific columns, you can use: df[df.duplicated(['column1', 'column2'])]