Return rows that have the same value in one column but the column value is unknown-CodePudding

I have a large dataset called pop and want to return the only 2 rows that have the same value in column 'J'. I do not know what rows have the same value and do not know what the common value is... I want to return these two rows.

Without knowing the common value, this code is not helpful:

pop.loc[pop['X'] == some_value]

I tried this but it returned the entire dataset:

pop.query('X' == 'X')

Any input is appreciated...

CodePudding user response：

You can do .value_counts() then get the first element, which has been sorted to be the most common value.

I'll use some dummy data here:

In [2]: df = pd.DataFrame(['a', 'b', 'c', 'd', 'b', 'f'], columns=['X'])

In [3]: df
Out[3]: 
   X
0  a
1  b
2  c
3  d
4  b
5  f

In [4]: wanted_value = df['X'].value_counts().index[0]

In [5]: wanted_value
Out[5]: 'b'

In [6]: df[df['X'] == wanted_value]
Out[6]: 
   X
1  b
4  b

For reference, df['X'].value_counts() is:

b    2
a    1
c    1
d    1
f    1
Name: X, dtype: int64

CodePudding user response：

Thanks, I figured out another way that seemed a bit easier...

pop['X'].value_counts()

the top value was 21 and showed '2', indicating 21 was the duplicated value; all remaining values indicated '1', no duplicates

pop.loc[pop['X'] == 21]

returned the 2 rows with the duplicated value in column X.