Home > front end >  Return rows that have the same value in one column but the column value is unknown
Return rows that have the same value in one column but the column value is unknown

Time:10-02

I have a large dataset called pop and want to return the only 2 rows that have the same value in column 'J'. I do not know what rows have the same value and do not know what the common value is... I want to return these two rows.

Without knowing the common value, this code is not helpful:

pop.loc[pop['X'] == some_value]

I tried this but it returned the entire dataset:

pop.query('X' == 'X')

Any input is appreciated...

CodePudding user response:

You can do .value_counts() then get the first element, which has been sorted to be the most common value.

I'll use some dummy data here:

In [2]: df = pd.DataFrame(['a', 'b', 'c', 'd', 'b', 'f'], columns=['X'])

In [3]: df
Out[3]: 
   X
0  a
1  b
2  c
3  d
4  b
5  f

In [4]: wanted_value = df['X'].value_counts().index[0]

In [5]: wanted_value
Out[5]: 'b'

In [6]: df[df['X'] == wanted_value]
Out[6]: 
   X
1  b
4  b

For reference, df['X'].value_counts() is:

b    2
a    1
c    1
d    1
f    1
Name: X, dtype: int64

CodePudding user response:

Thanks, I figured out another way that seemed a bit easier...

pop['X'].value_counts()

  • the top value was 21 and showed '2', indicating 21 was the duplicated value; all remaining values indicated '1', no duplicates

pop.loc[pop['X'] == 21]

  • returned the 2 rows with the duplicated value in column X.
  • Related