I have a large dataset called pop
and want to return the only 2 rows that have the same value in column 'J'. I do not know what rows have the same value and do not know what the common value is... I want to return these two rows.
Without knowing the common value, this code is not helpful:
pop.loc[pop['X'] == some_value]
I tried this but it returned the entire dataset:
pop.query('X' == 'X')
Any input is appreciated...
CodePudding user response:
You can do .value_counts()
then get the first element, which has been sorted to be the most common value.
I'll use some dummy data here:
In [2]: df = pd.DataFrame(['a', 'b', 'c', 'd', 'b', 'f'], columns=['X'])
In [3]: df
Out[3]:
X
0 a
1 b
2 c
3 d
4 b
5 f
In [4]: wanted_value = df['X'].value_counts().index[0]
In [5]: wanted_value
Out[5]: 'b'
In [6]: df[df['X'] == wanted_value]
Out[6]:
X
1 b
4 b
For reference, df['X'].value_counts()
is:
b 2
a 1
c 1
d 1
f 1
Name: X, dtype: int64
CodePudding user response:
Thanks, I figured out another way that seemed a bit easier...
pop['X'].value_counts()
- the top value was 21 and showed '2', indicating 21 was the duplicated value; all remaining values indicated '1', no duplicates
pop.loc[pop['X'] == 21]
- returned the 2 rows with the duplicated value in column X.