I have a dataFrame:
df =
a b c d e
0 0 1 2 3 4
1 1 2 3 0 4
2 2 3 1 4 0
I would like to get the values that occur N times in a certain column.
For example, if I want to get all the values that occur 2 times in column "e", i would get result = [4]
, and if I would like to get all the values that occur 1 time in column "d", I would get result = [3,0,4]
.
I can do df['e'].value_counts() == 2
but that gives a True/False series. I would just want to get the values in "True".
CodePudding user response:
What you did returns a True/False series, so we need to use this to get the index values!
col = 'd'
n = 1
df[col].value_counts() == n
# 3 True
# 0 True
# 4 True
# Name: d, dtype: bool
To get the indeces that have True
behind them, we can do:
df[col].value_counts().index[df[col].value_counts() == n]
# Int64Index([3, 0, 4], dtype='int64')
To create a list, we only need to use list()
:
list(df[col].value_counts().index[df[col].value_counts() == n])
# [3, 0, 4]
EDIT:
You can assign val_counts = df[col].value_counts()
and use this like so (or see the answer from @jezrael):
list(val_counts.index[val_counts == n])
# [3, 0, 4]
CodePudding user response:
You can filter index
values after Series.value_counts
:
s = df['e'].value_counts()
L = s.index[s.eq(2)].tolist()
print (L)
[4]
s = df['d'].value_counts()
L = s.index[s.eq(1)].tolist()
print (L)
[0, 4, 3]