Home > OS >  Pandas: how to get only values of columns that value_counts() equals N
Pandas: how to get only values of columns that value_counts() equals N

Time:10-14

I have a dataFrame:

df = 
    a   b   c   d   e
0   0   1   2   3   4
1   1   2   3   0   4
2   2   3   1   4   0

I would like to get the values that occur N times in a certain column.

For example, if I want to get all the values that occur 2 times in column "e", i would get result = [4], and if I would like to get all the values that occur 1 time in column "d", I would get result = [3,0,4].

I can do df['e'].value_counts() == 2 but that gives a True/False series. I would just want to get the values in "True".

CodePudding user response:

What you did returns a True/False series, so we need to use this to get the index values!

col = 'd'
n = 1
df[col].value_counts() == n
# 3    True
# 0    True
# 4    True
# Name: d, dtype: bool

To get the indeces that have True behind them, we can do:

df[col].value_counts().index[df[col].value_counts() == n]
# Int64Index([3, 0, 4], dtype='int64')

To create a list, we only need to use list():

list(df[col].value_counts().index[df[col].value_counts() == n])
# [3, 0, 4]

EDIT: You can assign val_counts = df[col].value_counts() and use this like so (or see the answer from @jezrael):

list(val_counts.index[val_counts == n])
# [3, 0, 4]

CodePudding user response:

You can filter index values after Series.value_counts:

s = df['e'].value_counts()

L = s.index[s.eq(2)].tolist()
print (L)
[4]

s = df['d'].value_counts()

L = s.index[s.eq(1)].tolist()
print (L)
[0, 4, 3]
  • Related