Home > Enterprise >  Pandas: Get indices rows of that match group-by value_count
Pandas: Get indices rows of that match group-by value_count

Time:04-29

I want to get the row indices for values that meet my search criteria.

Consider this data:

df = pd.DataFrame({'group': ['A', 'A', 'B', 'B', 'A'],
                   'item': ['BIOS', 'BIOS', 'Memory', 'Disk', 'BIOS'],
                   'val': [1, 2, 1, 2, 3]})

I can get the value counts like so:

df.value_counts(subset=['group', 'item'])

I can generate a new dataframe of results like so:

df.groupby('group')['item'].value_counts().loc[lambda x: x > 1].reset_index(name='count')

But how can I get the indices of the rows that return True when checking for my value_count condition?

# tacking on .index will yield a multi-index result
df.value_counts(subset=['group', 'item']).loc[lambda x: x > 2].index

Expected out:

[0, 1, 4]

CodePudding user response:

You can use groupby in place of value_counts:

df.groupby(['group', 'item']).filter(lambda g: len(g)>1).index

variant:

mask = df.groupby(['group', 'item'])['val'].transform('size').gt(1)
df.index[mask]

output:

Int64Index([0, 1, 4], dtype='int64')

NB. add .to_list() to get a list

CodePudding user response:

Let us do transform

df.index[df.groupby(['group', 'item'])['val'].transform('count')>1]
Out[403]: Int64Index([0, 1, 4], dtype='int64')
  • Related