I want to get the row indices for values that meet my search criteria.
Consider this data:
df = pd.DataFrame({'group': ['A', 'A', 'B', 'B', 'A'],
'item': ['BIOS', 'BIOS', 'Memory', 'Disk', 'BIOS'],
'val': [1, 2, 1, 2, 3]})
I can get the value counts like so:
df.value_counts(subset=['group', 'item'])
I can generate a new dataframe of results like so:
df.groupby('group')['item'].value_counts().loc[lambda x: x > 1].reset_index(name='count')
But how can I get the indices of the rows that return True when checking for my value_count
condition?
# tacking on .index will yield a multi-index result
df.value_counts(subset=['group', 'item']).loc[lambda x: x > 2].index
Expected out:
[0, 1, 4]
CodePudding user response:
You can use groupby
in place of value_counts
:
df.groupby(['group', 'item']).filter(lambda g: len(g)>1).index
variant:
mask = df.groupby(['group', 'item'])['val'].transform('size').gt(1)
df.index[mask]
output:
Int64Index([0, 1, 4], dtype='int64')
NB. add .to_list()
to get a list
CodePudding user response:
Let us do transform
df.index[df.groupby(['group', 'item'])['val'].transform('count')>1]
Out[403]: Int64Index([0, 1, 4], dtype='int64')