Home > Software engineering >  Remove rows in dataframe based on value counts of unique identifier
Remove rows in dataframe based on value counts of unique identifier

Time:05-06

Identifier Value_1 Value_2 Value_3
123        20.     30.     1
123.       12.     14.     1
123.       18.     12.     1
124.       12.     10.     6
124.       12.     16.     1
...
321.       14.     20.     3
Size 871 x 24

Hi! I have a dataframe/questionaire with size 871 x 24. The dataframe consist of questionaire answers made by a number of participants each with a unique ID in the "Identifier" column. I want to be able to filter only the participants who made 10 or more responses from the total dataframe. So far, I've managed to filter out which of the ID's fit this, by using:

df['Identifier'].value_counts()>=10

But how do I remove them from the total dataframe and make a new still containing the other columns and values?

CodePudding user response:

Use Series.map with boolean indexing:

df[df['Identifier'].map(df['Identifier'].value_counts())>=10]
  • Related