Home > Blockchain >  Pandas drop row if column value has appeared more than some number of times depending on the value
Pandas drop row if column value has appeared more than some number of times depending on the value

Time:09-04

I have a DataFrame that looks the following:

t = {1: ['A','B'], 2: ['D','F'], 3: ['A','C'], 4: ['B','E'], 5: [‘B’,’B’], 6: ['D','D'], 7: ['A','H']}
df = pd.DataFrame.from_dict(t,orient='index',columns=['X','Y'])
df

   X  Y
1  A  B
2  D  F
3  A  C
4  B  E
5  B  B
6  D  D
7  A  H

I then have a dictionary

d = {‘A’: 2, ‘B’: 1, ‘D’: 4}

What I would like to do is to drop the rows in my dataframe corresponding to the nth occurence of the value in the X column, where n is greater than the integer specified in my dictionary for that particular value, while preserving the order of the rows of my DataFrame. So the result of my operation with the above dictionary should be the DataFrame that looks like

   X  Y
1  A  B
2  D  F
3  A  C
4  B  E
6  D  D

whereas with the dictionary

d = {‘A’: 1, ‘B’: 2, ‘D’: 1}

it should look like

   X  Y
1  A  B
2  D  F
4  B  E
5  B  B

CodePudding user response:

You can use groupby.cumcount to enumerate the rows, then compare to the threshold with a map

mask = df.groupby('X').cumcount() < df['X'].map(d)

df[mask]
  • Related