Home > Net >  Drop rows in a pandas dataframe by criteria from another dataframe
Drop rows in a pandas dataframe by criteria from another dataframe

Time:11-04

I have the following dataframe containing scores for a competition as well as a column that counts what number entry for each person.

import pandas as pd

df = pd.DataFrame({'Name': ['John', 'Jim', 'John','Jim', 'John','Jim','John','Jim','John','Jim','Jack','Jack','Jack','Jack'],'Score': [10,8,9,3,5,0, 1, 2,3, 4,5,6,8,9]})
df['Entry_No'] = df.groupby(['Name']).cumcount()   1
df

enter image description here

Then I have another table that stores data on the maximum number of entries that each person can have:

df2 = pd.DataFrame({'Name': ['John', 'Jim', 'Jack'],'Limit': [2,3,1]})
df2

enter image description here

I am trying to drop rows from df where the entry number is greater than the Limit according to each person in df2 so that my expected output is this:

enter image description here

If there are any ideas on how to help me achieve this that would be fantastic! Thanks

CodePudding user response:

You can use pandas.merge to create another dataframe and drop columns by your condition:

df3 = pd.merge(df, df2, on="Name", how="left")
df3[df3["Entry_No"] <= df3["Limit"]][df.columns].reset_index(drop=True)
    Name  Score  Entry_No
0   John     10         1
1    Jim      8         1
2   John      9         2
3    Jim      3         2
4    Jim      0         3
5   Jack      5         1

I used how="left" to keep the order of df and reset_index(drop=True) to reset the index of the resulting dataframe.

CodePudding user response:

You could join the 2 dataframes, and then drop with a condition:

import pandas as pd

df = pd.DataFrame({'Name': ['John', 'Jim', 'John','Jim', 'John','Jim','John','Jim','John','Jim','Jack','Jack','Jack','Jack'],'Score': [10,8,9,3,5,0, 1, 2,3, 4,5,6,8,9]})
df['Entry_No'] = df.groupby(['Name']).cumcount()   1
df2 = pd.DataFrame({'Name': ['John', 'Jim', 'Jack'],'Limit': [2,3,1]})
df2 = df2.set_index('Name')

df = df.join(df2, on='Name')
df.drop(df[df.Entry_No>df.Limit].index, inplace = True)

gives the expected output

  • Related