Home > Software design >  How do you filter duplicate columns in a dataframe based on a value in another column
How do you filter duplicate columns in a dataframe based on a value in another column

Time:02-24

I would like to filter duplicate rows in a DataFrame according to columns "NID", "Lact" and code when the column "Code" = 10.

The following data provides example data

data_list = {'NID': {1: '1', 2: '1', 3: '1', 4: '1', 5: '2', 6: '2', 7: '1'},
 'Lact': {1: 1, 2: 1, 3: 1, 4: 2, 5: 2, 6: 2, 7: 1},
            'Code': {1: 10, 2: 0, 3: 10, 4: 0, 5: 0, 6: 10, 7: 0}}

The DataFrame appears below

  NID Lact  Code
1   1   1   10
2   1   1   0
3   1   1   10
4   1   2   0
5   2   2   0
6   2   2   10
7   1   1   0

If I run the following filter to identify duplicates it identifies duplicate rows based on "NID", "Lact", and "Code"

df[(df.duplicated(['NID', 'Lact', 'Code'], keep=False))]

The output is provided below

  NID Lact  Code
1   1   1   10
2   1   1   0
3   1   1   10
7   1   1   0

I would like to make this filter conditional on Code = 10 as I would like to delete the first instance of duplicate rows when code = 10 but not when code is not equal to 10

Is there a way to add a condition for Code == 10 to this filter?

CodePudding user response:

If need remove first duplicated row if condition Code == 10 chain it with DataFrame.duplicated with default keep='first' parameter and if need also filter all duplicates chain m2 with & for bitwise AND:

m1 = df['Code'].eq(10)
m2 = df.duplicated(['NID', 'Lact', 'Code'], keep=False)
m3 = df.duplicated(['NID', 'Lact', 'Code'])


df = df[(~m1 | m3) & m2]
print (df)
  NID  Lact  Code
2   1     1     0
3   1     1    10
7   1     1     0

CodePudding user response:

IIUC, you want to keep all rows if Code is not equal to 10 but drop duplicates otherwise, right? Then you could add that into the boolean mask:

out = df[~df.duplicated(['NID', 'Lact', 'Code'], keep=False) | df['Code'].ne(10)]

Output:

  NID  Lact  Code
2   1     1     0
4   1     2     0
5   2     2     0
6   2     2    10
7   1     1     0
  • Related