Python CSV check fields with same id in column if value is null in another column-CodePudding

I want to check all fields with a specific ID and delete the rows if their values are null in another column.

for example in the below table, I'm checking the Id (84) fields in patientId column, if all fields are null in pathology column, then i should delete this rows.

Thanks!

CodePudding user response：

df = pd.DataFrame({
    'id': [1,1,2,2],
    'value': [1,None,2,None]
})
df = df[~((df['id'] == 1) & df['value'].isna())]
print(df.to_markdown(index=False))
|   id |   value |
|-----:|--------:|
|    1 |       1 |
|    2 |       2 |
|    2 |     nan |

CodePudding user response：

I find duplicate strings where there is null. I get labels by comparing the main data frame([False False True False False True False False True]). Apply masking using ~, that is, I show rows with opposite indexes.

import pandas as pd

df = pd.DataFrame({'patientld': [89, 84, 84, 9, 9, 84, 5, 3, 84],
                   'pathology': ['null', 'null', 'null', 'yes', 'null', 'null', 'yes', 'yes', 'null']})

a = df[(df.duplicated()) & (df['pathology'].isin(['null']))]
index = df.index.isin(a.index)
print(df[~index])

Output

   patientld pathology
0         89      null
1         84      null
3          9       yes
4          9      null
6          5       yes
7          3       yes