search for common value(patient ID) from column 1 and if all there values in other column (pathologi-CodePudding

PATIENT_ID	PATHOLOGIES
12	null
12	null
3	patho1
3	null
5	patho2
2	patho1
12	null

If you can see, patient ID 12 is always null but others can be null or has pathologies if the same ID is always null, I want to delete it with the related rows in all columns note: I have 2 million ID, so I want a code to search for the ID's (Python, CSV)

CodePudding user response：

To remove all patients with only "null" variables you can use this example:

import csv
from itertools import groupby


with open("input.csv", "r") as f_in:
    reader = csv.reader(f_in)
    next(reader)  # skip header

    out = []
    for id_, g in groupby(sorted(reader), lambda k: k[0]):
        g = list(g)
        if all(pathology == "null" for _, pathology in g):
            continue
        out.extend(g)

with open("output.csv", "w") as f_out:
    writer = csv.writer(f_out)

    writer.writerow(["PATIENT_ID", "PATHOLOGIES"])
    writer.writerows(out)

This creates output.csv:

PATIENT_ID	PATHOLOGIES
2	patho1
3	null
3	patho1
5	patho2