I have a table with diagnoses icd_code
and patients subject_id
as seen in the screenshot below.
One patient can have several diagnoses. I would like to arrange the table such that subject_id
is the index and include a list of all diagnoses of that patient horizontally in separate columns. Is there a way I can do that with pandas? Also, how can I count how many diagnoses each patient has?
Thanks and best regards
CodePudding user response:
Try with groupby
and pivot
:
df["idx"] = df.groupby("subject_id").cumcount() 1
output = df.pivot("subject_id", "idx", "icd_code")
output["Count"] = output.count(axis=1)
>>> output
idx 1 2 3 4 5 Count
subject_id
11442057 65971 64231 V270 64511 NaN 4
15734973 2825 V0251 V270 64891 66481 5