I have a dataframe called df
that looks like this:
Provider | fid | pid | datetime | score | system |
---|---|---|---|---|---|
CHE-229 | 2bfc9a62 | 2f43d557 | 2021-09-26 | -3.0 | SOFA |
CHE-229 | 78d5d845 | 88c59d92 | 2021-09-12 | -4.0 | SAPS |
I would like to create a new column specific to the values from system. e.g. I want to create a new column called SOFA and another column called SAPS with their respective scores in their table.
The output I want is:
Provider | fid | pid | datetime | SOFA | SAPS |
---|---|---|---|---|---|
CHE-229 | 2bfc9a62 | 2f43d557 | 2021-09-26 | -3.0 | |
CHE-229 | 78d5d845 | 88c59d92 | 2021-09-12 | -4.0 |
CodePudding user response:
You can get this with an iterative procedure using numpy.where()
to choose the value depending on a condition, and then dropping the original columns:
for sys in df["system"].unique():
df[sys] = np.where(df["system"] == sys, df["score"], None)
df = df.drop(columns=["system", "score"])
CodePudding user response:
df.pivot(df.columns[:-2], 'system', 'score').fillna('').reset_index()
system Provider fid pid datetime SAPS SOFA
0 CHE-229 2bfc9a62 2f43d557 2021-09-26 -3.0
1 CHE-229 78d5d845 88c59d92 2021-09-12 -4.0
If you want them as numeric, then you can remove the fillna()
part or even fill the nan with 0. ie .fillna(0)