I have the following dataset:
df =
id test score date
1 A 1 2000-01-01
1 B 3 2000-01-01
1 A 1 2000-01-30
1 B 5 2000-01-30
2 A 2 2000-01-01
2 B 4 2000-01-01
3 A 6 2000-01-01
4 A 2 2000-01-01
I want to group_by id
and date
in order to get columns containing the type of the test for the same id
and date
:
df_outcome =
id testA testB date
1 1 3 2000-01-01
1 1 5 2000-01-30
2 2 4 2000-01-01
note that individuals 3 and 4 do not have test A and B in the same date and therefore are excluded from the analysis.
CodePudding user response:
(df.pivot(index=["id", "date"], columns="test", values="score")
.dropna()
.rename_axis(columns=None)
.add_prefix("test")
.reset_index()
.convert_dtypes())
- pivot over id & date with test variables in the column, scores as the content
- NaN will arise if a id-date pair doesn't have A or B score: so drop them
- move the "test" from the name of the columns to be the prefixes of the column names, i.e., testA, testB
- move id & date back to the columns side
- due to possible NaNs, things got floated, so convert datatypes for depromotion if possible, i.e., float to int here
to get
id date testA testB
0 1 2000-01-01 1 3
1 1 2000-01-30 1 5
2 2 2000-01-01 2 4