Home > database >  Pandas: reshaping data pandas
Pandas: reshaping data pandas

Time:01-11

I have the following dataset:

df =
id test score date
1  A    1     2000-01-01
1  B    3     2000-01-01
1  A    1     2000-01-30
1  B    5     2000-01-30
2  A    2     2000-01-01
2  B    4     2000-01-01
3  A    6     2000-01-01
4  A    2     2000-01-01

I want to group_by id and date in order to get columns containing the type of the test for the same id and date:

df_outcome =
id testA testB date
1  1     3     2000-01-01
1  1     5     2000-01-30
2  2     4     2000-01-01

note that individuals 3 and 4 do not have test A and B in the same date and therefore are excluded from the analysis.

CodePudding user response:

(df.pivot(index=["id", "date"], columns="test", values="score")
   .dropna()
   .rename_axis(columns=None)
   .add_prefix("test")
   .reset_index()
   .convert_dtypes())
  • pivot over id & date with test variables in the column, scores as the content
  • NaN will arise if a id-date pair doesn't have A or B score: so drop them
  • move the "test" from the name of the columns to be the prefixes of the column names, i.e., testA, testB
  • move id & date back to the columns side
  • due to possible NaNs, things got floated, so convert datatypes for depromotion if possible, i.e., float to int here

to get

   id        date  testA  testB
0   1  2000-01-01      1      3
1   1  2000-01-30      1      5
2   2  2000-01-01      2      4
  • Related