I have a dataframe that looks like this:
df = pd.DataFrame({'patient_id': ['p1', 'p2', 'p3', 'p1', 'p2', 'p3'],
'treatment_time': ['pre', 'pre', 'pre', 'post', 'post', 'post'],
'val1': [1, 4, 9, 2, 6, 10],
'val2': [3, 5, 11, 1, 4, 9],
'val3': [2, 4, 6, 3, 5, 7],
})
print(df)
patient_id treatment_time val1 val2 val3
0 p1 pre 1 3 2
1 p2 pre 4 5 4
2 p3 pre 9 11 6
3 p1 post 2 1 3
4 p2 post 6 4 5
5 p3 post 10 9 7
My goal is to group data by patient_id
and treatment_time
and to calculate the difference between pre-treatment and post-treatment values. The expected output should look like this:
patient_id val1 val2 val3
0 p1 1.0 -2.0 1.0
1 p2 2.0 -1.0 1.0
2 p3 1.0 -2.0 1.0
CodePudding user response:
Use DataFrame.pivot
with subtract in MultiIndex DataFrame
selected by DataFrame.xs
:
df1 = df.pivot('patient_id','treatment_time')
print (df1)
val1 val2 val3
treatment_time post pre post pre post pre
patient_id
p1 2 1 1 3 3 2
p2 6 4 4 5 5 4
p3 10 9 9 11 7 6
df = df1.xs('post', axis=1, level=1).sub(df1.xs('pre', axis=1, level=1))
print (df)
val1 val2 val3
patient_id
p1 1 -2 1
p2 2 -1 1
p3 1 -2 1
CodePudding user response:
If you always have only one 'pre' and 'post' per patient_id and the rows are sorted, a simple trick would be:
cols = df.filter(like='val').columns
df.groupby('patient_id')[cols].diff().dropna(how='all')
output:
val1 val2 val3
3 1.0 -2.0 1.0
4 2.0 -1.0 1.0
5 1.0 -2.0 1.0