Home > Software design >  Calculate difference between grouped elements in pandas
Calculate difference between grouped elements in pandas

Time:11-23

I have a dataframe that looks like this:

df = pd.DataFrame({'patient_id': ['p1', 'p2', 'p3', 'p1', 'p2', 'p3'],
              'treatment_time': ['pre', 'pre', 'pre', 'post', 'post', 'post'],
              'val1': [1, 4, 9, 2, 6, 10],
              'val2': [3, 5, 11, 1, 4, 9],
              'val3': [2, 4, 6, 3, 5, 7],
              })
print(df)

  patient_id treatment_time  val1  val2  val3
0         p1            pre     1     3     2
1         p2            pre     4     5     4
2         p3            pre     9    11     6
3         p1           post     2     1     3
4         p2           post     6     4     5
5         p3           post    10     9     7

My goal is to group data by patient_id and treatment_time and to calculate the difference between pre-treatment and post-treatment values. The expected output should look like this:

  patient_id  val1  val2  val3
0         p1   1.0  -2.0   1.0
1         p2   2.0  -1.0   1.0
2         p3   1.0  -2.0   1.0

CodePudding user response:

Use DataFrame.pivot with subtract in MultiIndex DataFrame selected by DataFrame.xs:

df1 = df.pivot('patient_id','treatment_time')
print (df1)
               val1     val2     val3    
treatment_time post pre post pre post pre
patient_id                               
p1                2   1    1   3    3   2
p2                6   4    4   5    5   4
p3               10   9    9  11    7   6

df = df1.xs('post', axis=1, level=1).sub(df1.xs('pre', axis=1, level=1))
print (df)
            val1  val2  val3
patient_id                  
p1             1    -2     1
p2             2    -1     1
p3             1    -2     1

CodePudding user response:

If you always have only one 'pre' and 'post' per patient_id and the rows are sorted, a simple trick would be:

cols = df.filter(like='val').columns
df.groupby('patient_id')[cols].diff().dropna(how='all')

output:

   val1  val2  val3
3   1.0  -2.0   1.0
4   2.0  -1.0   1.0
5   1.0  -2.0   1.0
  • Related