I am looking to calculate the changed in mental health scores of each individual between two timepoints.
Each user has a name, and a mental health score from 3 different timepoints. I would like to calculate the change in mental health score between timepoint 3 and 1
Below is example of df I'm starting with:
User Timepoint Mental Health Score
Bill 1 5
Bill 2 10
Bill 3 15
Wiz 1 10
Wiz 2 10
Wiz 3 15
Sam 1 5
Sam 2 5
Sam 3 5
This is desired output:
User Timepoint Mental Health Score Change in Mental Health (TP1 and 3)
Bill 1 5
Bill 2 10
Bill 3 15 10
Wiz 1 10
Wiz 2 10
Wiz 3 15 5
Sam 1 5
Sam 2 5
Sam 3 5 0
Does anyone know how to do this?
CodePudding user response:
You can accomplish this using shift()
and np.where()
df['Change in Mental Health (TP1 and 3)'] = df['Mental Health Score'] - df['Mental Health Score'].shift(2)
df['Change in Mental Health (TP1 and 3)'] = np.where(df['Timepoint'] != 3, 0, df['Change in Mental Health (TP1 and 3)']).astype(int)
df
CodePudding user response:
Try with groupby
and where
:
#sort by Timepoint if needed
#df = df.sort_values("Timepoint")
changes = df.groupby("User")["Mental Health Score"].transform('last')-df.groupby("User")["Mental Health Score"].transform('first')
df["Change"] = changes.where(df["Timepoint"].eq(3))
>>> df
User Timepoint Mental Health Score Change
0 Bill 1 5 NaN
1 Bill 2 10 NaN
2 Bill 3 15 10.0
3 Wiz 1 10 NaN
4 Wiz 2 10 NaN
5 Wiz 3 15 5.0
6 Sam 1 5 NaN
7 Sam 2 5 NaN
8 Sam 3 5 0.0
CodePudding user response:
As it is stated already in the comments you can groupby
your dataframe on User
and calculate difference on Mental Health Score
I put a snippet code here to demonstrate
def _overall_change(scores):
return scores.iloc[-1] - scores.iloc[0]
person = df.groupby('User')['Score'].agg(_overall_change)
CodePudding user response:
Using groupby
and a merge
:
g = df.sort_values(by='Timepoint').groupby('User')['Mental Health Score']
s = pd.concat({3: g.last()-g.first()})
# User
# 3 Bill 10
# Sam 0
# Wiz 5
# Name: Mental Health Score, dtype: int64
df.merge(s, left_on=['Timepoint', 'User'], right_index=True, how='left')
output:
User Timepoint Mental Health Score_x Mental Health Score_y
0 Bill 1 5 NaN
1 Bill 2 10 NaN
2 Bill 3 15 10.0
3 Wiz 1 10 NaN
4 Wiz 2 10 NaN
5 Wiz 3 15 5.0
6 Sam 1 5 NaN
7 Sam 2 5 NaN
8 Sam 3 5 0.0
CodePudding user response:
Here's another possible solution:
import pandas as pd
def calculate_change(mhs):
mhs = list(mhs)
return mhs[-1] - mhs[0]
df = df.sort_values(["User", "Timepoint"])
diff = df.groupby('User')['Mental Health Score'].agg(calculate_change)
df = pd.merge(df, diff, how='left', left_on='User', right_index=True)
df.columns = ['User', 'Timepoint', 'Mental Health Score', 'Change']
df['Change'] = df['Change'].loc[df['Timepoint']==3]
print(df)
Output
User Timepoint Mental Health Score Change
0 Bill 1 5 NaN
1 Bill 2 10 NaN
2 Bill 3 15 10.0
3 Wiz 1 10 NaN
4 Wiz 2 10 NaN
5 Wiz 3 15 5.0
6 Sam 1 5 NaN
7 Sam 2 5 NaN
8 Sam 3 5 0.0