I tried to find the Desired_Output column, which is defined as follow: For each Name and Subj group, find the Min of all previous Score.
Name Date Subj Score Desired_Output
A 2022-05-11 1200 70.88 69.60
A 2022-03-20 1200 69.96 69.60
A 2022-02-23 1200 69.60 69.63
A 2022-01-26 1200 69.63 70.22
A 2022-01-05 1200 70.35 70.22
A 2021-12-08 1200 70.22 70.69
A 2021-11-17 1000 56.73 null
A 2021-11-10 1200 70.69 null
B 2022-05-07 1600 96.16 96.53
B 2022-04-24 1600 94.53 null
B 2022-03-20 2000 124.60 null
B 2022-02-27 1800 109.16 null
B 2022-02-03 1400 82.54 null
Here is the dataset:
pd.DataFrame({
'Name': ['A','A','A','A','A','A','A','A','B','B','B','B','B'],
'Date': ['2022-05-11','2022-03-20','2022-02-23','2022-01-26','2022-01-05','2021-12-08','2021-11-17','2021-11-10','2022-05-07','2022-04-24','2022-03-20','2022-02-27','2022-02-03'],
'Subj': [1200,1200,1200,1200,1200,1200,1000,1200,1600,1600,2000,1800,1400],
'Score': [70.88,69.96,69.6,69.63,70.35,70.22,56.73,70.69,96.16,94.53,124.6,109.16,82.54]})
I don't know how to achieve that in Pandas, especially without looping the DataFrame.
CodePudding user response:
Assuming the dates are sorted in reverse order, you can use a reversed cummin
shift
per group:
df['Desired'] = (df[::-1]
.groupby(['Name', 'Subj'])['Score']
.apply(lambda s: s.cummin().shift())
)
Output:
Name Date Subj Score Desired
0 A 2022-05-11 1200 70.88 69.60
1 A 2022-03-20 1200 69.96 69.60
2 A 2022-02-23 1200 69.60 69.63
3 A 2022-01-26 1200 69.63 70.22
4 A 2022-01-05 1200 70.35 70.22
5 A 2021-12-08 1200 70.22 70.69
6 A 2021-11-17 1000 56.73 NaN
7 A 2021-11-10 1200 70.69 NaN
8 B 2022-05-07 1600 96.16 94.53
9 B 2022-04-24 1600 94.53 NaN
10 B 2022-03-20 2000 124.60 NaN
11 B 2022-02-27 1800 109.16 NaN
12 B 2022-02-03 1400 82.54 NaN