I have a dataframe:
df = pd.DataFrame([['ann', 23.3, 0], ['bob', 36.5, 0], ['don', 29.3, 1], ['jul', 45.8, 0], ['ken', 36.2, -1], ['nic', 38.9, 1], ['pal', 16.7, 0], ['qiu', 32.5, -1], ['sun', 33.9, 0], ['tom', 28.5, 1]], columns = ['name', 'score', 'grade'])
df
name score grade
0 ann 23.3 0
1 bob 36.5 0
2 don 29.3 1
3 jul 45.8 0
4 ken 36.2 -1
5 nic 38.9 1
6 pal 16.7 0
7 qiu 32.5 -1
8 sun 33.9 0
9 tom 28.5 1
that I need to calculate the rolling (consecutive) difference between score
where grade
moves from 1 to -1
or from -1 to 1
then add them in two new columns:
df
name score grade down up
0 ann 23.3 0
1 bob 36.5 0
2 don 29.3 1 ↰
3 jul 45.8 0 ⎥
4 ken 36.2 -1 6.9 ↲↰
5 nic 38.9 1 2.7 ↲↰
6 pal 16.7 0 ⎥
7 qiu 32.5 -1 -6.4 ↲↰
8 sun 33.9 0 ⎥
9 tom 28.5 1 -4.0 ↲
that is, column down
has score
of [row 4 - row 2] as grade
is from 1 to -1, column up
has score
of [row 5 - row 4] as grade
is from -1 to 1, and so on.
Is there a pandas way to get the desired results without using for loop?
Note: 1 & -1s in grade
always alternate.
CodePudding user response:
First, we build an intermediate DataFrame that have nonzero grades. Since 1s and -1s always alternate, it suffices to analyze the difference between consecutive grade
values.
Again, since 1s and -1s alternate, the difference between consecutive scores
can be either -2 or 2, so depending on what it is, we can identify whether it is an up
row or a down
row.
tmp = df.loc[df['grade'].ne(0), ['score','grade']].diff()
down_idx = tmp['grade'].lt(0).loc[lambda x: x].index
df.loc[down_idx, 'down'] = tmp.loc[down_idx, 'score']
up_idx = tmp['grade'].gt(0).loc[lambda x: x].index
df.loc[up_idx, 'up'] = tmp.loc[up_idx, 'score']
Output:
name score grade down up
0 ann 23.3 0 NaN NaN
1 bob 36.5 0 NaN NaN
2 don 29.3 1 NaN NaN
3 jul 45.8 0 NaN NaN
4 ken 36.2 -1 6.9 NaN
5 nic 38.9 1 NaN 2.7
6 pal 16.7 0 NaN NaN
7 qiu 32.5 -1 -6.4 NaN
8 sun 33.9 0 NaN NaN
9 tom 28.5 1 NaN -4.0