Home > OS >  Pandas: rolling difference between rows based on alternating value changes in the other column
Pandas: rolling difference between rows based on alternating value changes in the other column

Time:05-10

I have a dataframe:

df = pd.DataFrame([['ann', 23.3, 0], ['bob', 36.5, 0], ['don', 29.3, 1], ['jul', 45.8, 0], ['ken', 36.2, -1], ['nic', 38.9, 1], ['pal', 16.7, 0], ['qiu', 32.5, -1], ['sun', 33.9, 0], ['tom', 28.5, 1]], columns = ['name', 'score', 'grade'])

df
  name  score  grade
0  ann   23.3      0
1  bob   36.5      0
2  don   29.3      1
3  jul   45.8      0
4  ken   36.2     -1
5  nic   38.9      1
6  pal   16.7      0
7  qiu   32.5     -1
8  sun   33.9      0
9  tom   28.5      1

that I need to calculate the rolling (consecutive) difference between score where grade moves from 1 to -1 or from -1 to 1 then add them in two new columns:

df
  name  score  grade  down    up
0  ann   23.3      0
1  bob   36.5      0
2  don   29.3      1            ↰
3  jul   45.8      0            ⎥
4  ken   36.2     -1   6.9      ↲↰
5  nic   38.9      1         2.7 ↲↰
6  pal   16.7      0              ⎥
7  qiu   32.5     -1  -6.4        ↲↰
8  sun   33.9      0               ⎥
9  tom   28.5      1        -4.0   ↲

that is, column down has score of [row 4 - row 2] as grade is from 1 to -1, column up has score of [row 5 - row 4] as grade is from -1 to 1, and so on.

Is there a pandas way to get the desired results without using for loop?

Note: 1 & -1s in grade always alternate.

CodePudding user response:

First, we build an intermediate DataFrame that have nonzero grades. Since 1s and -1s always alternate, it suffices to analyze the difference between consecutive grade values.

Again, since 1s and -1s alternate, the difference between consecutive scores can be either -2 or 2, so depending on what it is, we can identify whether it is an up row or a down row.

tmp = df.loc[df['grade'].ne(0), ['score','grade']].diff()
down_idx = tmp['grade'].lt(0).loc[lambda x: x].index
df.loc[down_idx, 'down'] = tmp.loc[down_idx, 'score']
up_idx = tmp['grade'].gt(0).loc[lambda x: x].index
df.loc[up_idx, 'up'] = tmp.loc[up_idx, 'score']

Output:

  name  score  grade  down   up
0  ann   23.3      0   NaN  NaN
1  bob   36.5      0   NaN  NaN
2  don   29.3      1   NaN  NaN
3  jul   45.8      0   NaN  NaN
4  ken   36.2     -1   6.9  NaN
5  nic   38.9      1   NaN  2.7
6  pal   16.7      0   NaN  NaN
7  qiu   32.5     -1  -6.4  NaN
8  sun   33.9      0   NaN  NaN
9  tom   28.5      1   NaN -4.0
  • Related