Home > Back-end >  In python pandas, how to use rolling function to compare the elements?
In python pandas, how to use rolling function to compare the elements?

Time:03-15

I want to use pandas rolling function to compare weather the first element is smaller than the second one. I think the following codes should work:

import numpy as np
import pandas as pd
df = pd.DataFrame(data=np.random.randint(0,10,10), columns=['temperature'])
df.rolling(window=2).apply(lambda x: x[0] < x[1])

but it does not work. Instead, I got an error message:

ValueError: 0 is not in range

Does anyone know what caused the issue?

Update: I know I can use the diff function, but what I really want to do is something like this

df.rolling(window=3).apply(lambda x: x[0] < x[1] < x[2])

CodePudding user response:

IIUC, what you want to achieve if determining if the temperature is greater that the previous one?

You could use diff:

df['temperature'].diff().gt(0)

To check consecutive increases:

N = 3  # 3 consecutive values are increasing (= 2 increases)
df['increases2'] = df['temperature'].diff().gt(0).rolling(N-1).sum().eq(N-1)

example:

>>> df['increases'] = df['temperature'].diff().gt(0)
>>> df['increases2'] = df['temperature'].diff().gt(0).rolling(N-1).sum().eq(N-1)

   temperature  increases  increases2
0            7      False       False
1            7      False       False
2            9       True       False
3            1      False       False
4            7       True       False
5            0      False       False
6            6       True       False
7            9       True        True
8            9      False       False
9            7      False       False

CodePudding user response:

Replacing the x[n] with x.iloc[n] should work (using positional indexing)

import numpy as np
import pandas as pd
df = pd.DataFrame(data=np.random.randint(0,10,10), columns=['temperature'])
df['increasing'] = df.rolling(window=2).apply(lambda x: x.iloc[0] < x.iloc[1])
   temperature  increasing
0            8         NaN
1            9         1.0
2            0         0.0
3            3         1.0
4            8         1.0
5            7         0.0
6            7         0.0
7            8         1.0
8            7         0.0
9            6         0.0

Why?:

The value of 'x' in your lambda function looks something like this:

first iteration:

index temperature
0 8
1 9

second iteration:

index temperature
1 9
2 0

third iteration:

index temperature
2 0
3 3

The first iteration works because the index 0 and 1 are available (so x[0] < x[1] works fine). However, in the second iteration, the index 0 isn't available and x[0] fails with your ValueError. My solution uses positional indexing (with .iloc) and ignores those index values (see https://pandas.pydata.org/docs/user_guide/indexing.html).

This is also why your code works fine with two rows e.g.

df = pd.DataFrame(data=np.random.randint(0,10,2), columns=['temperature'])
df.rolling(window=2).apply(lambda x: x[0] < x[1])
  • Related