I want to use pandas
rolling
function to compare weather the first element is smaller than the second one. I think the following codes should work:
import numpy as np
import pandas as pd
df = pd.DataFrame(data=np.random.randint(0,10,10), columns=['temperature'])
df.rolling(window=2).apply(lambda x: x[0] < x[1])
but it does not work. Instead, I got an error message:
ValueError: 0 is not in range
Does anyone know what caused the issue?
Update:
I know I can use the diff
function, but what I really want to do is something like this
df.rolling(window=3).apply(lambda x: x[0] < x[1] < x[2])
CodePudding user response:
IIUC, what you want to achieve if determining if the temperature is greater that the previous one?
You could use diff
:
df['temperature'].diff().gt(0)
To check consecutive increases:
N = 3 # 3 consecutive values are increasing (= 2 increases)
df['increases2'] = df['temperature'].diff().gt(0).rolling(N-1).sum().eq(N-1)
example:
>>> df['increases'] = df['temperature'].diff().gt(0)
>>> df['increases2'] = df['temperature'].diff().gt(0).rolling(N-1).sum().eq(N-1)
temperature increases increases2
0 7 False False
1 7 False False
2 9 True False
3 1 False False
4 7 True False
5 0 False False
6 6 True False
7 9 True True
8 9 False False
9 7 False False
CodePudding user response:
Replacing the x[n] with x.iloc[n] should work (using positional indexing)
import numpy as np
import pandas as pd
df = pd.DataFrame(data=np.random.randint(0,10,10), columns=['temperature'])
df['increasing'] = df.rolling(window=2).apply(lambda x: x.iloc[0] < x.iloc[1])
temperature increasing
0 8 NaN
1 9 1.0
2 0 0.0
3 3 1.0
4 8 1.0
5 7 0.0
6 7 0.0
7 8 1.0
8 7 0.0
9 6 0.0
Why?:
The value of 'x' in your lambda function looks something like this:
first iteration:
index temperature
0 8
1 9
second iteration:
index temperature
1 9
2 0
third iteration:
index temperature
2 0
3 3
The first iteration works because the index 0 and 1 are available (so x[0] < x[1]
works fine). However, in the second iteration, the index 0 isn't available and x[0] fails with your ValueError. My solution uses positional indexing (with .iloc) and ignores those index values (see https://pandas.pydata.org/docs/user_guide/indexing.html).
This is also why your code works fine with two rows e.g.
df = pd.DataFrame(data=np.random.randint(0,10,2), columns=['temperature'])
df.rolling(window=2).apply(lambda x: x[0] < x[1])