I have the following dataframe (sample):
import pandas as pd
data = [['A', '2022-09-01', 2], ['A', '2022-09-02', 1], ['A', '2022-09-03', 1.5], ['A', '2022-09-04', 1.5],
['A', '2022-09-05', 1], ['A', '2022-09-06', 1], ['A', '2022-09-07', 0.5], ['A', '2022-09-08', 1],
['B', '2022-09-01', 3], ['B', '2022-09-02', 2], ['B', '2022-09-03', 1], ['B', '2022-09-04', 2],
['B', '2022-09-05', 1], ['B', '2022-09-06', 0.5], ['B', '2022-09-07', 1.2], ['B', '2022-09-08', 1.2]]
df = pd.DataFrame(data = data, columns = ['group', 'date', 'value'])
group date value
0 A 2022-09-01 2.0
1 A 2022-09-02 1.0
2 A 2022-09-03 1.5
3 A 2022-09-04 1.5
4 A 2022-09-05 1.0
5 A 2022-09-06 1.0
6 A 2022-09-07 0.5
7 A 2022-09-08 1.0
8 B 2022-09-01 3.0
9 B 2022-09-02 2.0
10 B 2022-09-03 1.0
11 B 2022-09-04 2.0
12 B 2022-09-05 1.0
13 B 2022-09-06 0.5
14 B 2022-09-07 1.2
15 B 2022-09-08 1.2
I would like to check per "value" if the values before and after are both higher per value. If the before and after values are both higher, it should return True otherwise False in the column called "class". Here is the desired output:
data = [['A', '2022-09-01', 2, False], ['A', '2022-09-02', 1, True], ['A', '2022-09-03', 1.5, False], ['A', '2022-09-04', 1.5, False],
['A', '2022-09-05', 1, False], ['A', '2022-09-06', 1, False], ['A', '2022-09-07', 0.5, True], ['A', '2022-09-08', 1, False],
['B', '2022-09-01', 3, False], ['B', '2022-09-02', 2, False], ['B', '2022-09-03', 1, True], ['B', '2022-09-04', 2, False],
['B', '2022-09-05', 1, False], ['B', '2022-09-06', 0.5, True], ['B', '2022-09-07', 1.2, False], ['B', '2022-09-08', 1.2, False]]
df_desired = pd.DataFrame(data = data, columns = ['group', 'date', 'value', 'class'])
group date value class
0 A 2022-09-01 2.0 False
1 A 2022-09-02 1.0 True
2 A 2022-09-03 1.5 False
3 A 2022-09-04 1.5 False
4 A 2022-09-05 1.0 False
5 A 2022-09-06 1.0 False
6 A 2022-09-07 0.5 True
7 A 2022-09-08 1.0 False
8 B 2022-09-01 3.0 False
9 B 2022-09-02 2.0 False
10 B 2022-09-03 1.0 True
11 B 2022-09-04 2.0 False
12 B 2022-09-05 1.0 False
13 B 2022-09-06 0.5 True
14 B 2022-09-07 1.2 False
15 B 2022-09-08 1.2 False
Some explanations:
- For example row 1: has a before value of 2 and after value of 1.5 which means that they are both higher than 1, so class = True.
- For example row 3 has a before value of 1.5 and after value of 1 which means that not both values are higher, so class = False.
- For example row 9: has a before value of 3 and after value of 1 which means that not both values are higher, so class = False.
So I was wondering if anyone knows how to check per value if the value before and after are both higher or not using pandas
?
CodePudding user response:
Use and
condition on comparing with previous and next row
df['class'] = df['value'].lt(df['value'].shift()) & df['value'].lt(df['value'].shift(-1))
# or
df['class'] = (df['value'] < df['value'].shift()) & (df['value'] < df['value'].shift(-1))
print(df)
group date value class
0 A 2022-09-01 2.0 False
1 A 2022-09-02 1.0 True
2 A 2022-09-03 1.5 False
3 A 2022-09-04 1.5 False
4 A 2022-09-05 1.0 False
5 A 2022-09-06 1.0 False
6 A 2022-09-07 0.5 True
7 A 2022-09-08 1.0 False
8 B 2022-09-01 3.0 False
9 B 2022-09-02 2.0 False
10 B 2022-09-03 1.0 True
11 B 2022-09-04 2.0 False
12 B 2022-09-05 1.0 False
13 B 2022-09-06 0.5 True
14 B 2022-09-07 1.2 False
15 B 2022-09-08 1.2 False
If you want to check within each group
, you can use groupby.transform
df['class'] = (df.groupby('group')['value']
.transform(lambda col: col.lt(col.shift()) & col.lt(col.shift(-1))))
print(df)
group date value class
0 A 2022-09-01 2.0 False
1 A 2022-09-02 1.0 True
2 A 2022-09-03 1.5 False
3 A 2022-09-04 1.5 False
4 A 2022-09-05 1.0 False
5 A 2022-09-06 1.0 False
6 A 2022-09-07 0.5 True
7 A 2022-09-08 1.0 False
8 B 2022-09-01 3.0 False
9 B 2022-09-02 2.0 False
10 B 2022-09-03 1.0 True
11 B 2022-09-04 2.0 False
12 B 2022-09-05 1.0 False
13 B 2022-09-06 0.5 True
14 B 2022-09-07 1.2 False
15 B 2022-09-08 1.2 False