Home > Back-end >  Check if both before and after values are higher per value per group
Check if both before and after values are higher per value per group

Time:10-28

I have the following dataframe (sample):

import pandas as pd

data = [['A', '2022-09-01', 2], ['A', '2022-09-02', 1], ['A', '2022-09-03', 1.5], ['A', '2022-09-04', 1.5],
        ['A', '2022-09-05', 1], ['A', '2022-09-06', 1], ['A', '2022-09-07', 0.5], ['A', '2022-09-08', 1],
        ['B', '2022-09-01', 3], ['B', '2022-09-02', 2], ['B', '2022-09-03', 1], ['B', '2022-09-04', 2],
        ['B', '2022-09-05', 1], ['B', '2022-09-06', 0.5], ['B', '2022-09-07', 1.2], ['B', '2022-09-08', 1.2]]
df = pd.DataFrame(data = data, columns = ['group', 'date', 'value'])

   group        date  value
0      A  2022-09-01    2.0
1      A  2022-09-02    1.0
2      A  2022-09-03    1.5
3      A  2022-09-04    1.5
4      A  2022-09-05    1.0
5      A  2022-09-06    1.0
6      A  2022-09-07    0.5
7      A  2022-09-08    1.0
8      B  2022-09-01    3.0
9      B  2022-09-02    2.0
10     B  2022-09-03    1.0
11     B  2022-09-04    2.0
12     B  2022-09-05    1.0
13     B  2022-09-06    0.5
14     B  2022-09-07    1.2
15     B  2022-09-08    1.2

I would like to check per "value" if the values before and after are both higher per value. If the before and after values are both higher, it should return True otherwise False in the column called "class". Here is the desired output:

data = [['A', '2022-09-01', 2, False], ['A', '2022-09-02', 1, True], ['A', '2022-09-03', 1.5, False], ['A', '2022-09-04', 1.5, False],
        ['A', '2022-09-05', 1, False], ['A', '2022-09-06', 1, False], ['A', '2022-09-07', 0.5, True], ['A', '2022-09-08', 1, False],
        ['B', '2022-09-01', 3, False], ['B', '2022-09-02', 2, False], ['B', '2022-09-03', 1, True], ['B', '2022-09-04', 2, False],
        ['B', '2022-09-05', 1, False], ['B', '2022-09-06', 0.5, True], ['B', '2022-09-07', 1.2, False], ['B', '2022-09-08', 1.2, False]]
df_desired = pd.DataFrame(data = data, columns = ['group', 'date', 'value', 'class'])

   group        date  value  class
0      A  2022-09-01    2.0  False
1      A  2022-09-02    1.0   True
2      A  2022-09-03    1.5  False
3      A  2022-09-04    1.5  False
4      A  2022-09-05    1.0  False
5      A  2022-09-06    1.0  False
6      A  2022-09-07    0.5   True
7      A  2022-09-08    1.0  False
8      B  2022-09-01    3.0  False
9      B  2022-09-02    2.0  False
10     B  2022-09-03    1.0   True
11     B  2022-09-04    2.0  False
12     B  2022-09-05    1.0  False
13     B  2022-09-06    0.5   True
14     B  2022-09-07    1.2  False
15     B  2022-09-08    1.2  False

Some explanations:

  • For example row 1: has a before value of 2 and after value of 1.5 which means that they are both higher than 1, so class = True.
  • For example row 3 has a before value of 1.5 and after value of 1 which means that not both values are higher, so class = False.
  • For example row 9: has a before value of 3 and after value of 1 which means that not both values are higher, so class = False.

So I was wondering if anyone knows how to check per value if the value before and after are both higher or not using pandas?

CodePudding user response:

Use and condition on comparing with previous and next row

df['class'] = df['value'].lt(df['value'].shift()) & df['value'].lt(df['value'].shift(-1))
# or
df['class'] = (df['value'] < df['value'].shift()) & (df['value'] < df['value'].shift(-1))
print(df)

   group        date  value  class
0      A  2022-09-01    2.0  False
1      A  2022-09-02    1.0   True
2      A  2022-09-03    1.5  False
3      A  2022-09-04    1.5  False
4      A  2022-09-05    1.0  False
5      A  2022-09-06    1.0  False
6      A  2022-09-07    0.5   True
7      A  2022-09-08    1.0  False
8      B  2022-09-01    3.0  False
9      B  2022-09-02    2.0  False
10     B  2022-09-03    1.0   True
11     B  2022-09-04    2.0  False
12     B  2022-09-05    1.0  False
13     B  2022-09-06    0.5   True
14     B  2022-09-07    1.2  False
15     B  2022-09-08    1.2  False

If you want to check within each group, you can use groupby.transform

df['class'] = (df.groupby('group')['value']
               .transform(lambda col: col.lt(col.shift()) & col.lt(col.shift(-1))))
print(df)

   group        date  value  class
0      A  2022-09-01    2.0  False
1      A  2022-09-02    1.0   True
2      A  2022-09-03    1.5  False
3      A  2022-09-04    1.5  False
4      A  2022-09-05    1.0  False
5      A  2022-09-06    1.0  False
6      A  2022-09-07    0.5   True
7      A  2022-09-08    1.0  False
8      B  2022-09-01    3.0  False
9      B  2022-09-02    2.0  False
10     B  2022-09-03    1.0   True
11     B  2022-09-04    2.0  False
12     B  2022-09-05    1.0  False
13     B  2022-09-06    0.5   True
14     B  2022-09-07    1.2  False
15     B  2022-09-08    1.2  False
  • Related