Home > Back-end >  Check last change that occur per row in a Dataframe using Pandas
Check last change that occur per row in a Dataframe using Pandas

Time:10-03

I have the following dataframe (sample):

import pandas as pd

data = [[1, 1, 'no'], [2, 1, 'no'], [3, 3, 'pos'], [4, 3, 'no'], [5, 3, 'no'], 
        [6, 2, 'neg'], [7, 2, 'no'], [8, 2, 'no'], [9, 3, 'pos'], [10, 3, 'no']]
df = pd.DataFrame(data = data, columns = ['seq', 'val', 'change'])

   seq  val change
0    1    1     no
1    2    1     no
2    3    3    pos
3    4    3     no
4    5    3     no
5    6    2    neg
6    7    2     no
7    8    2     no
8    9    3    pos
9   10    3     no

This dataframe describes changes per row. If the "val" increases, the "change" will be called 'pos' and if the value decreases will it be 'neg' otherwise it is called 'no' of no changes. I would like to add a column called "last change" which identifies what the last change was of the "val" in the "change" column. Here is the desired output:

data = [[1, 1, 'no', 'no'], [2, 1, 'no', 'no'], [3, 3, 'pos', 'pos'], [4, 3, 'no', 'pos'], [5, 3, 'no', 'pos'], 
        [6, 2, 'neg', 'neg'], [7, 2, 'no', 'neg'], [8, 2, 'no', 'neg'], [9, 3, 'pos', 'pos'], [10, 3, 'no', 'pos']]
df_desired = pd.DataFrame(data = data, columns = ['seq', 'val', 'change', 'last change'])

   seq  val change last change
0    1    1     no          no
1    2    1     no          no
2    3    3    pos         pos
3    4    3     no         pos
4    5    3     no         pos
5    6    2    neg         neg
6    7    2     no         neg
7    8    2     no         neg
8    9    3    pos         pos
9   10    3     no         pos

As you can see for seq 1 and 2 the "last change" is 'no' but for seq 3 the "change" column changes to 'pos' which means that in "last change" the value will also be 'pos' until the "change" will be 'neg' or again 'pos'. In this case the next change is at seq 6. So I was wondering if anyone knows how to identify these last changes using Pandas?

CodePudding user response:

You can use a mask (where) and ffill:

df['last change'] = (df['change']
                    # keep only pos/neg values
                    .where(df['change'].isin(['pos', 'neg']))
                    # forward fill and restore initial values
                    .ffill().fillna(df['change'])
                    )

output:

   seq  val change last change
0    1    1     no          no
1    2    1     no          no
2    3    3    pos         pos
3    4    3     no         pos
4    5    3     no         pos
5    6    2    neg         neg
6    7    2     no         neg
7    8    2     no         neg
8    9    3    pos         pos
9   10    3     no         pos

Intermediates:

   seq  val change where ffill fillna
0    1    1     no   NaN   NaN     no
1    2    1     no   NaN   NaN     no
2    3    3    pos   pos   pos    pos
3    4    3     no   NaN   pos    pos
4    5    3     no   NaN   pos    pos
5    6    2    neg   neg   neg    neg
6    7    2     no   NaN   neg    neg
7    8    2     no   NaN   neg    neg
8    9    3    pos   pos   pos    pos
9   10    3     no   NaN   pos    pos
  • Related