I have the following dataframe (sample):
import pandas as pd
data = [[1, 1, 'no'], [2, 1, 'no'], [3, 3, 'pos'], [4, 3, 'no'], [5, 3, 'no'],
[6, 2, 'neg'], [7, 2, 'no'], [8, 2, 'no'], [9, 3, 'pos'], [10, 3, 'no']]
df = pd.DataFrame(data = data, columns = ['seq', 'val', 'change'])
seq val change
0 1 1 no
1 2 1 no
2 3 3 pos
3 4 3 no
4 5 3 no
5 6 2 neg
6 7 2 no
7 8 2 no
8 9 3 pos
9 10 3 no
This dataframe describes changes per row. If the "val" increases, the "change" will be called 'pos' and if the value decreases will it be 'neg' otherwise it is called 'no' of no changes. I would like to add a column called "last change" which identifies what the last change was of the "val" in the "change" column. Here is the desired output:
data = [[1, 1, 'no', 'no'], [2, 1, 'no', 'no'], [3, 3, 'pos', 'pos'], [4, 3, 'no', 'pos'], [5, 3, 'no', 'pos'],
[6, 2, 'neg', 'neg'], [7, 2, 'no', 'neg'], [8, 2, 'no', 'neg'], [9, 3, 'pos', 'pos'], [10, 3, 'no', 'pos']]
df_desired = pd.DataFrame(data = data, columns = ['seq', 'val', 'change', 'last change'])
seq val change last change
0 1 1 no no
1 2 1 no no
2 3 3 pos pos
3 4 3 no pos
4 5 3 no pos
5 6 2 neg neg
6 7 2 no neg
7 8 2 no neg
8 9 3 pos pos
9 10 3 no pos
As you can see for seq 1 and 2 the "last change" is 'no' but for seq 3 the "change" column changes to 'pos' which means that in "last change" the value will also be 'pos' until the "change" will be 'neg' or again 'pos'. In this case the next change is at seq 6. So I was wondering if anyone knows how to identify these last changes using Pandas
?
CodePudding user response:
You can use a mask (where
) and ffill
:
df['last change'] = (df['change']
# keep only pos/neg values
.where(df['change'].isin(['pos', 'neg']))
# forward fill and restore initial values
.ffill().fillna(df['change'])
)
output:
seq val change last change
0 1 1 no no
1 2 1 no no
2 3 3 pos pos
3 4 3 no pos
4 5 3 no pos
5 6 2 neg neg
6 7 2 no neg
7 8 2 no neg
8 9 3 pos pos
9 10 3 no pos
Intermediates:
seq val change where ffill fillna
0 1 1 no NaN NaN no
1 2 1 no NaN NaN no
2 3 3 pos pos pos pos
3 4 3 no NaN pos pos
4 5 3 no NaN pos pos
5 6 2 neg neg neg neg
6 7 2 no NaN neg neg
7 8 2 no NaN neg neg
8 9 3 pos pos pos pos
9 10 3 no NaN pos pos