Create a new column that counts backwards from a specific point-CodePudding

I would like to look at an outcome in the time prior to a change in product and after a change in product. Here is an example df:

import pandas as pd

ids = [1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2]
date = ["11/4/2020", "12/5/2020", "01/5/2021", "02/5/2020", "03/5/2020", "04/5/2020", "05/5/2020", "06/5/2020", "07/5/2020", "08/5/2020", "09/5/2020",
        "01/3/2019", "02/3/2019", "03/3/2019", "04/3/2019", "05/3/2019", "06/3/2019", "07/3/2019", "08/3/2019", "09/3/2019", "10/3/2019"]
months = [0,1,2,3,4,0,1,2,3,4,5,0,1,2,3,4,0,1,2,3,4]

df = pd.DataFrame({'ids': ids,
                   'date': date,
                   'months': months
                   })

df

    ids       date  months
0     1  11/4/2020       0
1     1  12/5/2020       1
2     1  01/5/2021       2
3     1  02/5/2020       3
4     1  03/5/2020       4
5     1  04/5/2020       0
6     1  05/5/2020       1
7     1  06/5/2020       2
8     1  07/5/2020       3
9     1  08/5/2020       4
10    1  09/5/2020       5
11    2  01/3/2019       0
12    2  02/3/2019       1
13    2  03/3/2019       2
14    2  04/3/2019       3
15    2  05/3/2019       4
16    2  06/3/2019       0
17    2  07/3/2019       1
18    2  08/3/2019       2
19    2  09/3/2019       3
20    2  10/3/2019       4

This is what I would like the end result to be:

    ids       date  months  new_col
0     1  11/4/2020       0       -5
1     1  12/5/2020       1       -4
2     1  01/5/2021       2       -3
3     1  02/5/2020       3       -2
4     1  03/5/2020       4       -1
5     1  04/5/2020       0        0
6     1  05/5/2020       1        1
7     1  06/5/2020       2        2
8     1  07/5/2020       3        3
9     1  08/5/2020       4        4
10    1  09/5/2020       5        5
11    2  01/3/2019       0       -5
12    2  02/3/2019       1       -4
13    2  03/3/2019       2       -3
14    2  04/3/2019       3       -2
15    2  05/3/2019       4       -1
16    2  06/3/2019       0        0
17    2  07/3/2019       1        1
18    2  08/3/2019       2        2
19    2  09/3/2019       3        3
20    2  10/3/2019       4        4

In other words I would like to add a column that finds the second instance of months = 0 for a specific ID and counts backwards from that so I can look at outcomes before that point (all the negative numbers) vs the outcomes after that point (all the positive numbers).

Is there a simple way to do this in pandas?

Thanks in advance

CodePudding user response：

def fun():
    id, m = yield
    curid, out = id, -6
    while True:
        if curid == id:
            out  = 1
        else:
            out = -5
            curid = id
        id, m = yield out

f = fun()
next(f)

df['new_col'] = df.apply(lambda x: f.send((x['ids'], x['months'])), axis=1)
print(df)

    ids       date  months  new_col
0     1  11/4/2020       0       -5
1     1  12/5/2020       1       -4
2     1  01/5/2021       2       -3
3     1  02/5/2020       3       -2
4     1  03/5/2020       4       -1
5     1  04/5/2020       0        0
6     1  05/5/2020       1        1
7     1  06/5/2020       2        2
8     1  07/5/2020       3        3
9     1  08/5/2020       4        4
10    1  09/5/2020       5        5
11    2  01/3/2019       0       -5
12    2  02/3/2019       1       -4
13    2  03/3/2019       2       -3
14    2  04/3/2019       3       -2
15    2  05/3/2019       4       -1
16    2  06/3/2019       0        0
17    2  07/3/2019       1        1
18    2  08/3/2019       2        2
19    2  09/3/2019       3        3
20    2  10/3/2019       4        4

CodePudding user response：

Assume there are 2 and only 2 instances of 0 per group so I don't care about ids because:

(id1, first 0) -> negative counter,
(id1, second 0) -> positive counter,
(id2, first 0) -> negative counter,
(id2, second 0) -> positive count and so on.

Create virtual groups to know if you have to create negative or positive counter:

odd group: negative counter
even group: positive counter

df['new_col'] = (
    df.assign(new_col=df['months'].eq(0).cumsum())
      .groupby('new_col')['new_col']
      .apply(lambda x: range(-len(x), 0, 1) if x.name % 2 else range(len(x))) 
      .explode().values
)

Output:

>>> df
    ids       date  months new_col
0     1  11/4/2020       0      -5
1     1  12/5/2020       1      -4
2     1  01/5/2021       2      -3
3     1  02/5/2020       3      -2
4     1  03/5/2020       4      -1
5     1  04/5/2020       0       0
6     1  05/5/2020       1       1
7     1  06/5/2020       2       2
8     1  07/5/2020       3       3
9     1  08/5/2020       4       4
10    1  09/5/2020       5       5
11    2  01/3/2019       0      -5
12    2  02/3/2019       1      -4
13    2  03/3/2019       2      -3
14    2  04/3/2019       3      -2
15    2  05/3/2019       4      -1
16    2  06/3/2019       0       0
17    2  07/3/2019       1       1
18    2  08/3/2019       2       2
19    2  09/3/2019       3       3
20    2  10/3/2019       4       4