Home > OS >  New column based on some conditions
New column based on some conditions

Time:04-11

I want to add new column according to some conditions: where x and y same and year or year-1; if c = 1 , new column "c_new" = 1, otherwise 0. How can I do it?

import pandas as pd
data = {'x': [ 0, 300.1, 0, 300.1, 0, 300.1, 0, 300.1], 'y': [ 160.1, 400.1, 160.1, 400.1, 160.1, 400.1, 160.1, 400.1], 'a': [3, 4, 3, 4, 3, 4, 3, 4], 'c': [0, 0, 1, 0, 0, 0, 1, 0], 'year': [2000, 2000, 2001, 2001, 2002, 2002, 2003, 2003]}   
df = pd.DataFrame(data)
df
            
             x        y     a    c      year
        
        1   0.0     160.1   3   0.0     2000
        2   300.1   400.1   4   0.0     2000
        3   0.0     160.1   3   1.0     2001
        4   300.1   400.1   4   0.0     2001
        5   0.0     160.1   3   0.0     2002
        6   300.1   400.1   4   0.0     2002
        7   0.0     160.1   3   1.0     2003
        8   300.1   400.1   4   0.0     2003

Expected output:            
              x       y     a      c     year  c_new   
        
        1   0.0     160.1   3    0.0    2000   0.0       
        2   300.1   400.1   4    0.0    2000   0.0       
        2   0.0     160.1   3    1.0    2001   1.0       
        4   300.1   400.1   4    0.0    2001   0.0       
        5   0.0     160.1   3    0.0    2002   1.0        
        6   300.1   400.1   4    0.0    2002   0.0       
        7   0.0     160.1   3    1.0    2003   1.0       
        8   300.1   400.1   4    0.0    2003   0.0       

CodePudding user response:

Assuming you have all the years, you can use a shifted rolling max:

N = 2 # number of previous years to consider
df['c_new'] = (df
 .groupby(['x', 'y'])
 ['c'].apply(lambda x: x.shift().rolling(N, min_periods=1).max())
)

output:

       x      y  a  c  year  c_new
0    0.0  160.1  3  0  2000    NaN
1  300.1  400.1  4  0  2000    NaN
2    0.0  160.1  3  1  2001    0.0
3  300.1  400.1  4  0  2001    0.0
4    0.0  160.1  3  0  2002    1.0
5  300.1  400.1  4  0  2002    0.0
6    0.0  160.1  3  1  2003    1.0
7  300.1  400.1  4  0  2003    0.0

NB. be careful with grouping by floats. Ensure that they are rounded to avoid having close numbers forming different groups.

update: year and year-1
N = 2 # number of previous years to consider
df['c_new'] = (df
 .groupby(['x', 'y'])
 ['c'].rolling(N, min_periods=1).max().droplevel(['x', 'y'])
)

output:

       x      y  a  c  year  c_new
0    0.0  160.1  3  0  2000    0.0
1  300.1  400.1  4  0  2000    0.0
2    0.0  160.1  3  1  2001    1.0
3  300.1  400.1  4  0  2001    0.0
4    0.0  160.1  3  0  2002    1.0
5  300.1  400.1  4  0  2002    0.0
6    0.0  160.1  3  1  2003    1.0
7  300.1  400.1  4  0  2003    0.0
  • Related