I want to add new column according to some conditions: where x and y same and year or year-1; if c = 1 , new column "c_new" = 1, otherwise 0. How can I do it?
import pandas as pd
data = {'x': [ 0, 300.1, 0, 300.1, 0, 300.1, 0, 300.1], 'y': [ 160.1, 400.1, 160.1, 400.1, 160.1, 400.1, 160.1, 400.1], 'a': [3, 4, 3, 4, 3, 4, 3, 4], 'c': [0, 0, 1, 0, 0, 0, 1, 0], 'year': [2000, 2000, 2001, 2001, 2002, 2002, 2003, 2003]}
df = pd.DataFrame(data)
df
x y a c year
1 0.0 160.1 3 0.0 2000
2 300.1 400.1 4 0.0 2000
3 0.0 160.1 3 1.0 2001
4 300.1 400.1 4 0.0 2001
5 0.0 160.1 3 0.0 2002
6 300.1 400.1 4 0.0 2002
7 0.0 160.1 3 1.0 2003
8 300.1 400.1 4 0.0 2003
Expected output:
x y a c year c_new
1 0.0 160.1 3 0.0 2000 0.0
2 300.1 400.1 4 0.0 2000 0.0
2 0.0 160.1 3 1.0 2001 1.0
4 300.1 400.1 4 0.0 2001 0.0
5 0.0 160.1 3 0.0 2002 1.0
6 300.1 400.1 4 0.0 2002 0.0
7 0.0 160.1 3 1.0 2003 1.0
8 300.1 400.1 4 0.0 2003 0.0
CodePudding user response:
Assuming you have all the years, you can use a shifted rolling max:
N = 2 # number of previous years to consider
df['c_new'] = (df
.groupby(['x', 'y'])
['c'].apply(lambda x: x.shift().rolling(N, min_periods=1).max())
)
output:
x y a c year c_new
0 0.0 160.1 3 0 2000 NaN
1 300.1 400.1 4 0 2000 NaN
2 0.0 160.1 3 1 2001 0.0
3 300.1 400.1 4 0 2001 0.0
4 0.0 160.1 3 0 2002 1.0
5 300.1 400.1 4 0 2002 0.0
6 0.0 160.1 3 1 2003 1.0
7 300.1 400.1 4 0 2003 0.0
NB. be careful with grouping by floats. Ensure that they are rounded to avoid having close numbers forming different groups.
update: year and year-1
N = 2 # number of previous years to consider
df['c_new'] = (df
.groupby(['x', 'y'])
['c'].rolling(N, min_periods=1).max().droplevel(['x', 'y'])
)
output:
x y a c year c_new
0 0.0 160.1 3 0 2000 0.0
1 300.1 400.1 4 0 2000 0.0
2 0.0 160.1 3 1 2001 1.0
3 300.1 400.1 4 0 2001 0.0
4 0.0 160.1 3 0 2002 1.0
5 300.1 400.1 4 0 2002 0.0
6 0.0 160.1 3 1 2003 1.0
7 300.1 400.1 4 0 2003 0.0