I wanted to add new columns based on other columns and also group by "x" and "y":
- For the first new column (feature2), give the value of the "feature" two years ago.
- For the "feature3", give the value of the "feature" three years ago.
- For the "feature4", give the value of the "feature" four years ago.
- (And also it will continue for 10 years)
You can find the sample dataset and expected output below.
data = {'x': [40.1, 50.1, 40.1, 50.1, 40.1, 50.1, 40.1, 50.1, 40.1, 50.1 ], 'y': [100.1, 110.1, 100.1, 110.1, 100.1, 110.1, 100.1, 110.1, 100.1, 110.1], 'year': [2000, 2000, 2001, 2001, 2002, 2002, 2003, 2003, 2004, 2004], 'feature': [0, 1, 1, 1, 1, 0, 0, 0, 0, 1]}
df = pd.DataFrame(data)
df
x y year feature
0 40.1 100.1 2000 0
1 50.1 110.1 2000 1
2 40.1 100.1 2001 1
3 50.1 110.1 2001 1
4 40.1 100.1 2002 1
5 50.1 110.1 2002 0
6 40.1 100.1 2003 0
7 50.1 110.1 2003 0
8 40.1 100.1 2004 0
9 50.1 110.1 2004 1
Expected Output:
x y year feature feature2 feature3 feature4
0 40.1 100.1 2000 0 NaN NaN NaN
1 50.1 110.1 2000 1 NaN NaN NaN
2 40.1 100.1 2001 1 NaN NaN NaN
3 50.1 110.1 2001 1 NaN NaN NaN
4 40.1 100.1 2002 1 0 NaN NaN
5 50.1 110.1 2002 0 1 NaN NaN
6 40.1 100.1 2003 0 1 0 NaN
7 50.1 110.1 2003 0 1 1 NaN
8 40.1 100.1 2004 0 1 1 0
9 50.1 110.1 2004 1 0 1 1
CodePudding user response:
Assuming the years are sorted and are all present in each group, you can use a loop and groupby.shift
:
g = df.groupby(['x', 'y'])['feature']
for x in range(3):
df[f'feature{x 2}'] = g.shift(x 2)
output:
x y year feature feature2 feature3 feature4
0 40.1 100.1 2000 0 NaN NaN NaN
1 50.1 110.1 2000 1 NaN NaN NaN
2 40.1 100.1 2001 1 NaN NaN NaN
3 50.1 110.1 2001 1 NaN NaN NaN
4 40.1 100.1 2002 1 0.0 NaN NaN
5 50.1 110.1 2002 0 1.0 NaN NaN
6 40.1 100.1 2003 0 1.0 0.0 NaN
7 50.1 110.1 2003 0 1.0 1.0 NaN
8 40.1 100.1 2004 0 1.0 1.0 0.0
9 50.1 110.1 2004 1 0.0 1.0 1.0