Home > Blockchain >  How to add new columns considering previous years?
How to add new columns considering previous years?

Time:06-16

I wanted to add new columns based on other columns and also group by "x" and "y":

  • For the first new column (feature2), give the value of the "feature" two years ago.
  • For the "feature3", give the value of the "feature" three years ago.
  • For the "feature4", give the value of the "feature" four years ago.
  • (And also it will continue for 10 years)

You can find the sample dataset and expected output below.

data = {'x': [40.1, 50.1, 40.1, 50.1, 40.1, 50.1, 40.1, 50.1, 40.1, 50.1 ], 'y': [100.1, 110.1, 100.1, 110.1, 100.1, 110.1, 100.1, 110.1, 100.1, 110.1], 'year': [2000, 2000, 2001, 2001, 2002, 2002, 2003, 2003, 2004, 2004], 'feature': [0, 1, 1, 1, 1, 0, 0, 0, 0, 1]}   
df = pd.DataFrame(data)
df

      x       y     year  feature
0   40.1    100.1   2000    0
1   50.1    110.1   2000    1
2   40.1    100.1   2001    1
3   50.1    110.1   2001    1
4   40.1    100.1   2002    1
5   50.1    110.1   2002    0
6   40.1    100.1   2003    0
7   50.1    110.1   2003    0
8   40.1    100.1   2004    0
9   50.1    110.1   2004    1

Expected Output:
          x       y     year  feature   feature2  feature3   feature4
    0   40.1    100.1   2000    0         NaN       NaN        NaN
    1   50.1    110.1   2000    1         NaN       NaN        NaN
    2   40.1    100.1   2001    1         NaN       NaN        NaN
    3   50.1    110.1   2001    1         NaN       NaN        NaN
    4   40.1    100.1   2002    1          0        NaN        NaN
    5   50.1    110.1   2002    0          1        NaN        NaN
    6   40.1    100.1   2003    0          1         0         NaN
    7   50.1    110.1   2003    0          1         1         NaN
    8   40.1    100.1   2004    0          1         1          0
    9   50.1    110.1   2004    1          0         1          1

CodePudding user response:

Assuming the years are sorted and are all present in each group, you can use a loop and groupby.shift:

g = df.groupby(['x', 'y'])['feature']

for x in range(3):
    df[f'feature{x 2}'] = g.shift(x 2)

output:

      x      y  year  feature  feature2  feature3  feature4
0  40.1  100.1  2000        0       NaN       NaN       NaN
1  50.1  110.1  2000        1       NaN       NaN       NaN
2  40.1  100.1  2001        1       NaN       NaN       NaN
3  50.1  110.1  2001        1       NaN       NaN       NaN
4  40.1  100.1  2002        1       0.0       NaN       NaN
5  50.1  110.1  2002        0       1.0       NaN       NaN
6  40.1  100.1  2003        0       1.0       0.0       NaN
7  50.1  110.1  2003        0       1.0       1.0       NaN
8  40.1  100.1  2004        0       1.0       1.0       0.0
9  50.1  110.1  2004        1       0.0       1.0       1.0
  • Related