Home > Enterprise >  Add new column based on another column
Add new column based on another column

Time:04-11

I want to create a new column "year-1". The values of the new column should be the same as column "c" where x and y same and year= year-1. And also, for year-2. It means that I want to assign values of column "c" to the new column in the previous year according to x and y. How can I do it?

import pandas as pd
data = {'x': [ 80.1, 90.1, 0, 300.1, 80.1, 90.1, 0, 300.1, 80.1, 90.1, 0, 300.1], 'y': [ 140.1, 150.1, 160.1, 400.1, 140.1, 150.1, 160.1, 400.1, 140.1, 150.1, 160.1, 400.1], 'a': [1, 2, 3, 4, 5, 10, 11, 12, 13, 14, 15, 16], 'c': [0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0], 'year': [2000, 2000, 2000, 2000, 2001, 2001, 2001, 2001, 2002, 2002, 2002, 2002]}   
df = pd.DataFrame(data)
df
            
             x        y     a    c      year
        
        0   80.1    140.1   1   0.0     2000
        1   90.1    150.1   2   0.0     2000
        2   0.0     160.1   3   0.0     2000
        3   300.1   400.1   4   0.0     2000
        4   80.1    140.1   5   0.0     2001
        5   90.1    150.1   10  0.0     2001
        6   0.0     160.1   11  1.0     2001
        7   300.1   400.1   12  0.0     2001
        8   80.1    140.1   13  1.0     2002
        9   90.1    150.1   14  1.0     2002
        10  0.0     160.1   15  0.0     2002
        11  300.1   400.1   16  0.0     2002

Expected output:            
              x       y     a      c     year   year-1   year-2
        
        0   80.1    140.1   1     0.0    2000   NaN       NaN
        1   90.1    150.1   2     0.0    2000   NaN       NaN
        2   0.0     160.1   3     0.0    2000   NaN       NaN
        3   300.1   400.1   4     0.0    2000   NaN       NaN
        4   80.1    140.1   5     0.0    2001   0.0       NaN
        5   90.1    150.1   10    0.0    2001   0.0       NaN
        6   0.0     160.1   11    1.0    2001   0.0       NaN
        7   300.1   400.1   12    0.0    2001   0.0       NaN
        8   80.1    140.1   13    1.0    2002   0.0       0.0
        9   90.1    150.1   14    1.0    2002   0.0       0.0 
        10  0.0     160.1   15    0.0    2002   1.0       0.0 
        11  300.1   400.1   16    0.0    2002   0.0       0.0

CodePudding user response:

Try this:

df.reset_index()\
  .merge(df.set_index(['x','y','year'])\
  .unstack()\
  .shift(axis=1)\
  .stack(dropna=False)\
  .rename(columns={'c':'year-1'})\
  .reset_index())

Output:

    index      x      y  c  year  year-1
0       0   80.1  140.1  0  2000     NaN
1       1   90.1  150.1  0  2000     NaN
2       2    0.0  160.1  0  2000     NaN
3       3  300.1  400.1  0  2000     NaN
4       4   80.1  140.1  0  2001     0.0
5       5   90.1  150.1  0  2001     0.0
6       6    0.0  160.1  1  2001     0.0
7       7  300.1  400.1  0  2001     0.0
8       8   80.1  140.1  1  2002     0.0
9       9   90.1  150.1  1  2002     0.0
10     10    0.0  160.1  0  2002     1.0
11     11  300.1  400.1  0  2002     0.0

Details:

Reshape the dataframe such that you have x, y on rows and years in columns, then shift that data to the right. Reshape dataframe and rename column to 'year-1' and merge back to original dataframe.


To do the next year:

df.reset_index()\
  .merge(df.set_index(['x','y','year'])\
  .unstack()\
  .shift(axis=1)\
  .stack(dropna=False)\
  .rename(columns={'c':'year-1'})\
  .reset_index()).merge(df.set_index(['x', 'y', 'year'])
                          .unstack()
                          .shift(2, axis=1)
                          .stack(dropna=False)
                          .rename(columns={'c':'year-2'})
                          .reset_index())

Using functools.reduce with list comprehension:

from functools import reduce

n=2
reduce(lambda x, y: x.merge(y), [df.set_index(['x', 'y', 'year'])
                          .unstack()
                          .shift(i, axis=1)
                          .stack(dropna=False)
                          .rename(columns={'c':f'year-{i}'})
                          .reset_index()
  for i in range(n 1)]).sort_values('year')

Output:

        x      y  year  year-0  year-1  year-2
0     0.0  160.1  2000       0     NaN     NaN
3    80.1  140.1  2000       0     NaN     NaN
6    90.1  150.1  2000       0     NaN     NaN
9   300.1  400.1  2000       0     NaN     NaN
1     0.0  160.1  2001       1     0.0     NaN
4    80.1  140.1  2001       0     0.0     NaN
7    90.1  150.1  2001       0     0.0     NaN
10  300.1  400.1  2001       0     0.0     NaN
2     0.0  160.1  2002       0     1.0     0.0
5    80.1  140.1  2002       1     0.0     0.0
8    90.1  150.1  2002       1     0.0     0.0
11  300.1  400.1  2002       0     0.0     0.0
  • Related