I want to create a new column "year-1". The values of the new column should be the same as column "c" where x and y same and year= year-1. And also, for year-2. It means that I want to assign values of column "c" to the new column in the previous year according to x and y. How can I do it?
import pandas as pd
data = {'x': [ 80.1, 90.1, 0, 300.1, 80.1, 90.1, 0, 300.1, 80.1, 90.1, 0, 300.1], 'y': [ 140.1, 150.1, 160.1, 400.1, 140.1, 150.1, 160.1, 400.1, 140.1, 150.1, 160.1, 400.1], 'a': [1, 2, 3, 4, 5, 10, 11, 12, 13, 14, 15, 16], 'c': [0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0], 'year': [2000, 2000, 2000, 2000, 2001, 2001, 2001, 2001, 2002, 2002, 2002, 2002]}
df = pd.DataFrame(data)
df
x y a c year
0 80.1 140.1 1 0.0 2000
1 90.1 150.1 2 0.0 2000
2 0.0 160.1 3 0.0 2000
3 300.1 400.1 4 0.0 2000
4 80.1 140.1 5 0.0 2001
5 90.1 150.1 10 0.0 2001
6 0.0 160.1 11 1.0 2001
7 300.1 400.1 12 0.0 2001
8 80.1 140.1 13 1.0 2002
9 90.1 150.1 14 1.0 2002
10 0.0 160.1 15 0.0 2002
11 300.1 400.1 16 0.0 2002
Expected output:
x y a c year year-1 year-2
0 80.1 140.1 1 0.0 2000 NaN NaN
1 90.1 150.1 2 0.0 2000 NaN NaN
2 0.0 160.1 3 0.0 2000 NaN NaN
3 300.1 400.1 4 0.0 2000 NaN NaN
4 80.1 140.1 5 0.0 2001 0.0 NaN
5 90.1 150.1 10 0.0 2001 0.0 NaN
6 0.0 160.1 11 1.0 2001 0.0 NaN
7 300.1 400.1 12 0.0 2001 0.0 NaN
8 80.1 140.1 13 1.0 2002 0.0 0.0
9 90.1 150.1 14 1.0 2002 0.0 0.0
10 0.0 160.1 15 0.0 2002 1.0 0.0
11 300.1 400.1 16 0.0 2002 0.0 0.0
CodePudding user response:
Try this:
df.reset_index()\
.merge(df.set_index(['x','y','year'])\
.unstack()\
.shift(axis=1)\
.stack(dropna=False)\
.rename(columns={'c':'year-1'})\
.reset_index())
Output:
index x y c year year-1
0 0 80.1 140.1 0 2000 NaN
1 1 90.1 150.1 0 2000 NaN
2 2 0.0 160.1 0 2000 NaN
3 3 300.1 400.1 0 2000 NaN
4 4 80.1 140.1 0 2001 0.0
5 5 90.1 150.1 0 2001 0.0
6 6 0.0 160.1 1 2001 0.0
7 7 300.1 400.1 0 2001 0.0
8 8 80.1 140.1 1 2002 0.0
9 9 90.1 150.1 1 2002 0.0
10 10 0.0 160.1 0 2002 1.0
11 11 300.1 400.1 0 2002 0.0
Details:
Reshape the dataframe such that you have x, y on rows and years in columns, then shift that data to the right. Reshape dataframe and rename column to 'year-1' and merge back to original dataframe.
To do the next year:
df.reset_index()\
.merge(df.set_index(['x','y','year'])\
.unstack()\
.shift(axis=1)\
.stack(dropna=False)\
.rename(columns={'c':'year-1'})\
.reset_index()).merge(df.set_index(['x', 'y', 'year'])
.unstack()
.shift(2, axis=1)
.stack(dropna=False)
.rename(columns={'c':'year-2'})
.reset_index())
Using functools.reduce with list comprehension:
from functools import reduce
n=2
reduce(lambda x, y: x.merge(y), [df.set_index(['x', 'y', 'year'])
.unstack()
.shift(i, axis=1)
.stack(dropna=False)
.rename(columns={'c':f'year-{i}'})
.reset_index()
for i in range(n 1)]).sort_values('year')
Output:
x y year year-0 year-1 year-2
0 0.0 160.1 2000 0 NaN NaN
3 80.1 140.1 2000 0 NaN NaN
6 90.1 150.1 2000 0 NaN NaN
9 300.1 400.1 2000 0 NaN NaN
1 0.0 160.1 2001 1 0.0 NaN
4 80.1 140.1 2001 0 0.0 NaN
7 90.1 150.1 2001 0 0.0 NaN
10 300.1 400.1 2001 0 0.0 NaN
2 0.0 160.1 2002 0 1.0 0.0
5 80.1 140.1 2002 1 0.0 0.0
8 90.1 150.1 2002 1 0.0 0.0
11 300.1 400.1 2002 0 0.0 0.0