I have a dataframe like this:
df = pd.DataFrame({
'ref1': [1,1,3,7,7],
'ref2': [1,2,1,1,2],
'value': [1,2,3,5,6],
})
df
ref1 ref2 value
0 1 1 1
1 1 2 2
2 3 1 3
3 7 1 5
4 7 2 6
I want to add the column new_value
and obtain this:
ref1 ref2 value new_value my_comment
0 1 1 1 NaN no prev ref1
1 1 2 2 NaN no prev ref1
2 3 1 3 1.0
3 7 1 5 3.0
4 7 2 6 NaN no same ref2 @ ref1==3
following these rules:
new_value
isvalue
for the sameref2
and previousref1
(in an ordered list like[1,3,7]
), elseNaN
CodePudding user response:
Given that 1, 3, 7
are in order, you can pivot, shift, stack back to get the shifted values, then merge:
df.merge(df.pivot(index='ref1', columns='ref2', values='value')
.shift().stack().reset_index(name='new_value'),
on=['ref1','ref2'], how='left'
)
Output:
ref1 ref2 value new_value
0 1 1 1 NaN
1 1 2 2 NaN
2 3 1 3 1.0
3 7 1 5 3.0
4 7 2 6 NaN
Note If there are duplicate combinations of ref1, ref2
, pivot
will fail. In which case, you want to enumerate the pairs:
df.merge(df.assign(enum=df.groupby(['ref1','ref2']).cumcount())
.pivot(index=['enum','ref1'], columns='ref2', values='value')
.shift().stack()
.reset_index(level='enum', drop=True)
.reset_index(name='new_value'),
on=['ref1','ref2'], how='left'
)