Please consider the following dataframe in CSV format:
time,transaction_hash,index,value,recipient,spending_time,sender
2009-01-09 03:54:39,a0241106ee5a597c9,0,50.0,Paul,2009-01-12 03:30:25,Coinbase
2009-01-12 03:30:25,1338530e9831e9e16,1,40.0,Me,2009-01-12 06:02:13,Unknown
2009-01-12 06:02:13,546c4fb42b41e14be,1,30.0,John,2009-01-12 06:12:16,Unknown
2009-01-12 06:02:14,eaf0775ebca408f7r,1,17.0,Paul,2010-09-22 06:06:02,Unknown
2009-01-12 06:02:15,732a865bf5414eab2,1,15.0,Paul,2010-02-23 14:01:23,Unknown
2009-01-12 06:12:16,591e911da64588073,1,29.0,John,2009-01-12 06:34:22,Unknown
2009-01-12 06:34:22,12b52732a85c191ba,1,28.0,Sara,2009-01-12 20:04:20,Unknown
I wanna compare the rows in the index time
with the rows in the column spending_time
and if they are equal, substitute the Unknown
value in the sender
column (associated with index time
) by the value of the recipient
(corresponding to the spending_time
column), as follows:
time,transaction_hash,index,value,recipient,spending_time,sender
2009-01-09 03:54:39,a0241106ee5a597c9,0,50.0,Paul,2009-01-12 03:30:25,Coinbase
2009-01-12 03:30:25,1338530e9831e9e16,1,40.0,Me,2009-01-12 06:02:13,Paul
2009-01-12 06:02:13,546c4fb42b41e14be,1,30.0,John,2009-01-12 06:12:16,Me
2009-01-12 06:02:14,eaf0775ebca408f7r,1,17.0,Paul,2010-09-22 06:06:02,Unknown
2009-01-12 06:02:15,732a865bf5414eab2,1,15.0,Paul,2010-02-23 14:01:23,Unknown
2009-01-12 06:12:16,591e911da64588073,1,29.0,John,2009-01-12 06:34:22,John
2009-01-12 06:34:22,12b52732a85c191ba,1,28.0,Sara,2009-01-12 20:04:20,John
CodePudding user response:
Merge the dataframe with itself on the two related columns (time
and spending_time
).
df = df.merge(df[['spending_time', 'recipient']], left_on='time', right_on='spending_time', how='left', suffixes=('', '_y'))
Intermidiate result:
time transaction_hash index value recipient spending_time sender spending_time_y recipient_y
0 2009-01-09 03:54:39 a0241106ee5a597c9 0 50.0 Paul 2009-01-12 03:30:25 Coinbase NaN NaN
1 2009-01-12 03:30:25 1338530e9831e9e16 1 40.0 Me 2009-01-12 06:02:13 Unknown 2009-01-12 03:30:25 Paul
2 2009-01-12 06:02:13 546c4fb42b41e14be 1 30.0 John 2009-01-12 06:12:16 Unknown 2009-01-12 06:02:13 Me
3 2009-01-12 06:02:14 eaf0775ebca408f7r 1 17.0 Paul 2010-09-22 06:06:02 Unknown NaN NaN
4 2009-01-12 06:02:15 732a865bf5414eab2 1 15.0 Paul 2010-02-23 14:01:23 Unknown NaN NaN
5 2009-01-12 06:12:16 591e911da64588073 1 29.0 John 2009-01-12 06:34:22 Unknown 2009-01-12 06:12:16 John
6 2009-01-12 06:34:22 12b52732a85c191ba 1 28.0 Sara 2009-01-12 20:04:20 Unknown 2009-01-12 06:34:22 John
The new sender
column can now be obtained by merging sender
with recipient_y
:
df['sender'] = df['recipient_y'].combine_first(df['sender'])
df = df.drop(columns=['spending_time_y', 'recipient_y'])
Result:
time transaction_hash index value recipient spending_time sender
0 2009-01-09 03:54:39 a0241106ee5a597c9 0 50.0 Paul 2009-01-12 03:30:25 Coinbase
1 2009-01-12 03:30:25 1338530e9831e9e16 1 40.0 Me 2009-01-12 06:02:13 Paul
2 2009-01-12 06:02:13 546c4fb42b41e14be 1 30.0 John 2009-01-12 06:12:16 Me
3 2009-01-12 06:02:14 eaf0775ebca408f7r 1 17.0 Paul 2010-09-22 06:06:02 Unknown
4 2009-01-12 06:02:15 732a865bf5414eab2 1 15.0 Paul 2010-02-23 14:01:23 Unknown
5 2009-01-12 06:12:16 591e911da64588073 1 29.0 John 2009-01-12 06:34:22 John
6 2009-01-12 06:34:22 12b52732a85c191ba 1 28.0 Sara 2009-01-12 20:04:20 John