Home > Net >  Compare datetime index with a datetime column and change the corresponding value in another column
Compare datetime index with a datetime column and change the corresponding value in another column

Time:11-01

Please consider the following dataframe in CSV format:

time,transaction_hash,index,value,recipient,spending_time,sender
2009-01-09 03:54:39,a0241106ee5a597c9,0,50.0,Paul,2009-01-12 03:30:25,Coinbase
2009-01-12 03:30:25,1338530e9831e9e16,1,40.0,Me,2009-01-12 06:02:13,Unknown
2009-01-12 06:02:13,546c4fb42b41e14be,1,30.0,John,2009-01-12 06:12:16,Unknown
2009-01-12 06:02:14,eaf0775ebca408f7r,1,17.0,Paul,2010-09-22 06:06:02,Unknown
2009-01-12 06:02:15,732a865bf5414eab2,1,15.0,Paul,2010-02-23 14:01:23,Unknown
2009-01-12 06:12:16,591e911da64588073,1,29.0,John,2009-01-12 06:34:22,Unknown
2009-01-12 06:34:22,12b52732a85c191ba,1,28.0,Sara,2009-01-12 20:04:20,Unknown

I wanna compare the rows in the index time with the rows in the column spending_time and if they are equal, substitute the Unknown value in the sender column (associated with index time) by the value of the recipient (corresponding to the spending_time column), as follows:

time,transaction_hash,index,value,recipient,spending_time,sender
2009-01-09 03:54:39,a0241106ee5a597c9,0,50.0,Paul,2009-01-12 03:30:25,Coinbase
2009-01-12 03:30:25,1338530e9831e9e16,1,40.0,Me,2009-01-12 06:02:13,Paul
2009-01-12 06:02:13,546c4fb42b41e14be,1,30.0,John,2009-01-12 06:12:16,Me
2009-01-12 06:02:14,eaf0775ebca408f7r,1,17.0,Paul,2010-09-22 06:06:02,Unknown
2009-01-12 06:02:15,732a865bf5414eab2,1,15.0,Paul,2010-02-23 14:01:23,Unknown
2009-01-12 06:12:16,591e911da64588073,1,29.0,John,2009-01-12 06:34:22,John
2009-01-12 06:34:22,12b52732a85c191ba,1,28.0,Sara,2009-01-12 20:04:20,John

CodePudding user response:

Merge the dataframe with itself on the two related columns (time and spending_time).

df = df.merge(df[['spending_time', 'recipient']], left_on='time', right_on='spending_time', how='left', suffixes=('', '_y'))

Intermidiate result:

                  time   transaction_hash  index  value recipient        spending_time    sender      spending_time_y recipient_y
0  2009-01-09 03:54:39  a0241106ee5a597c9      0   50.0      Paul  2009-01-12 03:30:25  Coinbase                  NaN         NaN
1  2009-01-12 03:30:25  1338530e9831e9e16      1   40.0        Me  2009-01-12 06:02:13   Unknown  2009-01-12 03:30:25        Paul
2  2009-01-12 06:02:13  546c4fb42b41e14be      1   30.0      John  2009-01-12 06:12:16   Unknown  2009-01-12 06:02:13          Me
3  2009-01-12 06:02:14  eaf0775ebca408f7r      1   17.0      Paul  2010-09-22 06:06:02   Unknown                  NaN         NaN
4  2009-01-12 06:02:15  732a865bf5414eab2      1   15.0      Paul  2010-02-23 14:01:23   Unknown                  NaN         NaN
5  2009-01-12 06:12:16  591e911da64588073      1   29.0      John  2009-01-12 06:34:22   Unknown  2009-01-12 06:12:16        John
6  2009-01-12 06:34:22  12b52732a85c191ba      1   28.0      Sara  2009-01-12 20:04:20   Unknown  2009-01-12 06:34:22        John

The new sender column can now be obtained by merging sender with recipient_y:

df['sender'] = df['recipient_y'].combine_first(df['sender'])
df = df.drop(columns=['spending_time_y', 'recipient_y'])

Result:

                  time   transaction_hash  index  value recipient        spending_time    sender
0  2009-01-09 03:54:39  a0241106ee5a597c9      0   50.0      Paul  2009-01-12 03:30:25  Coinbase
1  2009-01-12 03:30:25  1338530e9831e9e16      1   40.0        Me  2009-01-12 06:02:13      Paul
2  2009-01-12 06:02:13  546c4fb42b41e14be      1   30.0      John  2009-01-12 06:12:16        Me
3  2009-01-12 06:02:14  eaf0775ebca408f7r      1   17.0      Paul  2010-09-22 06:06:02   Unknown
4  2009-01-12 06:02:15  732a865bf5414eab2      1   15.0      Paul  2010-02-23 14:01:23   Unknown
5  2009-01-12 06:12:16  591e911da64588073      1   29.0      John  2009-01-12 06:34:22      John
6  2009-01-12 06:34:22  12b52732a85c191ba      1   28.0      Sara  2009-01-12 20:04:20      John
  • Related