How to replace a dataframe rows with other rows based on column values?-CodePudding

I have a dataframe of this type:

      Time   Copy_from_Time Rest_of_data
0     1               1         foo1
1     2               1         foo2
2     3               3         foo3
3     4               4         foo4
4     5               4         foo5
5     6               4         foo6

I want to update "Rest of data" with data associated at the Time specified by "Copy_from_Time". So it would look like:

      Time   Copy_from_Time Rest_of_data
0     1               1         foo1
1     2               1         foo1
2     3               3         foo3
3     4               4         foo4
4     5               4         foo4
5     6               4         foo4

I can do it with iterrows(), but it is very slow. Is there a faster way with indexing tricks and maybe map()?

(The real example has Time, Time2, Copy_from_Time and Copy_from_Time2, so I would need to match several fields, but I guess it would be easy to adapt it)

CodePudding user response：

You could try as follows:

df['Rest_of_data'] = df.groupby('Copy_from_Time')['Rest_of_data'].transform('first')

   Time  Copy_from_Time Rest_of_data
0     1               1         foo1
1     2               1         foo1
2     3               3         foo3
3     4               4         foo4
4     5               4         foo4
5     6               4         foo4

To get "matches" for multiple cols, use e.g. df.groupby(['Time', 'Time2', etc.]).

CodePudding user response：

use map in updating the value in rest_of_data column

df['Rest_of_data']=df['Copy_from_Time'].map(df.set_index('Time')['Rest_of_data'])
df

    Time    Copy_from_Time  Rest_of_data
0      1                 1          foo1
1      2                 1          foo1
2      3                 3          foo3
3      4                 4          foo4
4      5                 4          foo4
5      6                 4          foo4