I have a dataframe of this type:
Time Copy_from_Time Rest_of_data
0 1 1 foo1
1 2 1 foo2
2 3 3 foo3
3 4 4 foo4
4 5 4 foo5
5 6 4 foo6
I want to update "Rest of data" with data associated at the Time specified by "Copy_from_Time". So it would look like:
Time Copy_from_Time Rest_of_data
0 1 1 foo1
1 2 1 foo1
2 3 3 foo3
3 4 4 foo4
4 5 4 foo4
5 6 4 foo4
I can do it with iterrows(), but it is very slow. Is there a faster way with indexing tricks and maybe map()?
(The real example has Time, Time2, Copy_from_Time and Copy_from_Time2, so I would need to match several fields, but I guess it would be easy to adapt it)
CodePudding user response:
You could try as follows:
df['Rest_of_data'] = df.groupby('Copy_from_Time')['Rest_of_data'].transform('first')
Time Copy_from_Time Rest_of_data
0 1 1 foo1
1 2 1 foo1
2 3 3 foo3
3 4 4 foo4
4 5 4 foo4
5 6 4 foo4
To get "matches" for multiple cols, use e.g. df.groupby(['Time', 'Time2', etc.])
.
CodePudding user response:
use map in updating the value in rest_of_data column
df['Rest_of_data']=df['Copy_from_Time'].map(df.set_index('Time')['Rest_of_data'])
df
Time Copy_from_Time Rest_of_data
0 1 1 foo1
1 2 1 foo1
2 3 3 foo3
3 4 4 foo4
4 5 4 foo4
5 6 4 foo4