Home > Software engineering >  Replace dataframe rows with identical rows from another dataframe on a column value
Replace dataframe rows with identical rows from another dataframe on a column value

Time:10-16

I have a dataframe data in which I took a subset of it g2_data to perform some operations on. How would I go about replacing values in the original dataframe with values from the subset, using values from one of the columns as the reference?

The column structure from data is retained in the subset g2_data shown below.

data:
         idx  group   x1   y1
0         27      1  0.0  0.0
1         28      1  0.0  0.0
2         29      1  0.0  0.0
3         73      1  0.0  0.0
4         74      1  0.0  0.0
     ...    ...  ...  ...
14612  14674      8  0.0  0.0
14613  14697      8  0.0  0.0
14614  14698      8  0.0  0.0
14615  14721      8  0.0  0.0
14616  14722      8  0.0  0.0

[14617 rows x 4 columns]
g2_data:
       idx  group     x1     y1
1125  1227      2  115.0    0.0
1126  1228      2    0.0  220.0
1127  1260      2    0.0    0.0
1128  1294      2    0.0    0.0
1129  1295      2    0.0    0.0
   ...    ...    ...    ...
3269  3277      2    0.0    0.0
3270  3308      2    0.0    0.0
3271  3309      2    0.0    0.0
3272  3342      2    0.0    0.0
3273  3343      2    0.0    0.0

[2149 rows x 4 columns]

Replace rows in Dataframe using index from another Dataframe has an answer to do this using the index values of the rows, but I would like to do it using the values from the idx column incase I need to reset the index in the subset later on (i.e. starting from 0 instead of using the index values from the original dataframe). It is important to note that the values in the idx column are all unique as they pertain to info about each observation.

CodePudding user response:

This probably isn't optimal, but you can convert g2_data to a dictionary and then map the other columns based on idx, filtering the update to just those ids in the g2_data subset.

g2_data_dict = g2_data.set_index('idx').to_dict()
g2_data_ids = g2_data['idx'].to_list()

for k in g2_data_dict.keys():
    data.loc[df['idx'].isin(g2_data_ids), k] = data['idx'].map(g2_data_dict[k])

CodePudding user response:

Use combine_first:

out = g2_data.set_index('idx').combine_first(data.set_index('idx')).reset_index()
  • Related