I have a dataframe data
in which I took a subset of it g2_data
to perform some operations on. How would I go about replacing values in the original dataframe with values from the subset, using values from one of the columns as the reference?
The column structure from data
is retained in the subset g2_data
shown below.
data:
idx group x1 y1
0 27 1 0.0 0.0
1 28 1 0.0 0.0
2 29 1 0.0 0.0
3 73 1 0.0 0.0
4 74 1 0.0 0.0
... ... ... ...
14612 14674 8 0.0 0.0
14613 14697 8 0.0 0.0
14614 14698 8 0.0 0.0
14615 14721 8 0.0 0.0
14616 14722 8 0.0 0.0
[14617 rows x 4 columns]
g2_data:
idx group x1 y1
1125 1227 2 115.0 0.0
1126 1228 2 0.0 220.0
1127 1260 2 0.0 0.0
1128 1294 2 0.0 0.0
1129 1295 2 0.0 0.0
... ... ... ...
3269 3277 2 0.0 0.0
3270 3308 2 0.0 0.0
3271 3309 2 0.0 0.0
3272 3342 2 0.0 0.0
3273 3343 2 0.0 0.0
[2149 rows x 4 columns]
Replace rows in Dataframe using index from another Dataframe has an answer to do this using the index values of the rows, but I would like to do it using the values from the idx
column incase I need to reset the index in the subset later on (i.e. starting from 0 instead of using the index values from the original dataframe). It is important to note that the values in the idx
column are all unique as they pertain to info about each observation.
CodePudding user response:
This probably isn't optimal, but you can convert g2_data
to a dictionary and then map the other columns based on idx
, filtering the update to just those ids in the g2_data subset.
g2_data_dict = g2_data.set_index('idx').to_dict()
g2_data_ids = g2_data['idx'].to_list()
for k in g2_data_dict.keys():
data.loc[df['idx'].isin(g2_data_ids), k] = data['idx'].map(g2_data_dict[k])
CodePudding user response:
Use combine_first
:
out = g2_data.set_index('idx').combine_first(data.set_index('idx')).reset_index()