With the following 'mask' DataFrame:
>>> mask
city value_1 value_2
index
0 London 10 NaN
1 Paris NaN 21
2 Paris 30 NaN
3 Paris NaN NaN
4 Berlin 3 5
5 Berlin NaN 10
6 New York NaN NaN
and the following 'filling' frame:
>>> filling
value_1 value_2
London 1100 2100
Paris 1150 2200
Berlin NaN 3000
New York 5000 NaN
How to merge filling
into mask
based on city
AND columns so that the resulting DataFrame becomes:
>>> result
city value_1 value_2
index
0 London 10 2100
1 Paris 1150 21
2 Paris 30 2200
3 Paris 1150 2200
4 Berlin 3 5
5 Berlin NaN 10
6 New York 5000 NaN
Conceptually, any value from mask
that is NaN
is susceptible of being "filled" by the value of filling
which matches both its city
and its column (value_1
or value_2
).
The part I'm struggling with is to make DataFrame.merge()
take into account both an index (here, city
) and all columns. Either is fine, but to obtain the expected result, it looks like I would need both.
EDIT:
I've tried the following:
>>> expanded = mask[[]].join(filling, on='city')
>>> mask.merge(expanded)
But this just gives me back mask
, and all values from expanded
are simply ignored (even if the target cell is NaN
).
CodePudding user response:
Try fillna
with reindex
of filling
:
mask.fillna(filling.reindex(mask.city).set_index(mask.index))
Output:
city value_1 value_2
index
0 London 10.0 2100.0
1 Paris 1150.0 21.0
2 Paris 30.0 2200.0
3 Paris 1150.0 2200.0
4 Berlin 3.0 5.0
5 Berlin NaN 10.0
6 New York 5000.0 NaN
CodePudding user response:
We can use DataFrame.update
with overwrite=False
for this.
Note: we seperate the methods in different lines, since update is inplace.
mask = mask.set_index("city")
mask.update(filling, overwrite=False)
mask = mask.reset_index()
city value_1 value_2
0 London 10.0 2100.0
1 Paris 1150.0 21.0
2 Paris 30.0 2200.0
3 Paris 1150.0 2200.0
4 Berlin 3.0 5.0
5 Berlin NaN 10.0
6 New York 5000.0 NaN
CodePudding user response:
You can also use combine_first
which is designed for this purpose:
print (mask.set_index("city").combine_first(filling))
value_1 value_2
Berlin 3.0 5.0
Berlin NaN 10.0
London 10.0 2100.0
New York 5000.0 NaN
Paris 1150.0 21.0
Paris 30.0 2200.0
Paris 1150.0 2200.0
If you need to retain the original order, do a reset_index
first and sort on it later.