Home > Software engineering >  Merge a filling DataFrame into a mask, based on index and columns
Merge a filling DataFrame into a mask, based on index and columns

Time:10-29

With the following 'mask' DataFrame:

>>> mask
               city      value_1      value_2
index
0            London           10          NaN
1             Paris          NaN           21
2             Paris           30          NaN
3             Paris          NaN          NaN
4            Berlin            3            5
5            Berlin          NaN           10
6          New York          NaN          NaN

and the following 'filling' frame:

>>> filling
                value_1      value_2
London             1100         2100
Paris              1150         2200
Berlin              NaN         3000
New York           5000          NaN

How to merge filling into mask based on city AND columns so that the resulting DataFrame becomes:

>>> result
               city      value_1      value_2
index
0            London           10         2100
1             Paris         1150           21
2             Paris           30         2200
3             Paris         1150         2200
4            Berlin            3            5
5            Berlin          NaN           10
6          New York         5000          NaN

Conceptually, any value from mask that is NaN is susceptible of being "filled" by the value of filling which matches both its city and its column (value_1 or value_2).

The part I'm struggling with is to make DataFrame.merge() take into account both an index (here, city) and all columns. Either is fine, but to obtain the expected result, it looks like I would need both.

EDIT:

I've tried the following:

>>> expanded = mask[[]].join(filling, on='city')
>>> mask.merge(expanded)

But this just gives me back mask, and all values from expanded are simply ignored (even if the target cell is NaN).

CodePudding user response:

Try fillna with reindex of filling:

mask.fillna(filling.reindex(mask.city).set_index(mask.index))

Output:

           city  value_1  value_2
index                            
0        London     10.0   2100.0
1         Paris   1150.0     21.0
2         Paris     30.0   2200.0
3         Paris   1150.0   2200.0
4        Berlin      3.0      5.0
5        Berlin      NaN     10.0
6      New York   5000.0      NaN

CodePudding user response:

We can use DataFrame.update with overwrite=False for this.

Note: we seperate the methods in different lines, since update is inplace.

mask = mask.set_index("city")
mask.update(filling, overwrite=False)
mask = mask.reset_index()

       city  value_1  value_2
0    London     10.0   2100.0
1     Paris   1150.0     21.0
2     Paris     30.0   2200.0
3     Paris   1150.0   2200.0
4    Berlin      3.0      5.0
5    Berlin      NaN     10.0
6  New York   5000.0      NaN

CodePudding user response:

You can also use combine_first which is designed for this purpose:

print (mask.set_index("city").combine_first(filling))

          value_1  value_2
Berlin        3.0      5.0
Berlin        NaN     10.0
London       10.0   2100.0
New York   5000.0      NaN
Paris      1150.0     21.0
Paris        30.0   2200.0
Paris      1150.0   2200.0

If you need to retain the original order, do a reset_index first and sort on it later.

  • Related