Home > Back-end >  merge two dataframes and create/fill non-existing rows from other one
merge two dataframes and create/fill non-existing rows from other one

Time:10-20

Lets say we have two dataframe called df which is the data frame that we want modification and a reference data frame as the data frame should look like;

but df is sometimes partially missing some rows from the reference dataframe

import pandas as pd



df = pd.DataFrame({'gr1f': ['A','A','B','A','B','A','B'],
                   'gr2p': ['CC','CC','CC','CC','CC','CC','CC'],

                   'gr3a': ['AL','AL','AL','DEL','DEL','DEL','DEL'],

                   'DP': [ 'l1yrs', 'l10yrs','l10yrs','l1yrs','l1yrs','l10yrs','l10yrs'],

                    'num_vals1' :[
                                1.138044999,
                                1.221786568,
                                1.7,
                                1.159030763,
                                1.2,
                                1.5,
                                1.238341765 ],
                    'num_vals2' :[
                                  0.166637328,
                                  0.04513741,
                                  0.12,
                                  0.141340498,
                                  0.87,
                                  0.033441602,
                                  0.01233441602]})
df

enter image description here

reference_df = pd.DataFrame({
                   'gr2p': ['CC','CC','CC','CC','DD','DD','DD','DD'],

                   'gr3a': ['AL','AL','AL','AL','DEL','DEL','DEL','DEL'],

                   'DP': ['l1yrs', 'l1yrs', 'l10yrs','l10yrs','l1yrs','l1yrs','l10yrs','l10yrs']})

reference_df

enter image description here

to merge this two data frame I used right join enter image description here

The expected output that I need to have

   gr1f gr2p    gr3a    DP     num_vals1    num_vals2   source
0   A   CC       AL    l10yrs   1.221787    0.045137    df
1   B   CC       AL    l10yrs   1.700000    0.120000    df
2   A   CC       AL    l1yrs    1.138045    0.166637    df
3   B   CC       AL    l1yrs    NaN         NaN         reference_df 
4   A   CC       DEL   l10yrs   1.50000     0.033441602 df
5   B   CC       DEL   l10yrs   1.238342    0.012334    df
6   A   CC       DEL   l1yrs    1.15903     0.1413404   df
7   B   CC       DEL   l1yrs    1.20000     0.87        df
8   A   DD       DEL   l10yrs   NaN         NaN         reference_df 
9   B   DD       DEL   l10yrs   NaN         NaN         reference_df
10  A   DD       DEL   l1yrs    NaN         NaN         reference_df
11  B   DD       DEL   l1yrs    NaN         NaN         reference_df

how can we achieve this final table with filling missing rows gr1f with A and B combinations and DP with 'l10yrs, l1yrs' combination. I think the expected output is showing the pattern so its a good guide.

CodePudding user response:

There's indicator option you can use. Also, you're not looking for right merge, you are looking for outer merge:

df1 = (df.merge(reference_df[['gr2p','gr3a','DP']]
                .assign(gr1f=['A','B']*(len(reference_df)//2)),
                on = ['gr1f','gr2p','gr3a','DP'], 
                how='outer', indicator=True)
        .sort_values(['gr2p','gr3a'])
        .assign(source=lambda x: x._merge.map({'both':'df', 'right_only':'ref', 'left_only':'df'}))
        .reset_index(drop=True)
      )

Output:

   gr1f gr2p gr3a      DP  num_vals1  num_vals2      _merge source
0     A   CC   AL   l1yrs   1.138045   0.166637        both     df
1     A   CC   AL  l10yrs   1.221787   0.045137        both     df
2     B   CC   AL  l10yrs   1.700000   0.120000        both     df
3     B   CC   AL   l1yrs        NaN        NaN  right_only    ref
4     A   CC  DEL   l1yrs   1.159031   0.141340   left_only     df
5     B   CC  DEL   l1yrs   1.200000   0.870000   left_only     df
6     A   CC  DEL  l10yrs   1.500000   0.033442   left_only     df
7     B   CC  DEL  l10yrs   1.238342   0.012334   left_only     df
8     A   DD  DEL   l1yrs        NaN        NaN  right_only    ref
9     B   DD  DEL   l1yrs        NaN        NaN  right_only    ref
10    A   DD  DEL  l10yrs        NaN        NaN  right_only    ref
11    B   DD  DEL  l10yrs        NaN        NaN  right_only    ref
  • Related