Home > database >  Pandas Merging Multiple Columns at the Same Between Two Dataframes
Pandas Merging Multiple Columns at the Same Between Two Dataframes

Time:09-29

I'm trying to find a way to merge in multiple columns at the same time with Pandas. I have the output I want by doing five separate merges, but it feels like there should be a more pythonic way to do it.

Essentially I have a dataframe with five keyword columns in a dataframe called df_striking which I'm trying to merge in search volume data from another dataframe (called df_keyword_vol) into adjacent rows.

Minimum Reproducible Example:

import pandas as pd

striking_data = {
    "KW1": ["nectarine", "apricot", "plum"],
    "KW1 Vol": ["", "", ""],
    "KW2": ["apple", "orange", "pear"],
    "KW2 Vol": ["", "", ""],
    "KW3": ["banana", "grapefruit", "cherry"],
    "KW3 Vol": ["", "", ""],
    "KW4": ["kiwi", "lemon", "peach"],
    "KW4 Vol": ["", "", ""],
    "KW5": ["raspberry", "blueberry", "berries"],
    "KW5 Vol": ["", "", ""],
}

df_striking = pd.DataFrame(striking_data)

keyword_vol_data = {
    "Keyword": [
        "nectarine",
        "apricot",
        "plum",
        "apple",
        "orange",
        "pear",
        "banana",
        "grapefruit",
        "cherry",
        "kiwi",
        "lemon",
        "peach",
        "raspberry",
        "blueberry",
        "berries",
    ],
    "Volume": [
        1000,
        500,
        200,
        600,
        800,
        1000,
        450,
        10,
        900,
        1200,
        150,
        700,
        400,
        850,
        1000,
    ],
}

df_keyword_vol = pd.DataFrame(keyword_vol_data)

Desired Output

enter image description here

What I've tried. I've made two functions to merge the keyword data a row a time, but it's just not very pythonic!

# two functions to merge in the keyword volume data for KWs 1 - 5
def merger(col1, col2):
    dx = df_striking.merge(df_keyword_vol, how='left', left_on=col1, right_on=col2)
    return dx

def volume(vol1, vol2):
    vol = df_striking[vol1] = df_striking[vol2]
    df_striking.drop(['Keyword', 'Volume'], axis=1, inplace=True)
    return vol

df_striking = merger("KW1", "Keyword")
volume("KW1 Vol", "Volume")
df_striking = merger("KW2", "Keyword")
volume("KW2 Vol", "Volume")
df_striking = merger("KW3", "Keyword")
volume("KW3 Vol", "Volume")
df_striking = merger("KW4", "Keyword")
volume("KW4 Vol", "Volume")
df_striking = merger("KW5", "Keyword")
volume("KW5 Vol", "Volume")

CodePudding user response:

If you already have the empty columns, you can use:

mapping = df_keyword_vol.set_index('Keyword')['Volume']

df_striking.iloc[:, 1::2] = df_striking.iloc[:, ::2].replace(mapping)

Else, if you only have the KWx columns:

df2 = (pd.concat([df, df.replace(mapping)], axis=1)
         .sort_index(axis=1)
       )

output:

         KW1   KW1     KW2   KW2         KW3  KW3    KW4   KW4        KW5   KW5
0  nectarine  1000   apple   600      banana  450   kiwi  1200  raspberry   400
1    apricot   500  orange   800  grapefruit   10  lemon   150  blueberry   850
2       plum   200    pear  1000      cherry  900  peach   700    berries  1000

CodePudding user response:

It’s easier if you transform it all to a long format:

>>> striking = df_striking.filter(regex='KW[0-9]*$').stack().rename('Keyword').reset_index()
>>> joined = striking.merge(df_keyword_vol)
>>> joined
  level_0 level_1     Keyword  Volume
0       0     KW1   nectarine    1000
1       0     KW2       apple     600
2       0     KW3      banana     450
3       0     KW4        kiwi    1200
4       0     KW5   raspberry     400
5       1     KW1     apricot     500
6       1     KW2      orange     800
7       1     KW3  grapefruit      10
8       1     KW4       lemon     150
9       1     KW5   blueberry     850
10      2     KW1        plum     200
11      2     KW2        pear    1000
12      2     KW3      cherry     900
13      2     KW4       peach     700
14      2     KW5     berries    1000

Then you can get the original format with .pivot, but with a multi-index as columns:

>>> joined.pivot('index', 'level_1', ['Keyword', 'Volume'])
           Keyword                                       Volume                       
level_1        KW1     KW2         KW3    KW4        KW5    KW1   KW2  KW3   KW4   KW5
index                                                                                 
0        nectarine   apple      banana   kiwi  raspberry   1000   600  450  1200   400
1          apricot  orange  grapefruit  lemon  blueberry    500   800   10   150   850
2             plum    pear      cherry  peach    berries    200  1000  900   700  1000

We can get around that weird format with a pd.concat:

>>> pd.concat([
...     joined.pivot('index', 'level_1', 'Keyword'),
...     joined.pivot('index', 'level_1', 'Volume').add_suffix(' Vol')
... ], axis='columns').sort_index(axis='columns')
level_1        KW1  KW1 Vol     KW2  KW2 Vol         KW3  KW3 Vol    KW4  KW4 Vol        KW5  KW5 Vol
index                                                                                                
0        nectarine     1000   apple      600      banana      450   kiwi     1200  raspberry      400
1          apricot      500  orange      800  grapefruit       10  lemon      150  blueberry      850
2             plum      200    pear     1000      cherry      900  peach      700    berries     1000
  • Related