I'm trying to find a way to merge in multiple columns at the same time with Pandas. I have the output I want by doing five separate merges, but it feels like there should be a more pythonic way to do it.
Essentially I have a dataframe with five keyword columns in a dataframe called df_striking which I'm trying to merge in search volume data from another dataframe (called df_keyword_vol) into adjacent rows.
Minimum Reproducible Example:
import pandas as pd
striking_data = {
"KW1": ["nectarine", "apricot", "plum"],
"KW1 Vol": ["", "", ""],
"KW2": ["apple", "orange", "pear"],
"KW2 Vol": ["", "", ""],
"KW3": ["banana", "grapefruit", "cherry"],
"KW3 Vol": ["", "", ""],
"KW4": ["kiwi", "lemon", "peach"],
"KW4 Vol": ["", "", ""],
"KW5": ["raspberry", "blueberry", "berries"],
"KW5 Vol": ["", "", ""],
}
df_striking = pd.DataFrame(striking_data)
keyword_vol_data = {
"Keyword": [
"nectarine",
"apricot",
"plum",
"apple",
"orange",
"pear",
"banana",
"grapefruit",
"cherry",
"kiwi",
"lemon",
"peach",
"raspberry",
"blueberry",
"berries",
],
"Volume": [
1000,
500,
200,
600,
800,
1000,
450,
10,
900,
1200,
150,
700,
400,
850,
1000,
],
}
df_keyword_vol = pd.DataFrame(keyword_vol_data)
Desired Output
What I've tried. I've made two functions to merge the keyword data a row a time, but it's just not very pythonic!
# two functions to merge in the keyword volume data for KWs 1 - 5
def merger(col1, col2):
dx = df_striking.merge(df_keyword_vol, how='left', left_on=col1, right_on=col2)
return dx
def volume(vol1, vol2):
vol = df_striking[vol1] = df_striking[vol2]
df_striking.drop(['Keyword', 'Volume'], axis=1, inplace=True)
return vol
df_striking = merger("KW1", "Keyword")
volume("KW1 Vol", "Volume")
df_striking = merger("KW2", "Keyword")
volume("KW2 Vol", "Volume")
df_striking = merger("KW3", "Keyword")
volume("KW3 Vol", "Volume")
df_striking = merger("KW4", "Keyword")
volume("KW4 Vol", "Volume")
df_striking = merger("KW5", "Keyword")
volume("KW5 Vol", "Volume")
CodePudding user response:
If you already have the empty columns, you can use:
mapping = df_keyword_vol.set_index('Keyword')['Volume']
df_striking.iloc[:, 1::2] = df_striking.iloc[:, ::2].replace(mapping)
Else, if you only have the KWx
columns:
df2 = (pd.concat([df, df.replace(mapping)], axis=1)
.sort_index(axis=1)
)
output:
KW1 KW1 KW2 KW2 KW3 KW3 KW4 KW4 KW5 KW5
0 nectarine 1000 apple 600 banana 450 kiwi 1200 raspberry 400
1 apricot 500 orange 800 grapefruit 10 lemon 150 blueberry 850
2 plum 200 pear 1000 cherry 900 peach 700 berries 1000
CodePudding user response:
It’s easier if you transform it all to a long format:
>>> striking = df_striking.filter(regex='KW[0-9]*$').stack().rename('Keyword').reset_index()
>>> joined = striking.merge(df_keyword_vol)
>>> joined
level_0 level_1 Keyword Volume
0 0 KW1 nectarine 1000
1 0 KW2 apple 600
2 0 KW3 banana 450
3 0 KW4 kiwi 1200
4 0 KW5 raspberry 400
5 1 KW1 apricot 500
6 1 KW2 orange 800
7 1 KW3 grapefruit 10
8 1 KW4 lemon 150
9 1 KW5 blueberry 850
10 2 KW1 plum 200
11 2 KW2 pear 1000
12 2 KW3 cherry 900
13 2 KW4 peach 700
14 2 KW5 berries 1000
Then you can get the original format with .pivot
, but with a multi-index as columns:
>>> joined.pivot('index', 'level_1', ['Keyword', 'Volume'])
Keyword Volume
level_1 KW1 KW2 KW3 KW4 KW5 KW1 KW2 KW3 KW4 KW5
index
0 nectarine apple banana kiwi raspberry 1000 600 450 1200 400
1 apricot orange grapefruit lemon blueberry 500 800 10 150 850
2 plum pear cherry peach berries 200 1000 900 700 1000
We can get around that weird format with a pd.concat
:
>>> pd.concat([
... joined.pivot('index', 'level_1', 'Keyword'),
... joined.pivot('index', 'level_1', 'Volume').add_suffix(' Vol')
... ], axis='columns').sort_index(axis='columns')
level_1 KW1 KW1 Vol KW2 KW2 Vol KW3 KW3 Vol KW4 KW4 Vol KW5 KW5 Vol
index
0 nectarine 1000 apple 600 banana 450 kiwi 1200 raspberry 400
1 apricot 500 orange 800 grapefruit 10 lemon 150 blueberry 850
2 plum 200 pear 1000 cherry 900 peach 700 berries 1000