I'm trying to find all combinations of a substring found in a single column, and then explode the dataframe with all possible combinations of each word.
Example Dataframe
URL Keyword
0 http://www.amazon.com Amazon Lightning Sale
1 https://www.ebay.com Shop eBay Today
Desired Output
URL Keyword
0 http://www.amazon.com Amazon Lightning Sale
1 http://www.amazon.com Amazon Sale Lightning
2 http://www.amazon.com Lightning Amazon Sale
3 http://www.amazon.com Sale Amazon Lightning
4 http://www.amazon.com Sale Lightning Amazon
5 http://www.amazon.com Lightning Sale Amazon
6 https://www.ebay.com Shop eBay Today
7 https://www.ebay.com Shop Today eBay
8 https://www.ebay.com eBay Shop Today
9 https://www.ebay.com eBay Today Shop
10 https://www.ebay.com Today eBay Shop
11 https://www.ebay.com Today Shop eBay
Minimum Reproducable Example
import pandas as pd
# initialize data of lists.
data = {'URL': ['http://www.amazon.com', 'https://www.ebay.com'],
'Keyword': ["Amazon Lightning Sale", "Shop eBay Today"]}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
print(df)
I've tried the solution here: Pandas DataFrame Combinations and expand but it's not quite what I need.
CodePudding user response:
from itertools import permutations
df['Keyword'] = df['Keyword'].apply(lambda x: list(permutations(x.split())))
df.explode('Keyword', ignore_index=True)
First the itertools.permutations
method applied to the Keyword column will create all possible combinations of keywords as a list.
Next you can use the pandas.DataFrame.explode
function to create many items from the created lists.
If you really want a full string instead of a tuple of keywords, you can replace the list(...)
part with a string join: [" ".join(t) for t in permutations(x.split())]
CodePudding user response:
create an ID for each row
#create an id: df['ID'] = range(df.shape[0]) URL Keyword ID 0 http://www.amazon.com Amazon Lightning Sale 0 1 https://www.ebay.com Shop eBay Today 1
Create the combinations for each row as a new DataFrame, or dict or ...
import re import itertools def create_combinations(id, kw): # split the keywords by word: words = re.split('\W ', kw) return pd.DataFrame([ {'ID':id, 'Combinations' : ' '.join(x) } for x in itertools.permutations(words)]) # create the combinations data = [] for id, kw in zip(df.ID, df.Keyword): data.append( create_combinations(id, kw) )
Join
# join the data pd.merge(df, pd.concat(data), on = ['ID']) URL Keyword ID Combinations 0 http://www.amazon.com Amazon Lightning Sale 0 Amazon Lightning Sale 1 http://www.amazon.com Amazon Lightning Sale 0 Amazon Sale Lightning 2 http://www.amazon.com Amazon Lightning Sale 0 Lightning Amazon Sale 3 http://www.amazon.com Amazon Lightning Sale 0 Lightning Sale Amazon 4 http://www.amazon.com Amazon Lightning Sale 0 Sale Amazon Lightning 5 http://www.amazon.com Amazon Lightning Sale 0 Sale Lightning Amazon 6 https://www.ebay.com Shop eBay Today 1 Shop eBay Today 7 https://www.ebay.com Shop eBay Today 1 Shop Today eBay 8 https://www.ebay.com Shop eBay Today 1 eBay Shop Today 9 https://www.ebay.com Shop eBay Today 1 eBay Today Shop 10 https://www.ebay.com Shop eBay Today 1 Today Shop eBay 11 https://www.ebay.com Shop eBay Today 1 Today eBay Shop
Joost Döbken's answer is a bit more elegant
CodePudding user response:
Here is an alternative way without using itertools:
(df.assign(Keyword = df['Keyword'].str.split().map(lambda x: [[i,j,k] for i in x for j in x for k in x if len({i,j,k})==len(x)]))
.explode('Keyword')
.assign(Keyword = lambda x: x['Keyword'].str.join(' ')))
Output:
URL Keyword
0 http://www.amazon.com Amazon Lightning Sale
0 http://www.amazon.com Amazon Sale Lightning
0 http://www.amazon.com Lightning Amazon Sale
0 http://www.amazon.com Lightning Sale Amazon
0 http://www.amazon.com Sale Amazon Lightning
0 http://www.amazon.com Sale Lightning Amazon
1 https://www.ebay.com Shop eBay Today
1 https://www.ebay.com Shop Today eBay
1 https://www.ebay.com eBay Shop Today
1 https://www.ebay.com eBay Today Shop
1 https://www.ebay.com Today Shop eBay
1 https://www.ebay.com Today eBay Shop