Home > OS >  Pandas: merge dataframes based on occurrence value in a list
Pandas: merge dataframes based on occurrence value in a list

Time:06-30

I have two dataframes, which I try to merge.

The first one is the information about frequency of some entity (for example, keyword)

 keyword            freq
 specific code      3
 algorithm          2
 problem            1
 app                1

And the second dataframe with all the texts and their info

text_id   keywords                          text
1         ['specific code', 'algorithm']    We're here to help you with specific coding, algorithm.
2         ['algorithm', 'problem']          You can use this algorithm to solve your problem.
3         ['specific code']                 This specific code will help you.
4         ['specific code', 'app']          How to build a specific code for an app?

I want to merge them and get output dataframe

keyword         freq   text_id    text
specific code   3      1          We're here to help you with specific coding, algorithm.
specific code   3      3          This specific code will help you.
specific code   3      4          How to build a specific code for an app?
algorithm       2      1          We're here to help you with specific coding, algorithm.
algorithm       2      2          You can use this algorithm to solve your problem.
problem         1      2          You can use this algorithm to solve your problem.
app             1      4          How to build a specific code for an app?

I have right now code but it's not elegant at all

keywords = []
freq_keywords = []
ids = []
texts = []

for kw, kw_freq in zip(df_1[kw].values, df_1['freq'].values):
    for _, row in df_2.iterrows():
        if kw in row['keywords']:
            keywords.append(kw)
            freq_keywords.append(kw_freq)
            ids.append(row['text_id'])
            texts.append(row['text'])

output_df = pd.DataFrame({
    'keyword': keywords,
    'freq': freq_keywords,
    'text_id': ids,
    'text': texts,
})

So is it possible to it more elegant? I think about using merge based on occurrence of value in the column keyword from the first dataframe df_1 in the column keywords from the second dataframe df_2, but I didn't find something for that.

CodePudding user response:

IIUC this should get you the results you are expecting

df1 = pd.DataFrame({
    'text_id' : [1, 2, 3, 4],
    'keywords' : [['specific code', 'algorithm'], ['algorithm', 'problem'], ['specific code'], ['specific code', 'app']],
    'Text' : ["We're here to help you with specific coding, algorithm.", 'You can use this algorithm to solve your problem.', 'This specific code will help you.', 'How to build a specific code for an app?']
})

df_merge = pd.merge(df1.explode('keywords'), df, left_on = 'keywords', right_on = 'keyword')
df_merge = df_merge[['keyword', 'freq', 'text_id', 'Text']]

CodePudding user response:

Yes, you should use merge after exploding the keywords column of the first DataFrame

out = (
    df_1.explode('keywords')
        .rename(columns={'keywords': 'keyword'})
        .merge(df_2, on='keyword')
)

print(out)

Output:

text_id keyword text freq
0 1 specific code We're here to help you with specific coding, algorithm. 3
1 3 specific code This specific code will help you. 3
2 4 specific code How to build a specific code for an app? 3
3 1 algorithm We're here to help you with specific coding, algorithm. 2
4 2 algorithm You can use this algorithm to solve your problem. 2
5 2 problem You can use this algorithm to solve your problem. 1
6 4 app How to build a specific code for an app? 1
  • Related