I have two dataframes, which I try to merge.
The first one is the information about frequency of some entity (for example, keyword)
keyword freq
specific code 3
algorithm 2
problem 1
app 1
And the second dataframe with all the texts and their info
text_id keywords text
1 ['specific code', 'algorithm'] We're here to help you with specific coding, algorithm.
2 ['algorithm', 'problem'] You can use this algorithm to solve your problem.
3 ['specific code'] This specific code will help you.
4 ['specific code', 'app'] How to build a specific code for an app?
I want to merge them and get output dataframe
keyword freq text_id text
specific code 3 1 We're here to help you with specific coding, algorithm.
specific code 3 3 This specific code will help you.
specific code 3 4 How to build a specific code for an app?
algorithm 2 1 We're here to help you with specific coding, algorithm.
algorithm 2 2 You can use this algorithm to solve your problem.
problem 1 2 You can use this algorithm to solve your problem.
app 1 4 How to build a specific code for an app?
I have right now code but it's not elegant at all
keywords = []
freq_keywords = []
ids = []
texts = []
for kw, kw_freq in zip(df_1[kw].values, df_1['freq'].values):
for _, row in df_2.iterrows():
if kw in row['keywords']:
keywords.append(kw)
freq_keywords.append(kw_freq)
ids.append(row['text_id'])
texts.append(row['text'])
output_df = pd.DataFrame({
'keyword': keywords,
'freq': freq_keywords,
'text_id': ids,
'text': texts,
})
So is it possible to it more elegant?
I think about using merge based on occurrence of value in the column keyword
from the first dataframe df_1
in the column keywords
from the second dataframe df_2
, but I didn't find something for that.
CodePudding user response:
IIUC this should get you the results you are expecting
df1 = pd.DataFrame({
'text_id' : [1, 2, 3, 4],
'keywords' : [['specific code', 'algorithm'], ['algorithm', 'problem'], ['specific code'], ['specific code', 'app']],
'Text' : ["We're here to help you with specific coding, algorithm.", 'You can use this algorithm to solve your problem.', 'This specific code will help you.', 'How to build a specific code for an app?']
})
df_merge = pd.merge(df1.explode('keywords'), df, left_on = 'keywords', right_on = 'keyword')
df_merge = df_merge[['keyword', 'freq', 'text_id', 'Text']]
CodePudding user response:
Yes, you should use merge
after exploding the keywords
column of the first DataFrame
out = (
df_1.explode('keywords')
.rename(columns={'keywords': 'keyword'})
.merge(df_2, on='keyword')
)
print(out)
Output:
text_id | keyword | text | freq | |
---|---|---|---|---|
0 | 1 | specific code | We're here to help you with specific coding, algorithm. | 3 |
1 | 3 | specific code | This specific code will help you. | 3 |
2 | 4 | specific code | How to build a specific code for an app? | 3 |
3 | 1 | algorithm | We're here to help you with specific coding, algorithm. | 2 |
4 | 2 | algorithm | You can use this algorithm to solve your problem. | 2 |
5 | 2 | problem | You can use this algorithm to solve your problem. | 1 |
6 | 4 | app | How to build a specific code for an app? | 1 |