How I can pair columns of 2 data frames with the following assumption:
If the first 2 letters of col1
in df1
is the same as the first 2 leeters of col2
in df2
match.
df1:
col1
'summer'
'good heather'
'sooner come'
'keep up great work'
'really'
df2:
col2
'nice to be'
'good'
'remember'
'recall me'
output
col1 col2
'good heather' 'good'
'really' 'remember'
'really' 'recall me'
I have a big data set and looking for efficient method
CodePudding user response:
import pandas as pd
col1 = pd.DataFrame(
['summer',
'good heather',
'sooner come',
'keep up great work',
'really'])
col2 = pd.DataFrame([
'nice to be',
'good',
'remember',
'recall me'
])
col1["first"] = col1[0].str[:2]
col2["first"] = col2[0].str[:2]
res = pd.merge(col1, col2, how='cross')
res[res['first_x'] == res['first_y']][["0_x", "0_y"]].reset_index(drop=True)
output:
0_x 0_y
1 good heather good
2 really remember
3 really recall me