Home > Mobile >  how match 2 columns of two dataframe based on their 2 first letter?
how match 2 columns of two dataframe based on their 2 first letter?

Time:07-23

How I can pair columns of 2 data frames with the following assumption:

If the first 2 letters of col1 in df1 is the same as the first 2 leeters of col2 in df2 match.

df1:

col1
'summer'
'good heather'
'sooner come'
'keep up great work'
'really'


df2:
col2
'nice to be'
'good'
'remember'
'recall me'

output

col1                col2
'good heather'     'good'
'really'          'remember'
'really'          'recall me'

I have a big data set and looking for efficient method

CodePudding user response:

import pandas as pd
col1 = pd.DataFrame(
    ['summer',
'good heather',
'sooner come',
'keep up great work',
'really'])

col2 = pd.DataFrame([
    'nice to be',
'good',
'remember',
'recall me'
])
col1["first"] = col1[0].str[:2]
col2["first"] = col2[0].str[:2]
res = pd.merge(col1, col2, how='cross')
res[res['first_x'] == res['first_y']][["0_x", "0_y"]].reset_index(drop=True)

output:

    0_x             0_y
1   good heather    good
2   really          remember
3   really          recall me
  • Related