Comparind Pandas column to evaluate if they contains the same number or set of numbers-CodePudding

I have a dataset with two address information that I need to compare to evaluate if they contains the same number or set of numbers.

This is my dataset

data = [['Road 76', 'Road 12, 55'], ['Road 11, 7-9', 'Road 11, 5'], ['Road 25', 'Street 5']]
df_original = pd.DataFrame( data, columns = ['Address 1', 'Address 2'])

This is the outcome

test_data  = [['Road 76', 'Road 12, 55', 0], ['7-9, Road 11', 'Road 11, 5', 1], ['Road 5', 'Street 25', 0]]
df_outcome = pd.DataFrame(test_data,  columns = ['Address 1', 'Address 2', 'Number Match?'])
df_outcome

This is my attempt, but it only considers the first number appearing in the column

df_original['Address 1'] = df_original['Address 1'].str.extract('(\d )')
df_original['Address 2'] = df_original['Address 2'].str.extract('(\d )')
df_original['Number match'] = np.where(df_original['Address 1']==df_original['Address 2'], 1, 0)

Suggestions?

CodePudding user response：

First get all integers by Series.str.findall, convert values to sets and for intersection use &, last convert to boolean for mapping True->1, False->0:

df_original['Address 1'] = df_original['Address 1'].str.findall('(\d )').apply(set)
df_original['Address 2'] = df_original['Address 2'].str.findall('(\d )').apply(set)

df_original['Number match'] = (df_original['Address 1'] & df_original['Address 2']).astype(int)

print (df_original)
    Address 1 Address 2  Number match
0        {76}  {55, 12}             0
1  {9, 7, 11}   {5, 11}             1
2        {25}       {5}             0