The following code will tell me if there are partial matches (via the True values in the final column):
import pandas as pd
x = {'Non-Suffix' : ['1234567', '1234568', '1234569', '1234554'], 'Suffix' : ['1234567:C', '1234568:VXCF', '1234569-01', '1234554-01:XC']}
x = pd.DataFrame(x)
x['"Non-Suffix" Partial Match in "Suffix"?'] = x.apply(lambda row: row['Non-Suffix'] in row['Suffix'], axis=1)
x
However, if I re-arrange the values in the second column, I'll get False values:
x = {'Non-Suffix' : ['1234567', '1234568', '1234569', '1234554'], 'Suffix' : ['1234568:VXCF', '1234567:C', '1234554-01:XC', '1234569-01']}
x = pd.DataFrame(x)
x['"Non-Suffix" Partial Match in "Suffix"?'] = x.apply(lambda row: row['Non-Suffix'] in row['Suffix'], axis=1)
x
Is there a way I can get the second block of code to find these partial matches even if they're not in the same row?
Also, instead of 'True/False' values, is there a way for me to have the value of 'Partial Match Exists!' instead of True, and 'Partial Match Does Not Exist!' instead of False?
CodePudding user response:
You can join the Non-Suffix
column value with |
then use Series.str.contains
to check if contain any value
x['"Non-Suffix" Partial Match in "Suffix"?'] = x['Suffix'].str.contains('|'.join(x['Non-Suffix']))
print(x)
Non-Suffix Suffix "Non-Suffix" Partial Match in "Suffix"?
0 1234567 1234568:VXCF True
1 1234568 1234567:C True
2 1234569 1234554-01:XC True
3 1234554 1234569-01 True
Above solution checks if Suffix
contains any of Non-Suffix
, if you want to do the reverse, you might do
x['"Non-Suffix" Partial Match in "Suffix"?'] = x['Non-Suffix'].apply(lambda v: x['Suffix'].str.contains(v).any())
print(x)
Non-Suffix Suffix "Non-Suffix" Partial Match in "Suffix"?
0 879 1234568:VXCF False
1 1234568 1234567:C True
2 1234569 1234554-01:XC True
3 1234554 1234569-01 True