Home > Back-end >  Pandas how to checkif substring of one column is a substring of another column
Pandas how to checkif substring of one column is a substring of another column

Time:10-07

I have a input like this, and I want to check where a substring in col_1 exists in col_2 or not. What I wanted to try contains two step: split text from col_1 and then use for loop to do the comparison with col_2. I'm wondering if I can achieve this via df.apply ?

INPUT>

dct = {'col_1': ['X_a', 'Y_b'],
       'col_2': ['a_b_c', 'c_d_e',]}
df = pd.DataFrame(dct)

EXPECT RESULT>

  col_1  col_2  result
0   X_a  a_b_c   True
1   Y_b  c_d_e   False

CodePudding user response:

You can use df.apply with axis=1. This mean that it will apply function to each row.

>>> import pandas as pd
>>>
>>> dct = {'col_1': ['X_a', 'Y_b'],
...        'col_2': ['a', 'c',]}
>>> df = pd.DataFrame(dct)
>>>
>>> def check_substring(row):
...     _, second = row.col_1.split("_")
...     return second in row.col_2
...
>>> df["result"] = df.apply(check_substring, axis=1)
>>> print(df)
  col_1 col_2  result
0   X_a     a    True
1   Y_b     c   False

CodePudding user response:

Do you need something involving set intersections?

df['result'] = (
  df['col_1'].str.split('_').map(set) & df['col_2'].str.split('_').map(set))
df

  col_1  col_2  result
0   X_a  a_b_c    True
1   Y_b  c_d_e   False

CodePudding user response:

Try:

df['result'] = np.nan
for i in range(len(df)):
    df['result'][i] = df['col_2'][i] in list(df['col_1'][i])

CodePudding user response:

This is a one liner using apply and an inner loop.

df['result'] =  df.apply(lambda x: any(y in x['col_1'] for y in x['col_2'].split('_')), axis=1)

Exmaple

dct = {'col_1': ['X_a', 'Y_b'],
       'col_2': ['a_b_c', 'c_d_e',]}
df = pd.DataFrame(dct)

df['result'] =  df.apply(lambda x: any(y in x['col_1'] for y in x['col_2'].split('_')), axis=1)
>>> df
  col_1  col_2  result
0   X_a  a_b_c    True
1   Y_b  c_d_e   False
  • Related