I have a input like this, and I want to check where a substring in col_1
exists in col_2
or not. What I wanted to try contains two step: split text from col_1
and then use for loop to do the comparison with col_2
. I'm wondering if I can achieve this via df.apply
?
INPUT>
dct = {'col_1': ['X_a', 'Y_b'],
'col_2': ['a_b_c', 'c_d_e',]}
df = pd.DataFrame(dct)
EXPECT RESULT>
col_1 col_2 result
0 X_a a_b_c True
1 Y_b c_d_e False
CodePudding user response:
You can use df.apply
with axis=1
. This mean that it will apply function to each row.
>>> import pandas as pd
>>>
>>> dct = {'col_1': ['X_a', 'Y_b'],
... 'col_2': ['a', 'c',]}
>>> df = pd.DataFrame(dct)
>>>
>>> def check_substring(row):
... _, second = row.col_1.split("_")
... return second in row.col_2
...
>>> df["result"] = df.apply(check_substring, axis=1)
>>> print(df)
col_1 col_2 result
0 X_a a True
1 Y_b c False
CodePudding user response:
Do you need something involving set intersections?
df['result'] = (
df['col_1'].str.split('_').map(set) & df['col_2'].str.split('_').map(set))
df
col_1 col_2 result
0 X_a a_b_c True
1 Y_b c_d_e False
CodePudding user response:
Try:
df['result'] = np.nan
for i in range(len(df)):
df['result'][i] = df['col_2'][i] in list(df['col_1'][i])
CodePudding user response:
This is a one liner using apply and an inner loop.
df['result'] = df.apply(lambda x: any(y in x['col_1'] for y in x['col_2'].split('_')), axis=1)
Exmaple
dct = {'col_1': ['X_a', 'Y_b'],
'col_2': ['a_b_c', 'c_d_e',]}
df = pd.DataFrame(dct)
df['result'] = df.apply(lambda x: any(y in x['col_1'] for y in x['col_2'].split('_')), axis=1)
>>> df
col_1 col_2 result
0 X_a a_b_c True
1 Y_b c_d_e False