Home > database >  Test if string in column starts with string in another column
Test if string in column starts with string in another column

Time:11-11

I am trying to test if a string in one column starts with a string in another column like so:

>>> import pandas as pd
>>>
>>> df = pd.DataFrame( {'A': ['Sam', 'Ham', 'Pam'], 'B': ['Samuelson', 'Mike', 'Pamela']})
>>> df
     A          B
0  Sam  Samuelson
1  Ham       Mike
2  Pam     Pamela

>>> df.B.str.startswith(df.A)
0   NaN
1   NaN
2   NaN
Name: B, dtype: float64
>>> 

Apparently this does not work. Anyone knows how to accomplish this kind of string comparison?

CodePudding user response:

You can use apply:

df.apply(lambda row: row['B'].startswith(row['A']), axis=1)

which gives:

0     True
1    False
2     True
dtype: bool

or just list comprehension with zip here:

[y.startswith(x) for x,y in zip(df['A'], df['B'])]

If you want a new columns:

df['C'] = [y.startswith(x) for x,y in zip(df['A'], df['B'])]

Output:

     A          B      C
0  Sam  Samuelson   True
1  Ham       Mike  False
2  Pam     Pamela   True

CodePudding user response:

You can use a trick: concatenate A and B and use a regex:

df['C'] = (df['A'] '%' df['B']).str.match(r'^(. )%\1')

output:

     A          B      C
0  Sam  Samuelson   True
1  Ham       Mike  False
2  Pam     Pamela   True

CodePudding user response:

df["C"] = list(map(lambda a, b: a == b[:len(a)], df.A, df.B))
df
#      A          B      C
# 0  Sam  Samuelson   True
# 1  Ham       Mike  False
# 2  Pam     Pamela   True

OR

def foo(a, b):
    return b.startswith(a)

np.vectorize(foo)(df.A, df.B)
  • Related