I am trying to test if a string in one column starts with a string in another column like so:
>>> import pandas as pd
>>>
>>> df = pd.DataFrame( {'A': ['Sam', 'Ham', 'Pam'], 'B': ['Samuelson', 'Mike', 'Pamela']})
>>> df
A B
0 Sam Samuelson
1 Ham Mike
2 Pam Pamela
>>> df.B.str.startswith(df.A)
0 NaN
1 NaN
2 NaN
Name: B, dtype: float64
>>>
Apparently this does not work. Anyone knows how to accomplish this kind of string comparison?
CodePudding user response:
You can use apply
:
df.apply(lambda row: row['B'].startswith(row['A']), axis=1)
which gives:
0 True
1 False
2 True
dtype: bool
or just list comprehension with zip
here:
[y.startswith(x) for x,y in zip(df['A'], df['B'])]
If you want a new columns:
df['C'] = [y.startswith(x) for x,y in zip(df['A'], df['B'])]
Output:
A B C
0 Sam Samuelson True
1 Ham Mike False
2 Pam Pamela True
CodePudding user response:
You can use a trick: concatenate A and B and use a regex:
df['C'] = (df['A'] '%' df['B']).str.match(r'^(. )%\1')
output:
A B C
0 Sam Samuelson True
1 Ham Mike False
2 Pam Pamela True
CodePudding user response:
df["C"] = list(map(lambda a, b: a == b[:len(a)], df.A, df.B))
df
# A B C
# 0 Sam Samuelson True
# 1 Ham Mike False
# 2 Pam Pamela True
OR
def foo(a, b):
return b.startswith(a)
np.vectorize(foo)(df.A, df.B)