pandas str.match not working after " ".
import pandas as pd
df = pd.DataFrame()
df['a'] = ["Huwaei p30 4GB 256GB"]
b = df.loc[df['a'].str.match(f"Huwaei p30 4GB 256GB")]
c = df.loc[df['a'].str.match(f"Huwaei p30 4GB ")]
print('b: ', b)
print('c: ', c)
b doesn't work, returns empty.
c works, detects the " ", returns the row
Any idea? This is turning me crazy. How can I match the whole string with the " ".
Thanks
CodePudding user response:
b = df.loc[df['a'].str.match(f"Huwaei p30 4GB\ 256GB")]
c = df.loc[df['a'].str.match(f"Huwaei p30 4GB\ ")]
Output
b: a
0 Huwaei p30 4GB 256GB
c: a
0 Huwaei p30 4GB 256GB
Those are not doing pattern matching as per the regular expression provided. (ref)
Just for even clear understanding with your match
pattern, if you change original entry to be df['a'] = ["Huwaei p30 4GB256GB"]
both will return the row since
in regex means it would match the preceding entry as many times as possible. To be more precise, in previous case it matches still
Huwaei p30 4GB
and then it tries to match it multiple times but do not find anything but also it doesn't find a match for the next 2
character.
CodePudding user response:
You need to escape
as it is a special regex character:
df.loc[df['a'].str.match(f"Huwaei p30 4GB\ 256GB")]
Alternatively, if you do not need a regex, use startswith
:
df.loc[df['a'].str.startswith(f"Huwaei p30 4GB 256GB")]
output:
a
0 Huwaei p30 4GB 256GB