Home > Software engineering >  pandas str match is not working after " "
pandas str match is not working after " "

Time:05-06

pandas str.match not working after " ".

import pandas as pd

df = pd.DataFrame()

df['a'] = ["Huwaei p30 4GB 256GB"]

b = df.loc[df['a'].str.match(f"Huwaei p30 4GB 256GB")]
c = df.loc[df['a'].str.match(f"Huwaei p30 4GB ")]

print('b: ', b)
print('c: ', c)

b doesn't work, returns empty.

c works, detects the " ", returns the row

Any idea? This is turning me crazy. How can I match the whole string with the " ".

Thanks

CodePudding user response:

b = df.loc[df['a'].str.match(f"Huwaei p30 4GB\ 256GB")]
c = df.loc[df['a'].str.match(f"Huwaei p30 4GB\ ")]
Output
b:                        a
0  Huwaei p30 4GB 256GB
c:                        a
0  Huwaei p30 4GB 256GB

Those are not doing pattern matching as per the regular expression provided. (ref)

Just for even clear understanding with your match pattern, if you change original entry to be df['a'] = ["Huwaei p30 4GB256GB"] both will return the row since in regex means it would match the preceding entry as many times as possible. To be more precise, in previous case it matches still Huwaei p30 4GB and then it tries to match it multiple times but do not find anything but also it doesn't find a match for the next 2 character.

CodePudding user response:

You need to escape as it is a special regex character:

df.loc[df['a'].str.match(f"Huwaei p30 4GB\ 256GB")]

Alternatively, if you do not need a regex, use startswith:

df.loc[df['a'].str.startswith(f"Huwaei p30 4GB 256GB")]

output:

                      a
0  Huwaei p30 4GB 256GB
  • Related