Home > database >  Python re, how to capture 12"" / 14""
Python re, how to capture 12"" / 14""

Time:05-19

I need to capture patterns like this one:

12"" / 14""

in

"Factory SP1 150 12"" / 14"""

The numbers change (always 2 digits), the rest doesn't.
Note that the double quotes at the ends of the string are part of the string and not enclosers.

Also note that I'm working with pandas and using .str.extract(pattern).

My code:

df = pd.read_csv(r'filename.csv', delimiter = ';', usecols = ["OLD_COLUMN", "OTHER_COLUMNS"], encoding='utf-8', error_bad_lines=False)

pattern = r'(\d{2}""\s*/\s*\d{2}"")'

df["NEW_COLUMN"] = df["OLD_COLUMN"].str.extract(pattern)

I changed groups, tried to escape every character. I can't find a way.

CodePudding user response:

You can use r'\d{2}""\s*/\s*\d{2}""' as regex:

s = '"Factory SP1 150 12"" / 14"""'
re.findall(r'\d{2}""\s*/\s*\d{2}""', s)

output:

['12"" / 14""']

Be careful with your strings: "Factory SP1 150 12"" / 14""" is equivalent to: "Factory SP1 150 12" " / 14" "" so 'Factory SP1 150 12 / 14'

CodePudding user response:

pattern = '([0-9] ""\s*/\s*[0-9] "")'

Is a regex that will match that along with other expressions like 1351""/1"". The issue is your use of the r or raw string. It causes your \ in the pattern to be interpreted as literally \. So your original pattern would only match strings like 12\"\" / 14\"\"

  • Related