I am scraping web content with Beautifulsoup, Python and I would like to manipulate the following strings:
'Induktora 28" 36V/14 Ah | 16.5" Bordo'
'Induktora 28" 36V/14 Ah | 18" Bordo'
'Induktora 26" 36V/14 Ah | 16.5" Black Matte/Red'
'Induktora 26" 36V/14 Ah | 18" Black Matte/Red'
I would like to get:
- word after "
|
" and contains quote at the end"
- word(s) after "
|
" and after the quote"
(if there is any)
Example:
str='Induktora 28" 36V/14 Ah | 16.5" Bordo'
size='16.5"'
color='Bordo'
newtitle='Induktora 28" 36V/14 Ah'
str='Induktora 26" 36V/14 Ah | 18" Black Matte/Red'
size='18"'
color='Black Matte/Red'
newtitle='Induktora 26" 36V/14 Ah'
CodePudding user response:
You'd probably use the built-in re
module for that. Your pattern would probably look something like \| ([\d\.] )" (.*)$
.
If your regex pattern isn't doing what you expect, you can debug it at a site like https://pythex.org/.