I have this series:
[Spanish | Intermediate; Portuguese | Native; English | Advanced,
French | Intermediate; Spanish | Native; English | Native,
Spanish | Native; English | Intermediate,
Portuguese | Native; English | Intermediate; Spanish | Intermediate ]
I want to use regex to extract the Spanish followed by the level like; Spanish | Native
.
I used:
y =[]
for i in la:
x = re.findall(r"[Spanish [^a-z] [^a-z] [^a-z] "
r"Intermediate|Advanced|Native|Beginner]", i)
y.append(x)
but not good result.
CodePudding user response:
To get the groups of Language | Level
, you can use \w \s\|\s\w
. This looks for a word, then whitespace, then a pipe, then whitespace, then a word.
CodePudding user response:
y =[]
for i in la:
x = re.findall(r"Spanish.*?\|(.*?);", i)
y.append(x)
This would extract & return all the levels associated with Spanish language.
The pattern is Spanish
then .*?
means any characters in non greedy way then followed by |
symbol then again .*?
means any characters in non greedy way followed by semicolon
The bracket around .*?
means return only the levels