I need to extract a specific word after a word using regex-CodePudding

I have this series:

[Spanish | Intermediate; Portuguese | Native; English | Advanced,
French | Intermediate; Spanish | Native; English | Native,
Spanish | Native; English | Intermediate,
Portuguese | Native; English | Intermediate; Spanish | Intermediate ]

I want to use regex to extract the Spanish followed by the level like; Spanish | Native.

I used:

y =[]
for i in la:
    x = re.findall(r"[Spanish [^a-z] [^a-z] [^a-z] "
                   r"Intermediate|Advanced|Native|Beginner]", i)
    y.append(x)

but not good result.

CodePudding user response：

To get the groups of Language | Level, you can use \w \s\|\s\w . This looks for a word, then whitespace, then a pipe, then whitespace, then a word.

CodePudding user response：

y =[]
for i in la:
    x = re.findall(r"Spanish.*?\|(.*?);", i)
    y.append(x)

This would extract & return all the levels associated with Spanish language. The pattern is Spanish then .*? means any characters in non greedy way then followed by | symbol then again .*? means any characters in non greedy way followed by semicolon The bracket around .*? means return only the levels