I want to extract 2 lists of words that are connected by the sign =. The regex code works for separate lists but not in combination.
Example string: bla word1="word2" blabla abc="xyz" bla bla
One output shall contain the words directly left of =, i.e. word1, abc and the other output shall contain the words directly right of =, i.e. word2, xyz without quotes.
\w (?==\"(?:(?!\").)*\")
extracts the words left of =, i.e. word1,abc
=\"(?:(?!\").)*\"
extracts the words right of = including quotes and =, i.e. ="word2",="xyz"
How can I combine these 2 queries to a single regex-expression that outputs 2 groups? Quotes and equal signs shall not be outputted.
CodePudding user response:
You can use
([^\s=] )="([^"]*)"
See the regex demo. Details:
([^\s=] )
- Group 1: one or more occurrences of a char other than whitespace and=
char="
- a="
substring([^"]*)
- Group 1: zero or more chars other than"
char"
- a"
char.
Note: \w
only matches one or more letters, digits and underscores, and won't match if the keys contain, say, hyphens. (?:(?!\").)*
tempered greedy token is not efficient, and does not match line break chars. As the negative lookahead only contains a single char pattern (\.
), it is more efficient to write it as a negated character class, [^.]*
. It also matches line break chars. If you do not want that behavior, just add the \r\n
into the negated character class.
CodePudding user response:
This should do what you want:
(?: (\w*)=)(?:\"(\w*)\")
This is for a python regex.
You can see it working here.
CodePudding user response:
If you are looking for lhs and rhs from lhs="rhs" this should work (Sorry this what I understood from your question)
import re
test_str='abc="def" ghi'
ans=re.search("(\w )=\"(\w )\"",test_str)
print(ans.group(1))
print(ans.group(2))
my_list=list(ans.groups())
print(my_list)