I am trying to perform a simple regex pattern match in python to tell whether an input string is a comma separated list. Examples of input list are,
input = '[1]'
input = "['Yes']"
input = '["Yes"]'
input = '["Yes",1,"No"],["HIGH","MEDIUM","LOW"]'
input = "[1,2,3], ['High', 'Medium', 'Low']"
etc. Now when I try to match the regex pattern for a single list, it works okay. So for a single list I do the below,
import re
pattern = re.compile(r'^\[(((\". \")|(\'. \')|(\d )),?) \]$')
input = '["Yes", 12, "No"]'
print(pattern.match(input))
print(pattern.match(input).string)
and I get the desired output
<re.Match object; span=(0, 17), match='["Yes", 12, "No"]'>
["Yes", 12, "No"]
However, for testing a similar pattern on a string containing multiple lists.
import re
pattern = re.compile(r'^((\[(((\". \")|(\'. \')|(\d )),?) \]),?) $')
input = "[1,2,3],['High','Medium','Low']"
print(pattern.match(input))
print(pattern.match(input).string)
This works okay and I get the below output.
<re.Match object; span=(0, 31), match="[1,2,3],['High','Medium','Low']">
[1,2,3],['High','Medium','Low']]
However, if I want to find individual lists using the regex findall method, it doesn't work. So, if I do the below. Note that the pattern below is for a single list item without the line beginning ^ and line ending $ symbols.
import re
pattern = re.compile(r'\[(((\". \")|(\'. \')|(\d )),?) \]')
input = "[1,2,3],['High','Medium','Low']"
pattern.findall(input)
I get the output:
[('3', '3', '', '', '3'),
("'High','Medium','Low'",
"'High','Medium','Low'",
'',
"'High','Medium','Low'",
'')]
So the matching completely ignored the list [1,2,3]. Further for the match 'High','Medium','Low' is missing the list beginning '[' and ending ']'.
Also, I am wondering if there is a better way to write this regex without using ast.literal_eval.
CodePudding user response:
You could use r'(\[(((\". \")|(\'. \')|(\d )),?) \])'
as the pattern and then extract the desired matches, i.e.
pattern = re.compile(r'(\[(((\". \")|(\'. \')|(\d )),?) \])')
matches = list(map(lambda x:x[0], pattern.findall(s)))
Then matches will be ['[1,2,3]', "['High','Medium','Low']"]
CodePudding user response:
I don't think I got the whole idea of all this, so sorry for any misunderstanding by my part.
I tried using the pattern (\[. ?\]),?
with re.findall()
and got the following output:
import re
pattern = re.compile(r'(\[. ?\]),?')
input = "[1,2,3], ['High', xyz'Medium','Low']"
>>> pattern.findall(input)
['[1,2,3]', "['High', xyz'Medium','Low']"]
Is this what you meant to get?
CodePudding user response:
This is what worked in the end:
import re
pattern = re.compile(r'(\[(((\"[^,\"\'\[\]] \")|(\'[^,\"\'\[\]] \')|(\d )),?\s?) \])')
input = "[1,2,3], [1,2,'Low'], ['High', 'Medium','Low']"
matches = list(map(lambda x:x[0], pattern.findall(input)))
matches
Output:
['[1,2,3]', "[1,2,'Low']", "['High', 'Medium','Low']"]
If you provide a garbage value in one of the lists in the input it won't consider it. Say for example if the input was:
input = "[1,2,3], [1,2,'Low'], ['High', xyz'Medium','Low']"
the output would only contain the valid lists:
['[1,2,3]', "[1,2,'Low']"]
I came up with this solution with the answer provided by @Konny.