I have thousands of lines of the following sample.
"[('entryA', 'typeA'), ('entryB', 'typeB'), ('entryC', 'typeC'), ('entryD', 'typeD')]"
My question is how to extract the first entry of each parenthesis and put it in the following format?
"entries" : ["entryA", "entryB", "entryC", "entryD"]
My code:
s = "[('entryA', 'typeA'), ('entryB', 'typeB'), ('entryC', 'typeC'), ('entryD', 'typeD')]"
result = re.findall('\(\'.*?,', s)
print("\"entries\":",result)
Current output:
"entries": ["('entryA',", "('entryB',", "('entryC',", "('entryD',"]
CodePudding user response:
You need to use lookahead and lookbehind regexs to do the following
s = "[('entryA', 'typeA'), ('entryB', 'typeB'), ('entryC', 'typeC'), ('entryD', 'typeD')]"
result = re.findall("(?<=\(').*?(?=',)", s)
print("\"entries\":",result)
Lookahead: (?=EXPR)
looks what is directly ahead the element.
Lookbehind: (?<=EXPR)
looks what is directly behind the element.
CodePudding user response:
Here's a better way.
import ast
s = ast.literal_eval(s)
entries = [a[0] for a in s]
CodePudding user response:
you don't need re
, use ast.literal_eval
>>> s = "[('entryA', 'typeA'), ('entryB', 'typeB'), ('entryC', 'typeC'), ('entryD', 'typeD')]"
>>> from ast import literal_eval
>>> literal_eval(s)
[('entryA', 'typeA'), ('entryB', 'typeB'), ('entryC', 'typeC'), ('entryD', 'typeD')]
>>> out = [i[0] for i in literal_eval(s)]
>>> out
['entryA', 'entryB', 'entryC', 'entryD']