so I have an array that looks like the one below. the "error" substring always starts with this character a special character "‘"
so I was able to just get the errors with something like this
a = [' 276ARDUINO_i2c.c:70:27: error: ‘ARDUINO_I2C_nI2C', ' 248rpy_i2c.h:76:40: error: ‘RPY_I2C_BASE_ADDR_LIST', ' 452rpy_i2c.c:79:77: error: ‘RPY_I2C_IRQ_LIST']
newlist = [x.split('‘')[1] for x in a]
print(newlist)
and the output would look like this
['ARDUINO_I2C_nI2C', 'RPY_I2C_BASE_ADDR_LIST', 'RPY_I2C_IRQ_LIST']
but now, i also need to get the name of the file related to that error. The name of the file always start with a numeric substring that I also need to remove. the output I want would look like this
['ARDUINO_i2c.c', 'ARDUINO_I2C_nI2C'], ['rpy_i2c.h', 'RPY_I2C_BASE_ADDR_LIST'], ['rpy_i2c.c','RPY_I2C_IRQ_LIST']
I'll apreciate any suggestions. thanks.
CodePudding user response:
You could use a regular expression to capture the required parts of your string. For example, the following regex (Try it online):
\d ([^:] ):.*‘(.*)$
Explanation:
-----------
\d : One or more numbers
( ) ( ) : Capturing groups
[^:] : One or more non-colon characters (in capturing group 1)
: : One colon
.* : Any number of any character
‘ : The ‘ character
.* : Any number of any character (in capturing group 2)
$ : End of string
To use it:
import re
regex = re.compile(r"\d ([^:] ):.*‘(.*)$")
newlist = [regex.search(s).groups() for s in a]
which gives a list of tuples:
[('ARDUINO_i2c.c', 'ARDUINO_I2C_nI2C'),
('rpy_i2c.h', 'RPY_I2C_BASE_ADDR_LIST'),
('rpy_i2c.c', 'RPY_I2C_IRQ_LIST')]
If you really want a list of lists, you can convert the result of .groups()
to a list:
newlist = [list(regex.search(s).groups()) for s in a]
CodePudding user response:
I have created this code to get the exact result as you like but there could be more efficient ways too. I have split the values and used regex to get the needed result.
import re
a = [' 276ARDUINO_i2c.c:70:27: error: ‘ARDUINO_I2C_nI2C', '248rpy_i2c.h:76:40: error: ‘RPY_I2C_BASE_ADDR_LIST', ' 452rpy_i2c.c:79:77: error: ‘RPY_I2C_IRQ_LIST']
r=[]
for x in a:
d=x.split(": error: ‘")
r.append([re.sub("[0-9]{3}","",d[0].split(":")[0].strip()),d[1]])
print(r)
CodePudding user response:
We can't do this in list comprehension
easily. It's better to use for
loop here.
Like this:
# Your data
a = [' 276ARDUINO_i2c.c:70:27: error: ‘ARDUINO_I2C_nI2C', ' 248rpy_i2c.h:76:40: error: ‘RPY_I2C_BASE_ADDR_LIST', ' 452rpy_i2c.c:79:77: error: ‘RPY_I2C_IRQ_LIST']
# A list to hold your either dicts or lists
new = []
# For loop
for i in a:
# We can split using ': ' as it's consistent with all data.
# The only problem in this logic is that we will get word 'error' too, so we need to ignore it, thus use '_'.
# Next problem is that you've space at the start, so I used .strip to get rid of those.
name, _, error = i.strip().split(': ')
# Now since you don't need number at the start of name, we will use .lstrip() and provide all numbers!
name = name.lstrip('0123456789') # Every char that is in passed string in lstrip() method is used to remove.
# If you want list
new.append([name, error]
# Or if you want dict -> uncomment below & comment above
## new.append({name: error})
print(new)
# output:
[['ARDUINO_i2c.c:70:27', '‘ARDUINO_I2C_nI2C'], ['rpy_i2c.h:76:40', '‘RPY_I2C_BASE_ADDR_LIST'], ['rpy_i2c.c:79:77', '‘RPY_I2C_IRQ_LIST']]