I have a large array with this format
a = np.array([['#define', 'name_1', '(value1) /*comment 1*/'],
['#define', 'name_2', '(value2) /*comment 2*/'],
['#define', 'name_3', '(value3) /*comment 3*/'],
['#define', 'name_4', '(value4) /*comment 4*/']])
the strings from column 3 have comments and I just need to keep that 'value' part which is inside parenthesis i.e. (0x123) The output would look like this
[['#define', 'name_1', '(value1)'],
['#define', 'name_2', '(value2)'],
['#define', 'name_3', '(value3)'],
['#define', 'name_4', '(value4)']]
I would appreciate any help, thanks.
CodePudding user response:
Option 1: use .split()
to split /*
and keep only the first part which is value
for i in range(len(a)):
a[i][2] = a[i][2].split("/*")[0].strip()
Option 2: use re.sub()
to substitute the comment part that is surrounded by \**\
with ""
import re
for i in range(len(a)):
a[i][2] = re.sub(r"/\*.*\*/", "", a[i][2]).strip()
output:
array([['#define', 'name_1', '(value1)'],
['#define', 'name_2', '(value2)'],
['#define', 'name_3', '(value3)'],
['#define', 'name_4', '(value4)']], dtype='<U22')
CodePudding user response:
Here is a vectorized one-liner without using any for loops, or nested NumPy operations. Just change the dtype for the specific column to <U8
-
a[:,-1] = a[:,-1].astype('<U8') #<----
print(a)
array([['#define', 'name_1', '(value1)'],
['#define', 'name_2', '(value2)'],
['#define', 'name_3', '(value3)'],
['#define', 'name_4', '(value4)']], dtype='<U22')
Technically speaking, for your example, since your other strings are less than length 8, you could just use a.astype('<U8')
but be careful while applying to the complete array.
a.astype('<U8')
array([['#define', 'name_1', '(value1)'],
['#define', 'name_2', '(value2)'],
['#define', 'name_3', '(value3)'],
['#define', 'name_4', '(value4)']], dtype='<U8')
But note, this will apply an 8-length restriction to all the other cells.
CodePudding user response:
You should use regex for extract (value)
from '(value1) /*comment 1*/'
.
For getting (value)
with regex you should import re
library in python.
import re
import numpy as np
pattern = re.compile(r'\((.*?)\)')
a = np.array([['#define', 'name_1', '(val1) /*comment 1*/'],
['#define', 'name_2', '(val2) /*comment 2*/'],
['#define', 'name_3', '(val3) /*comment 3*/'],
['#define', 'name_4', '(val4) /*comment 4*/']])
for row in a:
new = re.match(pattern, row[2]).group(1)
row[2] = f'({new})'
print(a)
output
[['#define' 'name_1' '(val1)']
['#define' 'name_2' '(val2)']
['#define' 'name_3' '(val3)']
['#define' 'name_4' '(val4)']]