how to delete part of string element of numpy array python-CodePudding

I have a large array with this format

a = np.array([['#define', 'name_1', '(value1) /*comment 1*/'],
              ['#define', 'name_2', '(value2) /*comment 2*/'],
              ['#define', 'name_3', '(value3) /*comment 3*/'],
              ['#define', 'name_4', '(value4) /*comment 4*/']])

the strings from column 3 have comments and I just need to keep that 'value' part which is inside parenthesis i.e. (0x123) The output would look like this

[['#define', 'name_1', '(value1)'],
              ['#define', 'name_2', '(value2)'],
              ['#define', 'name_3', '(value3)'],
              ['#define', 'name_4', '(value4)']]

I would appreciate any help, thanks.

CodePudding user response：

Option 1: use .split() to split /* and keep only the first part which is value

for i in range(len(a)):
    a[i][2] = a[i][2].split("/*")[0].strip()

Option 2: use re.sub() to substitute the comment part that is surrounded by \**\ with ""

import re
for i in range(len(a)):
    a[i][2] = re.sub(r"/\*.*\*/", "", a[i][2]).strip()

output:

array([['#define', 'name_1', '(value1)'],
       ['#define', 'name_2', '(value2)'],
       ['#define', 'name_3', '(value3)'],
       ['#define', 'name_4', '(value4)']], dtype='<U22')

CodePudding user response：

Here is a vectorized one-liner without using any for loops, or nested NumPy operations. Just change the dtype for the specific column to <U8 -

a[:,-1] = a[:,-1].astype('<U8')   #<----

print(a)

array([['#define', 'name_1', '(value1)'],
       ['#define', 'name_2', '(value2)'],
       ['#define', 'name_3', '(value3)'],
       ['#define', 'name_4', '(value4)']], dtype='<U22')

Technically speaking, for your example, since your other strings are less than length 8, you could just use a.astype('<U8') but be careful while applying to the complete array.

a.astype('<U8')

array([['#define', 'name_1', '(value1)'],
       ['#define', 'name_2', '(value2)'],
       ['#define', 'name_3', '(value3)'],
       ['#define', 'name_4', '(value4)']], dtype='<U8')

But note, this will apply an 8-length restriction to all the other cells.

CodePudding user response：

You should use regex for extract (value) from '(value1) /*comment 1*/'.

For getting (value) with regex you should import re library in python.

import re
import numpy as np 

pattern = re.compile(r'\((.*?)\)')

a = np.array([['#define', 'name_1', '(val1) /*comment 1*/'],
              ['#define', 'name_2', '(val2) /*comment 2*/'],
              ['#define', 'name_3', '(val3) /*comment 3*/'],
              ['#define', 'name_4', '(val4) /*comment 4*/']])

for row in a:
    new = re.match(pattern, row[2]).group(1)
    row[2] = f'({new})'

print(a)

output

[['#define' 'name_1' '(val1)']
 ['#define' 'name_2' '(val2)']
 ['#define' 'name_3' '(val3)']
 ['#define' 'name_4' '(val4)']]