Create a new numpy array from elements of another numpy array-CodePudding

I've been strugling to create a sub-array from specific elements of a first array.

Given a first array that looks like this (it commes from a txt file with two lines :

L1,(B:A:3:1),(A:C:5:2),(C:D:2:3)
L2,(C:E:2:0.5),(E:F:10:1),(F:D:0.5:0.5)):

code

toto = pd.read_csv("bd_2_test.txt",delimiter=",",header=None,names=["Line","1rst","2nd","3rd"])
matrix_toto = toto.values
matrix_toto

result

    Line    1rst    2nd 3rd
0   L1  (B:A:3:1)   (A:C:5:2)   (C:D:2:3)
1   L2  (C:E:2:0.5) (E:F:10:1)  (F:D:0.5:0.5)

how can I transform it into an array like this one?

array([['B', 'A', 3, 1],
       ['A', 'C', 5, 2],
       ['C', 'D', 2, 3],
       ['C', 'E', 2, 0.5],
       ['E', 'F', 10, 1],
       ['F', 'D', 0.5, 0.5]], dtype=object)

I tried vectorizing but I get each second element of the array.

np.vectorize(lambda s: s[1])(matrice_toto)

array([['1', 'B', 'A', 'C'],
       ['2', 'C', 'E', 'F']], dtype='<U1')

CodePudding user response：

I am not sure what you are trying is the optimal solution to your real problem. But, well, staying as close as possible to your initial try

# We need regular expression to transform a string of ``"(x:y:z:t)"`` into an array``["x","y","z","t"]``
import re
# tr does that transformation
tr=lambda s: np.array(re.findall('\(([^:]*):([^:]*):([^:]*):([^:]*)\)', s)[0])
# Alternative version, without re (and maybe best, I've benchmarked it)
tr=lambda s: s[1:-1].split(':') # s[1:-1] remove 1st and last char, so parenthesis. And .split(':') creates an array for substring separated by colons.
# trv is the vectorization of tr
# We need the signature, because the return type is an array itself.
trv=np.vectorize(tr, signature='()->(n)')
result=trv(matrix_toto[:,1:].flatten())

Note that matrix_toto[:,1:] is your matrix, without the 1st column (the line name). And matrix_toto[:,1:].flatten() flatten it, so we have 1 entry per cell of your initial array (excluding line name). Each of those cell is a string "(x:y:z:t)". Which is transformed by trv into an array.

Result is

array([['B', 'A', '3', '1'],
       ['A', 'C', '5', '2'],
       ['C', 'D', '2', '3'],
       ['C', 'E', '2', '0'],
       ['E', 'F', '1', '1'],
       ['F', 'D', '0', '0']], dtype='<U1')

Obviously you need only one of the 2 lines tr=.... I let both in the code, because I don't know the exact specification of those (x:y:z:t) patterns, so you may need to adapt from one of the 2 versions.