I've been strugling to create a sub-array from specific elements of a first array.
Given a first array that looks like this (it commes from a txt file with two lines :
L1,(B:A:3:1),(A:C:5:2),(C:D:2:3)
L2,(C:E:2:0.5),(E:F:10:1),(F:D:0.5:0.5)):
code
toto = pd.read_csv("bd_2_test.txt",delimiter=",",header=None,names=["Line","1rst","2nd","3rd"])
matrix_toto = toto.values
matrix_toto
result
Line 1rst 2nd 3rd
0 L1 (B:A:3:1) (A:C:5:2) (C:D:2:3)
1 L2 (C:E:2:0.5) (E:F:10:1) (F:D:0.5:0.5)
how can I transform it into an array like this one?
array([['B', 'A', 3, 1],
['A', 'C', 5, 2],
['C', 'D', 2, 3],
['C', 'E', 2, 0.5],
['E', 'F', 10, 1],
['F', 'D', 0.5, 0.5]], dtype=object)
I tried vectorizing but I get each second element of the array.
np.vectorize(lambda s: s[1])(matrice_toto)
array([['1', 'B', 'A', 'C'],
['2', 'C', 'E', 'F']], dtype='<U1')
CodePudding user response:
I am not sure what you are trying is the optimal solution to your real problem. But, well, staying as close as possible to your initial try
# We need regular expression to transform a string of ``"(x:y:z:t)"`` into an array``["x","y","z","t"]``
import re
# tr does that transformation
tr=lambda s: np.array(re.findall('\(([^:]*):([^:]*):([^:]*):([^:]*)\)', s)[0])
# Alternative version, without re (and maybe best, I've benchmarked it)
tr=lambda s: s[1:-1].split(':') # s[1:-1] remove 1st and last char, so parenthesis. And .split(':') creates an array for substring separated by colons.
# trv is the vectorization of tr
# We need the signature, because the return type is an array itself.
trv=np.vectorize(tr, signature='()->(n)')
result=trv(matrix_toto[:,1:].flatten())
Note that matrix_toto[:,1:]
is your matrix, without the 1st column (the line name). And matrix_toto[:,1:].flatten()
flatten it, so we have 1 entry per cell of your initial array (excluding line name). Each of those cell is a string "(x:y:z:t)"
. Which is transformed by trv into an array.
Result is
array([['B', 'A', '3', '1'],
['A', 'C', '5', '2'],
['C', 'D', '2', '3'],
['C', 'E', '2', '0'],
['E', 'F', '1', '1'],
['F', 'D', '0', '0']], dtype='<U1')
Obviously you need only one of the 2 lines tr=...
. I let both in the code, because I don't know the exact specification of those (x:y:z:t)
patterns, so you may need to adapt from one of the 2 versions.