Home > Software engineering >  Converting a Numpy array with string data into integers after seperating
Converting a Numpy array with string data into integers after seperating

Time:11-23

I have numpy array like this:

myarray = array(['31,67,82', '31,72,82', '31,41,77']

I want to split the text by using comma as separator. Then I want them to be converted into an integer.

I tried,

a = list()
for x in myarray:
    a.append(np.char.split(x, sep =','))

It is working but when I want it to convert into integer by using astype(np.int) like,

a = list()
for x in myarray:
    a.append(np.char.split(x, sep =',').astype(np.int)

I faced with an error,

ValueError: setting an array element with a sequence.

How can I achieve that?

Thanks in advance!

My desired output is something like:

 np.array([[31,67,82] , [31,72,82],[31,41,77]])

CodePudding user response:

from this answer:

a = list()
for x in myarray:
    a.append(np.fromstring(x, dtype=np.int, sep=','))
np.array(a)

result:

array([[31, 67, 82],
       [31, 72, 82],
       [31, 41, 77]])

CodePudding user response:

It's probably easier to pre-process your input array rather than trying to modify the data types retrospectively. Something like this:

from numpy import array

def process(a):
    r = []
    for item in a:
        r.append([int(x) for x in item.split(',')])
    return r

myarray = array(process(['31,67,82', '31,72,82', '31,41,77']))
print(myarray)

CodePudding user response:

Using the map function on all the elements generated after using split function and casting the output into a list again would be appropriate and efficient. Try this:

a = list()
for x in myarray:
    a.append(list(map(int, x.split(','))))

CodePudding user response:

In [345]: myarray = np.array(['31,67,82', '31,72,82', '31,41,77'])
In [346]: myarray
Out[346]: array(['31,67,82', '31,72,82', '31,41,77'], dtype='<U8')

Applying char.split to the whole array (it actually does iterate on the strings and does astr.split:

In [347]: a = np.char.split(myarray, sep=',')
In [348]: a
Out[348]: 
array([list(['31', '67', '82']), list(['31', '72', '82']),
       list(['31', '41', '77'])], dtype=object)

We can't apply astype to that whole array:

In [349]: a.astype(int)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'

ValueError: setting an array element with a sequence.

But if we make a list, we can easily create an int array from that:

In [350]: a.tolist()
Out[350]: [['31', '67', '82'], ['31', '72', '82'], ['31', '41', '77']]
In [351]: np.array(a.tolist(), int)
Out[351]: 
array([[31, 67, 82],
       [31, 72, 82],
       [31, 41, 77]])

Equivalently, doing the string split directly:

In [352]: [astr.split(',') for astr in myarray]
Out[352]: [['31', '67', '82'], ['31', '72', '82'], ['31', '41', '77']]
In [353]: np.array([astr.split(',') for astr in myarray],int)
Out[353]: 
array([[31, 67, 82],
       [31, 72, 82],
       [31, 41, 77]])

Telling np.array to do the string to int conversion should be faster than a map(int...). But the splitting itself is inherently a string method.

  • Related