So i'm simply trying to add certain strings to a NxN array(matrix) using a for loop. First I create an empty 2D array using np.empty to later fill up with my values:
matrixK = np.empty(((imgL.shape[0], imgL.shape[1])),dtype=str)
for i in range(imgL.shape[0]):
for j in range(imgL.shape[1]):
matrixK[i][j] = 'ab' 'cd'
When I run this code I get a correct NxN array matrixK that is however only filled with 'a' at all indexes instead of 'abcd'. So basically always only the first character of the string instead of the whole string. I suppose something is wrong with the data type of the array but don't know what since I specify the datatype as str
For now i'm just trying to fill the array with strings like 'ab' 'cd' for testing, in practice these would be strings taken from other arrays
CodePudding user response:
Since numpy arrays allocate memory for the values in the array, your empty strings are allocated as dtype='<U1'
, which is not enough to store strings with length >= 2.
You need to use object
instead of str
, which will allocate memory for PyObject
pointers in the array where the data type is a python object that can be dereferenced to be the full python string you intended.
matrixK = np.empty(((imgL.shape[0], imgL.shape[1])), dtype=object)
EDIT
If you know the size of the strings beforehand, you could also use <U4
to allocate memory to hold up to 4 characters in each position.
matrixK = np.empty(((imgL.shape[0], imgL.shape[1])), dtype='<U4')
As for why using dtype=str
doesn't work, numpy is trying to be smart in not using PyObject
s in the array so the elements are contiguous in memory. This is numpy's behaviour as its primary use case is for storing numerical values where matrix operations are more efficient when values are contiguous in memory instead of having to dereference PyObject
pointers (Python's default behavior). But since arrays need to have a data type for the elements for the purpose of memory allocation, numpy infers from the input and represents it as <U1
for your case of empty strings.
In a lower-level language, you'd face the same issue of having to either allocate a pointer to the char array/string or allocate for strings with a predefined length.