I have an array of strings that is composed of number-like strings such as 010
. I am trying to build a 2D numpy array by creating an empty numpy array, and then filling in the rows with my array of strings. However, it seems like whenever I assign a row in the numpy array, it converts the number-like strings into numbers. The main issue with this behavior is that I am losing leading zeroes from my strings.
I wrote a simple example to show what is happening:
import numpy as np
num_rows = 5
arr = ["010", "011", "111", "100", "001"]
np_arr = np.empty((num_rows, len(arr)), dtype=str)
for i in range(len(np_arr)):
np_arr[i] = arr
print(np_arr)
The resulting output is:
[['0' '0' '1' '1' '0']
['0' '0' '1' '1' '0']
['0' '0' '1' '1' '0']
['0' '0' '1' '1' '0']
['0' '0' '1' '1' '0']]
vs. the expected output:
[['010' '011' '111' '100' '001']
['010' '011' '111' '100' '001']
['010' '011' '111' '100' '001']
['010' '011' '111' '100' '001']
['010' '011' '111' '100' '001']]
I do not understand this behavior and am hoping to find a solution to my problem and understand if this type conversion is being done by numpy or by Python. I have tried quite a few variations to this small example but have not found a working solution.
Thanks!
CodePudding user response:
The issue is in the type of the array: you need to set an array-protocol type string, like <U3
: if you change dtype=str
to dtype='<U3'
it will work.
CodePudding user response:
Here's a solution:
num_rows = 5
arr = ["010", "011", "111", "100", "001"]
# Turn your array into a numpy array with dtype string.
n = np.array(arr, dtype=str)
# Repeat the row as many times as needed.
n = np.tile(n, num_rows).reshape(num_rows, len(n))
Let me know if you have any questions.
A note for the future is that in most cases, you can replace for
loops with NumPy functions, which tend to be faster due to vectorisation.