Numpy automatically converting array of strings to array of numbers-CodePudding

I have an array of strings that is composed of number-like strings such as 010. I am trying to build a 2D numpy array by creating an empty numpy array, and then filling in the rows with my array of strings. However, it seems like whenever I assign a row in the numpy array, it converts the number-like strings into numbers. The main issue with this behavior is that I am losing leading zeroes from my strings.

I wrote a simple example to show what is happening:

import numpy as np

num_rows = 5
arr = ["010", "011", "111", "100", "001"]
np_arr = np.empty((num_rows, len(arr)), dtype=str)

for i in range(len(np_arr)):
    np_arr[i] = arr

print(np_arr)

The resulting output is:

[['0' '0' '1' '1' '0']
 ['0' '0' '1' '1' '0']
 ['0' '0' '1' '1' '0']
 ['0' '0' '1' '1' '0']
 ['0' '0' '1' '1' '0']]

vs. the expected output:

[['010' '011' '111' '100' '001']
 ['010' '011' '111' '100' '001']
 ['010' '011' '111' '100' '001']
 ['010' '011' '111' '100' '001']
 ['010' '011' '111' '100' '001']]

I do not understand this behavior and am hoping to find a solution to my problem and understand if this type conversion is being done by numpy or by Python. I have tried quite a few variations to this small example but have not found a working solution.

Thanks!

CodePudding user response：

The issue is in the type of the array: you need to set an array-protocol type string, like <U3: if you change dtype=str to dtype='<U3' it will work.

CodePudding user response：

Here's a solution:

num_rows = 5
arr = ["010", "011", "111", "100", "001"]

# Turn your array into a numpy array with dtype string.
n = np.array(arr, dtype=str)

# Repeat the row as many times as needed.
n = np.tile(n, num_rows).reshape(num_rows, len(n))

Let me know if you have any questions.

A note for the future is that in most cases, you can replace for loops with NumPy functions, which tend to be faster due to vectorisation.