I have ndarray like this. I am writing it to a dataframe, saving as a pickle, reading that pickle, and then creating new array again. Why does np.array_equal(my_array2,X_train)
return false? i tried to debug and have written some code to understand the problem but having a hard time
How should I change the code so that both arrays match?
X_train=array([[" I I want to know how much s it thank you"],
[" press any key to connect P Thank you Too <unk> I "]],
dtype='<U97064')
X_train
X_train[0]
#array([[' I I want to know how much s it thank you'],
[' press any key to connect P Thank you Too <unk> I ']],
dtype='<U97064')
df = pd.DataFrame(X_train, columns = ['Column_A'])
df.to_pickle('df.pkl')
df2 = pd.read_pickle('df.pkl')
my_array2= df2['Column_A'].to_numpy(dtype='<U97064')
np.array_equal(my_array2[0],X_train[0])
#false
np.array_equal(my_array2,X_train)
#false
type of arrays match
print (type(my_array2))
print (type(X_train))
#<class 'numpy.ndarray'>
#<class 'numpy.ndarray'>
but individual members dont match
#not sure why datatype of individual elements is different
print (type(my_array2[0]))
print (type(X_train[0]))
#<class 'numpy.str_'>
#<class 'numpy.ndarray'>
X_train.dtype
#dtype('<U97064')
type(X_train.dtype)
#numpy.dtype
CodePudding user response:
In your code,
X_train[0]
is itself an array while my_array2[0]
is a string.
print(X_train[0])
>>array([' I I want to know how much s it thank you'], dtype='<U97064')
print(my_array2[0])
>>' I I want to know how much s it thank you'
If you want my_array2
to be a numpy array of shape (2,1)
the shape same as X_train
, add .reshape(2,1)
.
my_array2= df2['Column_A'].to_numpy(dtype='<U97064').reshape(2,1)
print(np.array_equal(my_array2[0],X_train[0]))
>>true
print(np.array_equal(my_array2,X_train))
>>true