I'm doing a PCA calculation on 4 very large arrays. these arrays include NaN cells. for the PCA to work I have to reshape the arrays into vectors (one dimension instead of two) and I delete all the NaN cells, which changes the length of the vectors. The PCA returns 4 new vectors that I need to reshape back to the exact same dimensions as the original arrays and that each cell goes back to its original index.
The NaN cells in the arrays aren't in any particular order they are random. I tried to arr.reshape(arr.shape[0]*arr.shape[1],1) to create the vectors with the NaN values.
saving the indexes of the NaN values then to delete them and run the PCA on the vectors and insert the NaN values and to reshape them again into the shape of the original arrays
*the arrays are all equal in dimensions(23292, 9120)
because of the size of the arrays it takes way too long to iterate over both to save the NaN indexes and to insert them after the PCA
if anybody has a better idea how I can restore the arrays it will be very appreciated Thank you
CodePudding user response:
I would use a mask image for this. For example:
arr = … # your input image
mask = ~np.isnan(arr)
vec = arr[mask] # 1D array with non-nan values
…
arr[mask] = vec # writing values back