I just started working with numpy arrays in conjunction with panda dataframes, and I am working on a practice project, but have hit a bit of a problem. I have a panda dataframe which I pass the rows of to a function to do some work on it. The function takes in two different arrays one labeled best and worst and then creates a new vector to compare sums against. From there it will return either the current array that the pandas.apply has passed or it will return the new vector based on which sum() is the lowest. This creates a new python array which needs to be a matrix of 20x5 at the end. The function works fine, but the returned dataframe needs to be converted to a python array of size (20 x 5) for further work, which when np.array() is called, it converts it into an array of size (20,). I figured just using .reshape(20,5) would work since it has enough elements to work with, but it does not, it just fails on run. Any help is appreciated as I can't find anything that's helping me understand why this is happening.
(the error, as many could guess by reading above, is: "cannot reshape array of size 20 into shape (20,5)" )
code except from my program that shows it (can run on it's own):
import numpy as np
import pandas as pd
rng = np.random.default_rng(seed=22)
df = pd.DataFrame(rng.random((20,5)))
def new_vectors(current, best, worst):
#convert current to numpy array
current = current.to_numpy()
#construct a new vector to check
new = np.add(current, np.subtract((rng.random()*(np.subtract(best, np.absolute(current)))), ((rng.random()*(np.subtract(worst, np.absolute(current)))))))
#get the new sum for the new and old vectors
summed = current.sum()
newsummed = new.sum()
#return the smallest one
return np.add(((newsummed < summed)*(new)), ((newsummed > summed)*(current))).flatten()
z = np.array(df.apply(new_vectors, args=(df.iloc[0].to_numpy(), df.iloc[11].to_numpy()), axis=1))
z.reshape(20,5) #I know reshape() creates a copy, just here to show it doesn't work regardless
CodePudding user response:
You can do the reshape manually.
Delete
z.reshape(20,5)
. This is not going to work with an array of arrays.After applying the function, use this instead:
# Create a empty matrix with desired size matrix = np.zeros(shape=(20,5)) # Iterate over z and assign each array to a row in the numpy matrix. for i,arr in enumerate(z): matrix[i] = arr
If you don't know the desired size for the matrix. Create the matrix as matrix = np.zeros(shape=df.shape)
.
All the code used:
import numpy as np
import pandas as pd
rng = np.random.default_rng(seed=22)
df = pd.DataFrame(rng.random((20,5)))
def new_vectors(current, best, worst):
#convert current to numpy array
current = current.to_numpy()
#construct a new vector to check
new = np.add(current, np.subtract((rng.random()*(np.subtract(best, np.absolute(current)))), ((rng.random()*(np.subtract(worst, np.absolute(current)))))))
#get the new sum for the new and old vectors
summed = current.sum()
newsummed = new.sum()
#return the smallest one
return np.add(((newsummed < summed)*(new)), ((newsummed > summed)*(current))).flatten()
z = np.array(df.apply(new_vectors, args=(df.iloc[0].to_numpy(), df.iloc[11].to_numpy()), axis=1))
matrix = np.zeros(shape=df.shape)
for i,arr in enumerate(z):
matrix[i] = arr