I am trying to convert an array into a df
, with an index copied from another_df
.
another_df:
test1 test2 test3 test4 test5
test1 0.0 0.8 0.6 0.6 0.2857142857142857
test2 0.8 0.0 0.5 1.0 0.8571428571428571
test3 0.6 0.5 0.0 1.0 0.7142857142857143
test4 0.6 1.0 1.0 0.0 0.7142857142857143
test5 0.2857142857142857 0.8571428571428571 0.7142857142857143 0.7142857142857143 0.0
print (array)
[[ 0.23052147 0.03058967]
[-0.54449458 -0.08481665]
[-0.21274323 -0.39635658]
[ 0.13880332 0.58125618]
[ 0.38791301 -0.13067262]]
print (type(array))
<class 'numpy.ndarray'>
df = pd.DataFrame(array,
index = another_df.index,
columns = ['x','y'])
It does this fine - df
is:
x y
test1 0.2305214680511617 0.03058967262464556
test2 -0.544494575705709 -0.08481665342258861
test3 -0.2127432294443813 -0.396356582859552
test4 0.13880332309442767 0.5812561804072454
test5 0.38791301400450073 -0.13067261674975036
However, I also get ValueError: Shape of passed values is (5, 1), indices imply (5, 2)
. This is very confusing, as
(i) my function completes fine despite the error, which according to the stack trace happens before any return statements.
(ii) my array looks 2d, so I'm not sure why it's getting read as 1d (which is what looks like is happening).
Any ideas for the above, and can I ignore it as it seems to be returning OK?
edit - variable typo
CodePudding user response:
(i) my function completes fine despite the error, which according to the stack trace happens before any return statements.
Not possible if the statement raised an exception before setting the dataframe to df
variable. This latter should be probably define earlier in your code. Try del df
before df = pd.DataFrame(array, ...)
.
(ii) my array looks 2d, so I'm not sure why it's getting read as 1d (which is what looks like is happening).
Your data is really in 2d but this is not the problem. As suggested by @MycchakaKleinbort, you should check the shape of another_df
index by using another_df.index.shape
.
Else your code should work:
array = np.array([[ 0.23052147, 0.03058967],
[-0.54449458, -0.08481665],
[-0.21274323, -0.39635658],
[ 0.13880332, 0.58125618],
[ 0.38791301, -0.13067262]])
df = pd.DataFrame(array,
index=['test1', 'test2', 'test3', 'test4', 'test5'],
columns=['x', 'y'])
print(df)
# Output:
x y
test1 0.230521 0.030590
test2 -0.544495 -0.084817
test3 -0.212743 -0.396357
test4 0.138803 0.581256
test5 0.387913 -0.130673
The DataFrame shape:
>>> df.shape, df.index.shape, df.columns.shape
((5, 2), (5,), (2,))