Home > Software design >  numpy.ndarray to dataframe conversion - dimension issues
numpy.ndarray to dataframe conversion - dimension issues

Time:11-11

I am trying to convert an array into a df, with an index copied from another_df.

another_df:
        test1               test2               test3               test4               test5
test1   0.0                 0.8                 0.6                 0.6                 0.2857142857142857
test2   0.8                 0.0                 0.5                 1.0                 0.8571428571428571
test3   0.6                 0.5                 0.0                 1.0                 0.7142857142857143
test4   0.6                 1.0                 1.0                 0.0                 0.7142857142857143
test5   0.2857142857142857  0.8571428571428571  0.7142857142857143  0.7142857142857143  0.0

print (array)
[[ 0.23052147  0.03058967]
 [-0.54449458 -0.08481665]
 [-0.21274323 -0.39635658]
 [ 0.13880332  0.58125618]
 [ 0.38791301 -0.13067262]]

print (type(array))
<class 'numpy.ndarray'>

df = pd.DataFrame(array, 
                  index = another_df.index, 
                  columns = ['x','y'])

It does this fine - df is:

        x                   y
test1   0.2305214680511617  0.03058967262464556
test2   -0.544494575705709  -0.08481665342258861
test3   -0.2127432294443813 -0.396356582859552
test4   0.13880332309442767 0.5812561804072454
test5   0.38791301400450073 -0.13067261674975036

However, I also get ValueError: Shape of passed values is (5, 1), indices imply (5, 2). This is very confusing, as

(i) my function completes fine despite the error, which according to the stack trace happens before any return statements.

(ii) my array looks 2d, so I'm not sure why it's getting read as 1d (which is what looks like is happening).

Any ideas for the above, and can I ignore it as it seems to be returning OK?

edit - variable typo

CodePudding user response:

(i) my function completes fine despite the error, which according to the stack trace happens before any return statements.

Not possible if the statement raised an exception before setting the dataframe to df variable. This latter should be probably define earlier in your code. Try del df before df = pd.DataFrame(array, ...).

(ii) my array looks 2d, so I'm not sure why it's getting read as 1d (which is what looks like is happening).

Your data is really in 2d but this is not the problem. As suggested by @MycchakaKleinbort, you should check the shape of another_df index by using another_df.index.shape.

Else your code should work:

array = np.array([[ 0.23052147,  0.03058967],
                  [-0.54449458, -0.08481665],
                  [-0.21274323, -0.39635658],
                  [ 0.13880332,  0.58125618],
                  [ 0.38791301, -0.13067262]])

df = pd.DataFrame(array, 
                  index=['test1', 'test2', 'test3', 'test4', 'test5'], 
                  columns=['x', 'y'])
print(df)

# Output:
              x         y
test1  0.230521  0.030590
test2 -0.544495 -0.084817
test3 -0.212743 -0.396357
test4  0.138803  0.581256
test5  0.387913 -0.130673

The DataFrame shape:

>>> df.shape, df.index.shape, df.columns.shape
((5, 2), (5,), (2,))
  • Related