Home > Software engineering >  Format Pandas Dataframe from 2D Array
Format Pandas Dataframe from 2D Array

Time:10-22

I have a problem where I am taking a 2D array and want to convert it to a pandas Dataframe. I will be taking this Dataframe and displaying in an excel spreadsheet.

I created my Dataframe like this: df = pd.DataFrame("twoDArray"). The 2D array I am transforming into a Dataframe is of length 8, and I named all the columns using the following code, df.columns = ["column1", column2", "column3", "column4", "column5", "column6", "column7", "column8"]

The Nested arrays are very long, and not always the same length. I want the nested arrays at each index to be an entire column on the Dataframe. So one row would be lst[0][0], lst[1][0], lst[2][0].

example:

Pandas seems to do this by default

    lst = [["hello", 1],["World", 3], ["Goodbye" , 5]]
    df = pd.DataFrame(lst)
         
   output:    
              column1   column2
    1        hello           1
    2        World           3
    3        Goodbye         5

but I want:

   lst = [["hello", 1, 2], ["World", 3], ["Goodbye", 5,6,7,"test"]]
   df = pd.DataFrame(lst)

     output:
           column1  column2 column3
    1       hello    World  Goodbye
    2           1        3        5
    3           2        -        6
    4           -        -        7
    5           -        -     test

Is this possible to do?

Thanks for the help.

CodePudding user response:

The following code solves your question:

lst = [["hello", 1, 2], ["World", 3], ["Goodbye", 5,6,7,"test"]]
df = pd.DataFrame(lst).T

Because the .T represents transposing the dataframe, which is what you were trying to do, instead of your lists being rows, they should be columns, and the remaining values are NaN

  • Related