Home > Software design >  How to convert list of lists into a dataframe?(pandas)
How to convert list of lists into a dataframe?(pandas)

Time:12-16

I want to convert these lists into a dataframe:

col1 = ['', '8465-01-446-8616', '8465-01-538-4334', '8345-01-641-8205', '8465-01-690-3107', '8465-01-526-7827', '', '', '5120-00-097-4136', '6640-00-866-1645']

col2 = [[], ['06222'], ['0N995', '12858'
], ['58367', '58367'], ['4SEN6'], ['0CCM0'], [], [], ['82523', '8BXN8'], ['1KGE8', '725Q0']]

col3 = [[], ['MODEL 703'], ['PMBS9', 'BS-9'], ['24425HBA', '37210'], ['MA66-002'], ['03485'], [], [], ['100250', '5080'], ['09-845C', '4716E20']]


col4 = [[], [''], ['AARDVARK', 'PAULSON MANUFACTURING CORPORAT
ION'], ['WILLARD MARINE, INC.', 'WILLARD MARINE, INC.'], ['CONDOR OUTDOOR PRODUCTS, INC.'], ['CASCADE DESIGNS, INC.'], [], [], ['', 'TELEFLEX LLC'], ['FISHER SCI
ENTIFIC COMPANY L.L.C.', 'THOMAS SCIENTIFIC, LLC']]

Expected dataframe:

col1             col2     col3      col4

8465-01-446-8616 06222  MODEL 703   
8465-01-538-4334 0N995  PMBS9      AARDVARK
8465-01-538-4334 12858  BS-9       PAULSON MANUFACTURING CORPORATION
8345-01-641-8205 58367  24425HBA   WILLARD MARINE, INC.
8345-01-641-8205 58367  37210      WILLARD MARINE, INC.
.....................................
...................................

Note: First row is empty since, all the first elements are empty.

CodePudding user response:

The constructor of DataFrame can take a dictionary col_name -> col_values. So you could use:

df = pd.DataFrame({'col1': col1, 'col2': col2, 'col3': col3, 'col4': col4})

Once this is done, recent versions of Pandas (>= 1.3.0 while current is 1.3.5) allow multi-columns explode:

df = df.explode(['col2', 'col3', 'col4'])

giving:

               col1   col2       col3                               col4
0                      NaN        NaN                                NaN
1  8465-01-446-8616  06222  MODEL 703                                   
2  8465-01-538-4334  0N995      PMBS9                           AARDVARK
2  8465-01-538-4334  12858       BS-9  PAULSON MANUFACTURING CORPORATION
3  8345-01-641-8205  58367   24425HBA               WILLARD MARINE, INC.
3  8345-01-641-8205  58367      37210               WILLARD MARINE, INC.
4  8465-01-690-3107  4SEN6   MA66-002      CONDOR OUTDOOR PRODUCTS, INC.
5  8465-01-526-7827  0CCM0      03485              CASCADE DESIGNS, INC.
6                      NaN        NaN                                NaN
7                      NaN        NaN                                NaN
8  5120-00-097-4136  82523     100250                                   
8  5120-00-097-4136  8BXN8       5080                       TELEFLEX LLC
9  6640-00-866-1645  1KGE8    09-845C   FISHER SCIENTIFIC COMPANY L.L.C.
9  6640-00-866-1645  725Q0    4716E20             THOMAS SCIENTIFIC, LLC

CodePudding user response:

One option is to unnest the lists using sum(col1, []) then to insert them into a dataframe like so :

import pandas as pd

df = pd.DataFrame(
        {'col1': sum(col1, []),
         'col2': sum(col2, []),
         'col3': sum(col3, [])
        })

Note: As brought up in the comments, the sum() method removes empty slots, if those need to be kept you could use a list comprehension like so instead of the sum :

col1 = [v for s in col1 for v in (s or [''])]

you could then do :

df = pd.DataFrame(
        {'col1': col1,
         'col2': col2,
         'col3': col3
        })

Provided you have applied the list comprehension step on each list.

  • Related