I want to convert these lists into a dataframe:
col1 = ['', '8465-01-446-8616', '8465-01-538-4334', '8345-01-641-8205', '8465-01-690-3107', '8465-01-526-7827', '', '', '5120-00-097-4136', '6640-00-866-1645']
col2 = [[], ['06222'], ['0N995', '12858'
], ['58367', '58367'], ['4SEN6'], ['0CCM0'], [], [], ['82523', '8BXN8'], ['1KGE8', '725Q0']]
col3 = [[], ['MODEL 703'], ['PMBS9', 'BS-9'], ['24425HBA', '37210'], ['MA66-002'], ['03485'], [], [], ['100250', '5080'], ['09-845C', '4716E20']]
col4 = [[], [''], ['AARDVARK', 'PAULSON MANUFACTURING CORPORAT
ION'], ['WILLARD MARINE, INC.', 'WILLARD MARINE, INC.'], ['CONDOR OUTDOOR PRODUCTS, INC.'], ['CASCADE DESIGNS, INC.'], [], [], ['', 'TELEFLEX LLC'], ['FISHER SCI
ENTIFIC COMPANY L.L.C.', 'THOMAS SCIENTIFIC, LLC']]
Expected dataframe:
col1 col2 col3 col4
8465-01-446-8616 06222 MODEL 703
8465-01-538-4334 0N995 PMBS9 AARDVARK
8465-01-538-4334 12858 BS-9 PAULSON MANUFACTURING CORPORATION
8345-01-641-8205 58367 24425HBA WILLARD MARINE, INC.
8345-01-641-8205 58367 37210 WILLARD MARINE, INC.
.....................................
...................................
Note: First row is empty since, all the first elements are empty.
CodePudding user response:
The constructor of DataFrame
can take a dictionary col_name -> col_values. So you could use:
df = pd.DataFrame({'col1': col1, 'col2': col2, 'col3': col3, 'col4': col4})
Once this is done, recent versions of Pandas (>= 1.3.0 while current is 1.3.5) allow multi-columns explode:
df = df.explode(['col2', 'col3', 'col4'])
giving:
col1 col2 col3 col4
0 NaN NaN NaN
1 8465-01-446-8616 06222 MODEL 703
2 8465-01-538-4334 0N995 PMBS9 AARDVARK
2 8465-01-538-4334 12858 BS-9 PAULSON MANUFACTURING CORPORATION
3 8345-01-641-8205 58367 24425HBA WILLARD MARINE, INC.
3 8345-01-641-8205 58367 37210 WILLARD MARINE, INC.
4 8465-01-690-3107 4SEN6 MA66-002 CONDOR OUTDOOR PRODUCTS, INC.
5 8465-01-526-7827 0CCM0 03485 CASCADE DESIGNS, INC.
6 NaN NaN NaN
7 NaN NaN NaN
8 5120-00-097-4136 82523 100250
8 5120-00-097-4136 8BXN8 5080 TELEFLEX LLC
9 6640-00-866-1645 1KGE8 09-845C FISHER SCIENTIFIC COMPANY L.L.C.
9 6640-00-866-1645 725Q0 4716E20 THOMAS SCIENTIFIC, LLC
CodePudding user response:
One option is to unnest the lists using sum(col1, [])
then to insert them into a dataframe like so :
import pandas as pd
df = pd.DataFrame(
{'col1': sum(col1, []),
'col2': sum(col2, []),
'col3': sum(col3, [])
})
Note: As brought up in the comments, the sum()
method removes empty slots, if those need to be kept you could use a list comprehension like so instead of the sum :
col1 = [v for s in col1 for v in (s or [''])]
you could then do :
df = pd.DataFrame(
{'col1': col1,
'col2': col2,
'col3': col3
})
Provided you have applied the list comprehension step on each list.