pandas: convert list of lists to dataframe-CodePudding

I have a list of lists containing 12 elements in 10 items. I would like to obtain a dataframe with 12 columns and 10 rows. The twelve comma separated items are treated as one column by pd.DataFrame(). The apostrophe are part of the list to indicate a string but I suspect it is interpreted by the DataFrame function as the column boundaries. They cannot be replaced. How can this be done? What is causing this behaviour? Here is the sample data:

[['1,er,2,Fado de Padd,1\'18"1,H,6,2600,J. Dekker,17 490 €,A. De Wrede,1,6'],
 ['2,e,7,Elixir Normand,1\'18"2,H,7,2600,S. Schoonhoven,24 755 €,S. Schoonhoven,14'],
 ['3,e,3,Give You All of Me,1\'18"2,H,5,2600,JF. Van Dooyeweerd,17 600 €,JF. Van Dooyeweerd,10'],
 ['4,e,4,Gouritch,1\'18"3,H,5,2600,BJ. Crebas,20 700 €,BJ. Crebas,32'],
 ['5,e,1,Franky du Cap Vert,1\'18"4,H,6,2600,JH. Mieras,15 536 €,N. De Vreede,65'],
 ['6,e,10,Défi Magik,1\'18"0,H,8,2620,F. Verkaik,44 865 €,AW. Bosscha,6,3'],
 ['7,e,9,Fleuron,1\'18"2,H,6,2620,M. Brouwer,44 830 €,D. Brouwer,7,3'],
 ['8,e,8,Dream Gibus,1\'18"6,H,8,2620,R. Ebbinge,33 330 €,Mme A. Lehmann,36'],
 ['9,e,5,Beau Gaillard,1\'19"5,H,10,2600,A. Bakker,20 140 €,N. De Vreede,44'],
 ['0,DAI,6,Bikini de Larcy,H,10,2600,D. Den Dubbelden,21 834 €,N. Rip,52']]

Any help welcome.

CodePudding user response：

If use only split it working well but last row is mismatched, so all values from column 4 are shifted:

df = pd.DataFrame([y.split(',') for x in L for y in x])

df.iloc[-1, 4:] = df.iloc[-1, 4:].shift()

print (df)
  0    1   2                   3       4  5   6     7                   8   \
0  1   er   2        Fado de Padd  1'18"1  H   6  2600           J. Dekker   
1  2    e   7      Elixir Normand  1'18"2  H   7  2600      S. Schoonhoven   
2  3    e   3  Give You All of Me  1'18"2  H   5  2600  JF. Van Dooyeweerd   
3  4    e   4            Gouritch  1'18"3  H   5  2600          BJ. Crebas   
4  5    e   1  Franky du Cap Vert  1'18"4  H   6  2600          JH. Mieras   
5  6    e  10          Défi Magik  1'18"0  H   8  2620          F. Verkaik   
6  7    e   9             Fleuron  1'18"2  H   6  2620          M. Brouwer   
7  8    e   8         Dream Gibus  1'18"6  H   8  2620          R. Ebbinge   
8  9    e   5       Beau Gaillard  1'19"5  H  10  2600           A. Bakker   
9  0  DAI   6     Bikini de Larcy     NaN  H  10  2600    D. Den Dubbelden   

         9                   10  11    12  
0  17 490 €         A. De Wrede   1     6  
1  24 755 €      S. Schoonhoven  14  None  
2  17 600 €  JF. Van Dooyeweerd  10  None  
3  20 700 €          BJ. Crebas  32  None  
4  15 536 €        N. De Vreede  65  None  
5  44 865 €         AW. Bosscha   6     3  
6  44 830 €          D. Brouwer   7     3  
7  33 330 €      Mme A. Lehmann  36  None  
8  20 140 €        N. De Vreede  44  None  
9  21 834 €              N. Rip  52  None

CodePudding user response：

The apostrophe means that the data is string type in the list, but can be extracted as the first element using my_list[0]. Need to process each list using list comprehension before putting into the dataframe.

There seems some typo (missing coordinates) in the last line of data, so I corrected it by adding 'null'.

import pandas as pd

data = [['1,er,2,Fado de Padd,1\'18"1,H,6,2600,J. Dekker,17 490 €,A. De Wrede,1,6'],
 ['2,e,7,Elixir Normand,1\'18"2,H,7,2600,S. Schoonhoven,24 755 €,S. Schoonhoven,14'],
 ['3,e,3,Give You All of Me,1\'18"2,H,5,2600,JF. Van Dooyeweerd,17 600 €,JF. Van Dooyeweerd,10'],
 ['4,e,4,Gouritch,1\'18"3,H,5,2600,BJ. Crebas,20 700 €,BJ. Crebas,32'],
 ['5,e,1,Franky du Cap Vert,1\'18"4,H,6,2600,JH. Mieras,15 536 €,N. De Vreede,65'],
 ['6,e,10,Défi Magik,1\'18"0,H,8,2620,F. Verkaik,44 865 €,AW. Bosscha,6,3'],
 ['7,e,9,Fleuron,1\'18"2,H,6,2620,M. Brouwer,44 830 €,D. Brouwer,7,3'],
 ['8,e,8,Dream Gibus,1\'18"6,H,8,2620,R. Ebbinge,33 330 €,Mme A. Lehmann,36'],
 ['9,e,5,Beau Gaillard,1\'19"5,H,10,2600,A. Bakker,20 140 €,N. De Vreede,44'],
 ['0,DAI,6,Bikini de Larcy,null,H,10,2600,D. Den Dubbelden,21 834 €,N. Rip,52']]

df = pd.DataFrame([line[0].split(',') for line in data])
print(df)

Output

   0    1   2                   3       4  5   6     7                   8   \
0  1   er   2        Fado de Padd  1'18"1  H   6  2600           J. Dekker   
1  2    e   7      Elixir Normand  1'18"2  H   7  2600      S. Schoonhoven   
2  3    e   3  Give You All of Me  1'18"2  H   5  2600  JF. Van Dooyeweerd   
3  4    e   4            Gouritch  1'18"3  H   5  2600          BJ. Crebas   
4  5    e   1  Franky du Cap Vert  1'18"4  H   6  2600          JH. Mieras   
5  6    e  10          Défi Magik  1'18"0  H   8  2620          F. Verkaik   
6  7    e   9             Fleuron  1'18"2  H   6  2620          M. Brouwer   
7  8    e   8         Dream Gibus  1'18"6  H   8  2620          R. Ebbinge   
8  9    e   5       Beau Gaillard  1'19"5  H  10  2600           A. Bakker   
9  0  DAI   6     Bikini de Larcy    null  H  10  2600    D. Den Dubbelden   

          9                  10  11    12  
0  17 490 €         A. De Wrede   1     6  
1  24 755 €      S. Schoonhoven  14  None  
2  17 600 €  JF. Van Dooyeweerd  10  None  
3  20 700 €          BJ. Crebas  32  None  
4  15 536 €        N. De Vreede  65  None  
5  44 865 €         AW. Bosscha   6     3  
6  44 830 €          D. Brouwer   7     3  
7  33 330 €      Mme A. Lehmann  36  None  
8  20 140 €        N. De Vreede  44  None  
9  21 834 €              N. Rip  52  None

Another way with the same output:

df = pd.DataFrame(data)[0].str.split(',', expand=True)