Convert a list into a DataFrame-CodePudding

I need to convert my list into a 3 column pandas DataFrame (time, id and ret_exc_lead1m).

However it looks like this:

 [time        id            
 2010-01     comp_001661_01W   -0.041371
             comp_002410_04W   -0.053836
             comp_004367_02W    0.024752
             comp_004439_08W    0.013136
             comp_011749_09W   -0.013695
             comp_011925_01W    0.043677
 2010-02     comp_001661_01W   -0.041371
             comp_012384_14W   -0.014593
             comp_013498_01W    0.060748
             comp_015321_02W   -0.053604
             comp_015334_02W   -0.155894
 2010-03     comp_001661_01W   -0.041371
             comp_015532_10W    0.003835
             comp_015575_01W   -0.045820
             comp_015576_01W    0.032070
             comp_015598_03W    0.028164
             comp_015617_02W   -0.053060
             comp_015634_05W    0.102842
             comp_018636_04W   -0.029271
 2010-04     comp_001661_01W   -0.041371
             comp_019349_01W   -0.048753
             comp_019565_13W   -0.007516
             comp_025648_05W   -0.015128
             comp_029097_01W    0.085202
             comp_029804_04W   -0.011097
 2010-05     comp_001661_01W   -0.041371
             comp_030807_03W   -0.139678
             comp_031137_03W   -0.042764
             comp_031142_05W    0.055970
             comp_062806_93W   -0.104863
             comp_063914_02W    0.044195
             comp_063987_91W   -0.010617
 2010-06     comp_001661_01W   -0.041371
             comp_064835_03W    0.020164
             comp_064835_90W    0.047719
             comp_065248_07W   -0.045530
  Name: ret_exc_lead1m, dtype: float32]

I'm wondering if there is a way to do it.

CodePudding user response：

It depends on the dimension of your input list. Basically, if your list has one dimension, then the conversion will look like:

import pandas as pd 
your_list= ['a', 'few', 'important', 'words']
df = pd.DataFrame(lst) 
print(df)

Output:

   0
0  a
1  few
2  important
3  words

If the list is well structured and has 2 or more dimensional, you can directly point columns you want:

import pandas as pd 
lst = [['dog', 'black', 1], ['cat', 'grey', 15], ['monkey', 'brown', 2]] 
df = pd.DataFrame(lst, columns =['Animal', 'Color', 'Quantity'], dtype = float) 
print(df)

Result:

   Animal  Color  Quantity
0     dog  black       1.0
1     cat   grey      15.0
2  monkey  brown       2.0

In the case, if you have a few 1d lists it's possible to use zip() function:

import pandas as pd 
l_1 = ['first', 'second', 'third', 'fourth']
l_2 = [1, 2, 3, 4] 
df = pd.DataFrame(list(zip(l_1, l_2)), columns =['First Column', 'Second Column']) 
print(df)

Output:

  First Column  Second Column
0        first              1
1       second              2
2        third              3
3       fourth              4

CodePudding user response：

I shall assume that your list contains in fact a single element which is a Pandas Series. You just have to:

extract the unique element from your list
reset the index of the Series

Code is simply (assuming your list is l):

l[0].reset_index()

It should give:

       time               id  ret_exc_lead1m
0   2010-01  comp_001661_01W       -0.041371
1   2010-01  comp_002410_04W       -0.053836
2   2010-01  comp_004367_02W        0.024752
3   2010-01  comp_004439_08W        0.013136
4   2010-01  comp_011749_09W       -0.013695
5   2010-01  comp_011925_01W        0.043677
6   2010-02  comp_001661_01W       -0.041371
7   2010-02  comp_012384_14W       -0.014593
8   2010-02  comp_013498_01W        0.060748
9   2010-02  comp_015321_02W       -0.053604
10  2010-02  comp_015334_02W       -0.155894
11  2010-03  comp_001661_01W       -0.041371
12  2010-03  comp_015532_10W        0.003835
13  2010-03  comp_015575_01W       -0.045820
14  2010-03  comp_015576_01W        0.032070
15  2010-03  comp_015598_03W        0.028164
16  2010-03  comp_015617_02W       -0.053060
17  2010-03  comp_015634_05W        0.102842
18  2010-03  comp_018636_04W       -0.029271
19  2010-04  comp_001661_01W       -0.041371
20  2010-04  comp_019349_01W       -0.048753
21  2010-04  comp_019565_13W       -0.007516
22  2010-04  comp_025648_05W       -0.015128
23  2010-04  comp_029097_01W        0.085202
24  2010-04  comp_029804_04W       -0.011097
25  2010-05  comp_001661_01W       -0.041371
26  2010-05  comp_030807_03W       -0.139678
27  2010-05  comp_031137_03W       -0.042764
28  2010-05  comp_031142_05W        0.055970
29  2010-05  comp_062806_93W       -0.104863
30  2010-05  comp_063914_02W        0.044195
31  2010-05  comp_063987_91W       -0.010617
32  2010-06  comp_001661_01W       -0.041371
33  2010-06  comp_064835_03W        0.020164
34  2010-06  comp_064835_90W        0.047719
35  2010-06  comp_065248_07W       -0.045530

CodePudding user response：

It works for me using StringIO to use it as an argument and create the pandas.DataFrame instance and generate the data table, but for this you will need to "filter" or "repair" the text strings if you want extract data separately.

from io import StringIO
import pandas as pd

lst = StringIO("""time        id            ret_exc_lead1m
                  2010-01     comp_001661_01W    -0.041371
                              comp_002410_04W    -0.053836
                              comp_004367_02W     0.024752
                              comp_004439_08W     0.013136
                  2010-06     comp_001661_01W    -0.041371
                              comp_064835_03W     0.020164
                              comp_064835_90W     0.047719
                              comp_065248_07W    -0.045530""")

lst.seek(0)                             
df = pd.DataFrame(lst) 

>>> df.index
RangeIndex(start=0, stop=9, step=1)

>>> df.columns
RangeIndex(start=0, stop=1, step=1)

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   0       9 non-null     object
dtypes: object(1)
memory usage: 200.0  bytes