I need to convert my list
into a 3 column pandas
DataFrame
(time
, id
and ret_exc_lead1m
).
However it looks like this:
[time id
2010-01 comp_001661_01W -0.041371
comp_002410_04W -0.053836
comp_004367_02W 0.024752
comp_004439_08W 0.013136
comp_011749_09W -0.013695
comp_011925_01W 0.043677
2010-02 comp_001661_01W -0.041371
comp_012384_14W -0.014593
comp_013498_01W 0.060748
comp_015321_02W -0.053604
comp_015334_02W -0.155894
2010-03 comp_001661_01W -0.041371
comp_015532_10W 0.003835
comp_015575_01W -0.045820
comp_015576_01W 0.032070
comp_015598_03W 0.028164
comp_015617_02W -0.053060
comp_015634_05W 0.102842
comp_018636_04W -0.029271
2010-04 comp_001661_01W -0.041371
comp_019349_01W -0.048753
comp_019565_13W -0.007516
comp_025648_05W -0.015128
comp_029097_01W 0.085202
comp_029804_04W -0.011097
2010-05 comp_001661_01W -0.041371
comp_030807_03W -0.139678
comp_031137_03W -0.042764
comp_031142_05W 0.055970
comp_062806_93W -0.104863
comp_063914_02W 0.044195
comp_063987_91W -0.010617
2010-06 comp_001661_01W -0.041371
comp_064835_03W 0.020164
comp_064835_90W 0.047719
comp_065248_07W -0.045530
Name: ret_exc_lead1m, dtype: float32]
I'm wondering if there is a way to do it.
CodePudding user response:
It depends on the dimension of your input list. Basically, if your list has one dimension, then the conversion will look like:
import pandas as pd
your_list= ['a', 'few', 'important', 'words']
df = pd.DataFrame(lst)
print(df)
Output:
0
0 a
1 few
2 important
3 words
If the list is well structured and has 2 or more dimensional, you can directly point columns you want:
import pandas as pd
lst = [['dog', 'black', 1], ['cat', 'grey', 15], ['monkey', 'brown', 2]]
df = pd.DataFrame(lst, columns =['Animal', 'Color', 'Quantity'], dtype = float)
print(df)
Result:
Animal Color Quantity
0 dog black 1.0
1 cat grey 15.0
2 monkey brown 2.0
In the case, if you have a few 1d lists it's possible to use zip()
function:
import pandas as pd
l_1 = ['first', 'second', 'third', 'fourth']
l_2 = [1, 2, 3, 4]
df = pd.DataFrame(list(zip(l_1, l_2)), columns =['First Column', 'Second Column'])
print(df)
Output:
First Column Second Column
0 first 1
1 second 2
2 third 3
3 fourth 4
CodePudding user response:
I shall assume that your list contains in fact a single element which is a Pandas Series. You just have to:
- extract the unique element from your list
- reset the index of the Series
Code is simply (assuming your list is l
):
l[0].reset_index()
It should give:
time id ret_exc_lead1m
0 2010-01 comp_001661_01W -0.041371
1 2010-01 comp_002410_04W -0.053836
2 2010-01 comp_004367_02W 0.024752
3 2010-01 comp_004439_08W 0.013136
4 2010-01 comp_011749_09W -0.013695
5 2010-01 comp_011925_01W 0.043677
6 2010-02 comp_001661_01W -0.041371
7 2010-02 comp_012384_14W -0.014593
8 2010-02 comp_013498_01W 0.060748
9 2010-02 comp_015321_02W -0.053604
10 2010-02 comp_015334_02W -0.155894
11 2010-03 comp_001661_01W -0.041371
12 2010-03 comp_015532_10W 0.003835
13 2010-03 comp_015575_01W -0.045820
14 2010-03 comp_015576_01W 0.032070
15 2010-03 comp_015598_03W 0.028164
16 2010-03 comp_015617_02W -0.053060
17 2010-03 comp_015634_05W 0.102842
18 2010-03 comp_018636_04W -0.029271
19 2010-04 comp_001661_01W -0.041371
20 2010-04 comp_019349_01W -0.048753
21 2010-04 comp_019565_13W -0.007516
22 2010-04 comp_025648_05W -0.015128
23 2010-04 comp_029097_01W 0.085202
24 2010-04 comp_029804_04W -0.011097
25 2010-05 comp_001661_01W -0.041371
26 2010-05 comp_030807_03W -0.139678
27 2010-05 comp_031137_03W -0.042764
28 2010-05 comp_031142_05W 0.055970
29 2010-05 comp_062806_93W -0.104863
30 2010-05 comp_063914_02W 0.044195
31 2010-05 comp_063987_91W -0.010617
32 2010-06 comp_001661_01W -0.041371
33 2010-06 comp_064835_03W 0.020164
34 2010-06 comp_064835_90W 0.047719
35 2010-06 comp_065248_07W -0.045530
CodePudding user response:
It works for me using StringIO
to use it as an argument and create the pandas.DataFrame
instance and generate the data table, but for this you will need to "filter" or "repair" the text strings if you want extract data separately.
from io import StringIO
import pandas as pd
lst = StringIO("""time id ret_exc_lead1m
2010-01 comp_001661_01W -0.041371
comp_002410_04W -0.053836
comp_004367_02W 0.024752
comp_004439_08W 0.013136
2010-06 comp_001661_01W -0.041371
comp_064835_03W 0.020164
comp_064835_90W 0.047719
comp_065248_07W -0.045530""")
lst.seek(0)
df = pd.DataFrame(lst)
>>> df.index
RangeIndex(start=0, stop=9, step=1)
>>> df.columns
RangeIndex(start=0, stop=1, step=1)
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 9 non-null object
dtypes: object(1)
memory usage: 200.0 bytes