I am trying to convert an dictionary to dataframe which contains nested list.the dataframe output i got is follows,
import pandas as pd
data = {'a': 'test', 'b': 1657, 'c': 'asset', 'd': [['2089', '0.0'], ['2088', '0.0']], 'e': [['2088', '0.0'], ['2088', '0.0'], ['2088', '0.00']]}
df = pd.DataFrame({ key:pd.Series(value) for key, value in data.items() })
My output is
a b c d e
0 test 1657.0 asset [2089, 0.0] [2088, 0.0]
1 NaN NaN NaN [2088, 0.0] [2088, 0.0]
2 NaN NaN NaN NaN [2088, 0.00]
the required output is
a b c d1 d2 e1 e2
0 test 1657.0 asset 2089 0.0 2088 0.0
1 NaN NaN NaN 2088 0.0 2088 0.0
2 NaN NaN NaN NaN NaN 2088 0.0
Thanks in advance for your efforts.
CodePudding user response:
try this after you get rid of NaNs in d column:
df = df.dropna(subset='d')
df[['d1', 'd2']] = pd.DataFrame(df.d.tolist(), columns=['d1', 'd2'])
df[['e1', 'e2']] = pd.DataFrame(df.e.tolist(), columns=['e1', 'e2'])
df= df.drop(columns=['d', 'e'])
df
>a b c d1 d2 e1 e2
0 test 1657.0 asset 2089 0.0 2088 0.0
1 NaN NaN NaN 2088 0.0 2088 0.0
CodePudding user response:
I'm never a fan of hardcoding certain column names and always in favour to find a programmatic way to cover these issues.
import pandas as pd
data = {'a': 'test',
'b': 1657,
'c': 'asset',
'd': [['2089', '0.0'], ['2088', '0.0']],
'e': [['2088', '0.0'], ['2088', '0.0'], ['2088', '0.00']],
'f': [['2088', '0.0', "x", "foo"], ['2088', '0.0', 'bar', "i"], ['2088', '0.00', "z", "0.2"]],
"x": ["test1", "test2"]}
s = []
for key, value in data.items():
if isinstance(value, list):
if isinstance(value[0], list):
cols = [key str(i)for i in range(1, len(value[0]) 1, 1)]
s.append(pd.DataFrame(data=value, columns=cols))
else:
s.append(pd.Series(data=value, name=key))
else:
s.append(pd.Series(data=value, name=key))
df = pd.concat(s, axis=1)
So it loops through the dictionary and takes a look whether the dictionary value is a list.
If it's a list, it'll check whether it is a nested list. If it is a nested list, it'll take a look at the length of the first element and construct the column names (eg. e1, e2) for the dataframe and stores the dataframe of that key value pair in a list. If it's not a nested list, it'll generate a series of the key value pair and store it in that same list.
If it's not a list, it'll generate a series of the key value pair and store it in that same list. Last step is just to concat the list.
I've taken the liberty to add a new key value pair to illustrate that everything will work when you extend the dictionary.
a b c d1 d2 e1 e2 f1 f2 f3 f4
0 test 1657.0 asset 2089 0.0 2088 0.0 2088 0.0 x foo
1 NaN NaN NaN 2088 0.0 2088 0.0 2088 0.0 bar i
2 NaN NaN NaN NaN NaN 2088 0.00 2088 0.00 z 0.2
CodePudding user response:
This code gived me the wanted dataframe:
import pandas as pd
import numpy as np
#import pandas as pd
data = {'a': 'test', 'b': 1657, 'c': 'asset', 'd':
[['2089', '0.0'], ['2088', '0.0']], 'e': [['2088', '0.0'],
['2088', '0.0'], ['2088', '0.00']]}
df = pd.DataFrame({ key:pd.Series(value) for key, value in
data.items() })
a = ['test']
b = [1657]
c = ['asset']
d = [['2089', '0.0'], ['2088', '0.0']]
e = [['2088', '0.0'], ['2088', '0.0'], ['2088', '0.00']]
d1 = [d[i][0] for i in range(len(d))]
d2 = [d[i][1] for i in range(len(d))]
e1 = [e[i][0] for i in range(len(e))]
e2 = [e[i][1] for i in range(len(e))]
df = pd.DataFrame({'a':pd.Series(a),'b':pd.Series(b),'c':pd.Series(c),'d1':pd.Series(d1),'d2':pd.Series(d2),'e1':pd.Series(e1),'e2':pd.Series(e2)})