Home > Software engineering >  Convert nested list in dictionary to dataframe
Convert nested list in dictionary to dataframe

Time:07-15

I am trying to convert an dictionary to dataframe which contains nested list.the dataframe output i got is follows,

import pandas as pd
data = {'a': 'test', 'b': 1657, 'c': 'asset', 'd': [['2089', '0.0'], ['2088', '0.0']], 'e': [['2088', '0.0'], ['2088', '0.0'], ['2088', '0.00']]}
df = pd.DataFrame({ key:pd.Series(value) for key, value in data.items() })

My output is

      a       b      c            d             e
0  test  1657.0  asset  [2089, 0.0]   [2088, 0.0]
1   NaN     NaN    NaN  [2088, 0.0]   [2088, 0.0]
2   NaN     NaN    NaN          NaN  [2088, 0.00]

the required output is

      a       b      c     d1    d2     e1   e2
0  test  1657.0  asset   2089   0.0   2088  0.0
1   NaN     NaN    NaN   2088   0.0   2088  0.0
2   NaN     NaN    NaN   NaN    NaN   2088  0.0

Thanks in advance for your efforts.

CodePudding user response:

try this after you get rid of NaNs in d column:

df = df.dropna(subset='d')
df[['d1', 'd2']] = pd.DataFrame(df.d.tolist(), columns=['d1', 'd2'])
df[['e1', 'e2']] = pd.DataFrame(df.e.tolist(), columns=['e1', 'e2'])
df= df.drop(columns=['d', 'e'])

df
>a  b   c   d1  d2  e1  e2
0   test    1657.0  asset   2089    0.0 2088    0.0
1   NaN NaN NaN 2088    0.0 2088    0.0

CodePudding user response:

I'm never a fan of hardcoding certain column names and always in favour to find a programmatic way to cover these issues.

import pandas as pd

data = {'a': 'test', 
        'b': 1657, 
        'c': 'asset', 
        'd': [['2089', '0.0'], ['2088', '0.0']],
        'e': [['2088', '0.0'], ['2088', '0.0'], ['2088', '0.00']],
        'f': [['2088', '0.0', "x", "foo"], ['2088', '0.0', 'bar', "i"], ['2088', '0.00', "z", "0.2"]], 
        "x": ["test1", "test2"]}

s = []
for key, value in data.items():
    if isinstance(value, list):
        if isinstance(value[0], list):
            cols = [key str(i)for i in range(1, len(value[0])   1, 1)]
            s.append(pd.DataFrame(data=value, columns=cols))
        else:
            s.append(pd.Series(data=value, name=key))
    else:
        s.append(pd.Series(data=value, name=key))
df = pd.concat(s, axis=1)

So it loops through the dictionary and takes a look whether the dictionary value is a list.

If it's a list, it'll check whether it is a nested list. If it is a nested list, it'll take a look at the length of the first element and construct the column names (eg. e1, e2) for the dataframe and stores the dataframe of that key value pair in a list. If it's not a nested list, it'll generate a series of the key value pair and store it in that same list.

If it's not a list, it'll generate a series of the key value pair and store it in that same list. Last step is just to concat the list.

I've taken the liberty to add a new key value pair to illustrate that everything will work when you extend the dictionary.

      a       b      c    d1   d2    e1    e2    f1    f2   f3   f4
0  test  1657.0  asset  2089  0.0  2088   0.0  2088   0.0    x  foo
1   NaN     NaN    NaN  2088  0.0  2088   0.0  2088   0.0  bar    i
2   NaN     NaN    NaN   NaN  NaN  2088  0.00  2088  0.00    z  0.2

CodePudding user response:

This code gived me the wanted dataframe:

import pandas as pd
import numpy as np

#import pandas as pd
data = {'a': 'test', 'b': 1657, 'c': 'asset', 'd': 
[['2089', '0.0'], ['2088', '0.0']], 'e': [['2088', '0.0'], 
['2088', '0.0'], ['2088', '0.00']]}
df = pd.DataFrame({ key:pd.Series(value) for key, value in 
data.items() })
a = ['test']
b = [1657]
c = ['asset']
d = [['2089', '0.0'], ['2088', '0.0']]
e = [['2088', '0.0'], ['2088', '0.0'], ['2088', '0.00']]

d1 = [d[i][0] for i in range(len(d))]
d2 = [d[i][1] for i in range(len(d))]

e1 = [e[i][0] for i in range(len(e))]
e2 = [e[i][1] for i in range(len(e))]

df = pd.DataFrame({'a':pd.Series(a),'b':pd.Series(b),'c':pd.Series(c),'d1':pd.Series(d1),'d2':pd.Series(d2),'e1':pd.Series(e1),'e2':pd.Series(e2)})
  • Related