Converting a dictionary of lists of dictionaries to a dataframe-CodePudding

I have the following subsample of dictionary of lists of dictionaries (from a larger dictionary of millions of items):

bool_dict = {0: [{0: 4680}, {1: 1185}], 
             1: [{0: 172}, {1: 9}], 
             2: [{0: 149}, {1: 1282}], 
             3: [{0: 20}, {1: 127}], 
             4: [{0: 0}, {1: 0}]}

which I converted to a dataframe of the form:

          0          1
0  {0: 4680}  {1: 1185}
1   {0: 172}     {1: 9}
2   {0: 149}  {1: 1282}
3    {0: 20}   {1: 127}
4     {0: 0}     {1: 0}

by doing the following:

test=pd.DataFrame(bool_dict.values(),columns['0','1'],index=bool_dict.keys()).sort_index()

The problem is that I only need each cell's value, not the key, in the dataframe. So, the desired output is:

       0          1
0      4680       1185
1       172          9
2       149       1282
3        20        127
4         0          0

I tried the following:

test['0'] = test['0'].apply(lambda x: x[0])

but then I get a key error on what I thought was a dictionary.

To make sure it indeed was a dictionary, I then tried

from ast import literal_eval
test['0']=test['0'].apply(lambda x: literal_eval(str(x)))

then tried this again

test['0'] = test['0'].apply(lambda x: x[0])

with no success (I also tried the key as '0').

I could do the hacky thing of a split by the : and then remove extraneous stuff, but that just feels wrong for so many reasons.

CodePudding user response：

One way is to convert the inner list into a dictionary then pass it to the DataFrame constructor:

bool_dict_flattened = {i: {k:v for d in lst for k,v in d.items()} for i, lst in bool_dict.items()}
df = pd.DataFrame.from_dict(bool_dict_flattened, orient='index')

Another option is to apply str accessor on the columns by using the fact that column names and keys match for each column:

out = pd.DataFrame.from_dict(bool_dict, orient='index').apply(lambda x: x.str[x.name])

Output:

      0     1
0  4680  1185
1   172     9
2   149  1282
3    20   127
4     0     0

CodePudding user response：

You can iterate through each row by first lambda and iterate through each cell in that row with the second lambda and read the values of the dictionary:

df = pd.DataFrame(bool_dict).T
df.apply(lambda x: x.apply(lambda y: list(y.values())[0]))
df

      0     1
0  4680  1185
1   172     9
2   149  1282
3    20   127
4     0     0

CodePudding user response：

test['0'] = test['0'].apply(lambda x: x[0])

but then I get a key error on what I thought was a dictionary.

You get the key error is because your column name is integer, however, you access it with string. Try

test[0] = test[0].apply(lambda x: x[0])