Transform python dictionaries with keys and corresponding lists to pandas dataframe-CodePudding

I am trying to transform multiple dictionaries with keys and corresponding lists to a pandas dataframe and can't get to the right way of transforming them. For the pandas data frame, the keys are the index column and the lists

How can I transform python dictionaries with keys and corresponding lists (in values) to a pandas dataframe with keys as the index column and each of the dictionary as the other columns ?

Here is a sample set of dictionaries and one of my non-working solutions:

import pandas as pd

dict1 = {'key_1': [1, 2, 3, 4], 'key_2': [5, 6, 7, 8], 'key_3': [9, 10, 11, 12]}
dict2 = {'key_1': ['a', 'b', 'c', 'd'], 'key_2': ['e', 'f', 'g', 'h'], 'key_3': ['i', 'j', 'k', 'l']}
dict3 = {'key_1': ['DD', 'CC', 'BB', 'AA'], 'key_3': ['II', 'JJ', 'KK', 'LL']}

df = pd.DataFrame.from_dict({'dict1':pd.Series(dict1),
                            'dict2':pd.Series(dict2),
                            'dict3':pd.Series(dict3)})

print(df)

This is what I need the resulting dataframe to look like:

I tried using explode and it would work if I only had one dictionary, but doing it recursively for the other dictionaries did not work. Then, I tried some of the solutions in this Stackoverflow transformation solution but couldn't get the solutions to work, in some cases because of the NaNs in my example.

CodePudding user response：

You would need to fillna with a list with 4 items. Unfortunately fillna doesn't support a list as parameter.

But you can take advantage of a stack/unstack (and the fill_value parameter of unstack), then explode all columns:

(df
 .stack()
 .unstack(fill_value=[pd.NA]*4)
 .explode(list(df))
)

output:

      dict1 dict2 dict3
key_1     1     a    DD
key_1     2     b    CC
key_1     3     c    BB
key_1     4     d    AA
key_2     5     e  <NA>
key_2     6     f  <NA>
key_2     7     g  <NA>
key_2     8     h  <NA>
key_3     9     i    II
key_3    10     j    JJ
key_3    11     k    KK
key_3    12     l    LL

CodePudding user response：

Or try:

df_e = df.stack().explode().to_frame()
df_e = df_e.set_index(df_e.groupby(level=[0,1]).cumcount(), 
                      append=True)
df_out = df_e[0].unstack(1)
df_out

Output:

        dict1 dict2 dict3
key_1 0     1     a    DD
      1     2     b    CC
      2     3     c    BB
      3     4     d    AA
key_2 0     5     e   NaN
      1     6     f   NaN
      2     7     g   NaN
      3     8     h   NaN
key_3 0     9     i    II
      1    10     j    JJ
      2    11     k    KK
      3    12     l    LL