Home > Back-end >  Transform python dictionaries with keys and corresponding lists to pandas dataframe
Transform python dictionaries with keys and corresponding lists to pandas dataframe

Time:06-29

I am trying to transform multiple dictionaries with keys and corresponding lists to a pandas dataframe and can't get to the right way of transforming them. For the pandas data frame, the keys are the index column and the lists

How can I transform python dictionaries with keys and corresponding lists (in values) to a pandas dataframe with keys as the index column and each of the dictionary as the other columns ?

Here is a sample set of dictionaries and one of my non-working solutions:

import pandas as pd

dict1 = {'key_1': [1, 2, 3, 4], 'key_2': [5, 6, 7, 8], 'key_3': [9, 10, 11, 12]}
dict2 = {'key_1': ['a', 'b', 'c', 'd'], 'key_2': ['e', 'f', 'g', 'h'], 'key_3': ['i', 'j', 'k', 'l']}
dict3 = {'key_1': ['DD', 'CC', 'BB', 'AA'], 'key_3': ['II', 'JJ', 'KK', 'LL']}

df = pd.DataFrame.from_dict({'dict1':pd.Series(dict1),
                            'dict2':pd.Series(dict2),
                            'dict3':pd.Series(dict3)})

print(df)

This is what I need the resulting dataframe to look like:

Pandas dataframe from python dictionaries with keys and value lists

I tried using explode and it would work if I only had one dictionary, but doing it recursively for the other dictionaries did not work. Then, I tried some of the solutions in this Stackoverflow transformation solution but couldn't get the solutions to work, in some cases because of the NaNs in my example.

CodePudding user response:

You would need to fillna with a list with 4 items. Unfortunately fillna doesn't support a list as parameter.

But you can take advantage of a stack/unstack (and the fill_value parameter of unstack), then explode all columns:

(df
 .stack()
 .unstack(fill_value=[pd.NA]*4)
 .explode(list(df))
)

output:

      dict1 dict2 dict3
key_1     1     a    DD
key_1     2     b    CC
key_1     3     c    BB
key_1     4     d    AA
key_2     5     e  <NA>
key_2     6     f  <NA>
key_2     7     g  <NA>
key_2     8     h  <NA>
key_3     9     i    II
key_3    10     j    JJ
key_3    11     k    KK
key_3    12     l    LL

CodePudding user response:

Or try:

df_e = df.stack().explode().to_frame()
df_e = df_e.set_index(df_e.groupby(level=[0,1]).cumcount(), 
                      append=True)
df_out = df_e[0].unstack(1)
df_out

Output:

        dict1 dict2 dict3
key_1 0     1     a    DD
      1     2     b    CC
      2     3     c    BB
      3     4     d    AA
key_2 0     5     e   NaN
      1     6     f   NaN
      2     7     g   NaN
      3     8     h   NaN
key_3 0     9     i    II
      1    10     j    JJ
      2    11     k    KK
      3    12     l    LL
  • Related