Home > Enterprise >  Transform a dictionnary into a dataframe
Transform a dictionnary into a dataframe

Time:09-23

I have the following dictionnary :

dico_test = {'BC1': [['name1', 14],['name2', 10]], 'BC2': [['name1', 2],['name2', 3],['name3', 1]]}

And I would like to have a dataframe like this :

       BC1   BC2
name1   14     2
name2   10     3
name3  NaN     1

So far what I did: transform to dataframe with keys as index to fill with 'None' values
and transpose it to have my keys as columns :

df = pd.DataFrame.from_dict(dico_test, orient='index').T

           BC1         BC2
0  [name1, 14]  [name1, 2]
1  [name2, 10]  [name2, 3]
2         None  [name3, 1]

And now I can have my index names and values with loops on columns:

names = list()
for x in df.BC2:
    names.append(x[0])
df.index = names

column = list()
for col in df.columns:
    # 1st iter : BC1
    for x in df[col]:
        # 1st iter : [name1, 14]
        if x is not None: # because None is not subscriptable
            # 1st iter : append 14
            column.append(x[1])
        if x is None:
            column.append(x)
    df[col] = column

# result

        BC1  BC2
name1  14.0    2
name2  10.0    3
name3   NaN    1

PROBLEM : I would like to get my index names 'automatically' from the column there is no "None" value, and I can't find the solution to do so. Because in this example, I selected column BC2 manually and it's not the way I want to do it. And also if there is a better way to do my dataframe directly from my dictionnary without reformate everything after, it could be great.

CodePudding user response:

The problem is that your data defines columns with dictionaries, but then rows with lists of 2 items [index, value]. You can simply convert that list of 2 items to a dictionary and everything will work as expected out of the box:

>>> pd.DataFrame({col: dict(rows) for col, rows in dico_test.items()})
        BC1  BC2
name1  14.0    2
name2  10.0    3
name3   NaN    1

CodePudding user response:

With explode and pivot

s = pd.Series(dico_test).explode()
out = pd.DataFrame(s.tolist(),index=s.index).reset_index().pivot(0,'index',1)
out
Out[260]: 
index   BC1  BC2
0               
name1  14.0  2.0
name2  10.0  3.0
name3   NaN  1.0
  • Related