I have the following dictionnary :
dico_test = {'BC1': [['name1', 14],['name2', 10]], 'BC2': [['name1', 2],['name2', 3],['name3', 1]]}
And I would like to have a dataframe like this :
BC1 BC2
name1 14 2
name2 10 3
name3 NaN 1
So far what I did: transform to dataframe with keys as index to fill with 'None' values
and transpose it to have my keys as columns :
df = pd.DataFrame.from_dict(dico_test, orient='index').T
BC1 BC2
0 [name1, 14] [name1, 2]
1 [name2, 10] [name2, 3]
2 None [name3, 1]
And now I can have my index names and values with loops on columns:
names = list()
for x in df.BC2:
names.append(x[0])
df.index = names
column = list()
for col in df.columns:
# 1st iter : BC1
for x in df[col]:
# 1st iter : [name1, 14]
if x is not None: # because None is not subscriptable
# 1st iter : append 14
column.append(x[1])
if x is None:
column.append(x)
df[col] = column
# result
BC1 BC2
name1 14.0 2
name2 10.0 3
name3 NaN 1
PROBLEM : I would like to get my index names 'automatically' from the column there is no "None" value, and I can't find the solution to do so. Because in this example, I selected column BC2 manually and it's not the way I want to do it. And also if there is a better way to do my dataframe directly from my dictionnary without reformate everything after, it could be great.
CodePudding user response:
The problem is that your data defines columns with dictionaries, but then rows with lists of 2 items [index, value]
. You can simply convert that list of 2 items to a dictionary and everything will work as expected out of the box:
>>> pd.DataFrame({col: dict(rows) for col, rows in dico_test.items()})
BC1 BC2
name1 14.0 2
name2 10.0 3
name3 NaN 1
CodePudding user response:
With explode
and pivot
s = pd.Series(dico_test).explode()
out = pd.DataFrame(s.tolist(),index=s.index).reset_index().pivot(0,'index',1)
out
Out[260]:
index BC1 BC2
0
name1 14.0 2.0
name2 10.0 3.0
name3 NaN 1.0