I have a dictionary of tuples in a list and would like to convert them to a pandas dataframe, but having some hard time with it.
My data is as below:
{0: [('A1', 0.0037505763997138838),
('A2', 0.0036963076240675245),
('A3', 0.0035451257931104485),
('A4', 0.003501467316849233),
('A5', 0.00343229837150675),
('A6', 0.0033731723637910062),
('A7', 0.0033713118048861465),
('A8', 0.003325231288305062),
('A9', 0.002885164987475754),
('A10', 0.0028834984584371797)],
1: [('B1', 0.011094831353420088),
('B2', 0.009526049091086916),
('B3', 0.007002935827927014),
('B4', 0.00511673700015512),
('B5', 0.004870300921667765),
('B6', 0.004496108376557714),
('B7', 0.004230892962061271),
('B8', 0.004137434850455194),
('B9', 0.003958335393193675),
('B10', 0.0038285145788315993)]}
and I want to transform it into the following in Pandas
num label probs
0 A1 0.0037505763997138838
0 A2 0.0036963076240675245
0 A3 0.0035451257931104485
0 A4 0.003501467316849233
0 A5 0.00343229837150675
0 A6 0.0033731723637910062
0 A7 0.0033713118048861465
0 A8 0.003325231288305062
0 A9 0.002885164987475754
0 A10 0.0028834984584371797
1 B1 0.011094831353420088
1 B2 0.009526049091086916
1 B3 0.007002935827927014
1 B4 0.00511673700015512
1 B5 0.004870300921667765
1 B6 0.004496108376557714
1 B7 0.004230892962061271
1 B8 0.004137434850455194
1 B9 0.003958335393193675
1 B10 0.0038285145788315993
CodePudding user response:
You can try:
(Assuming data
is the name of the dict:)
df = (pd.Series(data)
.explode()
.apply(pd.Series)
.reset_index()
)
df.columns = ['num', 'label', 'probs']
Result:
print(df)
num label probs
0 0 A1 0.003751
1 0 A2 0.003696
2 0 A3 0.003545
3 0 A4 0.003501
4 0 A5 0.003432
5 0 A6 0.003373
6 0 A7 0.003371
7 0 A8 0.003325
8 0 A9 0.002885
9 0 A10 0.002883
10 1 B1 0.011095
11 1 B2 0.009526
12 1 B3 0.007003
13 1 B4 0.005117
14 1 B5 0.004870
15 1 B6 0.004496
16 1 B7 0.004231
17 1 B8 0.004137
18 1 B9 0.003958
19 1 B10 0.003829
Alternatively, you can also use pd.DataFrame()
in place of the 2nd pd.Series()
for better performance (thanks for the suggestion by @anky), as follows:
s = pd.Series(data).explode()
df = (pd.DataFrame(s.tolist(),columns=['label', 'probs'], index=s.index)
.rename_axis(index='num')
.reset_index()
)
Result:
print(df)
num label probs
0 0 A1 0.003751
1 0 A2 0.003696
2 0 A3 0.003545
3 0 A4 0.003501
4 0 A5 0.003432
5 0 A6 0.003373
6 0 A7 0.003371
7 0 A8 0.003325
8 0 A9 0.002885
9 0 A10 0.002883
10 1 B1 0.011095
11 1 B2 0.009526
12 1 B3 0.007003
13 1 B4 0.005117
14 1 B5 0.004870
15 1 B6 0.004496
16 1 B7 0.004231
17 1 B8 0.004137
18 1 B9 0.003958
19 1 B10 0.003829
CodePudding user response:
We can use comprehension syntax to create a list of triplets (name, label and probs), then you can easily create the dataframe from this list
c = ['name', 'label', 'probs']
pd.DataFrame([(k, *t) for k, v in d.items() for t in v], columns=c)
name label probs
0 0 A1 0.003751
1 0 A2 0.003696
2 0 A3 0.003545
3 0 A4 0.003501
4 0 A5 0.003432
5 0 A6 0.003373
6 0 A7 0.003371
7 0 A8 0.003325
8 0 A9 0.002885
9 0 A10 0.002883
10 1 B1 0.011095
11 1 B2 0.009526
12 1 B3 0.007003
13 1 B4 0.005117
14 1 B5 0.004870
15 1 B6 0.004496
16 1 B7 0.004231
17 1 B8 0.004137
18 1 B9 0.003958
19 1 B10 0.003829
CodePudding user response:
You need to rework a bit your dictionary. Here I used itertools.chain
to combine the values:
from itertools import chain
import pandas as pd
import numpy as np
df = (pd.DataFrame(list(chain(*d.values())),
columns=['label', 'probs'],
index=np.repeat(list(d), list(map(len, d.values()))))
.rename_axis('num')
.reset_index()
)
output:
num label probs
0 0 A1 0.003751
1 0 A2 0.003696
2 0 A3 0.003545
3 0 A4 0.003501
4 0 A5 0.003432
...
17 1 B8 0.004137
18 1 B9 0.003958
19 1 B10 0.003829