I get data in this format..
ListA =
[
[('test1', 'aaa', 'A'),('test2', 'bbb', 'B'),('test3', 'ccc', 'C')],
[('test4', 'ddd', 'D'),('test5', 'eee', 'E'),('test6', 'fff', 'F')],
[('test7', 'ggg', 'A'),('test8', 'hhh', 'B'),('test9', 'ppp', 'C')]
]
and I would like to transform to this format
ID, ColA, ColB, ColC,
1, 'test1', 'aaa', 'A'
1, 'test2', 'bbb', 'B'
1, 'test3', 'ccc', 'C'
2, 'test4', 'ddd', 'D'
2, 'test5', 'eee', 'E'
2, 'test6', 'fff', 'F'
3, 'test7', 'ggg', 'A'
3, 'test8', 'hhh', 'B'
3, 'test9', 'ppp', 'C'
CodePudding user response:
You can use itertools.chain
:
from itertools import chain
df = pd.DataFrame(chain.from_iterable(ListA),
columns=['ColA', 'ColB', 'ColC'])
output:
ColA ColB ColC
0 test1 aaa A
1 test2 bbb B
2 test3 ccc C
3 test4 ddd D
4 test5 eee E
5 test6 fff F
6 test7 ggg A
7 test8 hhh B
8 test9 ppp C
with the index (can handle uneven list lengths):
from itertools import chain
import numpy as np
idx = np.repeat(np.arange(len(ListA)) 1, list(map(len, ListA)))
df = pd.DataFrame(chain.from_iterable(ListA),
columns=['ColA', 'ColB', 'ColC'],
index=idx).rename_axis('ID')
output:
ColA ColB ColC
ID
1 test1 aaa A
1 test2 bbb B
1 test3 ccc C
2 test4 ddd D
2 test5 eee E
2 test6 fff F
3 test7 ggg A
3 test8 hhh B
3 test9 ppp C
CodePudding user response:
Nested list-comprehension to the rescue:
df = pd.DataFrame(
data=[tup for sublist in ListA for tup in sublist],
columns=['ColA', 'ColB', 'ColC'])
Output:
ColA ColB ColC
1 test1 aaa A
1 test2 bbb B
1 test3 ccc C
2 test4 ddd D
2 test5 eee E
2 test6 fff F
3 test7 ggg A
3 test8 hhh B
3 test9 ppp C
If you want the index preserved as in your expected output:
df = pd.DataFrame(
data=[tup for sublist in ListA for tup in sublist],
columns=['ColA', 'ColB', 'ColC'],
index=np.arange(len(ListA)).repeat([len(sublist) for sublist in ListA]) 1)
CodePudding user response:
Here's a solution that uses explode
to preserve the index:
df = pd.Series(ListA).explode().pipe(lambda x: pd.DataFrame(x.tolist(), index=x.index 1, columns=['ColA', 'ColB', 'ColC']))
Output:
>>> df
ColA ColB ColC
1 test1 aaa A
1 test2 bbb B
1 test3 ccc C
2 test4 ddd D
2 test5 eee E
2 test6 fff F
3 test7 ggg A
3 test8 hhh B
3 test9 ppp C
CodePudding user response:
For fun, another solution using pandas.concat
:
df = (pd
.concat(dict(enumerate(map(pd.DataFrame, ListA), start=1)))
.droplevel(1)
.rename(columns=dict(enumerate(['ColA', 'ColB', 'ColC'])))
)
or:
from itertools import count
c = count(1)
df = pd.concat([pd.DataFrame(x, index=[next(c)]*len(x),
columns=['ColA', 'ColB', 'ColC'])
for x in ListA])
output:
ColA ColB ColC
1 test1 aaa A
1 test2 bbb B
1 test3 ccc C
2 test4 ddd D
2 test5 eee E
2 test6 fff F
3 test7 ggg A
3 test8 hhh B
3 test9 ppp C