Home > Software design >  How to transform list to other format in python
How to transform list to other format in python

Time:03-17

I get data in this format..

ListA =
[
    [('test1', 'aaa', 'A'),('test2', 'bbb', 'B'),('test3', 'ccc', 'C')],
    [('test4', 'ddd', 'D'),('test5', 'eee', 'E'),('test6', 'fff', 'F')],
    [('test7', 'ggg', 'A'),('test8', 'hhh', 'B'),('test9', 'ppp', 'C')]
]

and I would like to transform to this format

ID, ColA, ColB, ColC,
1, 'test1', 'aaa', 'A'
1, 'test2', 'bbb', 'B'
1, 'test3', 'ccc', 'C'
2, 'test4', 'ddd', 'D'
2, 'test5', 'eee', 'E'
2, 'test6', 'fff', 'F'
3, 'test7', 'ggg', 'A'
3, 'test8', 'hhh', 'B'
3, 'test9', 'ppp', 'C'

CodePudding user response:

You can use itertools.chain:

from itertools import chain
df = pd.DataFrame(chain.from_iterable(ListA),
                  columns=['ColA', 'ColB', 'ColC'])

output:

    ColA ColB ColC
0  test1  aaa    A
1  test2  bbb    B
2  test3  ccc    C
3  test4  ddd    D
4  test5  eee    E
5  test6  fff    F
6  test7  ggg    A
7  test8  hhh    B
8  test9  ppp    C

with the index (can handle uneven list lengths):

from itertools import chain
import numpy as np

idx = np.repeat(np.arange(len(ListA)) 1, list(map(len, ListA)))

df = pd.DataFrame(chain.from_iterable(ListA),
                  columns=['ColA', 'ColB', 'ColC'],
                  index=idx).rename_axis('ID')

output:

     ColA ColB ColC
ID                 
1   test1  aaa    A
1   test2  bbb    B
1   test3  ccc    C
2   test4  ddd    D
2   test5  eee    E
2   test6  fff    F
3   test7  ggg    A
3   test8  hhh    B
3   test9  ppp    C

CodePudding user response:

Nested list-comprehension to the rescue:

df = pd.DataFrame(
    data=[tup for sublist in ListA for tup in sublist],
    columns=['ColA', 'ColB', 'ColC'])

Output:

    ColA ColB ColC
1  test1  aaa    A
1  test2  bbb    B
1  test3  ccc    C
2  test4  ddd    D
2  test5  eee    E
2  test6  fff    F
3  test7  ggg    A
3  test8  hhh    B
3  test9  ppp    C

If you want the index preserved as in your expected output:

df = pd.DataFrame(
        data=[tup for sublist in ListA for tup in sublist],
        columns=['ColA', 'ColB', 'ColC'],
        index=np.arange(len(ListA)).repeat([len(sublist) for sublist in ListA]) 1)

CodePudding user response:

Here's a solution that uses explode to preserve the index:

df = pd.Series(ListA).explode().pipe(lambda x: pd.DataFrame(x.tolist(), index=x.index   1, columns=['ColA', 'ColB', 'ColC']))

Output:

>>> df
    ColA ColB ColC
1  test1  aaa    A
1  test2  bbb    B
1  test3  ccc    C
2  test4  ddd    D
2  test5  eee    E
2  test6  fff    F
3  test7  ggg    A
3  test8  hhh    B
3  test9  ppp    C

CodePudding user response:

For fun, another solution using pandas.concat:

df = (pd
 .concat(dict(enumerate(map(pd.DataFrame, ListA), start=1)))
 .droplevel(1)
 .rename(columns=dict(enumerate(['ColA', 'ColB', 'ColC'])))
)

or:

from itertools import count
c = count(1)
df = pd.concat([pd.DataFrame(x, index=[next(c)]*len(x),
                             columns=['ColA', 'ColB', 'ColC'])
                for x in ListA])

output:

    ColA ColB ColC
1  test1  aaa    A
1  test2  bbb    B
1  test3  ccc    C
2  test4  ddd    D
2  test5  eee    E
2  test6  fff    F
3  test7  ggg    A
3  test8  hhh    B
3  test9  ppp    C
  • Related