Home > Enterprise >  How to combine multiple DataFrame rows into 1 with a column containing list values
How to combine multiple DataFrame rows into 1 with a column containing list values

Time:07-21

I have a DataFrame like this:

| cat0  | cat1 | cat2 | col_list  | Quantity |
| ----- | ---- | ---- | ----------| -------- |
| alpha | x    | a    | [a,b,c,d] | 4        |
| alpha | y    | a    | [e]       | 1        |
| beta  | z    | a    | [f,g]     | 2        |
| gamma | p    | b    | [h]       | 1        |
| gamma | q    | b    | [I,j,k,l] | 4        |
| phi   | r    | c    | [r,s]     | 2        |
| eita  | s    | c    | [m,n]     | 2        |

I want it to transform it based on cat2 column. If they have same value then combine those rows into 1 and merge the rows like this:

| cat0       | cat1  | cat2 | col_list        | Quantity |
| ---------- | ----- | ---- | ----------------| -------- |
| alpha,beta | x,y,z | a    | [a,b,c,d,e,f,g] | 7        |
| gamma      | p,q   | b    | [h,I,j,k,l]     | 5        |
| phi,eita   | r,s   | c    | [r,s,m,n]       | 4        |

If column have strings values (column: cat0,cat1) join and separate them by a comma. If it has list values (column" col_list) then add those elements into a single list. If it is a int value (column: Quantity), then just add them.

CodePudding user response:

You can use groupby.agg, with list/','.join, sum, or itertools.chain:

from itertools import chain

(df
 .groupby('cat2', as_index=False)
 .agg({'cat0': ','.join, 'cat1': ','.join,
       'col_list': lambda x: list(chain.from_iterable(x)),
       'Quantity': 'sum'
       })
 [df.columns] # reindex like original
)

NB. in place or chain, you can also use: lambda x: [v for l in x for x in l]

output:

               cat0   cat1 cat2               col_list  Quantity
0  alpha,alpha,beta  x,y,z    a  [a, b, c, d, e, f, g]         7
1       gamma,gamma    p,q    b        [h, I, j, k, l]         5
2          phi,eita    r,s    c           [r, s, m, n]         4

CodePudding user response:

You can use agg after pandas.groupby and use functools.reduce for merging mutiple list and ','.join for merging string with comma.

>>> from functools import reduce
>>> df.groupby('cat2').agg({'cat0': ','.join, 
                        'cat1': ','.join,
                        'col_list' : lambda x: reduce(lambda y,z : y z, x),
                        'Quantity' : 'sum'})


cat0          cat1    cat2    col_list         Quantity 
alpha,beta    x,y,z   a       [a,b,c,d,e,f,g]  7
gamma         p,q     b       [h,I,j,k,l]      5
phi,eita      r,s     c       [r,s,m,n]        4

Explanation: (how reduce merge multiple list?)

>>> reduce(lambda y,z : y z, [['a','b','c','d'], ['e'], ['f','g']])
# y <- (['a','b','c','d']   ['e'])
# y   ['f','g']
['a', 'b', 'c', 'd', 'e', 'f', 'g']
  • Related