I have a DataFrame like this:
| cat0 | cat1 | cat2 | col_list | Quantity |
| ----- | ---- | ---- | ----------| -------- |
| alpha | x | a | [a,b,c,d] | 4 |
| alpha | y | a | [e] | 1 |
| beta | z | a | [f,g] | 2 |
| gamma | p | b | [h] | 1 |
| gamma | q | b | [I,j,k,l] | 4 |
| phi | r | c | [r,s] | 2 |
| eita | s | c | [m,n] | 2 |
I want it to transform it based on cat2 column. If they have same value then combine those rows into 1 and merge the rows like this:
| cat0 | cat1 | cat2 | col_list | Quantity |
| ---------- | ----- | ---- | ----------------| -------- |
| alpha,beta | x,y,z | a | [a,b,c,d,e,f,g] | 7 |
| gamma | p,q | b | [h,I,j,k,l] | 5 |
| phi,eita | r,s | c | [r,s,m,n] | 4 |
If column have strings values (column: cat0,cat1) join and separate them by a comma. If it has list values (column" col_list) then add those elements into a single list. If it is a int value (column: Quantity), then just add them.
CodePudding user response:
You can use groupby.agg
, with list
/','.join
, sum
, or itertools.chain
:
from itertools import chain
(df
.groupby('cat2', as_index=False)
.agg({'cat0': ','.join, 'cat1': ','.join,
'col_list': lambda x: list(chain.from_iterable(x)),
'Quantity': 'sum'
})
[df.columns] # reindex like original
)
NB. in place or chain
, you can also use: lambda x: [v for l in x for x in l]
output:
cat0 cat1 cat2 col_list Quantity
0 alpha,alpha,beta x,y,z a [a, b, c, d, e, f, g] 7
1 gamma,gamma p,q b [h, I, j, k, l] 5
2 phi,eita r,s c [r, s, m, n] 4
CodePudding user response:
You can use agg
after pandas.groupby
and use functools.reduce
for merging mutiple list and ','.join
for merging string with comma.
>>> from functools import reduce
>>> df.groupby('cat2').agg({'cat0': ','.join,
'cat1': ','.join,
'col_list' : lambda x: reduce(lambda y,z : y z, x),
'Quantity' : 'sum'})
cat0 cat1 cat2 col_list Quantity
alpha,beta x,y,z a [a,b,c,d,e,f,g] 7
gamma p,q b [h,I,j,k,l] 5
phi,eita r,s c [r,s,m,n] 4
Explanation: (how reduce merge multiple list?)
>>> reduce(lambda y,z : y z, [['a','b','c','d'], ['e'], ['f','g']])
# y <- (['a','b','c','d'] ['e'])
# y ['f','g']
['a', 'b', 'c', 'd', 'e', 'f', 'g']