I have a dataframe dfsorted :
dfsorted = df.sort_values(["sku"], ascending=[True])
print(dfsorted.head())
id | sku | bill | qty_left |
---|---|---|---|
186 | 01-04 | 50469 | 0 |
16 | 01-20 | 50262 | 15 |
267 | 01-20 | 50460 | 1 |
18 | 01-20 | 50262 | 5 |
17 | 01-20 | 50262 | 5 |
How can I group / aggregate the dfsorted into this desired result:
sku | bill | qty_left |
---|---|---|
01-04 | 50469 | 0 |
01-20 | 50262, 50460 | 26 |
So :
- group the dataframe by 'sku'
- for each 'sku', concatenate the 'bill' values (these are already formatted as strings, I don't care if there are duplicates but unique values would be nice too)
- for each 'sku', sum the 'qty_left' values.
Thanks!
CodePudding user response:
Use agg
, where you can apply both custom (lambda
) functions as standard (such as sum
) functions:
df.groupby('sku').agg({'bill': lambda x: set(x), 'qty_left':'sum'})
set
makes sure they are unique values, using list
makes them just concatenated.
result:
bill qty_left
sku
01-04 {50469} 0
01-20 {50460, 50262} 26
If you want a string instead of a set for bill you can use:
df2.bill.apply(lambda s: ', '.join(list(map(str, s))))
Where df2
is the result of the groupby.agg
function above.
CodePudding user response:
Use GroupBy.agg
with lambda function for remove duplicates in original ordering:
df1 = (df.groupby('sku', as_index=False)
.agg({'bill': lambda x:','.join(dict.fromkeys(x)),
'qty_left':'sum'}))
print (df1)
sku bill qty_left
0 01-04 50469 0
1 01-20 50262,50460 26
If bfill
column are strings use:
df1 = (df.astype({'bill':str})
.groupby('sku', as_index=False)
.agg({'bill': lambda x:','.join(dict.fromkeys(x)),
'qty_left':'sum'}))
print (df1)
sku bill qty_left
0 01-04 50469 0
1 01-20 50262,50460 26