Home > Enterprise >  Combine multiple related products into one in pandas dataframe
Combine multiple related products into one in pandas dataframe

Time:02-22

I have a dataframe with the following sample data

Product quantity sold 
a   30
at  20
am  10
b   5
bn  7
bt  90
c   76
c1  67
ct  54
m   12
t   87
n   12

I want to group the products that start with a into a new product name Art, those that start with b under name Brt and those that start with c into Crt and leave products m, t and n in the same dataframe into something below:

Product quantity sold 
Art 60
Brt 102
Crt 197
m   12
t   87
n   12  

CodePudding user response:

Since you have complex conditions, might be easy enough to just rename the ones you want.

import pandas as pd
df = pd.DataFrame({'Product': ['a', 'at', 'am', 'b', 'bn', 'bt', 'c', 'c1', 'ct', 'm', 't', 'n'],
 'quantity sold ': [30, 20, 10, 5, 7, 90, 76, 67, 54, 12, 87, 12]})

df.loc[df['Product'].str.startswith('a'), 'Product'] = 'Art'
df.loc[df['Product'].str.startswith('b'), 'Product'] = 'Brt'
df.loc[df['Product'].str.startswith('c'), 'Product'] = 'Crt'

df.groupby('Product', as_index=False).sum()

Output

  Product  quantity sold 
0     Art              60
1     Brt             102
2     Crt             197
3       m              12
4       n              12
5       t              87

CodePudding user response:

You can do it using str.map and dictionary:

grp = df['Product'].str[0].map({'a':'Art', 'b':'BRT', 'c':'CRT'}).fillna(df['Product'])

df.groupby(grp)['quantity sold'].sum()

Output:

Product
Art     60
BRT    102
CRT    197
m       12
n       12
t       87
Name: quantity sold, dtype: int64

Here, we are using a shortcut for .str.get, str[0] indexing the first character from the string, then using map to create desired groups, and those values not in map are filled with the original values from df['Product']. Lastly, we groupby the newly created group.

  • Related