I have a data frame of a grocery store record:
df = pd.DataFrame(np.array([['Tom', 'apple1'], ['Tom', 'banana35'], ['Jeff', 'pear0']]),
columns=['customer', 'product'])
| customer | product | | -------- | --------| | Tom| apple1| | Tom| banana35| |Jeff| pear0| I want to get all the products that a customer ever bought, so I used
product_by_customer = df.groupby('customer')['product'].unique()
product_by_customer
customer | |
---|---|
Jeff | [pear0] |
Tom | [apple1, banana35] |
I want to get rid of the numbers after the product name. I tried
product_by_customer.str.replace('[0-9]', '')
but it replaced everything by NaN.
My desired output is |customer|| |--------|--------| |Jeff|pear| |Tom|apple, banana|
Any help is appreciated!
CodePudding user response:
You can first replace and then aggregate:
product_by_customer=df["product"].str.replace('[0-9]', '').groupby(df['customer']).unique()
print (product_by_customer)
customer
Jeff [pear]
Tom [apple, banana]
Name: product, dtype: object
Or aggregate with remove numeric:
import re
f = lambda x: [re.sub("[0-9]","", v ) for v in x.unique()]
product_by_customer = df.groupby('customer')['product'].agg(f)
print (product_by_customer)
customer
Jeff [pear]
Tom [apple, banana]
Name: product, dtype: object
Similar idea is remove possible duplicates by dict.fromkeys
trick:
f = lambda x: list(dict.fromkeys(x.str.replace('[0-9]', '', regex=True)))
product_by_customer = df.groupby('customer')['product'].agg(f)
print (product_by_customer)
customer
Jeff [pear]
Tom [apple, banana]
Name: product, dtype: object
CodePudding user response:
The values in the product column are in type nd array. Hence the replacement isnt taking place. Try the following code.
import re
df = pd.DataFrame(np.array([['Tom', 'apple1'], ['Tom', 'banana35'], ['Jeff', 'pear0']]),
columns=['customer', 'product'])
df1 = df.groupby(["customer"])["product"].unique().reset_index()
df1["product"] = df1["product"].apply(lambda x: [re.sub("\d","", v ) for v in x])
df1
Out[52]:
customer product
0 Jeff [pear]
1 Tom [apple, banana]
What we are doing is using the lambda function we will access each of the array value and then replace the digits.
CodePudding user response:
df = pd.DataFrame(np.array([['Tom', 'apple1'], ['Tom', 'banana35'], ['Jeff', 'pear0']]),
columns=['customer', 'product'])
df1 = df.copy()
df1["product"] = df1["product"].str.replace('[0-9]', '')
product_by_customer = df1.groupby('customer')['product'].unique()
product_by_customer
out :
customer
Jeff [pear]
Tom [apple, banana]
Name: product, dtype: object
make copy df and how about change before groupby?