I am doing Business Customer Segmentation. But when I run my code I am getting the error unsupported operand type(s) for -: 'str' and 'str'. And this error is located in this line of code:
# Aggregate data by each customer
customers = df_fix.groupby(['CustomerID']).agg({
'InvoiceDate': lambda x: str(snapshot_date - x.max()).days ,
'InvoiceNo': 'count',
'TotalSum': 'sum'})
Here is my entire program:
# Import The Libraries
# ! pip install xlrd
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Import The Dataset
df = pd.read_csv('path/data.csv',encoding='latin1')
df = df[df['CustomerID'].notna()]
# Create TotalSum colummn
df_fix["TotalSum"] = df_fix["Quantity"] * df_fix["UnitPrice"]
# Sample the dataset
df_fix = df.sample(10000, random_state = 42)
# Convert to show date only
from datetime import datetime
df_fix["InvoiceDate"] = pd.to_datetime(df_fix["InvoiceDate"], errors='coerce', utc=True).dt.strftime('%Y-%m-%d')
# Create date variable that records recency
import datetime
snapshot_date = max(df_fix.InvoiceDate) str(datetime.timedelta(days=1))
# Aggregate data by each customer
customers = df_fix.groupby(['CustomerID']).agg({
'InvoiceDate': lambda x: (snapshot_date - x.max()).days ,
'InvoiceNo': 'count',
'TotalSum': 'sum'})
Please assist me
CodePudding user response:
You should keep the datetime type when calculate
df_fix["InvoiceDate"] = pd.to_datetime(df_fix["InvoiceDate"], errors='coerce', utc=True)
# Create date variable that records recency
snapshot_date = max(df_fix.InvoiceDate) pd.Timedelta(days=1)
# Aggregate data by each customer
customers = df_fix.groupby(['CustomerID']).agg({
'InvoiceDate': lambda x: (snapshot_date - x.max()).days ,
'InvoiceNo': 'count',
'TotalSum': 'sum'})
CodePudding user response:
Your snapshot_date
is no longer a datetime object, after your converted it into a string with the following line:
snapshot_date = max(df_fix.InvoiceDate) str(datetime.timedelta(days=1))
You may check the output of your snapshot_date
with print(snapshot_date)
to figure out how you can convert it back to a datetime
object.