Home > Blockchain >  Why am I getting unsupported operand type(s) for -: 'str' and 'str' error
Why am I getting unsupported operand type(s) for -: 'str' and 'str' error

Time:06-16

I am doing Business Customer Segmentation. But when I run my code I am getting the error unsupported operand type(s) for -: 'str' and 'str'. And this error is located in this line of code:

  # Aggregate data by each customer
    customers = df_fix.groupby(['CustomerID']).agg({
        'InvoiceDate': lambda x: str(snapshot_date - x.max()).days ,
        'InvoiceNo': 'count',
        'TotalSum': 'sum'})

Here is my entire program:

# Import The Libraries
# ! pip install xlrd
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Import The Dataset
df = pd.read_csv('path/data.csv',encoding='latin1')
df = df[df['CustomerID'].notna()]

# Create TotalSum colummn
df_fix["TotalSum"] = df_fix["Quantity"] * df_fix["UnitPrice"]
# Sample the dataset
df_fix = df.sample(10000, random_state = 42)

# Convert to show date only
from datetime import datetime
df_fix["InvoiceDate"] = pd.to_datetime(df_fix["InvoiceDate"], errors='coerce', utc=True).dt.strftime('%Y-%m-%d')

# Create date variable that records recency
import datetime
snapshot_date = max(df_fix.InvoiceDate) str(datetime.timedelta(days=1))

# Aggregate data by each customer
customers = df_fix.groupby(['CustomerID']).agg({
    'InvoiceDate': lambda x: (snapshot_date - x.max()).days ,
    'InvoiceNo': 'count',
    'TotalSum': 'sum'})

Please assist me

CodePudding user response:

You should keep the datetime type when calculate

df_fix["InvoiceDate"] = pd.to_datetime(df_fix["InvoiceDate"], errors='coerce', utc=True)

# Create date variable that records recency
snapshot_date = max(df_fix.InvoiceDate) pd.Timedelta(days=1)

# Aggregate data by each customer
customers = df_fix.groupby(['CustomerID']).agg({
    'InvoiceDate': lambda x: (snapshot_date - x.max()).days ,
    'InvoiceNo': 'count',
    'TotalSum': 'sum'})

CodePudding user response:

Your snapshot_date is no longer a datetime object, after your converted it into a string with the following line:

snapshot_date = max(df_fix.InvoiceDate) str(datetime.timedelta(days=1))

You may check the output of your snapshot_date with print(snapshot_date) to figure out how you can convert it back to a datetimeobject.

  • Related