I am currently learning segmentation and was using this project
Here is the cell that is causing the problem:
# sum of purchases / user & order
temp = df_cleaned.groupby(by=['CustomerID', 'InvoiceNo'], as_index=False)
['TotalPrice'].sum()
basket_price = temp.rename(columns = {'TotalPrice':'Basket Price'})
# percentage of the price of the order / product category
for i in range(5):
col = 'categ_{}'.format(i)
temp = df_cleaned.groupby(by=['CustomerID', 'InvoiceNo'], as_index=False)[col].sum()
basket_price.loc[:, col] = temp
# date of the order
df_cleaned['InvoiceDate_int'] = df_cleaned['InvoiceDate'].astype('int64')
temp = df_cleaned.groupby(by=['CustomerID', 'InvoiceNo'], as_index=False)
['InvoiceDate_int'].mean()
df_cleaned.drop('InvoiceDate_int', axis = 1, inplace = True)
basket_price.loc[:, 'InvoiceDate'] = pd.to_datetime(temp['InvoiceDate_int'])
# selection of significant entries:
basket_price = basket_price[basket_price['Basket Price'] > 0]
basket_price.sort_values('CustomerID', ascending = True)[:5]
The reason this confuses me is I am not sure what array is causing this problem and google has not helped me. I am absolutely new to this so any and all help is appreciated.
CodePudding user response:
I think the problem is here:
for i in range(5):
col = 'categ_{}'.format(i)
temp = df_cleaned.groupby(by=['CustomerID', 'InvoiceNo'], as_index=False)[col].sum()
basket_price.loc[:, col] = temp
Here temp supposed to be a three column dataframe & here
basket_price.loc[:, col] = temp
You are assigning a three column dataframe to a column, which doesnt make sense.