Home > database >  Value error where a 1D array was expected but got an array with shape (18632, 3)
Value error where a 1D array was expected but got an array with shape (18632, 3)

Time:07-03

I am currently learning segmentation and was using this project enter image description here

Here is the cell that is causing the problem:

# sum of purchases / user & order
temp = df_cleaned.groupby(by=['CustomerID', 'InvoiceNo'], as_index=False) 
['TotalPrice'].sum()
basket_price = temp.rename(columns = {'TotalPrice':'Basket Price'})

# percentage of the price of the order / product category
for i in range(5):
col = 'categ_{}'.format(i) 
temp = df_cleaned.groupby(by=['CustomerID', 'InvoiceNo'], as_index=False)[col].sum()
basket_price.loc[:, col] = temp 

# date of the order

df_cleaned['InvoiceDate_int'] = df_cleaned['InvoiceDate'].astype('int64')
temp = df_cleaned.groupby(by=['CustomerID', 'InvoiceNo'], as_index=False) 
['InvoiceDate_int'].mean()
df_cleaned.drop('InvoiceDate_int', axis = 1, inplace = True)
basket_price.loc[:, 'InvoiceDate'] = pd.to_datetime(temp['InvoiceDate_int'])

# selection of significant entries:
basket_price = basket_price[basket_price['Basket Price'] > 0]
basket_price.sort_values('CustomerID', ascending = True)[:5]

The reason this confuses me is I am not sure what array is causing this problem and google has not helped me. I am absolutely new to this so any and all help is appreciated.

CodePudding user response:

I think the problem is here:

for i in range(5):
col = 'categ_{}'.format(i) 
temp = df_cleaned.groupby(by=['CustomerID', 'InvoiceNo'], as_index=False)[col].sum()
basket_price.loc[:, col] = temp 

Here temp supposed to be a three column dataframe & here

basket_price.loc[:, col] = temp 

You are assigning a three column dataframe to a column, which doesnt make sense.

  • Related