Home > Mobile >  How Do I Find Out The Most Active Time Of The Day In Sales Data In Python?
How Do I Find Out The Most Active Time Of The Day In Sales Data In Python?

Time:06-15

I've the following sample dataset:

enter image description here

I need to find out the most active time of the day when sales are happening (by the hour or hour segments, any would do fine) from the column of 'Transaction Date'. I also need to find out the most active day (as in Monday, Tuesday, etc.), from the same column.

Now, my initial approach was to first rejig the Order_Details dataframe I'd created for the dataset to be more accomodating as:

Order_Details['Time'] = pd.to_datetime(Order_Details['Transaction Date'])
Order_Details['Hour'] = (Order_Details['Time']).dt.hour

Order_Details['Hour'].unique()

And then I decided to obtain a Series as:

timemost = Order_Details['Hour'].value_counts()

But it is here that I hit an obstacle. Whilst this timemost series does return the most frequent hour of the day by the count, it does so in a sorted order, and I need to have two separate lists -- one for the frequency of occurence and the other for the hour itself, where frequency is sorted by hour and not the other way around. And finally, I intend to plot it as line/bar chart, whatever may be applicable.

What should I do to improve this and get the desired result?

CodePudding user response:

First, it is more encouraging to help if the data is easier to be reproduced.

Order_Details dataframe:

Name                     Email                                     Product    Transaction Date
PERSON_1        [email protected]                                  PRODUCT_75 2021-03-01 00:47:00
PERSON_2 [email protected]                                  PRODUCT_75 2021-03-01 02:04:00
PERSON_3        [email protected]                                  PRODUCT_63 2021-03-01 09:10:00
PERSON_4        [email protected]                                  PRODUCT_63 2021-03-01 09:49:00
PERSON_5        [email protected] PRODUCT_34,PRODUCT_86,PRODUCT_57,PRODUCT_89 2021-03-01 10:56:00
PERSON_6        [email protected] PRODUCT_34,PRODUCT_66,PRODUCT_58,PRODUCT_83 2021-03-01 11:06:00
PERSON_7        [email protected]                                  PRODUCT_75 2021-03-01 12:02:00
PERSON_8        [email protected] PRODUCT_63,PRODUCT_90,PRODUCT_27,PRODUCT_5S 2021-03-01 12:31:00
PERSON_9      [email protected]                                  PRODUCT_26 2021-03-01 12:51:00
PERSON_10       [email protected]                                  PRODUCT_63 2021-03-01 12:52:00
PERSON_11       [email protected]                                  PRODUCT_63 2021-03-01 13:15:00
PERSON_12       [email protected]                        PRODUCT_5,PRODUCT_34 2021-03-01 13:37:00
PERSON_13       [email protected]                       PRODUCT_84,PRODUCT_27 2021-03-01 14:17:00
PERSON_14       [email protected]                                  PRODUCT_27 2021-03-01 14:27:00
PERSON_15       [email protected]                                  PRODUCT_25 2021-03-01 14:37:00
PERSON_16       [email protected]                                  PRODUCT_75 2021-03-01 14:56:00
PERSON_17       [email protected]                                  PRODUCT_84 2021-03-01 15:03:00
PERSON_18       [email protected]                                  PRODUCT_34 2021-03-01 15:08:00
PERSON 19       PERSON [email protected]                                  PRODUCT 85 2021-03-01 15:48:00

You can create the 'Hour' column straightforward:

Order_Details['Time'] = pd.to_datetime(Order_Details['Transaction Date']).dt.hour

I don't think that calling the 'unique' method will change anything

sorting by index:

timemost = Order_Details['Time'].value_counts().sort_index()

0     1
2     1
9     2
10    1
11    1
12    4
13    2
14    4
15    3

the line plot:

line = timemost.plot.line()

and finally, to show you have to import the 'maplotlib.pyplot' module:

import matplotlib.pyplot as plt

plt.show()

most frequent hour

  • Related