Home > Back-end >  Using Pandas and datetime to see what hours don't have sales
Using Pandas and datetime to see what hours don't have sales

Time:09-22

So I have sales data that I'm trying to analyze. I have datetime data ["Order Date Time"] and I'd like to see the most common hours for sales but more importantly I'd like to see what minutes have NO sales.

I have been spinning my wheels for a while and I can't get my brain around a solution. Any help is greatly appreciated.

CodePudding user response:

say you have sales at the daily grain. first, load your sales. 2nd, load every calendar day you are interested in studying (lets take a few days in january). You need to left join based on your "calendar" into your sales data, then fill your na values with 0 to represent a lack of sales that day.

import pandas as pd
salesdata=[
    ['2020-01-01',    0.25],
    ['2020-01-02',    1.40],
    ['2020-01-04',    2.34],
    ['2020-01-07',    1.54],
    ['2020-01-08',    1.47]
]
cols = ['date','sales']
nosales=[
    ['2020-01-01',    0],
    ['2020-01-02',    0],
    ['2020-01-03',    0],
    ['2020-01-04',    0],
    ['2020-01-05',    0],
    ['2020-01-06',    0],
    ['2020-01-07',    0],
    ['2020-01-08',    0],
    ['2020-01-09',    0],
    ['2020-01-10',    0]
]
df1 = pd.DataFrame(salesdata, columns=cols)
df2 = pd.DataFrame(nosales, columns=cols)
df3=pd.merge(df2,df1,on='date',how='left')
df3['sales_y']=df3['sales_y'].fillna(0)
print(df3)  


         date  sales_x  sales_y
0  2020-01-01        0     0.25
1  2020-01-02        0     1.40
2  2020-01-03        0     0.00
3  2020-01-04        0     2.34
4  2020-01-05        0     0.00
5  2020-01-06        0     0.00
6  2020-01-07        0     1.54
7  2020-01-08        0     1.47
8  2020-01-09        0     0.00
9  2020-01-10        0     0.00

CodePudding user response:

Say you have a DataFrame with random sale data

df = pd.DataFrame(data=zip(pd.date_range(start='2021-01-01 00:00:00', periods=100, freq='1min'), np.random.randint(1, 100, 100)), columns=['SaleDate', 'Qty']).sample(20)

Step 1 - generate another df with the time frame you need. this will go from min.SalesDate to max.SaleDate

dft = pd.DataFrame(pd.date_range(start=df.SaleDate.min(), end=df.SaleDate.max(), freq='1min'), columns=['SaleDate'])

Step 2 - left join df to dft.

dfm = dft.merge(df, on='SaleDate', how='left')

Step 3 - Now you can fillna with 0.

dfm = dfm.fillna(0)

and to get the rows where you have no sales:

no_sales = dfm.loc[dfm.Qty == 0]
  • Related