So I have sales data that I'm trying to analyze. I have datetime data ["Order Date Time"] and I'd like to see the most common hours for sales but more importantly I'd like to see what minutes have NO sales.
I have been spinning my wheels for a while and I can't get my brain around a solution. Any help is greatly appreciated.
CodePudding user response:
say you have sales at the daily grain. first, load your sales. 2nd, load every calendar day you are interested in studying (lets take a few days in january). You need to left join based on your "calendar" into your sales data, then fill your na values with 0 to represent a lack of sales that day.
import pandas as pd
salesdata=[
['2020-01-01', 0.25],
['2020-01-02', 1.40],
['2020-01-04', 2.34],
['2020-01-07', 1.54],
['2020-01-08', 1.47]
]
cols = ['date','sales']
nosales=[
['2020-01-01', 0],
['2020-01-02', 0],
['2020-01-03', 0],
['2020-01-04', 0],
['2020-01-05', 0],
['2020-01-06', 0],
['2020-01-07', 0],
['2020-01-08', 0],
['2020-01-09', 0],
['2020-01-10', 0]
]
df1 = pd.DataFrame(salesdata, columns=cols)
df2 = pd.DataFrame(nosales, columns=cols)
df3=pd.merge(df2,df1,on='date',how='left')
df3['sales_y']=df3['sales_y'].fillna(0)
print(df3)
date sales_x sales_y
0 2020-01-01 0 0.25
1 2020-01-02 0 1.40
2 2020-01-03 0 0.00
3 2020-01-04 0 2.34
4 2020-01-05 0 0.00
5 2020-01-06 0 0.00
6 2020-01-07 0 1.54
7 2020-01-08 0 1.47
8 2020-01-09 0 0.00
9 2020-01-10 0 0.00
CodePudding user response:
Say you have a DataFrame with random sale data
df = pd.DataFrame(data=zip(pd.date_range(start='2021-01-01 00:00:00', periods=100, freq='1min'), np.random.randint(1, 100, 100)), columns=['SaleDate', 'Qty']).sample(20)
Step 1 - generate another df with the time frame you need. this will go from min.SalesDate to max.SaleDate
dft = pd.DataFrame(pd.date_range(start=df.SaleDate.min(), end=df.SaleDate.max(), freq='1min'), columns=['SaleDate'])
Step 2 - left join df to dft.
dfm = dft.merge(df, on='SaleDate', how='left')
Step 3 - Now you can fillna with 0.
dfm = dfm.fillna(0)
and to get the rows where you have no sales:
no_sales = dfm.loc[dfm.Qty == 0]