my sql Table
SDATETIME FE014BPV FE011BPV
0 2022-05-28 5.770000 13.735000
1 2022-05-30 16.469999 42.263000
2 2022-05-31 56.480000 133.871994
3 2022-06-01 49.779999 133.561996
4 2022-06-02 45.450001 132.679001
.. ... ... ...
93 2022-09-08 0.000000 0.050000
94 2022-09-09 0.000000 0.058000
95 2022-09-10 0.000000 0.051000
96 2022-09-11 0.000000 0.050000
97 2022-09-12 0.000000 0.038000
My code:
import pandas as pd
import pyodbc
monthSQL = pd.read_sql_query('SELECT SDATETIME,max(FE014BPV) as flow,max(FE011BPV) as steam FROM [SCADA].[dbo].[TOTALIZER] GROUP BY SDATETIME ORDER BY SDATETIME ASC', conn)
monthdata = monthSQL.groupby(monthSQL['SDATETIME'].dt.strftime("%b-%Y"), sort=True).sum()
print(monthdata)
Produces this incorrect output
flow steam
SDATETIME
Aug-2022 1800.970001 2580.276996
Jul-2022 1994.300014 2710.619986
Jun-2022 3682.329998 7633.660018
May-2022 1215.950003 3098.273025
Sep-2022 0.000000 1.705000
I want output some thing like below
SDATETIME flow steam
May-2022 1215.950003 3098.273025
Jun-2022 3682.329998 7633.660018
Jul-2022 1994.300014 2710.619986
Aug-2022 1800.970001 2580.276996
Sep-2022 0.000000 1.705000
Also, need a sum of last 12 month data
CodePudding user response:
You are sorting the date names in alphabetical order - you need to specify which column to sort. You can see that because it goes (the starting letters of the dates):
SDATETIME
Aug-2022 # A goes before J, M, S in the alphabet
Jul-2022 # J goes after A, but before M and S in the alphabet
Jun-2022 # J goes after A, but before M and S in the alphabet
May-2022 # M goes after A, J but before S in the alphabet
Sep-2022 # S goes after A, J, M in the alphabet
CodePudding user response:
The output is correct, just not in the order you expect. Try this:
# This keep the SDATETIME as datetime, not string
monthdata = monthSQL.groupby([pd.Grouper(key="SDATETIME", freq="MS")]).sum()
# Add rolling sum of the last 12 months
pd.concat(
[
monthdata,
monthdata.add_suffix("_LAST12").rolling("366D").sum(),
],
axis=1,
)
About the rolling(...)
operation: it's easy to think that rolling(12)
gives you the rolling sum of the last 12 months. Instead it returns the rolling sum of the last 12 rows. This is important, because if there are gaps in your data, 12 rows may cover more than 12 months. rolling("366D")
makes sure that it only count rows within the last 366 days, which is the maximum length of any 12-month period.
We can't use rolling("12M")
because months do not have fixed durations. There are between 28 to 31 days in a month.