Pandas month sequece are not getting-CodePudding

my sql Table

     SDATETIME   FE014BPV    FE011BPV
0   2022-05-28   5.770000   13.735000
1   2022-05-30  16.469999   42.263000
2   2022-05-31  56.480000  133.871994
3   2022-06-01  49.779999  133.561996
4   2022-06-02  45.450001  132.679001
..         ...        ...         ...
93  2022-09-08   0.000000    0.050000
94  2022-09-09   0.000000    0.058000
95  2022-09-10   0.000000    0.051000
96  2022-09-11   0.000000    0.050000
97  2022-09-12   0.000000    0.038000

My code:

import pandas as pd
import pyodbc

monthSQL = pd.read_sql_query('SELECT SDATETIME,max(FE014BPV) as flow,max(FE011BPV) as steam  FROM [SCADA].[dbo].[TOTALIZER] GROUP BY SDATETIME ORDER BY SDATETIME ASC', conn)

monthdata = monthSQL.groupby(monthSQL['SDATETIME'].dt.strftime("%b-%Y"), sort=True).sum()
print(monthdata)

Produces this incorrect output

                  flow        steam
SDATETIME                          
Aug-2022   1800.970001  2580.276996
Jul-2022   1994.300014  2710.619986
Jun-2022   3682.329998  7633.660018
May-2022   1215.950003  3098.273025
Sep-2022      0.000000     1.705000

I want output some thing like below

SDATETIME         flow        steam
May-2022   1215.950003  3098.273025
Jun-2022   3682.329998  7633.660018
Jul-2022   1994.300014  2710.619986
Aug-2022   1800.970001  2580.276996
Sep-2022      0.000000     1.705000

Also, need a sum of last 12 month data

CodePudding user response：

You are sorting the date names in alphabetical order - you need to specify which column to sort. You can see that because it goes (the starting letters of the dates):

SDATETIME                          
Aug-2022 # A goes before J, M, S in the alphabet
Jul-2022 # J goes after A, but before M and S in the alphabet
Jun-2022 # J goes after A, but before M and S in the alphabet
May-2022 # M goes after A, J but before S in the alphabet
Sep-2022 # S goes after A, J, M in the alphabet

CodePudding user response：

The output is correct, just not in the order you expect. Try this:

# This keep the SDATETIME as datetime, not string
monthdata = monthSQL.groupby([pd.Grouper(key="SDATETIME", freq="MS")]).sum()

# Add rolling sum of the last 12 months
pd.concat(
    [
        monthdata,
        monthdata.add_suffix("_LAST12").rolling("366D").sum(),
    ],
    axis=1,
)

About the rolling(...) operation: it's easy to think that rolling(12) gives you the rolling sum of the last 12 months. Instead it returns the rolling sum of the last 12 rows. This is important, because if there are gaps in your data, 12 rows may cover more than 12 months. rolling("366D") makes sure that it only count rows within the last 366 days, which is the maximum length of any 12-month period.

We can't use rolling("12M") because months do not have fixed durations. There are between 28 to 31 days in a month.