Timeseries Analysis: Group data by month so I can look at it per month-CodePudding

I was wondering how I could group data by month so I can look at the data on a per month basis. How would i do that?

for example, assign january for all data recorded in january in their own dataframe for analysis, etc.

here is my current dataframe:

   WC_Humidity[%]  WC_Htgsetp[C]  WC_Clgsetp[C]       Date   Time  
0       55.553640             18             26 2005-01-01  00:10  
1       54.204342             18             26 2005-01-01  00:20  
2       51.896272             18             26 2005-01-01  00:30  
3       49.007770             18             26 2005-01-01  00:40  
4       45.825810             18             26 2005-01-01  00:50

help is much appreciated.

CodePudding user response：

Try this:

df1['Date'].to_numpy().astype('datetime64[M]')

CodePudding user response：

You can convert your column with following code.

df1['Date'] = pd.to_datetime(df["Date"].dt.strftime('%d-%m-%Y'))

And you can refer official document for why dayfirst is not working. https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html

CodePudding user response：

If the column is in this format 2021-01-29, 30-12-2024 then before the above line should take care of it and parse it accordingly.

pd.to_datetime(df1['Date'])

Now you can use this code to convert the date column to the way you wanted.

df1['Date'] = df['Date1'].dt.strftime('%d/%m/%Y')

This should get you what you want.

CodePudding user response：

If you have strings like 2005-01-01 then you can get

df['year-month'] = df['Date'].str[:7]

and later you can use

df.groupby('year-month')

Minimal working code.

I changed dates to have different months in data.

I use io only to simulate file in memory.

text = '''WC_Humidity[%],WC_Htgsetp[C],WC_Clgsetp[C],Date,Time
55.553640,18,26,2005-01-01,00:10
54.204342,18,26,2005-01-01,00:20
51.896272,18,26,2005-02-01,00:30
49.007770,18,26,2005-02-01,00:40
45.825810,18,26,2005-03-01,00:50
'''

import pandas as pd
import io

df = pd.read_csv(io.StringIO(text))

df['year-month'] = df['Date'].str[:7]

print(df)

for value, group in df.groupby('year-month'):
    print()
    print('---', value, '---')
    print(group)
    print()
    print('average WC_Humidity[%]:', group['WC_Humidity[%]'].mean())

Result:

   WC_Humidity[%]  WC_Htgsetp[C]  WC_Clgsetp[C]       Date   Time year-month
0       55.553640             18             26 2005-01-01  00:10    2005-01
1       54.204342             18             26 2005-01-01  00:20    2005-01
2       51.896272             18             26 2005-02-01  00:30    2005-02
3       49.007770             18             26 2005-02-01  00:40    2005-02
4       45.825810             18             26 2005-03-01  00:50    2005-03

--- 2005-01 ---
   WC_Humidity[%]  WC_Htgsetp[C]  WC_Clgsetp[C]       Date   Time year-month
0       55.553640             18             26 2005-01-01  00:10    2005-01
1       54.204342             18             26 2005-01-01  00:20    2005-01

average WC_Humidity[%]: 54.878991

--- 2005-02 ---
   WC_Humidity[%]  WC_Htgsetp[C]  WC_Clgsetp[C]       Date   Time year-month
2       51.896272             18             26 2005-02-01  00:30    2005-02
3       49.007770             18             26 2005-02-01  00:40    2005-02

average WC_Humidity[%]: 50.452021

--- 2005-03 ---
   WC_Humidity[%]  WC_Htgsetp[C]  WC_Clgsetp[C]       Date   Time year-month
4        45.82581             18             26 2005-03-01  00:50    2005-03

average WC_Humidity[%]: 45.82581

If you have objects datetime then you can do

df['year-month'] = df['Date'].dt.strftime('%Y-%m')

and rest is the same

text = '''WC_Humidity[%],WC_Htgsetp[C],WC_Clgsetp[C],Date,Time
55.553640,18,26,2005-01-01,00:10
54.204342,18,26,2005-01-01,00:20
51.896272,18,26,2005-02-01,00:30
49.007770,18,26,2005-02-01,00:40
45.825810,18,26,2005-03-01,00:50
'''

import pandas as pd
import io

df = pd.read_csv(io.StringIO(text))

# create datetime objects
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')

df['year-month'] = df['Date'].dt.strftime('%Y-%m')

print(df)

for value, group in df.groupby('year-month'):
    print()
    print('---', value, '---')
    print(group)
    print()
    print('average WC_Humidity[%]:', group['WC_Humidity[%]'].mean())