Segmenting a dataframe based on date with datetime column. Python-CodePudding

I have a dataframe named data as shown:

Date	Value	X1	X2	X3
2019-05	15	23	65	98
2019-05	34	132	56	87
2019-06	23	66	90	44

The date column is in a datetime format of Year-Month starting from 2017-01 and the most recent 2022-05. I want to write a piece that will extract data into separate data frames. More specifically I want one data frame to contain the rows of the current month and year (2022-05), another dataframe to contain to data from the previous month (2022-04), and one more dataframe that contains data from 12 months ago (2021-05).

For my code I have the following:

import pandas as pd
from datetime import datetime as dt

data = pd.read_csv("results.csv")
current = data[data["Date"].dt.month == dt.now().month]

My results show the following:

Date	Value	X1	X2	X3
2019-05	15	23	65	98
2019-05	34	132	56	87
2020-05	23	66	90	44

So I get the rows that match the current month but I need it to match the current year I assumed I could just add multiple conditions to match current month and current year but that did not seem to work for me. Also is there a way to write the code in such a way where I can extract the data from the previous month and the previous year based on what the current month-year is? My first thought was to just take the month and subtract 1 and do the same thing for the year and if the current year is in January I would just write an exception to subtract 1 from both the month and year for the previous month analysis.

CodePudding user response：

Split your DF into a dict of DFs and then access the one you want directly by the date (YYYY-MM).

index	Date	Value	X1	X2	X3
0	2017-04	15	23	65	98
1	2019-05	34	132	56	87
2	2021-06	23	66	90	44

dfs = {x:df[df.Date == x ] for x in df.Date.unique()}
dfs['2017-04']

index	Date	Value	X1	X2	X3
0	2017-04	15	23	65	98

CodePudding user response：

You can do this with a groupby operation, which is a first-class kind of thing in tabular analysis (sql/pandas). In this case, you want to group by both year and month, creating dataframes:

dfs = []
for key, group_df in df.groupby([df.Date.dt.year, df.Date.dt.month]):
    dfs.append(group_df)

dfs will have the subgroups you want.

One thing: it's worth noting that there is a performance cost breaking dataframes into list items. Its just as likely that you could do whatever processing comes next directly in the groupby statement, such as df.groupby(...).X1.transform(sum) for example.