Home > Back-end >  Segmenting a dataframe based on date with datetime column. Python
Segmenting a dataframe based on date with datetime column. Python

Time:05-27

I have a dataframe named data as shown:

Date Value X1 X2 X3
2019-05 15 23 65 98
2019-05 34 132 56 87
2019-06 23 66 90 44

The date column is in a datetime format of Year-Month starting from 2017-01 and the most recent 2022-05. I want to write a piece that will extract data into separate data frames. More specifically I want one data frame to contain the rows of the current month and year (2022-05), another dataframe to contain to data from the previous month (2022-04), and one more dataframe that contains data from 12 months ago (2021-05).

For my code I have the following:

import pandas as pd
from datetime import datetime as dt

data = pd.read_csv("results.csv")
current = data[data["Date"].dt.month == dt.now().month]

My results show the following:

Date Value X1 X2 X3
2019-05 15 23 65 98
2019-05 34 132 56 87
2020-05 23 66 90 44

So I get the rows that match the current month but I need it to match the current year I assumed I could just add multiple conditions to match current month and current year but that did not seem to work for me. Also is there a way to write the code in such a way where I can extract the data from the previous month and the previous year based on what the current month-year is? My first thought was to just take the month and subtract 1 and do the same thing for the year and if the current year is in January I would just write an exception to subtract 1 from both the month and year for the previous month analysis.

CodePudding user response:

Split your DF into a dict of DFs and then access the one you want directly by the date (YYYY-MM).

index Date Value X1 X2 X3
0 2017-04 15 23 65 98
1 2019-05 34 132 56 87
2 2021-06 23 66 90 44
dfs = {x:df[df.Date == x ] for x in df.Date.unique()}
dfs['2017-04']
index Date Value X1 X2 X3
0 2017-04 15 23 65 98

CodePudding user response:

You can do this with a groupby operation, which is a first-class kind of thing in tabular analysis (sql/pandas). In this case, you want to group by both year and month, creating dataframes:

dfs = []
for key, group_df in df.groupby([df.Date.dt.year, df.Date.dt.month]):
    dfs.append(group_df)

dfs will have the subgroups you want.

One thing: it's worth noting that there is a performance cost breaking dataframes into list items. Its just as likely that you could do whatever processing comes next directly in the groupby statement, such as df.groupby(...).X1.transform(sum) for example.

  • Related