I have a dataframe named data as shown:
Date | Value | X1 | X2 | X3 |
---|---|---|---|---|
2019-05 | 15 | 23 | 65 | 98 |
2019-05 | 34 | 132 | 56 | 87 |
2019-06 | 23 | 66 | 90 | 44 |
The date column is in a datetime format of Year-Month starting from 2017-01 and the most recent 2022-05. I want to write a piece that will extract data into separate data frames. More specifically I want one data frame to contain the rows of the current month and year (2022-05), another dataframe to contain to data from the previous month (2022-04), and one more dataframe that contains data from 12 months ago (2021-05).
For my code I have the following:
import pandas as pd
from datetime import datetime as dt
data = pd.read_csv("results.csv")
current = data[data["Date"].dt.month == dt.now().month]
My results show the following:
Date | Value | X1 | X2 | X3 |
---|---|---|---|---|
2019-05 | 15 | 23 | 65 | 98 |
2019-05 | 34 | 132 | 56 | 87 |
2020-05 | 23 | 66 | 90 | 44 |
So I get the rows that match the current month but I need it to match the current year I assumed I could just add multiple conditions to match current month and current year but that did not seem to work for me. Also is there a way to write the code in such a way where I can extract the data from the previous month and the previous year based on what the current month-year is? My first thought was to just take the month and subtract 1 and do the same thing for the year and if the current year is in January I would just write an exception to subtract 1 from both the month and year for the previous month analysis.
CodePudding user response:
Split your DF into a dict of DFs and then access the one you want directly by the date (YYYY-MM).
index | Date | Value | X1 | X2 | X3 |
---|---|---|---|---|---|
0 | 2017-04 | 15 | 23 | 65 | 98 |
1 | 2019-05 | 34 | 132 | 56 | 87 |
2 | 2021-06 | 23 | 66 | 90 | 44 |
dfs = {x:df[df.Date == x ] for x in df.Date.unique()}
dfs['2017-04']
index | Date | Value | X1 | X2 | X3 |
---|---|---|---|---|---|
0 | 2017-04 | 15 | 23 | 65 | 98 |
CodePudding user response:
You can do this with a groupby
operation, which is a first-class kind of thing in tabular analysis (sql/pandas). In this case, you want to group by both year and month, creating dataframes:
dfs = []
for key, group_df in df.groupby([df.Date.dt.year, df.Date.dt.month]):
dfs.append(group_df)
dfs
will have the subgroups you want.
One thing: it's worth noting that there is a performance cost breaking dataframes into list items. Its just as likely that you could do whatever processing comes next directly in the groupby statement, such as df.groupby(...).X1.transform(sum)
for example.