Home > OS >  Dividing a dataframe into several dataframes according to date column
Dividing a dataframe into several dataframes according to date column

Time:09-26

I have a dataframe which contains a specific column for the date which is called 'testdate'. And I have a period between two specific date, such as 20110501~20120731.

From a dataframe, I want to divide that dataframe into multiple dataframes according to the year-month of 'testdate'.

For example, if 'testdate' is between 20110501-20110531 then df1, if 'testdate' is between next month, then f2, ... and so on.

For example, a whole dataframe looks like this...
| StudentID | Testdate | Record |
| --------  | -------- | ------ |
| 1         | 20110528 |    50  |
| 2         | 20110601 |    75  |
| 3         | 20110504 |   100  |
| 4         | 20110719 |    82  |
| 5         | 20111120 |    42  |
| 6         | 20111103 |    95  |
| 7         | 20120520 |    42  |
| 8         | 20120503 |    95  |
But, I want to divide it like this...
[DF1]: name should be 201105
| StudentID | Testdate | Record |
| --------  | -------- | ------ |
| 1         | 20110528 |    50  |
| 3         | 20110504 |   100  |
[DF2]: name should be 201106
| StudentID | Testdate | Record |
| --------  | -------- | ------ |
| 2         | 20110601 |    75  |
[DF3]
| StudentID | Testdate | Record |
| --------  | -------- | ------ |
| 4         | 20110719 |    82  |
[DF4]
| StudentID | Testdate | Record |
| --------  | -------- | ------ |
| 5         | 20111120 |    42  |
| 6         | 20111103 |    95  |
[DF5]
| StudentID | Testdate | Record |
| --------  | -------- | ------ |
| 7         | 20120520 |    42  |
| 8         | 20120503 |    95  |

I found some codes for dividing a dataframe according to the quarter, but I could find any codes for my task.

How can I deal with it ? Many thanks to your help.

CodePudding user response:

Create a grouper by slicing yyyymm from testdate then group the dataframe and store each group inside a dict comprehension

s = df['Testdate'].astype(str).str[:6]
dfs = {f'df_{k}': g for k, g in df.groupby(s)}

# dfs['df_201105']
   StudentID  Testdate  Record
0          1  20110528      50
2          3  20110504     100

# dfs['df_201106']
   StudentID  Testdate  Record
1          2  20110601      75
  • Related