Home > OS >  Splitting a pandas dataframe by Dates
Splitting a pandas dataframe by Dates

Time:03-09

I would like to create a pandas datasheet that gets the dictionary a below and adds days_split amount of days from the initial date and creates a table. So for the dictionary below since the first date value is 2/4/2022 1:33:40 PM I would like to add another 10 days into it which would make the range of the first table to be between 2/4/2022 1:33:40 PM and 2/14/2022 1:33:40 PM. Since the last date value in the dictionary is 2/16/2022 9:43:30 AM it will add another days_split amount of days to 2/14/2022 1:33:40 PM so it will create the second tables ranges from 2/14/2022 1:33:40 PM and 2/24/2022 1:33:40 PM. All of the dictionary values could be housed within the 2 tables so there are no more tables that needs to be create. How would I be able to add this function to the code below and achieve the expected Output below?

import pandas as pd 
from datetime import timedelta

a = {'Date': {0: '2/4/2022 1:33:40 PM', 1: '2/7/2022 3:09:46 PM', 2: '2/11/2022 9:35:44 AM', 3: '2/14/2022 2:55:33 PM',
 4: '2/14/2022 2:57:06 PM', 5: '2/14/2022 2:58:12 PM', 6: '2/16/2022 9:30:44 AM', 7: '2/16/2022 9:32:56 AM', 8: '2/16/2022 9:32:59 AM',
 9: '2/16/2022 9:43:30 AM'}, 
'TransactionType': {0: 'Buy       ', 1: 'Buy       ', 2: 'Buy       ', 3: 'Sell      ', 4: 'Sell      ',
 5: 'Sell      ', 6: 'Buy       ', 7: 'Buy       ', 8: 'Buy       ', 9: 'Buy       '}, 
'Symbol': {0: 'META', 1: 'BABA', 2: 'GOOS', 3: 'BABA', 4: 'GOOS', 5: 'META', 6: 'DIS', 7: 'MAR', 
8: 'ABNB', 9: 'EXPE'}, 
'QTY': {0: 185, 1: 21, 2: 75, 3: -21, 4: -75, 5: -185, 6: 10, 7: 9, 8: 10, 9: 9}, 
'Price': {0: 12.79, 1: 116.16, 2: 28.8, 3: 121.82, 4: 28.06, 5: 12.44, 6: 154.55, 
7: 178.75, 8: 179.61, 9: 211.97}, 
'Amount': {0: -2366.15, 1: -2439.36, 2: -2160.0, 3: 2558.22, 4: 2104.5, 5: 2301.4, 6: -1545.5, 7: -1608.75,
 8: -1796.1, 9: -1907.73}}

days_split = 10
df = pd.DataFrame(a)

Expected Output:

Table 1 between 2/4/2022 1:33:40 PM and 2/14/2022 1:33:40 PM

Table 2 between 2/14/2022 1:33:40 PM and 2/24/2022 1:33:40 PM

enter image description here

CodePudding user response:

Use pandas.Grouper

df.Date = pd.to_datetime(df.Date)
df = df.set_index("Date")
groups = df.groupby(pd.Grouper(freq="10D"))
for x in groups:
    print(x[1].reset_index())

CodePudding user response:

I would use pd.Grouper, append each dataframe to a list:

l_dfs = []
df['Date'] = pd.to_datetime(df['Date'])
for _, g in df.groupby(pd.Grouper(key='Date', 
                                  freq='10D', 
                                  offset=df['Date'].min().strftime('%H:%M:%S'))):
    l_dfs.append(g)
    
l_dfs

Ouput:

[                 Date TransactionType Symbol  QTY   Price   Amount
 0 2022-02-04 13:33:40      Buy          META  185   12.79 -2366.15
 1 2022-02-07 15:09:46      Buy          BABA   21  116.16 -2439.36
 2 2022-02-11 09:35:44      Buy          GOOS   75   28.80 -2160.00,
                  Date TransactionType Symbol  QTY   Price   Amount
 3 2022-02-14 14:55:33      Sell         BABA  -21  121.82  2558.22
 4 2022-02-14 14:57:06      Sell         GOOS  -75   28.06  2104.50
 5 2022-02-14 14:58:12      Sell         META -185   12.44  2301.40
 6 2022-02-16 09:30:44      Buy           DIS   10  154.55 -1545.50
 7 2022-02-16 09:32:56      Buy           MAR    9  178.75 -1608.75
 8 2022-02-16 09:32:59      Buy          ABNB   10  179.61 -1796.10
 9 2022-02-16 09:43:30      Buy          EXPE    9  211.97 -1907.73]

  • Related