Home > Back-end >  how to trim dataframe according to datetime?
how to trim dataframe according to datetime?

Time:11-01

I have multiple dataframes with column = ['Date', 'HE', 'Prices']. These dataframes have a different start date and but common end date. I want to trim my dataframe acc to max start date among all dataframe.

import pandas as pd
import numpy as np
dateRange1 = pd.date_range('01/01/2020 04:00', '12/31/2020 23:00', freq='1H')
dateRange2 = pd.date_range('01/09/2020 08:00', '12/31/2020 23:00', freq='1H')
dateRange3 = pd.date_range('01/15/2020 11:00', '12/31/2020 23:00', freq='1H')

dic1 = {
    'df1': {
        'Date': dateRange1.date,
        'HE': dateRange1.hour,
        'Data': np.random.randint(1,100,len(dateRange1))
    },
'df2': {
        'Date': dateRange2.date,
        'HE': dateRange2.hour,
        'Data': np.random.randint(1,100,len(dateRange2))
    },
'df3': {
        'Date': dateRange3.date,
        'HE': dateRange3.hour,
        'Data': np.random.randint(1,100,len(dateRange3))
    }
}

dfList = {}
for i in dic1.keys():
    dfList[i] = pd.DataFrame(dic1[i])
startDate = max([dic1[i]['Date'][0] for i in dic1.keys()]) #this will give us the max 
                                                           start date which is '01/15/2020 11:00'

finalDateRange = pd.date_range(startDate   ' 00:00', '31/12/2020 23:00', freq='1H')
newDFList = {}
for i in dfList.keys():
    newDF = dfList[i].iloc[len(finalDateRange):, :] #Here I am trying to trim that dataframe 
                                                     but the results are not as expected
    print(newDF)

Output:

Date  HE  Data
8424  2020-12-17   0    36
8425  2020-12-17   1    81
...          ...  ..   ...

8782  2020-12-31  22    66
8783  2020-12-31  23    87

[360 rows x 3 columns]
            Date  HE  Data
8424  2020-12-27   0    30
8425  2020-12-27   1    84
...          ...  ..   ...

8542  2020-12-31  22    17
8543  2020-12-31  23    21

[120 rows x 3 columns]
            Date  HE  Data
8424  2020-12-31   0    76
8425  2020-12-31   1    38
...        ...    ..    ..
8446  2020-12-31  22    75
8447  2020-12-31  23    29

The expected output should look like this, new start date should be the next day of the max start date ('01/15/2020 11:00'):

Date  HE  Data
0     2020-01-16   0    61
1     2020-01-16   1    47
...          ...  ..   ...

8422  2020-12-31  22    22
8423  2020-12-31  23    80

[8424 rows x 3 columns]
            Date  HE  Data
0     2020-01-16   0    43
1     2020-01-16   1    22

...          ...  ..   ...

8422  2020-12-31  22    71
8423  2020-12-31  23    63

[8424 rows x 3 columns]
            Date  HE  Data
0     2020-01-16   0    92
1     2020-01-16   1    46

...          ...  ..   ...

8422  2020-12-31  22    77
8423  2020-12-31  23    46

[8424 rows x 3 columns]

CodePudding user response:

I think you want:

newDFList = {}
# use `items()` iterator for both keys and values
for i,d in dfList.items():
    # notice the minus sign
    newDFList[i] = d.iloc[-len(finalDateRange):] 

Or you can (better) use boolean indexing:

newDFList = {}
for i,d in dfList.items():
    # only keep dates >= startDate
    newDFList[i] = d.loc[d['Date'] > startDate]

Bonus Use on-fly dictionary iteration:

newDFList = {i: d.loc[d['Date'] > startDate]
    for i, d in dfList.items()
}

Then, for d in newDFList.values(): print(d) gives you:

            Date  HE  Data
356   2020-01-16   0    90
357   2020-01-16   1    91
358   2020-01-16   2    75
359   2020-01-16   3    68
360   2020-01-16   4    74
...          ...  ..   ...
8775  2020-12-31  19    67
8776  2020-12-31  20    83
8777  2020-12-31  21    55
8778  2020-12-31  22    70
8779  2020-12-31  23    61

[8424 rows x 3 columns]
            Date  HE  Data
160   2020-01-16   0    59
161   2020-01-16   1    40
162   2020-01-16   2    98
163   2020-01-16   3    67
164   2020-01-16   4    78
...          ...  ..   ...
8579  2020-12-31  19    76
8580  2020-12-31  20    11
8581  2020-12-31  21    88
8582  2020-12-31  22    79
8583  2020-12-31  23    82

[8424 rows x 3 columns]
            Date  HE  Data
13    2020-01-16   0    52
14    2020-01-16   1    69
15    2020-01-16   2    44
16    2020-01-16   3    75
17    2020-01-16   4    67
...          ...  ..   ...
8432  2020-12-31  19    35
8433  2020-12-31  20    39
8434  2020-12-31  21    56
8435  2020-12-31  22    83
8436  2020-12-31  23    60

[8424 rows x 3 columns]
  • Related