I have multiple dataframes with column = ['Date', 'HE', 'Prices']
. These dataframes have a different start date and but common end date. I want to trim my dataframe acc to max start date among all dataframe.
import pandas as pd
import numpy as np
dateRange1 = pd.date_range('01/01/2020 04:00', '12/31/2020 23:00', freq='1H')
dateRange2 = pd.date_range('01/09/2020 08:00', '12/31/2020 23:00', freq='1H')
dateRange3 = pd.date_range('01/15/2020 11:00', '12/31/2020 23:00', freq='1H')
dic1 = {
'df1': {
'Date': dateRange1.date,
'HE': dateRange1.hour,
'Data': np.random.randint(1,100,len(dateRange1))
},
'df2': {
'Date': dateRange2.date,
'HE': dateRange2.hour,
'Data': np.random.randint(1,100,len(dateRange2))
},
'df3': {
'Date': dateRange3.date,
'HE': dateRange3.hour,
'Data': np.random.randint(1,100,len(dateRange3))
}
}
dfList = {}
for i in dic1.keys():
dfList[i] = pd.DataFrame(dic1[i])
startDate = max([dic1[i]['Date'][0] for i in dic1.keys()]) #this will give us the max
start date which is '01/15/2020 11:00'
finalDateRange = pd.date_range(startDate ' 00:00', '31/12/2020 23:00', freq='1H')
newDFList = {}
for i in dfList.keys():
newDF = dfList[i].iloc[len(finalDateRange):, :] #Here I am trying to trim that dataframe
but the results are not as expected
print(newDF)
Output:
Date HE Data
8424 2020-12-17 0 36
8425 2020-12-17 1 81
... ... .. ...
8782 2020-12-31 22 66
8783 2020-12-31 23 87
[360 rows x 3 columns]
Date HE Data
8424 2020-12-27 0 30
8425 2020-12-27 1 84
... ... .. ...
8542 2020-12-31 22 17
8543 2020-12-31 23 21
[120 rows x 3 columns]
Date HE Data
8424 2020-12-31 0 76
8425 2020-12-31 1 38
... ... .. ..
8446 2020-12-31 22 75
8447 2020-12-31 23 29
The expected output should look like this, new start date should be the next day of the max start date ('01/15/2020 11:00'):
Date HE Data
0 2020-01-16 0 61
1 2020-01-16 1 47
... ... .. ...
8422 2020-12-31 22 22
8423 2020-12-31 23 80
[8424 rows x 3 columns]
Date HE Data
0 2020-01-16 0 43
1 2020-01-16 1 22
... ... .. ...
8422 2020-12-31 22 71
8423 2020-12-31 23 63
[8424 rows x 3 columns]
Date HE Data
0 2020-01-16 0 92
1 2020-01-16 1 46
... ... .. ...
8422 2020-12-31 22 77
8423 2020-12-31 23 46
[8424 rows x 3 columns]
CodePudding user response:
I think you want:
newDFList = {}
# use `items()` iterator for both keys and values
for i,d in dfList.items():
# notice the minus sign
newDFList[i] = d.iloc[-len(finalDateRange):]
Or you can (better) use boolean indexing:
newDFList = {}
for i,d in dfList.items():
# only keep dates >= startDate
newDFList[i] = d.loc[d['Date'] > startDate]
Bonus Use on-fly dictionary iteration:
newDFList = {i: d.loc[d['Date'] > startDate]
for i, d in dfList.items()
}
Then, for d in newDFList.values(): print(d)
gives you:
Date HE Data
356 2020-01-16 0 90
357 2020-01-16 1 91
358 2020-01-16 2 75
359 2020-01-16 3 68
360 2020-01-16 4 74
... ... .. ...
8775 2020-12-31 19 67
8776 2020-12-31 20 83
8777 2020-12-31 21 55
8778 2020-12-31 22 70
8779 2020-12-31 23 61
[8424 rows x 3 columns]
Date HE Data
160 2020-01-16 0 59
161 2020-01-16 1 40
162 2020-01-16 2 98
163 2020-01-16 3 67
164 2020-01-16 4 78
... ... .. ...
8579 2020-12-31 19 76
8580 2020-12-31 20 11
8581 2020-12-31 21 88
8582 2020-12-31 22 79
8583 2020-12-31 23 82
[8424 rows x 3 columns]
Date HE Data
13 2020-01-16 0 52
14 2020-01-16 1 69
15 2020-01-16 2 44
16 2020-01-16 3 75
17 2020-01-16 4 67
... ... .. ...
8432 2020-12-31 19 35
8433 2020-12-31 20 39
8434 2020-12-31 21 56
8435 2020-12-31 22 83
8436 2020-12-31 23 60
[8424 rows x 3 columns]