I have a df that looks like this (the df much larger)
DateTime Value Date Time period DatePeriod
0 2022-09-18 06:00:00 5.4 18/09/2022 06:00 morning 18/09/2022-morning
1 2022-09-18 07:00:00 6.0 18/09/2022 07:00 morning 18/09/2022-morning
2 2022-09-18 08:00:00 6.5 18/09/2022 08:00 morning 18/09/2022-morning
3 2022-09-18 09:00:00 6.7 18/09/2022 09:00 morning 18/09/2022-morning
4 2022-09-18 10:00:00 6.9 18/09/2022 10:00 morning 18/09/2022-morning
11 2022-09-18 17:00:00 6.8 18/09/2022 17:00 morning 18/09/2022-morning
12 2022-09-18 18:00:00 6.4 18/09/2022 18:00 night 18/09/2022-night
13 2022-09-18 19:00:00 5.7 18/09/2022 19:00 night 18/09/2022-night
14 2022-09-18 20:00:00 4.8 18/09/2022 20:00 night 18/09/2022-night
15 2022-09-18 21:00:00 5.4 18/09/2022 21:00 night 18/09/2022-night
16 2022-09-18 22:00:00 4.7 18/09/2022 22:00 night 19/09/2022-night
21 2022-09-19 03:00:00 3.8 19/09/2022 03:00 night 19/09/2022-night
22 2022-09-19 04:00:00 3.5 19/09/2022 04:00 night 19/09/2022-night
23 2022-09-19 05:00:00 2.8 19/09/2022 05:00 night 19/09/2022-night
24 2022-09-19 06:00:00 3.8 19/09/2022 06:00 morning 19/09/2022-morning
I created a dictionary by grouping the Dateperiod and collected their values in a list, like this:
result = df.groupby('DatePeriod')['Value'].apply(list).to_dict()
Output:
{'18/09/2022-morning': [5.4, 6.0, 6.5, 6.9, 7.9, 8.5, 7.5, 7.9, 7.8, 7.6, 6.8],
'18/09/2022-night': [6.4, 5.7, 4.8, 5.4, 4.7, 4.3],
'19/09/2022-morning': [3.8],
'19/09/2022-night': [4.1, 4.4, 4.3, 3.8, 3.5, 2.8]}
Is there anyway I can get the exact same result but with the DateTime as key instead of DatePeriod in result dictionary? i.e I still want the grouping to be based on the DatePeriod and the values to be a list of values, only difference is i want the full Date to be the key, it can be the first DateTime as key, but not the DatePeriod! Example:
{'2022-09-18 06:00:00': [5.4, 6.0, 6.5, 6.9, 7.9, 8.5, 7.5, 7.9, 7.8, 7.6, 6.8],
'2022-09-18 18:00:00' : [6.4, 5.7, 4.8, 5.4, 4.7, 4.3],
'2022-09-19 06:00:00': [3.8],
'2022-09-19 03:00:00': [4.1, 4.4, 4.3, 3.8, 3.5, 2.8]}
Is there any easy way to do this? Thanks in advance
CodePudding user response:
IIUC you can use aggregation:
result = (df.groupby('DatePeriod')
.agg({"Value": list, "DateTime": "first"})
.set_index("DateTime")["Value"]
.to_dict())
print (result)
{'2022-05-12 06:00:00': [11.8], '2022-05-12 18:00:00': [12.5], '2022-05-13 06:00:00': [10.9], '2022-05-13 18:00:00': [13.5], '2022-05-14 06:00:00': [11.8]}