Home > OS >  Groupby a column but have another column as the key
Groupby a column but have another column as the key

Time:09-26

I have a df that looks like this (the df much larger)

   DateTime               Value    Date    Time   period        DatePeriod

0  2022-09-18 06:00:00    5.4  18/09/2022  06:00  morning  18/09/2022-morning
1  2022-09-18 07:00:00    6.0  18/09/2022  07:00  morning  18/09/2022-morning
2  2022-09-18 08:00:00    6.5  18/09/2022  08:00  morning  18/09/2022-morning
3  2022-09-18 09:00:00    6.7  18/09/2022  09:00  morning  18/09/2022-morning
4  2022-09-18 10:00:00    6.9  18/09/2022  10:00  morning  18/09/2022-morning
11 2022-09-18 17:00:00    6.8  18/09/2022  17:00  morning  18/09/2022-morning
12 2022-09-18 18:00:00    6.4  18/09/2022  18:00    night    18/09/2022-night
13 2022-09-18 19:00:00    5.7  18/09/2022  19:00    night    18/09/2022-night
14 2022-09-18 20:00:00    4.8  18/09/2022  20:00    night    18/09/2022-night
15 2022-09-18 21:00:00    5.4  18/09/2022  21:00    night    18/09/2022-night
16 2022-09-18 22:00:00    4.7  18/09/2022  22:00    night       19/09/2022-night
21 2022-09-19 03:00:00    3.8  19/09/2022  03:00    night    19/09/2022-night
22 2022-09-19 04:00:00    3.5  19/09/2022  04:00    night    19/09/2022-night
23 2022-09-19 05:00:00    2.8  19/09/2022  05:00    night    19/09/2022-night
24 2022-09-19 06:00:00    3.8  19/09/2022  06:00  morning  19/09/2022-morning

I created a dictionary by grouping the Dateperiod and collected their values in a list, like this:

result = df.groupby('DatePeriod')['Value'].apply(list).to_dict()

Output:

{'18/09/2022-morning': [5.4, 6.0, 6.5, 6.9, 7.9, 8.5, 7.5, 7.9, 7.8, 7.6, 6.8],
 '18/09/2022-night': [6.4, 5.7, 4.8, 5.4, 4.7, 4.3],
 '19/09/2022-morning': [3.8],
 '19/09/2022-night': [4.1, 4.4, 4.3, 3.8, 3.5, 2.8]}

Is there anyway I can get the exact same result but with the DateTime as key instead of DatePeriod in result dictionary? i.e I still want the grouping to be based on the DatePeriod and the values to be a list of values, only difference is i want the full Date to be the key, it can be the first DateTime as key, but not the DatePeriod! Example:

{'2022-09-18 06:00:00': [5.4, 6.0, 6.5, 6.9, 7.9, 8.5, 7.5, 7.9, 7.8, 7.6, 6.8],
 '2022-09-18 18:00:00' : [6.4, 5.7, 4.8, 5.4, 4.7, 4.3],
 '2022-09-19 06:00:00': [3.8],
 '2022-09-19 03:00:00': [4.1, 4.4, 4.3, 3.8, 3.5, 2.8]}

Is there any easy way to do this? Thanks in advance

CodePudding user response:

IIUC you can use aggregation:

result = (df.groupby('DatePeriod')
            .agg({"Value": list, "DateTime": "first"})
            .set_index("DateTime")["Value"]
            .to_dict())
print (result)

{'2022-05-12 06:00:00': [11.8], '2022-05-12 18:00:00': [12.5], '2022-05-13 06:00:00': [10.9], '2022-05-13 18:00:00': [13.5], '2022-05-14 06:00:00': [11.8]}
  • Related