Home > Software engineering >  Pandas data frame with time-series data - grouping without aggregating data
Pandas data frame with time-series data - grouping without aggregating data

Time:10-19

I have the following pandas dataframe:

    import pandas as pd
    df4 = pd.DataFrame({'timestamp':['2022-10-01 01:00:00', 
                                     '2022-10-02 01:00:00', 
                                     '2022-10-03 01:00:00', 
                                     '2022-10-04 01:00:00', 
                                     '2022-10-05 01:00:00',
                                     '2022-10-01 02:00:00', 
                                     '2022-10-02 02:00:00', 
                                     '2022-10-03 02:00:00', 
                                     '2022-10-04 02:00:00', 
                                     '2022-10-05 02:00:00'],
                       'A': [1,2,3,4,5,6,7,8,9,10],
                       'B': [10,9,8,7,6,5,4,3,2,1]}
                      )
    df4['timestamp'] = df4['timestamp'].astype('datetime64')
    df4

that gives the following data frame:

| timestamp          | A| B |
|--------------------|--| --|
| 2022-10-01 01:00:00| 1| 10|
| 2022-10-02 01:00:00| 2| 9 |
| 2022-10-01 01:00:00| 3| 8 |
| 2022-10-02 01:00:00| 4| 7 |
| 2022-10-01 01:00:00| 5| 6 |
| 2022-10-02 01:00:00| 6| 5 |
| 2022-10-01 01:00:00| 7| 4 |
| 2022-10-02 01:00:00| 8| 3 |
| 2022-10-01 01:00:00| 9| 2 |
| 2022-10-02 01:00:00| 10| 1|

I am trying to get a group on the day level of timestamp, across the y column, and without aggregation function, but with collecting all data in a new column (as a pandas series or NumPy array):

| timestamp  | no_name          | y    |
|------------| -----------------| -----|
| 2022-10-01 | 0 [1,3,5,7,9]    | A    |
| 2022-10-01 | 0 [10,8, 6, 4, 2]| B    |
| 2022-10-02 | 0 [2,4, 6, 8, 10]| A    |
| 2022-10-02 | 0 [9,7, 5, 3, 1] | B    |

I tried to follow the suggestions on the link Output pandas grouped data frame without aggregation :

but

    df6 = df4.groupby('timestamp').agg( lambda g: dict([(k, g[k].tolist()) for k in g]))

gives KeyError: 1

Also, I tried the following:

    df4.melt(id_vars=['timestamp'], var_name='target', value_name='value')

to get the target values in one column, the grouping across the timestamp (on hour level) and without aggregation did not give the desired result.

Any suggestions? What else could I try to solve it?

thx

CodePudding user response:

Add this to your code:

df4['day'] = df4['timestamp'].dt.strftime('%Y-%m-%d')
grouped = df4.groupby('day')[['A', 'B']].agg(list)
result = grouped.melt(ignore_index=False)
print(result)
  • Related