I have the following pandas dataframe:
import pandas as pd
df4 = pd.DataFrame({'timestamp':['2022-10-01 01:00:00',
'2022-10-02 01:00:00',
'2022-10-03 01:00:00',
'2022-10-04 01:00:00',
'2022-10-05 01:00:00',
'2022-10-01 02:00:00',
'2022-10-02 02:00:00',
'2022-10-03 02:00:00',
'2022-10-04 02:00:00',
'2022-10-05 02:00:00'],
'A': [1,2,3,4,5,6,7,8,9,10],
'B': [10,9,8,7,6,5,4,3,2,1]}
)
df4['timestamp'] = df4['timestamp'].astype('datetime64')
df4
that gives the following data frame:
| timestamp | A| B |
|--------------------|--| --|
| 2022-10-01 01:00:00| 1| 10|
| 2022-10-02 01:00:00| 2| 9 |
| 2022-10-01 01:00:00| 3| 8 |
| 2022-10-02 01:00:00| 4| 7 |
| 2022-10-01 01:00:00| 5| 6 |
| 2022-10-02 01:00:00| 6| 5 |
| 2022-10-01 01:00:00| 7| 4 |
| 2022-10-02 01:00:00| 8| 3 |
| 2022-10-01 01:00:00| 9| 2 |
| 2022-10-02 01:00:00| 10| 1|
I am trying to get a group on the day level of timestamp, across the y column, and without aggregation function, but with collecting all data in a new column (as a pandas series or NumPy array):
| timestamp | no_name | y |
|------------| -----------------| -----|
| 2022-10-01 | 0 [1,3,5,7,9] | A |
| 2022-10-01 | 0 [10,8, 6, 4, 2]| B |
| 2022-10-02 | 0 [2,4, 6, 8, 10]| A |
| 2022-10-02 | 0 [9,7, 5, 3, 1] | B |
I tried to follow the suggestions on the link Output pandas grouped data frame without aggregation :
but
df6 = df4.groupby('timestamp').agg( lambda g: dict([(k, g[k].tolist()) for k in g]))
gives KeyError: 1
Also, I tried the following:
df4.melt(id_vars=['timestamp'], var_name='target', value_name='value')
to get the target values in one column, the grouping across the timestamp (on hour level) and without aggregation did not give the desired result.
Any suggestions? What else could I try to solve it?
thx
CodePudding user response:
Add this to your code:
df4['day'] = df4['timestamp'].dt.strftime('%Y-%m-%d')
grouped = df4.groupby('day')[['A', 'B']].agg(list)
result = grouped.melt(ignore_index=False)
print(result)