How to extract max value of column parameter of each cycle and also time duration of this cycle into new dataframe?
# initialize list of lists
data = [['2021-01-22 08:00:00.000000 00:00', 2, 1],
['2021-01-22 08:00:01.000000 00:00', 3, 1],
['2021-01-22 08:00:04.000000 00:00', 2, 1],
['2021-01-22 09:00:00.000000 00:00', 5, 2],
['2021-01-22 09:00:02.000000 00:00', 4, 2],
['2021-01-22 09:00:04.000000 00:00', 4, 2],
['2021-01-22 09:00:06.000000 00:00', 4, 2],
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['time', 'parameter', 'cycle'])
# Conversion into time correct format
df['time'] = pd.to_datetime(df['time'])
target_df:
cycle | time_duration | max_parameter |
---|---|---|
1 | 4 | 3 |
2 | 6 | 5 |
CodePudding user response:
Group by 'cycle' and use agg
to aggregate each column group by a different function.
https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.aggregate.html
>>> df
time parameter cycle
0 2021-01-22 08:00:00 00:00 2 1
1 2021-01-22 08:00:01 00:00 3 1
2 2021-01-22 08:00:04 00:00 2 1
3 2021-01-22 09:00:00 00:00 5 2
4 2021-01-22 09:00:02 00:00 4 2
5 2021-01-22 09:00:04 00:00 4 2
6 2021-01-22 09:00:06 00:00 4 2
>>> target_df = (
df.groupby("cycle", as_index=False)
.agg(time_duration = ('time', lambda time: time.max() - time.min()),
max_parameter = ('parameter', 'max'))
)
>>> target_df
cycle time_duration max_parameter
0 1 0 days 00:00:04 3
1 2 0 days 00:00:06 5
If you want 'time_duration' in seconds just change to lambda time: (time.max() - time.min()).seconds