Home > Software design >  How to extract max value and time duration into new df/list
How to extract max value and time duration into new df/list

Time:10-29

How to extract max value of column parameter of each cycle and also time duration of this cycle into new dataframe?

# initialize list of lists
data = [['2021-01-22 08:00:00.000000 00:00', 2, 1], 
        ['2021-01-22 08:00:01.000000 00:00', 3, 1], 
        ['2021-01-22 08:00:04.000000 00:00', 2, 1],
        ['2021-01-22 09:00:00.000000 00:00', 5, 2], 
        ['2021-01-22 09:00:02.000000 00:00', 4, 2],
        ['2021-01-22 09:00:04.000000 00:00', 4, 2], 
        ['2021-01-22 09:00:06.000000 00:00', 4, 2],               
       ]
 
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['time', 'parameter', 'cycle'])

# Conversion into time correct format
df['time'] = pd.to_datetime(df['time'])

target_df:

cycle time_duration max_parameter
1 4 3
2 6 5

CodePudding user response:

Group by 'cycle' and use agg to aggregate each column group by a different function.

https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.aggregate.html

>>> df 

                       time  parameter  cycle
0 2021-01-22 08:00:00 00:00          2      1
1 2021-01-22 08:00:01 00:00          3      1
2 2021-01-22 08:00:04 00:00          2      1
3 2021-01-22 09:00:00 00:00          5      2
4 2021-01-22 09:00:02 00:00          4      2
5 2021-01-22 09:00:04 00:00          4      2
6 2021-01-22 09:00:06 00:00          4      2

>>> target_df = (
       df.groupby("cycle", as_index=False)
         .agg(time_duration = ('time', lambda time: time.max() - time.min()), 
              max_parameter = ('parameter', 'max'))
)
>>> target_df
            
   cycle   time_duration  max_parameter
0      1 0 days 00:00:04              3
1      2 0 days 00:00:06              5

If you want 'time_duration' in seconds just change to lambda time: (time.max() - time.min()).seconds

  • Related