Python : function for daily average using numpy-CodePudding

Python beginner needing help in an assignment here. I need to make a function that returns a list/array with the daily averages (i.e., 365 values) for a particular variable and csv file name

The function signature:

def daily_average(data, csv_file_name1:str, variable:str):

There are 3 csv files containing data like this (Example listed below is only a part of the file):

date time variable1 variable2 variable3
2021-01-01 01:00:00 6.08624 21.3 18.6
2021-01-01 02:00:00 7.40564 45.1 40.3
2021-01-01 03:00:00 5.01157 25.6 23.9
2021-01-01 04:00:00 12.76834 20.8 18.1
2021-01-01 05:00:00 9.09745 20.9 21.7

NumPy can be used. Pandas can also be used but I'm not proficient in it, so I'd prefer to not use it.

CodePudding user response：

This is a situation where pandas is a good approach. Finding the daily averages can be written as a groupby operation, which turns the problem into a one-liner where you (1) group according the date in the date_time column, then (2) compute the mean over the numeric entries.

I've duplicated some lines from the input to demonstrate this:

from io import StringIO
import pandas as pd

data_file = StringIO("""date_time,variable1,variable2,variable3
2021-01-01 01:00:00,6.08624,21.3,18.6
2021-01-01 02:00:00,7.40564,45.1,40.3
2021-01-01 03:00:00,5.01157,25.6,23.9
2021-01-01 04:00:00,12.76834,20.8,18.1
2021-01-01 05:00:00,9.09745,20.9,21.7
2021-01-02 01:00:00,6.08624,21.3,18.6
2021-01-02 02:00:00,7.40564,45.1,40.3
2021-01-03 03:00:00,5.01157,25.6,23.9
2021-01-04 04:00:00,12.76834,20.8,18.1
2021-01-05 05:00:00,9.09745,20.9,21.7""")

df = pd.read_csv(data_file, parse_dates=[0])

print(df.groupby(by=df['date_time'].dt.date).mean(numeric_only=True))

Output:

date_time                                  
2021-01-01   8.073848      26.74      24.52
2021-01-02   6.745940      33.20      29.45
2021-01-03   5.011570      25.60      23.90
2021-01-04  12.768340      20.80      18.10
2021-01-05   9.097450      20.90      21.70

CodePudding user response：

I echo the use of pandas in this instance. Another alternative is to resample the data on the daily time frame. Can specify the aggregation method in the resample_dict on a column basis as well.

resample_dict = {                                                                                                             
    'variable1': 'mean',
    'variable2': 'mean',
    'variable3': 'mean'
}

daily_average = daily_average.resample('D', closed='left', label='left').apply(ohlc_dict)