Python beginner needing help in an assignment here. I need to make a function that returns a list/array with the daily averages (i.e., 365 values) for a particular variable and csv file name
The function signature:
def daily_average(data, csv_file_name1:str, variable:str):
There are 3 csv files containing data like this (Example listed below is only a part of the file):
date time variable1 variable2 variable3
2021-01-01 01:00:00 6.08624 21.3 18.6
2021-01-01 02:00:00 7.40564 45.1 40.3
2021-01-01 03:00:00 5.01157 25.6 23.9
2021-01-01 04:00:00 12.76834 20.8 18.1
2021-01-01 05:00:00 9.09745 20.9 21.7
NumPy can be used. Pandas can also be used but I'm not proficient in it, so I'd prefer to not use it.
CodePudding user response:
This is a situation where pandas is a good approach. Finding the daily averages can be written as a groupby
operation, which turns the problem into a one-liner where you (1) group according the date in the date_time
column, then (2) compute the mean over the numeric entries.
I've duplicated some lines from the input to demonstrate this:
from io import StringIO
import pandas as pd
data_file = StringIO("""date_time,variable1,variable2,variable3
2021-01-01 01:00:00,6.08624,21.3,18.6
2021-01-01 02:00:00,7.40564,45.1,40.3
2021-01-01 03:00:00,5.01157,25.6,23.9
2021-01-01 04:00:00,12.76834,20.8,18.1
2021-01-01 05:00:00,9.09745,20.9,21.7
2021-01-02 01:00:00,6.08624,21.3,18.6
2021-01-02 02:00:00,7.40564,45.1,40.3
2021-01-03 03:00:00,5.01157,25.6,23.9
2021-01-04 04:00:00,12.76834,20.8,18.1
2021-01-05 05:00:00,9.09745,20.9,21.7""")
df = pd.read_csv(data_file, parse_dates=[0])
print(df.groupby(by=df['date_time'].dt.date).mean(numeric_only=True))
Output:
date_time
2021-01-01 8.073848 26.74 24.52
2021-01-02 6.745940 33.20 29.45
2021-01-03 5.011570 25.60 23.90
2021-01-04 12.768340 20.80 18.10
2021-01-05 9.097450 20.90 21.70
CodePudding user response:
I echo the use of pandas in this instance. Another alternative is to resample the data on the daily time frame. Can specify the aggregation method in the resample_dict
on a column basis as well.
resample_dict = {
'variable1': 'mean',
'variable2': 'mean',
'variable3': 'mean'
}
daily_average = daily_average.resample('D', closed='left', label='left').apply(ohlc_dict)