Getting the average for a certain column based off specific ranges of two other columns?-CodePudding

I have 4 columns such as the ones below (but a total of 50,000 values not shown here):

Date        Speed        Area        Incidents
1/1/2016     6.5         Maly            L
1/2/2016     7.7         Maly            H
1/3/2016     14          Maly            H
...
1/1/2017     5.5         Reet            M
1/12/2017    9.8         Reet            M
4/8/2017      3          Reet            H
3/12/2017    5.8         Anlow           L

I need to try and find the average speed recorded in all of 2017 in the area of Reet where the incidents were M. So the output for this is supposed to be: 7.65.

I have absolutely no idea what I'm missing, and so far I have tried using date_range() and set_index with .describe() for my average requirement but I couldn't get it right.

CodePudding user response：

Convert Date to datetimes and aggregate year with another columns for sum:

df['Date'] = pd.to_datetime(df['Date'])
df1 = (df.groupby([df['Date'].dt.year.rename('Year'), 'Area','Incidents']).Speed
         .sum() 
         .reset_index())

CodePudding user response：

You can use masks to select the data you want, and then take the mean:

import pandas as pd
data = {'Date' : ['1/1/2016', '1/2/2016', '1/3/2016', '1/1/2017', '1/12/2017', '4/8/2017', '3/12/2017'],
        'Speed' : [6.5, 7.7, 14, 5.5, 9.8, 3, 5.8],
        'Area' : ['Maly', 'Maly', 'Maly', 'Reet', 'Reet', 'Reet', 'Anlow'],
        'Incidents':['L', 'H', 'H', 'M', 'M', 'H', 'L']}
df = pd.DataFrame(data)

# Convert the 'date' to datetime and set it as the index
df['Date'] = pd.to_datetime(df['Date'])
df.index = df['Date']

# Create the mask(s) you want and take the mean of the chosen column
mean_wind_speed = df[(df.index.year==2017) & (df.Area=='Reet') & (df.Incidents == 'M')]['Speed'].mean()
print(mean_wind_speed)

outputs:

7.65

Edit

one can also use query, with a new column for Years:

df['Year'] = df.index.year
mean_wind_speed = df.query('Year == 2017 and Area == "Reet" and Incidents=="M"')['Speed'].mean()