Home > other >  How to use group by and a conditional mean at the same time in R?
How to use group by and a conditional mean at the same time in R?

Time:11-08

I have a data set with the following columns: Ticker, Company_name, Forecast_period, Measure, Estimate, Forecast_date.

In the Measure column the value is either EPS or DPS and the Forecast_period is either 1 or 2.

I would like to calculate the mean of EPS1, (mean of Estimate if Measure = EPS & Forecast_period =1) mean of EPS2 (mean of Estimate if Measure = EPS & Forecast_period = 2) and the mean of DPS1 (mean of Estimate if Measure = DPS & Forecast_period =1).

For that I wrote the following code:

IBESdatamean2 <- IBESdata %>%
  group_by(Company_name, Ticker, Forecast_period) %>%
  summarize(mean[Measure == "EPS" & Forecast_period == 1, 'Estimate'])

I need to group my data by Company_name, Ticker and Forecast_period that I am sure of, however I am not sure if I have written the right code for the mean calculation because I am getting the following error:

Caused by error in `mean[Measure == "EPS" & Forecast_period == 1, "Estimate"]`:
! object of type 'closure' is not subsettable

I want to have a conditional mean calculation as described above and if anyone could help me that would be greatly appreciated.

CodePudding user response:

The summarise function works to implement a function on the entire dataset. In this case we want to filter to a specific subset and then extract a single value.

We can use the filter() function and then implement a pull() and mean()

library(dplyr)

IBESdatamean2 <- IBESdata %>%
  group_by(Company_name, Ticker, Forecast_period) %>%
  filter(Measure == "EPS", Forecast_period == 1) %>%
  pull(Estimate) %>%
  mean(na.rm = T)

This is how you have written your code.

However, what I think you are actually asking for is the mean of each individual measure and forecast period per company.

In which case it would be appropriate to implement a summarise(), however your group_by() is incorrect.

library(dplyr)

IBES_ALL_MEANS <- IBESdata %>%
  group_by(Company_name, Forecast_period, Measure) %>%
  summarise(Mean_Estimates = mean(Estimate, na.rm = T)

If you wanted to further filter down to a specific company, forecast period or measure you can now do so using the filter() function on the IBES_ALL_MEANS dataset.

  • Related