Home > Software engineering >  Is there a way to return the names of rows with specific average column values?
Is there a way to return the names of rows with specific average column values?

Time:10-25

I am using the data set Income_Democracy.dta I am trying to find the name of countries that have an average dem_ind value greater than 0.95.

I figure I need to subset the countries, find the average, and return that as a new data set, but I can't figure out how to do it without the specific country names. I've fiddled with the which and subset functions but I'm only new to R and need help. For the specific countries I know you can do

mean(subset(incdem$dem_ind, incdem$country =="Australia"))

but I'm unsure how to generalise.

CodePudding user response:

Grouped by 'country', get the mean of 'dem_ind', filter the rows where the 'mean' column value is greater than 0.95 and pull the 'country' column as a vector

library(dplyr)
incdem %>%
    group_by(country) %>%
    summarise(Avg = mean(dem_ind, na.rm = TRUE), .groups = 'drop') %>%
    filter(Avg > 0.95) %>%
    pull(country)

Or another option is

names(which(sapply(split(incdem$dem_ind, incdem$country), mean, 
        na.rm = TRUE) > 0.95))

If it is a range of values

names(which(sapply(split(incdem$dem_ind, incdem$country), function(x) {
          avg <- mean(x, na.rm = TRUE)
          avg > 0.2 & avg < 0.8})))
  • Related