Home > Software design >  Facing a problem with "argument is not numeric or logical" when plotting with geom_vline i
Facing a problem with "argument is not numeric or logical" when plotting with geom_vline i

Time:09-22

I am facing a challenge, I can't seem to figure out (also after searching the www).

Given a data frame:

diabetes <- data.frame(Age = c(20,23,45,77), Diabetes = c('no', 'no', 'yes', 'yes'))

$ Age      <dbl> 20, 23, 45, 77
$ Diabetes <fct> no, no, yes, yes

I am trying to plot density curves for the two Diabetes outcomes and include vertical lines at the means.

plot_numeric <- function(dataset, predictor, outcome){
  p1 <- dataset %>% ggplot(aes_string(x = predictor))   
                    geom_density(fill = 'gray', alpha = 0.5)  
                    theme_fivethirtyeight()
  
  p2 <- dataset %>% ggplot(aes_string(x = predictor, fill = outcome))  
                    geom_density(alpha = 0.5)  
                    scale_fill_manual(values = c('#999999', '#E69F00'))  
                    geom_vline(aes_string(xintercept = mean(predictor[outcome == 'no'])), color = '999999')  
                    geom_vline(aes_string(xintercept = mean(predictor[outcome == 'yes'])), color = '#E69F00')  
                    theme_fivethirtyeight()
  
  gridExtra::grid.arrange(p1,p2)
}

plot_numeric(diabetes, 'Age', 'Diabetes')

I am receiving the error "argument is not numeric or logical: returning NA" and the vertical lines for the mean are not included.

Everything works fine when doing the plot outside a function.

Any ideas on how to fix this, is much appreciated.

CodePudding user response:

The aes_string() function doesn't come recommended anymore and instead people are encouraged to use tidy evaluation syntax as per the vignette.

In your case mean(predictor[outcome == 'no'])) tries to calculate the mean of a length 1 predictor character subsetted by a length 1 outcome character.

Fixing the shortcomings of aes_string(), you can use curly-curly brackets.

library(ggplot2)
library(magrittr)

diabetes <- data.frame(Age = c(20,23,45,77), Diabetes = c('no', 'no', 'yes', 'yes'))

plot_numeric <- function(dataset, predictor, outcome){
  p1 <- dataset %>% ggplot(aes(x = {{predictor}}))   
    geom_density(fill = 'gray', alpha = 0.5)
  
  p2 <- dataset %>% ggplot(aes(x = {{predictor}}, fill = {{outcome}}))  
    geom_density(alpha = 0.5)  
    scale_fill_manual(values = c('#999999', '#E69F00'))  
    geom_vline(
      aes(xintercept = mean({{predictor}}[{{outcome}} == 'no'])), 
      color = '999999'
      )  
    geom_vline(
      aes(xintercept = mean({{predictor}}[{{outcome}} == 'yes'])), 
      color = '#E69F00'
    )
  
  gridExtra::grid.arrange(p1,p2)
}

plot_numeric(diabetes, Age, Diabetes)

Created on 2021-09-21 by the reprex package (v2.0.1)

Alternatively if you like to give your column names as strings, you can use the .data pronoun:

library(ggplot2)
library(magrittr)

diabetes <- data.frame(Age = c(20,23,45,77), Diabetes = c('no', 'no', 'yes', 'yes'))

plot_numeric <- function(dataset, predictor, outcome){
  p1 <- dataset %>% ggplot(aes(x = .data[[predictor]]))   
    geom_density(fill = 'gray', alpha = 0.5)
  
  p2 <- dataset %>% ggplot(aes(x = .data[[predictor]], fill = .data[[outcome]]))  
    geom_density(alpha = 0.5)  
    scale_fill_manual(values = c('#999999', '#E69F00'))  
    geom_vline(
      aes(xintercept = mean(.data[[predictor]][.data[[outcome]] == 'no'])), 
      color = '999999'
      )  
    geom_vline(
      aes(xintercept = mean(.data[[predictor]][.data[[outcome]] == 'yes'])), 
      color = '#E69F00'
    )
  
  gridExtra::grid.arrange(p1,p2)
}

plot_numeric(diabetes, "Age", "Diabetes")

Created on 2021-09-21 by the reprex package (v2.0.1)

  • Related