I am facing a challenge, I can't seem to figure out (also after searching the www).
Given a data frame:
diabetes <- data.frame(Age = c(20,23,45,77), Diabetes = c('no', 'no', 'yes', 'yes'))
$ Age <dbl> 20, 23, 45, 77
$ Diabetes <fct> no, no, yes, yes
I am trying to plot density curves for the two Diabetes outcomes and include vertical lines at the means.
plot_numeric <- function(dataset, predictor, outcome){
p1 <- dataset %>% ggplot(aes_string(x = predictor))
geom_density(fill = 'gray', alpha = 0.5)
theme_fivethirtyeight()
p2 <- dataset %>% ggplot(aes_string(x = predictor, fill = outcome))
geom_density(alpha = 0.5)
scale_fill_manual(values = c('#999999', '#E69F00'))
geom_vline(aes_string(xintercept = mean(predictor[outcome == 'no'])), color = '999999')
geom_vline(aes_string(xintercept = mean(predictor[outcome == 'yes'])), color = '#E69F00')
theme_fivethirtyeight()
gridExtra::grid.arrange(p1,p2)
}
plot_numeric(diabetes, 'Age', 'Diabetes')
I am receiving the error "argument is not numeric or logical: returning NA" and the vertical lines for the mean are not included.
Everything works fine when doing the plot outside a function.
Any ideas on how to fix this, is much appreciated.
CodePudding user response:
The aes_string()
function doesn't come recommended anymore and instead people are encouraged to use tidy evaluation syntax as per the vignette.
In your case mean(predictor[outcome == 'no']))
tries to calculate the mean of a length 1 predictor
character subsetted by a length 1 outcome
character.
Fixing the shortcomings of aes_string()
, you can use curly-curly brackets.
library(ggplot2)
library(magrittr)
diabetes <- data.frame(Age = c(20,23,45,77), Diabetes = c('no', 'no', 'yes', 'yes'))
plot_numeric <- function(dataset, predictor, outcome){
p1 <- dataset %>% ggplot(aes(x = {{predictor}}))
geom_density(fill = 'gray', alpha = 0.5)
p2 <- dataset %>% ggplot(aes(x = {{predictor}}, fill = {{outcome}}))
geom_density(alpha = 0.5)
scale_fill_manual(values = c('#999999', '#E69F00'))
geom_vline(
aes(xintercept = mean({{predictor}}[{{outcome}} == 'no'])),
color = '999999'
)
geom_vline(
aes(xintercept = mean({{predictor}}[{{outcome}} == 'yes'])),
color = '#E69F00'
)
gridExtra::grid.arrange(p1,p2)
}
plot_numeric(diabetes, Age, Diabetes)
Created on 2021-09-21 by the reprex package (v2.0.1)
Alternatively if you like to give your column names as strings, you can use the .data
pronoun:
library(ggplot2)
library(magrittr)
diabetes <- data.frame(Age = c(20,23,45,77), Diabetes = c('no', 'no', 'yes', 'yes'))
plot_numeric <- function(dataset, predictor, outcome){
p1 <- dataset %>% ggplot(aes(x = .data[[predictor]]))
geom_density(fill = 'gray', alpha = 0.5)
p2 <- dataset %>% ggplot(aes(x = .data[[predictor]], fill = .data[[outcome]]))
geom_density(alpha = 0.5)
scale_fill_manual(values = c('#999999', '#E69F00'))
geom_vline(
aes(xintercept = mean(.data[[predictor]][.data[[outcome]] == 'no'])),
color = '999999'
)
geom_vline(
aes(xintercept = mean(.data[[predictor]][.data[[outcome]] == 'yes'])),
color = '#E69F00'
)
gridExtra::grid.arrange(p1,p2)
}
plot_numeric(diabetes, "Age", "Diabetes")
Created on 2021-09-21 by the reprex package (v2.0.1)