I am working with a large data name "BRFSS2015" and my assignment requires that I find the average "Number of Days Mental Health Not Good" for those in Pennsylvania who have numeric data? I started by using the following code but an error keeps appearing because the "_" in _STATE will not work:
BRFSS2015%>%
filter(_STATE == "Pennsylvania")
Error: unexpected input in: "BRFSS2015%>% filter(_"
I have also tried backticks and gsub() on it but to no avail. Please help.
CodePudding user response:
The issue is with unusual column name i.e. names starting with digits or punct characters. Use backquote on the column name
library(dplyr)
BRFSS2015%>%
filter(`_STATE` == "Pennsylvania")
CodePudding user response:
You have to surround the variable name with backticks. Here is a small example:
library(dplyr)
tb = tibble(value = rnorm(5), "_state" = letters[1:5])
tb
#> # A tibble: 5 x 2
#> value `_state`
#> <dbl> <chr>
#> 1 -0.769 a
#> 2 0.273 b
#> 3 -1.32 c
#> 4 -0.0795 d
#> 5 -0.701 e
# Without backticks
tb %>%
filter(_state == "a")
#> Error: unexpected input in:
#> "tb %>%
#> filter(_"
# With backticks
tb %>%
filter(`_state` == "a")
#> # A tibble: 1 x 2
#> value `_state`
#> <dbl> <chr>
#> 1 -0.769 a
Or avoid this issue by renaming the variable, e.g. with rename
.
tb %>%
rename(state = `_state`)
#> # A tibble: 5 x 2
#> value state
#> <dbl> <chr>
#> 1 -0.573 a
#> 2 -1.43 b
#> 3 -1.26 c
#> 4 -0.313 d
#> 5 1.06 e
Created on 2022-04-10 by the reprex package (v0.3.0)