I observed the behavior of the filter() function which bothered me a bit.
I wan to select rows from a tibble that match simple criterion, they have some value in a particular column. The value is stored in a variable. Super easy stuff. The problem is that a name of my variable is this same as a column name of a tibble, which produces behaviour that i didn't expect.
>block
[1] 41
>msg %>% filter(text == "trial_run" & block == block)
# A tibble: 42 × 4
...1 block time text
<dbl> <dbl> <dbl> <chr>
1 14 1 1149175 trial_run
2 30 2 1164422 trial_run
3 46 3 1193408 trial_run
4 62 4 1199713 trial_run
5 78 5 1211763 trial_run
6 94 6 1218312 trial_run
7 110 7 1222947 trial_run
8 126 8 1236795 trial_run
9 142 9 1247513 trial_run
10 158 10 1254297 trial_run
# … with 32 more rows
I assume that block == block produces a tautology like 1 == 1 thus my block criterion wasn't applied. A simple solution is just to change the name of a block variable.
> blk <- block
> msg %>% filter(text == "trial_run" & block == blk)
# A tibble: 1 × 4
...1 block time text
<dbl> <dbl> <dbl> <chr>
1 654 41 1513347 trial_run
But I feel that this situation may lead me to a hard-to-track bug in the future. What am I doing wrong? How can I avoid this problem in the future (aside from making unique variable and column names)?
CodePudding user response:
The filter command has .data
and .env
"pronouns" that you can use to make it clear where variables are coming from. Try
msg %>% filter(text == "trial_run" & .data$block == .env$block)
The .data
means it's coming from the data.frame, and .env
means it's a value from the environment. You can read more about in the rlang help. There is an example just like this on that page.
CodePudding user response:
We can either escape !!
library(dplyr)
block <- 4
msg %>%
filter(text == "trial_run" & block == !!block)
-output
...1 block time text
4 62 4 1199713 trial_run
or may access the variable from the globalenv
msg %>%
filter(text == "trial_run" & block == .GlobalEnv$block)
-output
...1 block time text
4 62 4 1199713 trial_run
data
msg <- structure(list(...1 = c(14L, 30L, 46L, 62L, 78L, 94L, 110L, 126L,
142L, 158L), block = 1:10, time = c(1149175L, 1164422L, 1193408L,
1199713L, 1211763L, 1218312L, 1222947L, 1236795L, 1247513L, 1254297L
), text = c("trial_run", "trial_run", "trial_run", "trial_run",
"trial_run", "trial_run", "trial_run", "trial_run", "trial_run",
"trial_run")), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10"))