Home > other >  How to avoid overwriting a variable by a tibble column name?
How to avoid overwriting a variable by a tibble column name?

Time:11-04

I observed the behavior of the filter() function which bothered me a bit.

I wan to select rows from a tibble that match simple criterion, they have some value in a particular column. The value is stored in a variable. Super easy stuff. The problem is that a name of my variable is this same as a column name of a tibble, which produces behaviour that i didn't expect.

>block 
[1] 41
>msg %>% filter(text == "trial_run" & block == block)
# A tibble: 42 × 4
    ...1 block    time text     
   <dbl> <dbl>   <dbl> <chr>    
 1    14     1 1149175 trial_run
 2    30     2 1164422 trial_run
 3    46     3 1193408 trial_run
 4    62     4 1199713 trial_run
 5    78     5 1211763 trial_run
 6    94     6 1218312 trial_run
 7   110     7 1222947 trial_run
 8   126     8 1236795 trial_run
 9   142     9 1247513 trial_run
10   158    10 1254297 trial_run
# … with 32 more rows

I assume that block == block produces a tautology like 1 == 1 thus my block criterion wasn't applied. A simple solution is just to change the name of a block variable.

> blk <- block
> msg %>% filter(text == "trial_run" & block == blk)
# A tibble: 1 × 4
   ...1 block    time text     
  <dbl> <dbl>   <dbl> <chr>    
1   654    41 1513347 trial_run

But I feel that this situation may lead me to a hard-to-track bug in the future. What am I doing wrong? How can I avoid this problem in the future (aside from making unique variable and column names)?

CodePudding user response:

The filter command has .data and .env "pronouns" that you can use to make it clear where variables are coming from. Try

msg %>% filter(text == "trial_run" & .data$block == .env$block)

The .data means it's coming from the data.frame, and .env means it's a value from the environment. You can read more about in the rlang help. There is an example just like this on that page.

CodePudding user response:

We can either escape !!

library(dplyr)
block <- 4
msg %>% 
    filter(text == "trial_run" & block == !!block)

-output

...1 block    time      text
4   62     4 1199713 trial_run

or may access the variable from the globalenv

msg %>% 
    filter(text == "trial_run" & block == .GlobalEnv$block)

-output

...1 block    time      text
4   62     4 1199713 trial_run

data

msg <- structure(list(...1 = c(14L, 30L, 46L, 62L, 78L, 94L, 110L, 126L, 
142L, 158L), block = 1:10, time = c(1149175L, 1164422L, 1193408L, 
1199713L, 1211763L, 1218312L, 1222947L, 1236795L, 1247513L, 1254297L
), text = c("trial_run", "trial_run", "trial_run", "trial_run", 
"trial_run", "trial_run", "trial_run", "trial_run", "trial_run", 
"trial_run")), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9", "10"))
  • Related