Home > front end >  Add column to dataframe to show if an element in that row is in a certain list in R
Add column to dataframe to show if an element in that row is in a certain list in R

Time:09-28

I have a dataframe df (tibble in my case) in R and several files in a given directory which have a loose correspondence with the elements of one of the columns in df. I want to track which rows in df correspond to these files by adding a column has_file.

Here's what I've tried.

# SETUP
dir.create("temp")
setwd("temp")
LETTERS[1:4] %>% 
  str_c(., ".png") %>% 
  file.create()

df <- tibble(x = LETTERS[3:6])

file_list <- list.files()

# ATTEMPT
df %>% 
  mutate(
    has_file = file_list %>% 
      str_remove(".png") %>% 
      is.element(x, .) %>% 
      any()
  )

# RESULT
# A tibble: 4 x 2
  x     has_file
  <chr> <lgl>   
1 C     TRUE    
2 D     TRUE    
3 E     TRUE    
4 F     TRUE

I would expect that only the rows with C and D get values of TRUE for has_file, but E and F do as well.

What is happening here, and how may I generate this correspondence in a column?

(Tidyverse solution preferred.)

CodePudding user response:

We may need to add rowwise at the top because the any is going to do the evaluation on the whole column and as there are already two TRUE elements, any returns TRUE from that row to fill up the whole column. With rowwise, there is no need for any as is.element returns a single TRUE/FALSE per each element of 'x' column

df %>% 
 rowwise %>%
  mutate(
    has_file = file_list %>% 
      str_remove(".png") %>% 
      is.element(x, .)) %>%  
  ungroup
# A tibble: 4 × 2
  x     has_file
  <chr> <lgl>   
1 C     TRUE    
2 D     TRUE    
3 E     FALSE   
4 F     FALSE   

i.e. check the difference after adding the any

> is.element(df$x,  LETTERS[1:4])
[1]  TRUE  TRUE FALSE FALSE
> any(is.element(df$x,  LETTERS[1:4]))
[1] TRUE

We may also use map to do this

library(purrr)
df %>% 
   mutate(has_file = map_lgl(x, ~ file_list %>% 
                str_remove(".png") %>% 
                is.element(.x, .)))
# A tibble: 4 × 2
  x     has_file
  <chr> <lgl>   
1 C     TRUE    
2 D     TRUE    
3 E     FALSE   
4 F     FALSE   

Or if we want to use vectorized option, instead of using is.element, do the %in% directly

df %>% 
   mutate(has_file = x %in% str_remove(file_list, ".png"))
# A tibble: 4 × 2
  x     has_file
  <chr> <lgl>   
1 C     TRUE    
2 D     TRUE    
3 E     FALSE   
4 F     FALSE   
  • Related