I have a dataframe df
(tibble in my case) in R and several files in a given directory which have a loose correspondence with the elements of one of the columns in df
. I want to track which rows in df
correspond to these files by adding a column has_file
.
Here's what I've tried.
# SETUP
dir.create("temp")
setwd("temp")
LETTERS[1:4] %>%
str_c(., ".png") %>%
file.create()
df <- tibble(x = LETTERS[3:6])
file_list <- list.files()
# ATTEMPT
df %>%
mutate(
has_file = file_list %>%
str_remove(".png") %>%
is.element(x, .) %>%
any()
)
# RESULT
# A tibble: 4 x 2
x has_file
<chr> <lgl>
1 C TRUE
2 D TRUE
3 E TRUE
4 F TRUE
I would expect that only the rows with C and D get values of TRUE for has_file
, but E and F do as well.
What is happening here, and how may I generate this correspondence in a column?
(Tidyverse solution preferred.)
CodePudding user response:
We may need to add rowwise
at the top because the any
is going to do the evaluation on the whole column and as there are already two TRUE
elements, any
returns TRUE from that row to fill up the whole column. With rowwise
, there is no need for any
as is.element
returns a single TRUE/FALSE per each element of 'x' column
df %>%
rowwise %>%
mutate(
has_file = file_list %>%
str_remove(".png") %>%
is.element(x, .)) %>%
ungroup
# A tibble: 4 × 2
x has_file
<chr> <lgl>
1 C TRUE
2 D TRUE
3 E FALSE
4 F FALSE
i.e. check the difference after adding the any
> is.element(df$x, LETTERS[1:4])
[1] TRUE TRUE FALSE FALSE
> any(is.element(df$x, LETTERS[1:4]))
[1] TRUE
We may also use map
to do this
library(purrr)
df %>%
mutate(has_file = map_lgl(x, ~ file_list %>%
str_remove(".png") %>%
is.element(.x, .)))
# A tibble: 4 × 2
x has_file
<chr> <lgl>
1 C TRUE
2 D TRUE
3 E FALSE
4 F FALSE
Or if we want to use vectorized option, instead of using is.element
, do the %in%
directly
df %>%
mutate(has_file = x %in% str_remove(file_list, ".png"))
# A tibble: 4 × 2
x has_file
<chr> <lgl>
1 C TRUE
2 D TRUE
3 E FALSE
4 F FALSE