I'm struggling understanding why my code below works only when using rowwise
in combination with ifelse
. Or more precisely, I think I get why it is working in that scenario, but not why it doesn't simply work with if_else
.
What I'm doing is, I'm checking if a certain rows contains the word "infile" or "outfile" and if it has a relative path (".."). If it does have the words "infile/outfile" and not a relative path, then it has an absolute path "C:". And in that case, I want to replace the user name with something else (here: "test").
Any ideas?
Data:
df <- structure(list(value = c("infile 'C:\\Users\\USER\\folder\\Data.sav'",
"infile '..\\folder\\Data.sav'", "outfile '..\\folder\\Data.sav'",
"test", "")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-5L))
user_name <- "test"
Code that works:
df |>
rowwise() |>
mutate(value = ifelse(str_detect(value, "infile|outfile") & !str_detect(value, "\\'\\.\\.\\\\"),
str_replace(value,
str_sub(value,
str_locate_all(value, "\\\\")[[1]][2] 1,
str_locate_all(value, "\\\\")[[1]][3] - 1),
user_name),
value)) |>
ungroup()
with output:
# A tibble: 5 × 1
value
<chr>
1 "infile 'C:\\Users\\test\\folder\\Data.sav'"
2 "infile '..\\folder\\Data.sav'"
3 "outfile '..\\folder\\Data.sav'"
4 "test"
5 ""
Code that doesn't work:
df |>
mutate(value = if_else(str_detect(value, "infile|outfile") & !str_detect(value, "\\'\\.\\.\\\\"),
str_replace(value,
str_sub(value,
str_locate_all(value, "\\\\")[[1]][2] 1,
str_locate_all(value, "\\\\")[[1]][3] - 1),
user_name),
value))
I think this works, but gives a warning message:
Warning messages:
1: Problem while computing `value = if_else(...)`.
ℹ empty search patterns are not supported
2: Problem while computing `value = if_else(...)`.
ℹ empty search patterns are not supported
Code that doesn't work:
df |>
rowwise() |>
mutate(value = if_else(str_detect(value, "infile|outfile") & !str_detect(value, "\\'\\.\\.\\\\"),
str_replace(value,
str_sub(value,
str_locate_all(value, "\\\\")[[1]][2] 1,
str_locate_all(value, "\\\\")[[1]][3] - 1),
user_name),
value)) |>
ungroup()
Error in `mutate()`:
! Problem while computing `value = if_else(...)`.
ℹ The error occurred in row 2.
Caused by error:
! Empty `pattern` not supported
CodePudding user response:
Here is one way (where my substitution of USER
is very simple; not sure if it should be more generic):
df %>%
tidyr::separate(value, into = c('Type', 'Path'), sep = ' ') %>%
dplyr::mutate(
Value = dplyr::if_else(
(Type %in% c('infile', 'outfile')) & !startsWith(Path, "'.."),
stringr::str_replace(Path, 'USER', user_name),
Path
)
)
I split the value
column to make the check easier.
If you need to replace the username with the variable you can do like this (here with back referencing the regular expression):
df %>%
tidyr::separate(value, into = c('Type', 'Path'), sep = ' ') %>%
dplyr::mutate(
Value = dplyr::if_else(
(Type %in% c('infile', 'outfile')) & !startsWith(Path, "'.."),
sub('^(C:\\\\Users\\\\)([[:alnum:]] )\\\\', paste0('\\1', user_name, '\\\\'), Path),
Path
)
)
CodePudding user response:
Basically, the issue is that without rowwise()
, str_locate
is looking at all 5 strings in df$value
on each iteration, and returning the same indices for the beginning and ending of the string for each row.
To debug, I'd suggest breaking the calculation out a bit:
df %>% rowwise() %>%
mutate(n=length(value), slen=str_length(value),
l1=str_locate_all(value,"\\\\")[[1]][2] 1,
l2=str_locate_all(value,"\\\\")[[1]][3]-1,
ssub=str_sub(value, l1, l2),
detect=str_detect(value, "infile|outfile")& !str_detect(value,"\\'\\.\\.\\\\"),
vout=if_else(detect, ssub, user_name))
# A tibble: 5 × 8
# Rowwise:
value n slen l1 l2 ssub detect vout
<chr> <int> <int> <dbl> <dbl> <chr> <lgl> <chr>
1 "infile 'C:\\Users\\USER\\folder\\Data.sav'" 1 38 18 21 "USER" TRUE USER
2 "infile '..\\folder\\Data.sav'" 1 27 19 10 "" FALSE test
3 "outfile '..\\folder\\Data.sav'" 1 28 20 11 "" FALSE test
4 "test" 1 4 NA NA NA FALSE test
5 "" 1 0 NA NA NA FALSE test
While without the rowwise()
, mutate gets all the strings in the value column all at once, and it finds the same locations for your cuts on every single row:
df %>%
mutate(n=length(value), slen=str_length(value),
l1=str_locate_all(value,"\\\\")[[1]][2] 1,
l2=str_locate_all(value,"\\\\")[[1]][3]-1,
ssub=str_sub(value, l1, l2),
detect=str_detect(value, "infile|outfile")& !str_detect(value,"\\'\\.\\.\\\\"),
vout=if_else(detect, ssub, user_name))
# A tibble: 5 × 8
value n slen l1 l2 ssub detect vout
<chr> <int> <int> <dbl> <dbl> <chr> <lgl> <chr>
1 "infile 'C:\\Users\\USER\\folder\\Data.sav'" 5 38 18 21 "USER" TRUE USER
2 "infile '..\\folder\\Data.sav'" 5 27 18 21 "\\Dat" FALSE test
3 "outfile '..\\folder\\Data.sav'" 5 28 18 21 "r\\Da" FALSE test
4 "test" 5 4 18 21 "" FALSE test
5 "" 5 0 18 21 "" FALSE test
Once you calculate the locations to subset your string incorrectly, I think you are just lucky that another error was thrown.