I have a wide data frame similar to document term matrix:
df_names_tkns <- tibble::tribble(
~name, ~aaa, ~ddd, ~downing, ~eee, ~london, ~street, ~bbb, ~broadway, ~ccc, ~new, ~york,
"AAA LONDON DOWNING STREET DDD EEE", 1L, 1L, 1L, 1L, 1L, 1L, NA, NA, NA, NA, NA,
"AAA NEW YORK BROADWAY BBB CCC", 1L, NA, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L
)
I would like to replace bulk in all columns where x > 0 the value into the name of the column. What would be the correct syntax? I have tried the following two approaches, with if and with case_when.
df_names_tkns2 <- df_names_tkns |>
mutate(across(2:ncol(df_names_tkns),
function (x) if (x > 0) cur_column(x) else x))
The error quote:
Caused by error in `across()`:
! Problem while computing column `aaa`.
Caused by error in `if (x > 0) ...`:
! the condition has length > 1
Or I tried
df_names_tkns2 <- df_names_tkns |>
mutate(
across(
2:ncol(df_names_tkns),
~ case_when(.x > 1 ~ cur_column(.x)) )
)
)
Error quote:
Caused by error in `across()`:
! Problem while computing column `aaa`.
Caused by error in `cur_column()`:
! unused argument (aaa)
Apparently I am not using the right syntax for writing the function. What would be the correct way?
CodePudding user response:
The error you're getting is caused by passing a vector with length greater than one to if()
.
For example:
if(c(TRUE, FALSE)) print("true")
# Error in if (c(TRUE, FALSE)) print("true") : the condition has length > 1
To avoid this you can either:
- Group your data frame by row
- Use vectorised logic
1. Group data frame by row
Using dplry::rowwise()
. This will ensure that only one value is passed to if()
at a time.
df_names_tkns2 <-
df_names_tkns |>
rowwise() |>
mutate(across(2:ncol(df_names_tkns),
# ^ NB consider across(2:last_col(), ...) for brevity
function (x) if (x > 0) cur_column(x) else x))
2. Use vectorised logic
if_else()
is a function that vectorises if()
/else()
logic. So you could do:
df_names_tkns2 <- df_names_tkns |>
mutate(across(2:ncol(df_names_tkns),
function (x) if_else(x > 0, cur_column(), x)))
A note about cur_column()
cur_column()
does not take an argument - some of the latter errors you were getting were caused by the unnamed argument aaa
being passed to cur_column()
. This is because you were passing values (such as aaa
) to cur_column()
via function(x) cur_column(x)
within your call to mutate()
. Since cur_column()
doesn't take any arguments, it crashes when you pass x
to it.
CodePudding user response:
Another possible solution using purrr::imap_dfc
df_names_tkns |>
mutate(purrr::imap_dfc(df_names_tkns[-1], ~ifelse(.x > 0, .y, .x)))
##> # A tibble: 2 × 12
##> name aaa ddd downing eee london street bbb broadway ccc new york
##> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
##> 1 AAA … aaa ddd downing eee london street NA NA NA NA NA
##> 2 AAA … aaa NA NA NA NA NA bbb broadway ccc new york