Home > Blockchain >  mutate across columns - if condition met return column name as value
mutate across columns - if condition met return column name as value

Time:11-21

I have a wide data frame similar to document term matrix:

df_names_tkns <-  tibble::tribble(
                                ~name, ~aaa, ~ddd, ~downing, ~eee, ~london, ~street, ~bbb, ~broadway, ~ccc, ~new, ~york,
  "AAA LONDON DOWNING STREET DDD EEE",   1L,   1L,       1L,   1L,      1L,      1L,   NA,        NA,   NA,   NA,    NA,
      "AAA NEW YORK BROADWAY BBB CCC",   1L,   NA,       NA,   NA,      NA,      NA,   1L,        1L,   1L,   1L,    1L
  )
 

I would like to replace bulk in all columns where x > 0 the value into the name of the column. What would be the correct syntax? I have tried the following two approaches, with if and with case_when.

   df_names_tkns2 <- df_names_tkns |> 
      mutate(across(2:ncol(df_names_tkns),
      function (x)  if (x > 0) cur_column(x) else x))

The error quote:

Caused by error in `across()`:
! Problem while computing column `aaa`.
Caused by error in `if (x > 0) ...`:
! the condition has length > 1

Or I tried

df_names_tkns2 <- df_names_tkns |> 
  mutate(
    across(
      2:ncol(df_names_tkns), 
      ~ case_when(.x > 1 ~  cur_column(.x))                              )
      )
    )

Error quote:

Caused by error in `across()`:
! Problem while computing column `aaa`.
Caused by error in `cur_column()`:
! unused argument (aaa)

Apparently I am not using the right syntax for writing the function. What would be the correct way?

CodePudding user response:

The error you're getting is caused by passing a vector with length greater than one to if().

For example:

if(c(TRUE, FALSE)) print("true")
# Error in if (c(TRUE, FALSE)) print("true") : the condition has length > 1

To avoid this you can either:

  1. Group your data frame by row
  2. Use vectorised logic

1. Group data frame by row

Using dplry::rowwise(). This will ensure that only one value is passed to if() at a time.

df_names_tkns2 <-
      df_names_tkns |> 
      rowwise() |>
      mutate(across(2:ncol(df_names_tkns),
      # ^ NB consider across(2:last_col(), ...) for brevity
      function (x)  if (x > 0) cur_column(x) else x))

2. Use vectorised logic

if_else() is a function that vectorises if()/else() logic. So you could do:

df_names_tkns2 <- df_names_tkns |> 
      mutate(across(2:ncol(df_names_tkns),
      function (x)  if_else(x > 0, cur_column(), x)))

A note about cur_column()

cur_column() does not take an argument - some of the latter errors you were getting were caused by the unnamed argument aaa being passed to cur_column(). This is because you were passing values (such as aaa) to cur_column() via function(x) cur_column(x) within your call to mutate(). Since cur_column() doesn't take any arguments, it crashes when you pass x to it.

CodePudding user response:

Another possible solution using purrr::imap_dfc

df_names_tkns |>
    mutate(purrr::imap_dfc(df_names_tkns[-1], ~ifelse(.x > 0, .y, .x)))

##>   # A tibble: 2 × 12
##>   name  aaa   ddd   downing eee   london street bbb   broadway ccc   new   york 
##>   <chr> <chr> <chr> <chr>   <chr> <chr>  <chr>  <chr> <chr>    <chr> <chr> <chr>
##> 1 AAA … aaa   ddd   downing eee   london street NA    NA       NA    NA    NA   
##> 2 AAA … aaa   NA    NA      NA    NA     NA     bbb   broadway ccc   new   york
  • Related