I have a dataframe like this:
df <- data.frame(
Metric = c('WI', NA, 'MN', NA, 'CO', NA),
Eval = c('WI', NA, 'AK', NA, 'CO', NA),
colA = c(30, 'ABC', 45, 'DEF', 2, 'XYZ'),
colB = c(25, 'BEC', 23, 'FED', 50, 'HIJ')
)
I'd like to evaluate whether the Metric
column and the Eval
column are equal, and, if they are, fill everything to the right of the Metric
column with NAs so the result would look like this:
df_desired <- data.frame(
Metric = c('WI', NA, 'MN', NA, 'CO', NA),
Eval = c(NA, NA, 'AK', NA, NA, NA),
colA = c(NA, 'ABC', 45, 'DEF', NA, 'XYZ'),
colB = c(NA, 'BEC', 23, 'FED', NA, 'HIJ')
)
What's the best way to do this using R, ideally with a tidyverse
function? I tried using mutate
/across
, but defining the conditional here is throwing me off.
CodePudding user response:
Create a logical vector and do the assignment based on row/column index/names (base R
is more direct to do)
i1 <- with(df, Metric == Eval & !is.na(Metric) & !is.na(Eval))
df[i1, -1] <- NA
-output
> df
Metric Eval colA colB
1 WI <NA> <NA> <NA>
2 <NA> <NA> ABC BEC
3 MN AK 45 23
4 <NA> <NA> DEF FED
5 CO <NA> <NA> <NA>
6 <NA> <NA> XYZ HIJ
Or with dplyr
, create a column of logical vector ('i1'), loop across
the columns 'Eval' to 'colB', use case_when/ifelse/if_else/replace
to change the values to NA
based on the 'i1' and remove the temporary column by assigning to NULL
library(dplyr)
df %>%
mutate(i1 = Metric == Eval,
across(Eval:colB, ~ case_when(i1 ~ NA_character_, TRUE ~ .)),
i1 = NULL)
-output
Metric Eval colA colB
1 WI <NA> <NA> <NA>
2 <NA> <NA> ABC BEC
3 MN AK 45 23
4 <NA> <NA> DEF FED
5 CO <NA> <NA> <NA>
6 <NA> <NA> XYZ HIJ
CodePudding user response:
Use mutate()
and if_else()
to conditionally replace values:
df |>
mutate(colA = if_else(Metric == Eval, NA_character_, colA, missing = colA))
#> Metric Eval colA colB
#> 1 WI WI <NA> 25
#> 2 <NA> <NA> ABC BEC
#> 3 MN AK 45 23
#> 4 <NA> <NA> DEF FED
#> 5 CO CO <NA> 50
#> 6 <NA> <NA> XYZ HIJ
Note that we can't just use NA
, we have to match it to the existing column type. In your example colA
and colB
are character vectors, so it's NA_character_
. And we have to specify missing
to handle the NA == NA
case.
To generalise this across multiple columns, use across()
and wrap the if_else()
in an anonymous function:
df |>
mutate(across(Eval:colB, ~if_else(Metric == Eval, NA_character_, ., missing = .)))
#> Metric Eval colA colB
#> 1 WI <NA> <NA> <NA>
#> 2 <NA> <NA> ABC BEC
#> 3 MN AK 45 23
#> 4 <NA> <NA> DEF FED
#> 5 CO <NA> <NA> <NA>
#> 6 <NA> <NA> XYZ HIJ