Hi I have dataframe and in 13 columns of my df values are coded from 1 to 5. I want to replace 1,2 with 0 and 3,4 with 1 and drop 5. How I can make a change in my current data without mutating? Because there are 13 columns needed to be changed
CodePudding user response:
You can try the code below (borrow data from @benson23, thanks!)
> df[] <- (df >= 3) * NA^(df == 5)
> df
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
1 1 NA NA 0 0 1 0 NA NA 0 0 0 NA
2 1 1 1 0 0 NA 0 NA NA 0 0 0 NA
3 0 0 1 NA 1 1 0 1 1 1 1 0 1
4 0 0 0 1 1 0 1 0 NA 0 1 0 0
5 1 1 1 0 0 NA 1 0 0 NA 1 1 1
df >=3
yields boolean matrix consisting ofTRUE
orFALSE
NA^(df == 5)
yields a matrix consisting ofNA
or1
, sinceNA^0 = 1
and1^NA = NA
, and this matrix will play as a mask- The element-wise product between two matrix retains the non-
NA
entries and also turns boolean values to numerics
CodePudding user response:
I would either perform the conversion as a two-step process (since there are two rules), or write a function that encapsulates your rules, and apply those. I’ll be using ‘dplyr’ mutate
in the following since that seems to be what you’re using:
Here’s the two-step process:
df |>
mutate(across(everything(), ~ replace(.x, .x == 5L, NA))) |>
mutate(across(everything(), ~ .x >= 3L))
And here it is using a function:
myrule = function (x) {
if_else(x == 5L, NA, x >= 3L)
}
df |> mutate(across(everything(), myrule))
Here it is crucial that you give the function myrule
a descriptive name that fits your problem domain.
CodePudding user response:
Let's say we have this stimulated dataframe:
set.seed(123)
df <- matrix(rep(sample(1:5, 5*13, replace = T)), ncol = 13) %>% as.data.frame()
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
1 2 1 5 5 5 4 5 3 1 1 1 2 4
2 4 3 3 2 4 3 5 4 3 2 3 1 1
3 3 5 4 1 4 3 4 2 3 3 4 4 2
4 5 3 3 2 5 2 4 2 2 2 1 5 2
5 5 3 1 5 4 1 1 2 1 5 3 2 5
Base R
We can first set df == 5
to NA
, and use a logical expression to see if values are greater then or equal to 3 (proposed by @danlooo in the comment).
The (df >= 3)
syntax is used to convert logical output of df >= 3
to integer.
df[df == 5] <- NA
df <- as.data.frame( (df >= 3))
dplyr
Or we can use the mutate
with across
combination in the dplyr
package.
library(dplyr)
df <- df %>% mutate(across(everything(), ~case_when(.x %in% 1:2 ~ 0,
.x %in% 3:4 ~ 1,
TRUE ~ NA_real_)))
Output
df
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
1 0 0 NA NA NA 1 NA 1 0 0 0 0 1
2 1 1 1 0 1 1 NA 1 1 0 1 0 0
3 1 NA 1 0 1 1 1 0 1 1 1 1 0
4 NA 1 1 0 NA 0 1 0 0 0 0 NA 0
5 NA 1 0 NA 1 0 0 0 0 NA 1 0 NA
Data
Here's the dput(df)
for easier data loading.
structure(list(V1 = c(2L, 4L, 3L, 5L, 5L), V2 = c(1L, 3L, 5L,
3L, 3L), V3 = c(5L, 3L, 4L, 3L, 1L), V4 = c(5L, 2L, 1L, 2L, 5L
), V5 = c(5L, 4L, 4L, 5L, 4L), V6 = c(4L, 3L, 3L, 2L, 1L), V7 = c(5L,
5L, 4L, 4L, 1L), V8 = c(3L, 4L, 2L, 2L, 2L), V9 = c(1L, 3L, 3L,
2L, 1L), V10 = c(1L, 2L, 3L, 2L, 5L), V11 = c(1L, 3L, 4L, 1L,
3L), V12 = c(2L, 1L, 4L, 5L, 2L), V13 = c(4L, 1L, 2L, 2L, 5L)), class = "data.frame", row.names = c(NA,
-5L))