Home > front end >  How can I use vectorisation in R to change a DF value based on a condition?
How can I use vectorisation in R to change a DF value based on a condition?

Time:05-11

Suppose I have the following DF:

C1 C2
0 0
1 1
1 1
0 0
. .
. .

I now want to apply these following conditions on the Dataframe:

  • The value for C1 should be 1
  • A random integer between 0 and 5 should be less than 2

If both these conditions are true, I change the C1 and C2 value for that row to 2

I understand this can be done by using the apply function, and I have used the following:

C1 <- c(0, 1,1,0,1,0,1,0,1,0,1)
C2 <- c(0, 1,1,0,1,0,1,0,1,0,1)

df <- data.frame(C1, C2)

fun <- function(x){
  if (sample(0:5, 1) < 2){
    x[1:2] <- 2
  }
  return (x)
}

index <- df$C1 ==1  // First Condition
processed_Df <-t(apply(df[index,],1,fun)) // Applies Second Condition
df[index,] <-  processed_Df

Output:

C1 C2
0 0
2 2
1 1
0 0
. .
. .

Some Rows have both conditions met, some doesn't (This is the main functionality, I would like to achieve)

Now I want to achieve this same using vectorization and without using loops or the apply function. The only confusion I have is "If I don't use apply, won't each row get the same result based on the condition's result? (For example, the following:)

df$C1 <- ifelse(df$C1==1 & sample(0:5, 1) < 5, 2, df$C1)

This changes all the rows in my DF with C1==2 to 2 when there should possibly be many 1's.

Is there a way to get different results for the second condition for each row without using the apply function? Hopefully my question makes sense.

Thanks

CodePudding user response:

You need to sample the values for nrow times. Try this method -

set.seed(167814)
df[df$C1 == 1 & sample(0:5, nrow(df), replace = TRUE) < 2, ] <- 2
df

#   C1 C2
#1   0  0
#2   2  2
#3   2  2
#4   0  0
#5   1  1
#6   0  0
#7   2  2
#8   0  0
#9   1  1
#10  0  0
#11  1  1

CodePudding user response:

Here is a fully vectorized way. Create the logical index index just like in the question. Then sample all random integers r in one call to sample. Replace in place based on the conjunction of the index and the condition r < 2.

x <- 'C1    C2
0   0
1   1
1   1
0   0'
df1 <- read.table(textConnection(x), header = TRUE)

set.seed(1)
index <- df1$C1 == 1
r <- sample(0:5, length(index), TRUE)
df1[index & r < 2, c("C1", "C2")] <- 2
df1
#>   C1 C2
#> 1  0  0
#> 2  1  1
#> 3  2  2
#> 4  0  0

Created on 2022-05-11 by the reprex package (v2.0.1)

  • Related