Suppose I have the following DF:
C1 | C2 |
---|---|
0 | 0 |
1 | 1 |
1 | 1 |
0 | 0 |
. | . |
. | . |
I now want to apply these following conditions on the Dataframe:
- The value for C1 should be 1
- A random integer between 0 and 5 should be less than 2
If both these conditions are true, I change the C1 and C2 value for that row to 2
I understand this can be done by using the apply function, and I have used the following:
C1 <- c(0, 1,1,0,1,0,1,0,1,0,1)
C2 <- c(0, 1,1,0,1,0,1,0,1,0,1)
df <- data.frame(C1, C2)
fun <- function(x){
if (sample(0:5, 1) < 2){
x[1:2] <- 2
}
return (x)
}
index <- df$C1 ==1 // First Condition
processed_Df <-t(apply(df[index,],1,fun)) // Applies Second Condition
df[index,] <- processed_Df
Output:
C1 | C2 |
---|---|
0 | 0 |
2 | 2 |
1 | 1 |
0 | 0 |
. | . |
. | . |
Some Rows have both conditions met, some doesn't (This is the main functionality, I would like to achieve)
Now I want to achieve this same using vectorization and without using loops or the apply
function. The only confusion I have is "If I don't use apply
, won't each row get the same result based on the condition's result? (For example, the following:)
df$C1 <- ifelse(df$C1==1 & sample(0:5, 1) < 5, 2, df$C1)
This changes all the rows in my DF with C1==2 to 2 when there should possibly be many 1's.
Is there a way to get different results for the second condition for each row without using the apply
function? Hopefully my question makes sense.
Thanks
CodePudding user response:
You need to sample
the values for nrow
times. Try this method -
set.seed(167814)
df[df$C1 == 1 & sample(0:5, nrow(df), replace = TRUE) < 2, ] <- 2
df
# C1 C2
#1 0 0
#2 2 2
#3 2 2
#4 0 0
#5 1 1
#6 0 0
#7 2 2
#8 0 0
#9 1 1
#10 0 0
#11 1 1
CodePudding user response:
Here is a fully vectorized way. Create the logical index index
just like in the question. Then sample all random integers r
in one call to sample
. Replace in place based on the conjunction of the index and the condition r < 2
.
x <- 'C1 C2
0 0
1 1
1 1
0 0'
df1 <- read.table(textConnection(x), header = TRUE)
set.seed(1)
index <- df1$C1 == 1
r <- sample(0:5, length(index), TRUE)
df1[index & r < 2, c("C1", "C2")] <- 2
df1
#> C1 C2
#> 1 0 0
#> 2 1 1
#> 3 2 2
#> 4 0 0
Created on 2022-05-11 by the reprex package (v2.0.1)