I want to create a function that I can simulate n number of times. My ultimate goal is to find if the sum of c for every n number of simulations. I am a beginner in r-coding so I am just starting to practice with for loops and if else statements.
This is what I hope to achieve as of now: If a> b, c would be "2" and if a < b, c would be "-2". If a=b, c would be determined by the a and b value of the NEXT row. This is what i have so far, but I am keep getting errors. I would like to know if what I have for a=b is how I should approach this. Any help is appreciated.
a<-c(5,6,7,8,9,10,1,4,6,7)
b<-c(4,6,8,5,3,4,5,2,1,3)
c<-c(0,0,0,0,0,0,0,0,0,0)
df<-data.frame(a,b,c)
if(df$a > df$b){
df$c<- c(2)}
else if(df$a < df$b){
df$c<- c(-2)}
else if(df$a == df$b){ # a=b
if(df$a[ 1,] > df$b[ 1,]) {
df$c<- c(2)}
else(df$a[ 1,] < df$b[ 1,]){
df$c<- c(-2) }
}
else
print("error")
}
sum(df$c)
CodePudding user response:
Edit: two users already pointed out what to do if there's consecutive rows of a == b. Good opportunity to dive into the tidyverse (as already suggested by others):
library(dplyr)
library(tidyr)
df <- data.frame(
a = c(5,6,7,8,9,10,1,4,6,7),
b = c(4,6,8,5,3,4,5,2,1,3)
)
df %>%
mutate(c = ifelse(a == b, NA, 2 * sign(a-b))) %>% ## (1)
fill(c, .direction = 'up') ## (2)
(1) set c to NA when a == b (2) 'fill' (replace) NAs with the next availabe value down the rows
Starting with R, it's helpful to know that vectorizing (the x[n]
thing) usually makes your code conciser and—in certain situations— much faster than using loops. In your case:
df$c <- 2 * sign(df$a - df$b) ## see ?sign
z <- df$c == 0 ## see (1)
df$c[z] = lead(df$c,1)[z] ## see (2)
(1) equal numbers have sign zero, z is a boolean vector indicating the positions (rows) where a == b (thus: z is TRUE)
(2) change c only at the positions where z is TRUE. lead
and lag
are functions taking a vector and returning its shifted (by a given number of positions) vector.
CodePudding user response:
The problem
if()
and else()
in R is meant for control flow, and is not vectorized. In plain English this means that if()
is expecting a statement evaluating to one TRUE
or FALSE
. When you do df$a > df$b
you get a boolean vector of the same length as rows in your dataframe. When this happens, if()
will only use the first item, and give you a warning. This will give you the wrong answers.
A better solution
I think you are looking for ifelse()
which is vectorized. And since you have nested if-else statements you are probably better off with dplyr::case_when()
.
Here is an example which also fixes cases where a == b
for multiple rows:
# Note that I've added two consecutive rows where a == b
a <- c(5,6,6,7,8,9,10,1,4,6,7)
b <- c(4,6,6,8,5,3,4,5,2,1,3)
df <- data.frame(a, b)
library(dplyr)
df %>%
mutate(
c = case_when(
a > b ~ 2,
a < b ~ -2,
# If not a > b nor a < b is TRUE, they must be equal,
# so we set all other cases to NA...
TRUE ~ NA_real_
)
) %>%
# ... and then we use fill() to replace NAs with the first
# non NA valua after it
tidyr::fill(c, .direction = "up")
#> a b c
#> 1 5 4 2
#> 2 6 6 -2
#> 3 6 6 -2
#> 4 7 8 -2
#> 5 8 5 2
#> 6 9 3 2
#> 7 10 4 2
#> 8 1 5 -2
#> 9 4 2 2
#> 10 6 1 2
#> 11 7 3 2
Created on 2022-03-30 by the reprex package (v2.0.1)
How this works:
ifelse()
works likeif()
andelse()
in your code, but it accepts multiple valuescase_when()
acts like nestedifelse()
statements, so it will first check ifa > b
and set those values equal to2
, next it will check the remaining rows ifa < b
and set those to-2
and so on.- In cases where a is not less nor more than b, they must be equal. We set these cases to
NA
. - After we use
tidyr::fill()
to replace missing values with the first instance of a non-missing value after it. This handles cases where there are multiple consecutive rows ofa == b
.
CodePudding user response:
Here is a tidyverse
solution. This will also work with multiple equal a
and b
in series (I have added row 3 to the data to demonstrate).
It relies on cumsum()
to group the data, such that rows with a == b
are in the same group as the next row that is a != b
. Then it sets c
to the last value in the group.
library(tidyverse)
a<-c(5,6,5,7,8,9,10,1,4,6,7)
b<-c(4,6,5,8,5,3,4,5,2,1,3)
df <-data.frame(a,b)
df |>
mutate(c = ifelse(a>b, 2, -2), # Determines c for `a != b` cases
grp = rev(cumsum(rev(a != b)))) |> # create group variable, use rev() since we want backward cumsum
group_by(grp) |>
mutate(c = last(c)) |>
ungroup() |>
select(-grp)
#> # A tibble: 11 × 3
#> a b c
#> <dbl> <dbl> <dbl>
#> 1 5 4 2
#> 2 6 6 -2
#> 3 5 5 -2
#> 4 7 8 -2
#> 5 8 5 2
#> 6 9 3 2
#> 7 10 4 2
#> 8 1 5 -2
#> 9 4 2 2
#> 10 6 1 2
#> 11 7 3 2
Created on 2022-03-30 by the reprex package (v2.0.1)