So I have a dataframe and I want to create a new variable randomly using other factors; my data contains this key variables:
iQ | Age | Educ_y |
---|---|---|
5 | 23 | 15 |
4 | 54 | 17 |
2 | 43 | 6 |
3 | 13 | 7 |
5 | 14 | 8 |
1 | 51 | 16 |
I want to generate a new variable (years of experience) randomly using this creterias:
If Age >= 15 & Iq<= 2 so "Exp_y" takes a randome number between (Age-15)/2 and Age-15.
If (Age >= 15 & (Iq==3 | Iq==4) so "Exp_y" takes a randome number between (Age-Educ_y-6)/2 and (Age-Educ_y-6).
And 0 otherwise.
I tried using this code :
Df <- Df %>%
rowwise() %>%
mutate(Exep_y = case_when(
Age > 14 & iq <= 2 ~ sample(seq((Age-15)/2, Age-15, 1), 1),
Age > 14 & between(iq, 3, 4) ~ sample(seq((Age-Educ_y-6)/2, Age-Educ_y-6, 1), 1),
TRUE ~ 0
))
But I end up with this Error message:
Error in `mutate()`:
! Problem while computing `Exep_y = case_when(...)`.
i The error occurred in row 3.
Caused by error in `seq.default()`:
! signe incorrect de l'argument 'by'
Any ideas please; Best Regards
CodePudding user response:
You could try using if_else()
rather than case_when
:
Documentation can be found here: https://dplyr.tidyverse.org/reference/if_else.html
CodePudding user response:
This error message is occurring because the case_when()
statement evaluates all the right-hand-side expressions, and then selects based on the left-hand-side.. Therefore, even though, for example row 4 of your sample dataset will default to TRUE~0
, the RHS side of the the first two conditions also gets evaluated. In this case, the first condition's RHS is seq((13-15)/2,13-15,1)
, which returns an error, because in this case from
= -1 and to
= -2, so the by
argument cannot be 1 (it is the wrong sign).
seq((13-15)/2, 13-15, 1)
Error in seq.default((13 - 15)/2, 13 - 15, 1) :
wrong sign in 'by' argument
You could do something like this:
f <- function(i,a,e) {
if(i>4 | a<15) return(0)
if(i<=2) return(sample(seq((a-15)/2, a-15),1))
return(sample(seq((a-e-6)/2, a-e-6),1))
}
Df %>% rowwise() %>% mutate(Exep_y=f(iq,Age,Educ_y))
Output:
iq Age Educ_y Exep_y
<int> <int> <int> <dbl>
1 5 23 15 0
2 4 54 17 16.5
3 2 43 6 21
4 3 13 7 0
5 5 14 8 0
6 1 51 16 27