Home > Software engineering >  Error in `mutate()` while creating a new variable using R
Error in `mutate()` while creating a new variable using R

Time:07-02

So I have a dataframe and I want to create a new variable randomly using other factors; my data contains this key variables:

iQ Age Educ_y
5 23 15
4 54 17
2 43 6
3 13 7
5 14 8
1 51 16

I want to generate a new variable (years of experience) randomly using this creterias:

If Age >= 15 & Iq<= 2 so "Exp_y" takes a randome number between (Age-15)/2 and Age-15.

If (Age >= 15 & (Iq==3 | Iq==4) so "Exp_y" takes a randome number between (Age-Educ_y-6)/2 and (Age-Educ_y-6).

And 0 otherwise.

I tried using this code :

Df <- Df %>% 
  rowwise() %>% 
  mutate(Exep_y = case_when(
    Age > 14 & iq <= 2 ~ sample(seq((Age-15)/2, Age-15, 1), 1),
    Age > 14 & between(iq, 3, 4)  ~ sample(seq((Age-Educ_y-6)/2, Age-Educ_y-6, 1), 1),
    TRUE               ~ 0
  ))

But I end up with this Error message:

Error in `mutate()`:
! Problem while computing `Exep_y = case_when(...)`.
i The error occurred in row 3.
Caused by error in `seq.default()`:
! signe incorrect de l'argument 'by'

Any ideas please; Best Regards

CodePudding user response:

You could try using if_else() rather than case_when:

Documentation can be found here: https://dplyr.tidyverse.org/reference/if_else.html

CodePudding user response:

This error message is occurring because the case_when() statement evaluates all the right-hand-side expressions, and then selects based on the left-hand-side.. Therefore, even though, for example row 4 of your sample dataset will default to TRUE~0, the RHS side of the the first two conditions also gets evaluated. In this case, the first condition's RHS is seq((13-15)/2,13-15,1), which returns an error, because in this case from = -1 and to = -2, so the by argument cannot be 1 (it is the wrong sign).

seq((13-15)/2, 13-15, 1)
Error in seq.default((13 - 15)/2, 13 - 15, 1) : 
  wrong sign in 'by' argument

You could do something like this:

f <- function(i,a,e) {
  if(i>4 | a<15) return(0)
  if(i<=2) return(sample(seq((a-15)/2, a-15),1))
  return(sample(seq((a-e-6)/2, a-e-6),1))
}

Df %>% rowwise() %>% mutate(Exep_y=f(iq,Age,Educ_y))

Output:

     iq   Age Educ_y Exep_y
  <int> <int>  <int>  <dbl>
1     5    23     15    0  
2     4    54     17   16.5
3     2    43      6   21  
4     3    13      7    0  
5     5    14      8    0  
6     1    51     16   27 
  • Related