Good afternoon,
I'm newish and currently trying to work in SPARK using sparlyr and dplyr libraries and faced with a problem - after performing transformation with mutate function (for example, after adding a column) I can not reference to this newly-created column, however it is vital for my future calcualtions. In other words, my initial df does not have newly created column, and this column is present only in the transformation that I have done.
Here is an example:
#Creating a df
block1_value <- c(1000, 1500, 2000, 3000, 3500, 4000, 5000)
block2_value <- c(1, 2, 3, 4, 5, 6, 7)
block3_value <- c("a", "b", "c", "d", "e", "f", "g")
df <- data.frame(block1_value, block2_value, block3_value)
#Using mutate() to add new calculated column
df %>%
mutate(Result = block1_value block2_value)
#While referencing to this newly created column I do get an error
df %>%
mutate(Result2 = ifelse(Result > 3000, "Yes", "No"))
How is it possible to fix this problem using dplyr syntax (the problem is that I can use only dplyr library as all the work is performed is Spark)
Thanks a lot!!
CodePudding user response:
mutate
doesn't actually mutate a variable. It produces a modified copy of the dataframe. The following code works because the %>%
operator forwards the result of the first mutate
(i.e. the modified df
) to the second mutate
.
df %>%
mutate(Result = block1_value block2_value) %>%
mutate(Result2 = ifelse(Result > 3000, "Yes", "No"))
#> block1_value block2_value block3_value Result Result2
#>1 1000 1 a 1001 No
#>2 1500 2 b 1502 No
#>3 2000 3 c 2003 No
#>4 3000 4 d 3004 Yes
#>5 3500 5 e 3505 Yes
#>6 4000 6 f 4006 Yes
#>7 5000 7 g 5007 Yes
CodePudding user response:
You have not assigned the mutation to the dataframe.
This works
df <- df %>% mutate(Result = block1_value block2_value)
df <-df %>% mutate(Result2 = ifelse(Result > 3000, "Yes", "No"))
But this is clean and efficient.
df <- df %>% mutate(Result = block1_value block2_value) %>%
mutate(Result2 = ifelse(Result > 3000, "Yes", "No"))