Home > Back-end >  Choose dataframe variables by name and multiply with a vector elementwise
Choose dataframe variables by name and multiply with a vector elementwise

Time:11-01

I have a data frame and a vector as follows:

my_df <- as.data.frame(
list(year = c(2001, 2001, 2001, 2001, 2001, 2001), month = c(1, 
2, 3, 4, 5, 6), Pdt_d0 = c(0.379045935402736, 0.377328817455841, 
0.341158889847019, 0.36761990427443, 0.372442657083218, 0.382702189949558
), Pdt_d1 = c(0.146034519173855, 0.166289573095497, 0.197787188740911, 
0.137071647982617, 0.162103042313547, 0.168566518193772), Pdt_d2 = c(0.126975939811326, 
0.107708783271871, 0.14096203677089, 0.142228236885706, 0.115542396064519, 
0.106935751726809), Pdt_tot = c(2846715, 2897849.5, 2935406.25, 
2850649, 2840313.75, 3087993.5))
)

my_vector <- 1:3

I want to multiply Pdt_d0:Pdt_d2 with the corresponding element from my_vec, while keeping the other columns untouched. I can get the desired multiplication with dplyr::select(my_df, num_range("Pdt_d", 0:2)) %>% mapply(``*``, ., my_vec) but I lose the year, month, Pdt_tot columns in the process. I tried to achieve my goal with dplyr::select(my_df, num_range("Pdt_d", 0:2)) <- dplyr::select(my_df, num_range("Pdt_d", 0:2)) %>% mapply(``*``, ., my_vec) which returns an error 'select<-' is not an exported object. Is there an obvious trick I am not seeing?

I don't think my question is a duplicate; I have seen the answers in here and here but neither question allows me to choose variables by name

CodePudding user response:

I don't think that you want to do this mess, but it does work.

library(dplyr)
library(tidyr)

my_df %>%
  gather(variable, value, -year,-month,-Pdt_tot) %>%
  group_by(year, month, Pdt_tot) %>%
  mutate(value = value * my_vector) %>%
  spread(variable,value)

   year month  Pdt_tot Pdt_d0 Pdt_d1 Pdt_d2
  <dbl> <dbl>    <dbl>  <dbl>  <dbl>  <dbl>
1  2001     1 2846715   0.379  0.292  0.381
2  2001     2 2897850.  0.377  0.333  0.323
3  2001     3 2935406.  0.341  0.396  0.423
4  2001     4 2850649   0.368  0.274  0.427
5  2001     5 2840314.  0.372  0.324  0.347
6  2001     6 3087994.  0.383  0.337  0.321

Not specifying year, month, and Pdt_tot is,

my_df %>%
  gather(variable, value, - !num_range("Pdt_d", 0:2)) %>%
  group_by(across(c(-variable, -value))) %>%
  mutate(value = value * my_vector) %>%
  spread(variable, value)

   year month  Pdt_tot Pdt_d0 Pdt_d1 Pdt_d2
  <dbl> <dbl>    <dbl>  <dbl>  <dbl>  <dbl>
1  2001     1 2846715   0.379  0.292  0.381
2  2001     2 2897850.  0.377  0.333  0.323
3  2001     3 2935406.  0.341  0.396  0.423
4  2001     4 2850649   0.368  0.274  0.427
5  2001     5 2840314.  0.372  0.324  0.347
6  2001     6 3087994.  0.383  0.337  0.321

CodePudding user response:

You can use the left-hand-side overwritten by the right-hand-side Map/mapply logic, which you tried, outside of the tidy world:

vars <- paste0("Pdt_d", 0:2)
my_df[vars] <- Map(`*`, my_df[vars], my_vector)
my_df

#  year month    Pdt_d0    Pdt_d1    Pdt_d2 Pdt_tot
#1 2001     1 0.3790459 0.2920690 0.3809278 2846715
#2 2001     2 0.3773288 0.3325791 0.3231263 2897850
#3 2001     3 0.3411589 0.3955744 0.4228861 2935406
#4 2001     4 0.3676199 0.2741433 0.4266847 2850649
#5 2001     5 0.3724427 0.3242061 0.3466272 2840314
#6 2001     6 0.3827022 0.3371330 0.3208073 3087994

This works because [<- exists as a function in R, for assigning to a left-hand-side selection by the square brackets, like my_df[].
The error that was returned is because the code has a select() function on the left-hand-side, and there is no 'select<-' function. I.e., you can't assign to a select()-ion because it isn't setup to work like that. The tidy functions are usually expected to be piped like my_df %>% select() %>% etc without overwriting the original input.

  • Related