Home > Net >  Selecting variables that are in a vector with a substitution string in R
Selecting variables that are in a vector with a substitution string in R

Time:10-22

I have this dataset:

df <- data.frame(kgs_chicken = c(0,1,2,1,2,3,0,1,2,8),
                 kgs_total = c(2,4,8,2,3,4,2,4,6,20),
                 price = c(0.81, 1.42, 2.85, 0.73, 1.07, 
                           1.52, 0.72, 1.42, 1.94, 7.44))

And I applied some transformations:

df_trans <- df %>%
  mutate(ratio = kgs_chicken / kgs_total,
         kgs_chicken_ln = log(kgs_chicken - min(kgs_chicken)   1),
         kgs_total_ln = log(kgs_total - min(kgs_total)   1),
         ratio_price_kgs_total = price / kgs_total)

Then, after running an algorithm I am recommended to pick up some variables. This algorithm return just the vector with the names of the variables (which are hardcoded here):

filter_vector <- c("kgs_chicken_ln", "kgs_total")

Ok, I want to select only the variables applying that vector, but if one of the elements of the vector has a "_ln" string, I want the variable without the "_ln". I have tried this:

df %>%
  select(across(ends_with("_ln"), .fns = function (x) gsub("_ln","",names(x))))

But I get an error:

Error: `across()` must only be used inside dplyr verbs.

The expected result is:

   kgs_chicken kgs_total
1            0         2
2            1         4
3            2         8
4            1         2
5            2         3
6            3         4
7            0         2
8            1         4
9            2         6
10           8        20

Consider that I have a dataset with hundreds of variables so a solution could help me to automate that selection. Any help would be greatly appreciated.

CodePudding user response:

You may remove _ln string from the vector and select the column.

df[sub('_ln$', '', filter_vector)]

#   kgs_chicken kgs_total
#1            0         2
#2            1         4
#3            2         8
#4            1         2
#5            2         3
#6            3         4
#7            0         2
#8            1         4
#9            2         6
#10           8        20

In dplyr, you can use it within select -

library(dplyr)
df %>% select(sub('_ln$', '', filter_vector))

CodePudding user response:

Will this work:

library(dplyr)
library(stringr)

df_trans %>% select(filter_vector) %>% 
       rename_at(vars(ends_with('_ln')), ~ str_remove(., '_ln'))
   kgs_chicken kgs_total
1    0.0000000         2
2    0.6931472         4
3    1.0986123         8
4    0.6931472         2
5    1.0986123         3
6    1.3862944         4
7    0.0000000         2
8    0.6931472         4
9    1.0986123         6
10   2.1972246        20
  • Related