I have this dataset:
df <- data.frame(kgs_chicken = c(0,1,2,1,2,3,0,1,2,8),
kgs_total = c(2,4,8,2,3,4,2,4,6,20),
price = c(0.81, 1.42, 2.85, 0.73, 1.07,
1.52, 0.72, 1.42, 1.94, 7.44))
And I applied some transformations:
df_trans <- df %>%
mutate(ratio = kgs_chicken / kgs_total,
kgs_chicken_ln = log(kgs_chicken - min(kgs_chicken) 1),
kgs_total_ln = log(kgs_total - min(kgs_total) 1),
ratio_price_kgs_total = price / kgs_total)
Then, after running an algorithm I am recommended to pick up some variables. This algorithm return just the vector with the names of the variables (which are hardcoded here):
filter_vector <- c("kgs_chicken_ln", "kgs_total")
Ok, I want to select only the variables applying that vector, but if one of the elements of the vector has a "_ln" string, I want the variable without the "_ln". I have tried this:
df %>%
select(across(ends_with("_ln"), .fns = function (x) gsub("_ln","",names(x))))
But I get an error:
Error: `across()` must only be used inside dplyr verbs.
The expected result is:
kgs_chicken kgs_total
1 0 2
2 1 4
3 2 8
4 1 2
5 2 3
6 3 4
7 0 2
8 1 4
9 2 6
10 8 20
Consider that I have a dataset with hundreds of variables so a solution could help me to automate that selection. Any help would be greatly appreciated.
CodePudding user response:
You may remove _ln
string from the vector and select the column.
df[sub('_ln$', '', filter_vector)]
# kgs_chicken kgs_total
#1 0 2
#2 1 4
#3 2 8
#4 1 2
#5 2 3
#6 3 4
#7 0 2
#8 1 4
#9 2 6
#10 8 20
In dplyr
, you can use it within select
-
library(dplyr)
df %>% select(sub('_ln$', '', filter_vector))
CodePudding user response:
Will this work:
library(dplyr)
library(stringr)
df_trans %>% select(filter_vector) %>%
rename_at(vars(ends_with('_ln')), ~ str_remove(., '_ln'))
kgs_chicken kgs_total
1 0.0000000 2
2 0.6931472 4
3 1.0986123 8
4 0.6931472 2
5 1.0986123 3
6 1.3862944 4
7 0.0000000 2
8 0.6931472 4
9 1.0986123 6
10 2.1972246 20