Home > Net >  Removing a prefix from a subset of column names using the str_remove function
Removing a prefix from a subset of column names using the str_remove function

Time:08-27

In this SO post the accepted answer shows how to remove a prefix from a subset of column names. I will reproduce the toy data and solution and get to my issue. Note that I have altered the toy data by adding a suffix (_end) to two of the variables.

df <- data.frame(ATH_V1 = rnorm(10), ATH_V2_end = rnorm(10), ATH_V3_end = rnorm(10), ATH_V4 = rnorm(10), ATH_V5 = rnorm(10), ATH_V6 = rnorm(10), ATH_V7 = rnorm(10))

df
    
#        ATH_V1  ATH_V2_end ATH_V3_end     ATH_V4     ATH_V5      ATH_V6      ATH_V7
# 1  -1.5520380  1.16782520 -0.3628090  1.5238728 -1.1660806 -1.01416226 -0.95163564
# 2   0.6270134  1.63810443  0.2199733 -0.6175186 -1.8909463 -0.23913125 -0.70650296
# 3  -0.7462879  0.08504734  0.6506818 -0.5436457  1.3369322  1.69883194 -1.07623124
# 4   0.3196569  0.95782069 -0.3454795 -1.7485607  2.3896003  1.24958489 -0.73316675
# 5  -0.8820414 -2.01739089 -0.5881156  1.2725712  1.4251221  0.56213069 -0.47188011
# 6  -0.5534390  1.48974625 -0.2532402 -1.2333677  1.6690452 -0.48178503  0.30727117
# 7  -0.4637729 -1.13762829  1.3072153  1.0082090 -1.7958189 -1.37604307 -0.08900913
# 8  -0.3878013 -1.09693619 -0.9022672  0.1809460 -1.0303186  0.54576930 -0.64634653
# 9  -0.9553941  0.91495814 -0.2993733 -0.5860527 -0.5623538 -0.24521585  0.21297231
# 10  2.2891475  0.05568124 -0.1718192  0.4249103  2.6009601  0.06357305  0.47794076

I would like to remove the ATH_ prefix ONLY from the columns that end with _end.

Now the solution in the original post proposed the following code, where we specify the column names we want to operate on in a vector within rename_at and then remove the ATH_ prefix via the str_remove function, like so

df %>% rename_at(c("ATH_V2_end", "ATH_V3_end"), ~ .x %>% str_remove("^ATH_"))

#         ATH_V1     V2_end      V3_end      ATH_V4     ATH_V5     ATH_V6      ATH_V7
# 1   1.14822123 -0.6285561  0.52458507 -0.63906454  1.1401342 -1.6559726  0.41732258
# 2   0.07519307  2.0090135  0.13440368  1.24337727 -0.2906335 -0.1349698  1.45647898
# 3  -0.87465492 -1.8766134 -0.17119197 -1.22701678 -0.7603659  0.1015543 -1.06211069
# 4   1.01402581 -0.4744169  0.78326842 -0.02910686  0.1548202  1.0042147 -0.23739832
# 5   1.00613252 -1.5701097  1.64415870  0.86733910  0.1558727  0.3011537  0.05700506
# 6  -1.01416351 -1.7687648 -0.13999833 -1.01482747 -0.5732621 -0.2504362  2.20762232
# 7   1.00861721  0.7494679  0.08853307  1.46402775 -0.1153655  0.8427913 -1.16114455
# 8   0.28117809 -0.6669487 -0.50816389 -0.12875270  0.7798111 -0.3937148 -1.30894602
# 9  -0.23092640  2.8516271 -1.36959691 -0.39303227  1.9862182  1.2378769 -1.66039502
# 10  0.65034202  0.9009923  0.58264859  0.50931251  1.7284268  1.8420746 -0.71894637

However the help for the new dplyr suite of packages states that rename_at has been superseded by rename_with and that you can use some of the powerful functionality of the select functions to choose a subsets of columns.

So I would like to remove the ATH_ prefix ONLY from the columns that end with _end using the ends_with() function within rename_with() using tidyverse grammar.

I tried

df %>%
  select(ends_with("_end")) %>%
    rename_with(str_remove(string = ~.x,
                           pattern = "^ATH_"))

and

df %>%
  rename_with(cols = ends_with("_end"),
              .fn = str_remove(string = ~.x,
                               pattern = "^ATH_"))

And got the same error

Error in `rename_with()`:
! Can't convert `.fn`, a character vector, to a function.

Any help much appreciated

CodePudding user response:

If you use select to filter the columns, those columns will no longer be a part of the data frame. You're on the right track, though.

If you don't use the tilde with .x to represent the dynamic field name, you have to use function, literally.

For example, you can use the tilde, like this:

rename_with(df, .cols = ends_with("_end"),
            ~ gsub("^ATH_", "", .x))

Or you can designate a variable name of your choice, instead of .x, and use function(), like this:

rename_with(df, .cols = ends_with("_end"),
            .fn = function(frenchFries) {
              gsub("^ATH_", "", frenchFries)
             })

You can use names() to test your work before you change the object. The names() function wasn't really intended for piping, but with a bit of finesse, it does the job.

rename_with(df, .cols = ends_with("_end"),
            .fn = function(frenchFries) {
              gsub("^ATH_", "", frenchFries)
              }) %>% {names(.)}
# [1] "ATH_V1" "V2_end" "V3_end" "ATH_V4" "ATH_V5" "ATH_V6" "ATH_V7" 

In R, very few libraries present objects as mutable or modified in place, so you have to assign this to an object to actually change it.

df <- rename_with(df, .cols = ends_with("_end"),
                ~ gsub("^ATH_", "", .x))

CodePudding user response:

You put the ~ symbol to a wrong place... It should be

df %>%
  rename_with(cols = ends_with("_end"),
              .fn = ~ str_remove(string = .x, pattern = "^ATH_"))

#            V1     V2_end     V3_end          V4          V5         V6          V7
# 1  -0.7211939 -0.8369699  0.8317321 -0.05233632  0.05711023 -1.1028795 -0.44261881
# 2  -1.2497923 -0.9062427  1.6472891 -0.77403163 -0.37941031 -0.8270005  1.14721669
# 3  -0.1343481 -1.2049003  0.5347915  0.16202132 -0.38939422 -1.6720070 -1.55429956
# 4   0.1664160  1.9248057 -0.1133589 -0.48717961  0.89363994  1.0983927  0.82700398
# 5  -1.0916865 -0.8093323 -1.3128583 -0.68529918 -0.22614257  0.3307024 -2.45071083
# 6   0.4191887  1.6177852  1.7017075  1.40316160 -1.30115133 -0.6129785  1.28648456
# 7   0.8725919 -0.2706190  1.3131828 -2.99366849  1.28976332 -0.2348865  1.09045642
# 8  -0.5935664 -0.2918142  0.7699294 -1.30566644 -1.53736071 -0.2689142  0.10605338
# 9   1.4284704 -0.3578967 -0.8106887  1.04486145 -0.32881870  0.2486389  0.08226489
# 10  1.2323733 -0.2241655  0.2167915 -0.31868072 -0.74497243 -1.7778882 -0.70894820

More concise expression is

df %>%
  rename_with(~ str_remove(.x, "^ATH_"), ends_with("_end"))

and even

df %>%
  rename_with(str_remove, ends_with("_end"), "^ATH_")
  • Related