Home > Software design >  How to deselect many variables without removing specific variables in dplyr
How to deselect many variables without removing specific variables in dplyr

Time:09-02

Say there is a data frame that has a structure like this:

df <- data.frame(x.1 = rnorm(n=100),
                 x.2 = rnorm(n=100),
                 x.3 = rnorm(n=100),
                 x.special = rnorm(n=100),
                 x.y.z = rnorm(n=100))

Inspecting the head, we get this output:

          x.1        x.2         x.3  x.special      x.y.z
1  1.01014580 -1.4047666  1.50374721 -0.8339784 -0.0831983
2  0.44307253 -0.4695634 -0.71951820  1.5758893  1.2163749
3 -0.87051845  0.1793721 -0.26838489 -1.0477929 -1.0813926
4 -0.28491936  0.4186763 -0.07494088 -0.2177471  0.3490200
5 -0.03769566 -0.3656822  0.12478667 -0.7975811 -0.4481193
6 -0.83808036  0.6842561  0.71231627 -0.3348798  1.7418141

Suppose I want to remove all the numbered variables but keep the x.special and x.y.z variables. I know that I can easily deselect with:

df %>% 
  select(-x.1,
         -x.2,
         -x.3)

However for something like 50 or 100 variables like this, it would become cumbersome. Similarly, I know I can pick patterns like so:

df %>% 
  select(-contains("x."))

But this of course removes everything because the special variables have the . name. Is there a more intelligent way of picking these variables? I feel like there is an option for finding the numeric variable in the name.

CodePudding user response:

# use regex to remove these colums...
colsBool <- !grepl(x=names(df), pattern="\\d")

Result:

> head(df[, colsBool])
   x.special      x.y.z
1  1.1145156 -0.4911891
2  0.7059937  0.4500111
3 -0.6566422  1.6085353
4 -0.6322514 -0.8017260
5  0.4785106  0.6014765
6 -0.8508830 -0.5078307

Regular expressions are your best friend in this situation.

For instance, if you wanted to remove columns whose last value is a number, just do !grepl(pattern = "\\d$",...), the $ sign at the end of the expression will match only columns ending with a number. The ! sign in front of the grepl() expression negates the values in the match, that is, a TRUE becomes FALSE and vice-versa.

  • Related