Home > Back-end >  Creating a dataframe in R that is a subset of a number of other columns
Creating a dataframe in R that is a subset of a number of other columns

Time:01-27

I have a data frame with 854 observations and 47 variables (India_Summary). I want to create another data frame that contains only some columns from the 47 variables, named 'MEMSEXCOV1', 'PostSecAvailable', 'TertiaryYears'.

I thought I could simply use this (assuming I am just naming the new df 'India_Summary2'):

India_Summary2 <- India_Summary[['MEMSEXCOV1', 'PostSecAvailable', 'TertiaryYears']]

The error I receive is:

Error in `[[.default`(col, i, exact = exact) : subscript out of bounds.

I tried using an equal sign instead:

India_Summary2 = India_Summary[['MEMSEXCOV1', 'PostSecAvailable', 'TertiaryYears']]

and I receive the below error:

Error in `[[.default`(col, i, exact = exact) : subscript out of bounds
In addition: Warning messages:
1: In doTryCatch(return(expr), name, parentenv, handler) :
  display list redraw incomplete
2: In doTryCatch(return(expr), name, parentenv, handler) :
  invalid graphics state
3: In doTryCatch(return(expr), name, parentenv, handler) :
  invalid graphics state

CodePudding user response:

Your code looks like Python. In R, I'd recommend using the dplyr package. You'd have something like this:

library(dplyr)

India_Summary2 <- India_Summary %>% 
   select(MEMSEXCOV1, PostSecAvailable, TertiaryYears)

CodePudding user response:

You haven't provided any of your data and Justin already provided a solution using the dplyr package. It's impossible to know if this will work for you since your data is not available, so I show a way to do it with the iris dataset already in R, employing a method that doesn't require libraries.

First, the data. I can inspect the top with head(iris):

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

I want Sepal.Length and Sepal.Width. So I can achieve this in R's base functions in two ways. First, with matrix notation, I select a row x column location of values [X, X]. Since I only want columns Sepal.Width and Sepal.Length, I ask for only columns by omitting the row [,X].

#### Subset by Matrix Notation ####
iris.2 <- iris[,c(1,2)]

Alternatively, I can do the same thing by specifying specifically what I want with subset using the select argument.

#### Subset with Function ####
iris.2 <- subset(iris,
       select = c("Sepal.Length",
                  "Sepal.Width"))

Both achieve the same thing. If I now use head(iris), I only see two columns:

  Sepal.Length Sepal.Width
1          5.1         3.5
2          4.9         3.0
3          4.7         3.2
4          4.6         3.1
5          5.0         3.6
6          5.4         3.9
  • Related