Home > Back-end >  R - tidyverse approach to split a dataframe by columns and keep a set of common columns
R - tidyverse approach to split a dataframe by columns and keep a set of common columns

Time:12-09

My question is very similar to this one, but I would prefer to have a tidyverse approach.

I have a dataset with several columns and I want to split it columnwise (not rowwise!), but keep a list of common columns in every dataset. To illustrate this, I will use the iris dataset, and let's say that Species is the common column that I want to keep.

It would be really easy to do it using just these simple operations:

iris1 <- iris[,c("Species", "Sepal.Width")]
iris2 <- iris[,c("Species", "Sepal.Length")]
iris3 <- iris[,c("Species", "Petal.Width")]
iris4 <- iris[,c("Species", "Petal.Length")]

So I want to achieve the same output as that, but in a tidyverse style and usable in a pipeline without breaking it.

CodePudding user response:

One approach could be to make a function that extracts from iris the Species and the column (number or name) of your choice, then map those column numbers into your function.

library(dplyr)
make_df <- function(col) { iris %>% select(Species, {{ col }} )}
c(2,1,4,3) %>% purrr::map(make_df)

or as one line:

c(2,1,4,3) %>% map(~iris %>% select(Species, {{ .x }}))

This will output a list with four elements, each of which is a data frame like you describe. For many workflows that will be safer and more convenient than creating four free-floating data frames in the global environment.

c(2,1,4,3) %>% map(make_df) %>% map(head)

[[1]]
  Species Sepal.Width
1  setosa         3.5
2  setosa         3.0
3  setosa         3.2
4  setosa         3.1
5  setosa         3.6
6  setosa         3.9

[[2]]
  Species Sepal.Length
1  setosa          5.1
2  setosa          4.9
3  setosa          4.7
4  setosa          4.6
5  setosa          5.0
6  setosa          5.4

[[3]]
  Species Petal.Width
1  setosa         0.2
2  setosa         0.2
3  setosa         0.2
4  setosa         0.2
5  setosa         0.2
6  setosa         0.4

[[4]]
  Species Petal.Length
1  setosa          1.4
2  setosa          1.4
3  setosa          1.3
4  setosa          1.5
5  setosa          1.4
6  setosa          1.7

CodePudding user response:

To do this in a tidyverse style, you can use the select() function from the dplyr package to subset the columns in the iris dataset. For example, you could do the following:

# Load the dplyr package
library(dplyr)

# Subset the iris dataset to keep only the Species and Sepal.Width columns
iris1 <- iris %>% select(Species, Sepal.Width)

  •  Tags:  
  • r
  • Related