Home > Enterprise >  How to slice a dataset into multiple dataset in R
How to slice a dataset into multiple dataset in R

Time:03-11

For this example, I'm going to use iris dataset built-in in R.

How can I avoid the copy and pasting of the syntax below to have the same output?

package

library(dplyr)

Input

head(iris)

#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1          5.1         3.5          1.4         0.2  setosa
#2          4.9         3.0          1.4         0.2  setosa
#3          4.7         3.2          1.3         0.2  setosa
#4          4.6         3.1          1.5         0.2  setosa
#5          5.0         3.6          1.4         0.2  setosa
#6          5.4         3.9          1.7         0.4  setosa

Manual Solution

I have to subset my dataset based on the name of the column names. I know how to do this "manually" but it would require a lot of copying and pasting on my current dataset.

Sepal <- iris %>% select(contains("Sepal")) 
Petal <- iris %>% select(contains("Petal")) 

Output

head(Sepal)
# Sepal.Length Sepal.Width
# 1          5.1         3.5
# 2          4.9         3.0
# 3          4.7         3.2
# 4          4.6         3.1
# 5          5.0         3.6
# 6          5.4         3.9
head(Petal)
# Petal.Length Petal.Width
# 1          1.4         0.2
# 2          1.4         0.2
# 3          1.3         0.2
# 4          1.5         0.2
# 5          1.4         0.2
# 6          1.7         0.4

How can I automatize this process? I think I can use the purrr package here. But I couldn't find a way to do it.

CodePudding user response:

You can use

library(tidyverse)

map(set_names(c("Sepal", "Petal")), ~ select(iris, starts_with(.x)))

output (head)

$Sepal
  Sepal.Length Sepal.Width
1          5.1         3.5
2          4.9         3.0
3          4.7         3.2
4          4.6         3.1
5          5.0         3.6
6          5.4         3.9

$Petal
  Petal.Length Petal.Width
1          1.4         0.2
2          1.4         0.2
3          1.3         0.2
4          1.5         0.2
5          1.4         0.2
6          1.7         0.4

CodePudding user response:

An option is also to use split.default on the substring of column names to return a named list of data.frames

library(dplyr)
library(stringr)
head(iris) %>% 
 select(-Species) %>%
 split.default(str_remove(names(.), "\\..*"))
$Petal
  Petal.Length Petal.Width
1          1.4         0.2
2          1.4         0.2
3          1.3         0.2
4          1.5         0.2
5          1.4         0.2
6          1.7         0.4

$Sepal
  Sepal.Length Sepal.Width
1          5.1         3.5
2          4.9         3.0
3          4.7         3.2
4          4.6         3.1
5          5.0         3.6
6          5.4         3.9
  • Related