Home > front end >  How to create a loop which creates multiple subset dataframes from a larger data frame?
How to create a loop which creates multiple subset dataframes from a larger data frame?

Time:04-20

I am trying to create code in R that will instantly recognize the value of a certain column, find all rows with that value, and extract the data from of all of those rows including all other columns intersecting those rows in a new data frame. I want this to repeat for every distinct value inside the base column. for instance:

mydata <- data.frame(x = c(1,2,3), y = c('a','b','c'), z = c('red','red','yellow'))
colors <- list(mydata$z)
for (i in 1:length(colors)) {
   assign(paste0("mydata_",i), subset(mydata, z == colors[[i]]))
}

this was my latest attempt but can't get it to work. the goal is to have in this example 2 new dataframes called "mydata_red" and "my_data_yellow". Each will only contain the matching rows

CodePudding user response:

Using assign to split a frame or list into multiple objects is an anti-pattern, and rarely an improvement over the preferred method of keeping all frames in a list. See How do I make a list of data frames? discussions on this topic. One premise is that when you do something to one frame in the list, it is likely that you will do something very similar to other elements of the list of frames, and working on the list using lapply and generalizing your methods a little can make for cleaner solutions and such.

To get there with this data, it is as easy as splitting:

LOF <- split(mydata, mydata$z)
LOF  ## <- "List Of Frames", perhaps not the most awesome name?
# $red
#   x y   z
# 1 1 a red
# 2 2 b red
# $yellow
#   x y      z
# 3 3 c yellow

As suggested by jay.sf's comment, this can be used to convert this list of frames into individual objects. While I discourage it in general, perhaps it's best for your use-case.

names(LOF) <- paste0("mydata_", names(LOF))
list2env(LOF, envir = globalenv())
# <environment: R_GlobalEnv>  ### this can be safely ignored
ls()
# [1] "LOF"           "mydata"        "mydata_red"    "mydata_yellow"
mydata_red
#   x y   z
# 1 1 a red
# 2 2 b red

CodePudding user response:

Your code works fine. Just remove list so you create a vector of color names and not a list. If you only want distinct values, use unique.

mydata <- data.frame(x = c(1,2,3), y = c('a','b','c'), z = c('red','red','yellow'))

colors <- unique(mydata$z)

for (i in 1:length(colors)) {
    assign(paste0("mydata_",i), subset(mydata, z == colors[[i]]))
    }

CodePudding user response:

In tidyverse:

mydata %>% group_by(z) %>% group_map(~.x %>% mutate(z=.y$z))
[[1]]
# A tibble: 2 × 3
      x y     z    
  <dbl> <chr> <chr>
1     1 a     red  
2     2 b     red  

[[2]]
# A tibble: 1 × 3
      x y     z     
  <dbl> <chr> <chr> 
1     3 c     yellow

The ~.x %>% mutate(z = .y$z) may look a bit strange at first sight. The ~ creates a lambda (function). By default the .f argument to group_map takes one required and one optional parameter. The required argument is by default named .x and it contains the subset of the input data frame that contain the current group. Similarly, .y, the optional argument, contains a single row that defines the current group. group_map applies the function defined by .f to each group of the input data frame in turn and returns the results in a list.

mydata %>% group_by(z) %>% group_map(~.x %>% bind_cols(.y))

Has the same effect.

  • Related