I am trying to create code in R that will instantly recognize the value of a certain column, find all rows with that value, and extract the data from of all of those rows including all other columns intersecting those rows in a new data frame. I want this to repeat for every distinct value inside the base column. for instance:
mydata <- data.frame(x = c(1,2,3), y = c('a','b','c'), z = c('red','red','yellow'))
colors <- list(mydata$z)
for (i in 1:length(colors)) {
assign(paste0("mydata_",i), subset(mydata, z == colors[[i]]))
}
this was my latest attempt but can't get it to work. the goal is to have in this example 2 new dataframes called "mydata_red" and "my_data_yellow". Each will only contain the matching rows
CodePudding user response:
Using assign
to split a frame or list into multiple objects is an anti-pattern, and rarely an improvement over the preferred method of keeping all frames in a list
. See How do I make a list of data frames? discussions on this topic. One premise is that when you do something to one frame in the list, it is likely that you will do something very similar to other elements of the list of frames, and working on the list using lapply
and generalizing your methods a little can make for cleaner solutions and such.
To get there with this data, it is as easy as splitting:
LOF <- split(mydata, mydata$z)
LOF ## <- "List Of Frames", perhaps not the most awesome name?
# $red
# x y z
# 1 1 a red
# 2 2 b red
# $yellow
# x y z
# 3 3 c yellow
As suggested by jay.sf's comment, this can be used to convert this list of frames into individual objects. While I discourage it in general, perhaps it's best for your use-case.
names(LOF) <- paste0("mydata_", names(LOF))
list2env(LOF, envir = globalenv())
# <environment: R_GlobalEnv> ### this can be safely ignored
ls()
# [1] "LOF" "mydata" "mydata_red" "mydata_yellow"
mydata_red
# x y z
# 1 1 a red
# 2 2 b red
CodePudding user response:
Your code works fine. Just remove list
so you create a vector of color names and not a list. If you only want distinct values, use unique
.
mydata <- data.frame(x = c(1,2,3), y = c('a','b','c'), z = c('red','red','yellow'))
colors <- unique(mydata$z)
for (i in 1:length(colors)) {
assign(paste0("mydata_",i), subset(mydata, z == colors[[i]]))
}
CodePudding user response:
In tidyverse:
mydata %>% group_by(z) %>% group_map(~.x %>% mutate(z=.y$z))
[[1]]
# A tibble: 2 × 3
x y z
<dbl> <chr> <chr>
1 1 a red
2 2 b red
[[2]]
# A tibble: 1 × 3
x y z
<dbl> <chr> <chr>
1 3 c yellow
The ~.x %>% mutate(z = .y$z)
may look a bit strange at first sight. The ~
creates a lambda (function). By default the .f
argument to group_map
takes one required and one optional parameter. The required argument is by default named .x
and it contains the subset of the input data frame that contain the current group. Similarly, .y
, the optional argument, contains a single row that defines the current group. group_map
applies the function defined by .f
to each group of the input data frame in turn and returns the results in a list.
mydata %>% group_by(z) %>% group_map(~.x %>% bind_cols(.y))
Has the same effect.