I have a dataframe that needs to be split into individual files based on the value of a variable in the dataframe. There are scores of individuals and confidential information in the dataframe, thus a simplified example is below. I want the split to be based on the variable "first".
first <- c("Jon", "Bill", "Bill" , "Maria", "Ben", "Tina")
age <- c(23, 41, 41 , 32, 58, 26)
df <- data.frame(first , age)
df
For example, I want the file with Jon to have one line and the file with Bill to have two lines. I've attempted the following but I'm stuck. I don't know how to get individual dataframes from the list df.split.
library(tidyverse)
df.grped <-
df %>%
group_by(first)
df.split <-
group_split(df.grped)
So I would like to have the files: df.split_Jon, df.split_Bill, df.split_Maria, etc. The actual source file is large so I don't want to specify each.
Since I understand working in tidyverse the best I'd like to have the solution there, if possible. Thanks for any help!!
CodePudding user response:
After splitting the data set by the unique values of the first column, we make use of list2env
function to create a separated dataframe
of each subset into the global environment as follows:
library(tidyverse)
setNames(df %>%
group_split(first), paste0("df.split_", unique(df$first))) %>%
list2env(envir = globalenv())
CodePudding user response:
Another alternative:
library(tidyverse)
df %>%
group_split(first) %>%
walk(~ assign(str_c("df.split_", .[1, 1]), value = ., envir = .GlobalEnv))
names(.GlobalEnv)
#> [1] "df.split_Bill" "first" "df.split_Maria" "df.split_Ben"
#> [5] "df.split_Tina" "age" "df.split_Jon" "df"
Created on 2022-01-01 by the reprex package (v2.0.1)