I have a large dataset with numerous sample IDs. A very simplified version looks something like this:
df <- data.frame(ID = rep(c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J"), times = c(10, 4, 12, 19, 5, 22, 6, 7, 11, 4)),
Value = sample(x = 20:30, size = 100, replace = T))
I would like to split my large dataset into multiple smaller dataframes based on ID so that when I plot the data my graph doesn't get too crowded. In this simplified example, I would like to split it into two dataframes/plots, one with data from the first 5 unique IDs (A-E) and the other with data from the next 5 unique IDs (F-J). How can I do this easily using index notation (assuming I have hundreds of IDs)? My code below doesn't work and I don't know what's wrong with it:
subset.1 <- df[unique(df$ID)[1:5]]
subset.2 <- df[unique(df$ID)[6:10]]
CodePudding user response:
You should subset with a logical vector:
df[df$ID %in% unique(df$ID)[1:5], ]
df[df$ID %in% unique(df$ID)[6:10], ]
You can also use split
with cut
to split your dataframe into n
datasets (here, 2) by group.
split(df, cut(as.numeric(as.factor(df$ID)), 2))