Use index to subset dataframe based on unique values in a column-CodePudding

I have a large dataset with numerous sample IDs. A very simplified version looks something like this:

df <- data.frame(ID = rep(c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J"), times = c(10, 4, 12, 19, 5, 22, 6, 7, 11, 4)),
                  Value = sample(x = 20:30, size = 100, replace = T))

I would like to split my large dataset into multiple smaller dataframes based on ID so that when I plot the data my graph doesn't get too crowded. In this simplified example, I would like to split it into two dataframes/plots, one with data from the first 5 unique IDs (A-E) and the other with data from the next 5 unique IDs (F-J). How can I do this easily using index notation (assuming I have hundreds of IDs)? My code below doesn't work and I don't know what's wrong with it:

subset.1 <- df[unique(df$ID)[1:5]]
subset.2 <- df[unique(df$ID)[6:10]]

CodePudding user response：

You should subset with a logical vector:

df[df$ID %in% unique(df$ID)[1:5], ]
df[df$ID %in% unique(df$ID)[6:10], ]

You can also use split with cut to split your dataframe into n datasets (here, 2) by group.

split(df, cut(as.numeric(as.factor(df$ID)), 2))