Home > Net >  Subset a patterned series from one column of a dataframe
Subset a patterned series from one column of a dataframe

Time:10-20

Given the following dataframe:

set.seed(1)
df <- data.frame(rnorm(600))

I'd like to subset every other group of five from it. Essentially, cutting the dataset in half so that n = 300. One way to do this is the following:

subset.df <- data.frame(df$rnorm.600.[c(1:5, 11:15, 21:25, 31:35, 41:45, 51:55, 61:65, 71:75, 81:85, 91:95, 101:105, 111:115, 121:125, 131:135, 141:145, 151:155, 161:165, 171:175, 181:185, 191:195, 201:205, 211:215, 221:225, 231:235, 241:245, 251:255, 261:265, 271:275, 281:285, 291:295, 301:305, 311:315, 321:325, 331:335, 341:345, 351:355, 361:365, 371:375, 381:385, 391:395, 401:405, 411:415, 421:425, 431:435, 441:445, 451:455, 461:465, 471:475, 481:485, 491:495, 501:505, 511:515, 521:525, 531:535, 541:545, 551:555, 561:565, 571:575, 581:585, 591:595)])

However, this code is very cumbersome. Is there a function out there that can do this more efficiently? Thank you for any help!

CodePudding user response:

You could use a bit of modular math in your indexing:

df[((seq(nrow(df)) - 1) %% 10) < 5,]

CodePudding user response:

df[c(rep(TRUE, 5), rep(FALSE,5)),]

This works by creating an alternating pattern of 5 TRUE and then 5 FALSE, which R recycles to the length of the data. We then use those values to either include or exclude those rows, since df[ROWS_I_WANT,] will include all of ROWS_I_WANT and all the columns.

CodePudding user response:

You could split the dataframe into a list:

df_list <- slit(df, gl(2, 5, nrow(df))

You then select the subset you want:

df_list[[1]]

CodePudding user response:

We could use in base R subset in this way:

subset(df$rnorm.600., rep(0:1, times=nrow(df)/10, each=5) == 0)
  • Related