Home > front end >  For loop to divide a large data frame (30columns) into several smaller data frames (with 3 columns)
For loop to divide a large data frame (30columns) into several smaller data frames (with 3 columns)

Time:03-27

first time writing a question here so please don't be too harsh. I have a large data.frame called merged with a "Time" column, a "Well" column and 28 columns where each column stores data from measurements of a plate (so 28 plates)df called merged. Now I want to create a "for loop" that creates new dfs with the columns "Time", "Well" and the measurements of a plate starting with plate 1 or column 3 up until the last plate (28 or column 30). The problem is, I have no idea how I can adjust the name of the new df in the loop so it creates 28 different new dfs with the information of only 1 plate instead of just overwriting the new df and storing only the information of the last plate. beginner loop function. if you need more information, feel free to ask and I will try to deliver the information to the best of my ability.

Thank you all in advance!

Roger

CodePudding user response:

Here's a cooked up example. I begin by creating a data.frame that is similar to your data. Then I convert from wide format to long, using the tidyr package. (There are other ways to do this).

With the long format, it's then easy to select out the data you want by Plate identifier.

#----------------------
# Cook up a data.frame
#----------------------
# 30 sequential dates
dates = seq.Date(as.Date("2022-03-01"), as.Date("2022-03-30"), 1)
# 50 wells 
wells <- lapply(LETTERS[1:5], function(l) {paste0(l, seq(1, 10))})
wells <- unlist(wells)
# Create a data.frame
wells_data <- data.frame(expand.grid(dates, wells))
names(wells_data) <- c("Dates", "Wells")

# 30 columns of artificial data
for (i in 1:30) {
  new_data <- data.frame(runif(1:nrow(wells_data)))
  names(new_data) <- paste0("Plate", i)
  wells_data <- cbind(wells_data, new_data)
}
head(wells_data)
           Dates Wells     Plate1    Plate2    Plate3     Plate4     Plate5
1 2022-03-01    A1 0.20418463 0.5932133 0.7070428 0.04231371 0.25872767
2 2022-03-02    A1 0.95218240 0.1114270 0.3763757 0.22992064 0.05632674
3 2022-03-03    A1 0.07162576 0.9902931 0.1437405 0.40102327 0.56432590
4 2022-03-04    A1 0.17148644 0.1849485 0.2062618 0.45908182 0.44657831
5 2022-03-05    A1 0.11334931 0.4820294 0.1663636 0.87436576 0.60177308
6 2022-03-06    A1 0.13949741 0.7862085 0.6162253 0.50698110 0.75309069
      Plate6     Plate7      Plate8    Plate9    Plate10    Plate11   Plate12
1 0.77206623 0.45816279 0.002027475 0.3821823 0.30170925 0.08730046 0.7638708
2 0.31140577 0.39479768 0.919386005 0.2369556 0.33105790 0.86560846 0.9464049
3 0.36804632 0.30644346 0.782938605 0.3723977 0.21561693 0.14770805 0.7371391
4 0.07265802 0.68454399 0.916244462 0.7688442 0.36590464 0.42293563 0.8448824
5 0.59587190 0.78073586 0.338200076 0.3895508 0.61586528 0.47494553 0.8315232
6 0.41189998 0.06666752 0.721342234 0.5130501 0.06648771 0.61675408 0.9384815
# ...more columns...

#----------------------
# Now convert from wide to long
# and split by plate identifier
#----------------------
library(tidyr)
wells_data <- pivot_longer(wells_data,
                           cols=(3:ncol(wells_data)),
                           names_to="Plate",
                           values_to="measurement")
head(wells_data)
# A tibble: 6 × 4
  Dates      Wells Plate  measurement
  <date>     <fct> <chr>        <dbl>
1 2022-03-01 A1    Plate1      0.204 
2 2022-03-01 A1    Plate2      0.593 
3 2022-03-01 A1    Plate3      0.707 
4 2022-03-01 A1    Plate4      0.0423
5 2022-03-01 A1    Plate5      0.259 
6 2022-03-01 A1    Plate6      0.772 

# Now it's easy to select out each Plate:
plates = unique(wells_data$Plate)
lapply(plates, function(p) {
         subset = wells_data[wells_data$Plate == p,]
         # Do whatever you want with this subset
         print(paste("Mean for Plate", p, ":",
                    mean(subset$measurement)))
         
})

Hope this might help to get you going.

CodePudding user response:

Avoid flooding your global environment with many structurally similar, separate data frames. Consider building a single list of many related elements using lapply or sapply (for names) to build a list of subset data frames. As shown below, you lose no functionality of data.frames if saved in a larger list:

# RETRIEVE ALL V-STARTING COLUMN NAMES
v_cols <- colnames(merged_df)[grep("^V", colnames(merged_df))]

# NAMED LIST OF PLATE SUBSETTED DATA FRAMES
plate_measurements_list <- sapply(
    v_cols, 
    function(i) merged_df[,c("Time", "Well", col)],
    simplify = FALSE
)


# ACCESS AND USE EACH DATA FRAME
head(plate_measurements_list$V1)
tail(plate_measurements_list$V2)
summary(plate_measurements_list$V3)
...
str(plate_measurements_list$V28)
  • Related