Home > database >  create object with loop that subsets a list in r
create object with loop that subsets a list in r

Time:10-19

i have a list of 90 names which I would like to divide and include into objects using loop. I have selected the names of the list based on a pattern but i am not sure how to loop to create object names . I have tried before with the assign() function but it creates values (inside backticks `) and not objects. Thanks!!!

So the list has 90 names and each sample name is repeated 5 times so basically I have 18 samples in total and there are 5 files per sample. I want to create an object per sample that contains a list of the names corresponding to that sample so a list with 5 items. So i wanted to create a loop instead of copy-pasting the function (sample.1 = sample.names.dilutions[grep("Sample 1_", sample.names.dilutions)] ) 18 times. I hope this makes sense?

#list
>sample.names.dilutions
> length(sample.names.dilutions)
[1] 90

#names in list
> sample.names.dilutions[1:20]
 [1] "New AS Plate 21_AS Plate_Sample 1_100.fcs"  "New AS Plate 21_AS Plate_Sample 1_25.fcs"  
 [3] "New AS Plate 21_AS Plate_Sample 1_250.fcs"  "New AS Plate 21_AS Plate_Sample 1_50.fcs"  
 [5] "New AS Plate 21_AS Plate_Sample 1_500.fcs"  "New AS Plate 21_AS Plate_Sample 10_100.fcs"
 [7] "New AS Plate 21_AS Plate_Sample 10_25.fcs"  "New AS Plate 21_AS Plate_Sample 10_250.fcs"
 [9] "New AS Plate 21_AS Plate_Sample 10_50.fcs"  "New AS Plate 21_AS Plate_Sample 10_500.fcs"
[11] "New AS Plate 21_AS Plate_Sample 11_100.fcs" "New AS Plate 21_AS Plate_Sample 11_25.fcs" 
[13] "New AS Plate 21_AS Plate_Sample 11_250.fcs" "New AS Plate 21_AS Plate_Sample 11_50.fcs" 
[15] "New AS Plate 21_AS Plate_Sample 11_500.fcs" "New AS Plate 21_AS Plate_Sample 12_100.fcs"
[17] "New AS Plate 21_AS Plate_Sample 12_25.fcs"  "New AS Plate 21_AS Plate_Sample 12_250.fcs"
[19] "New AS Plate 21_AS Plate_Sample 12_50.fcs"  "New AS Plate 21_AS Plate_Sample 12_500.fcs"

#function i want to create with loop
> sample.1 = sample.names.dilutions[grep("Sample 1_", sample.names.dilutions)]
> length(sample.1)
[1] 5
> sample.1
[1] "New AS Plate 21_AS Plate_Sample 1_100.fcs" "New AS Plate 21_AS Plate_Sample 1_25.fcs" 
[3] "New AS Plate 21_AS Plate_Sample 1_250.fcs" "New AS Plate 21_AS Plate_Sample 1_50.fcs" 
[5] "New AS Plate 21_AS Plate_Sample 1_500.fcs"

> #i have 18 different samples and want to assign value and subset according to sample name
> for(i in 1:18) {
    print(sample.names[i], quote=FALSE) = sample.names.dilutions[grep(paste0("Sample ",i,"_"), sample.names.dilutions)]}

Error in print(sample.names[i], FALSE) <- sample.names.dilutions[grep(paste0("Sample ",  : 
  could not find function "print<-"

CodePudding user response:

I think I understand now; thank you for clarifying your question in the comments. If there's something I missed or you have any questions, please let me know.

Terminology, quickly

I believe you are interested in splitting a vector of strings into multiple shorter vectors of strings based on a pattern within each element. A list is simply a vector of vectors.

g is a vector of 20 string elements (see Data code chunk below).

is.vector(g)
#> [1] TRUE

Here's a list that only contains one vector.

str(list(g))
#> List of 1
#>  $ : chr [1:20] "New AS Plate 21_AS Plate_Sample 12_50.fcs" "New AS Plate 21_AS Plate_Sample 1_100.fcs" "New AS Plate 21_AS Plate_Sample 1_25.fcs" "New AS Plate 21_AS Plate_Sample 1_250.fcs" ...

Now onto the question...

In your question, you specifically ask about using assign(). Although using assign() can be convenient, [it is usually not recommended][1]. But sometimes you gotta do what you gotta do, no shame in that. Here's how you could use it manually, on one group at a time (like you show in your question).

# Using assign() one group at a time
h <- g[grep("Sample 1_", g)]
assign(x = "sample_1_group", value = h)

It is pretty straightforward (and seemingly logical) to use assign() in a for-loop.

The first step in defining a for-loop, is defining what the loop will be "loop over." Or in other words, what is going to change during each iteration of the loop. In your case, we are looking for a number. We can define that manually or programmatically.

# Define groups manually
ids <- c(12,1,10,11)
ids
#> [1] 12  1 10 11

# Pattern match groups
all_ids <- gsub(pattern = ".*Sample (\\d ).*", replacement = "\\1", x = g)
all_ids
#>  [1] "12" "1"  "1"  "1"  "1"  "1"  "10" "10" "10" "10" "10" "11" "11" "11" "11"
#> [16] "11" "12" "12" "12" "12"
ids <- unique(all_ids)
ids
#> [1] "12" "1"  "10" "11"

After we know what we are looping over, we can specify the loop. paste0() can be a workhorse here. This loop iterates over ids (one id at a time), finds matching strings in g, and writes them to your environment as a vector. During each iteration of the loop, we'd expect a new vector to appear in our environment.

# For-loop with assign
for(i in ids){
  a <- paste0("Sample ", i, "_")
  h <- g[grep(a, g)]
  h_name <- paste0("sample_", i, "_group")
  assign(x = h_name, value = h)
}

That technically works, but it's not the best. "Works" may be good enough, no problem with that, but you may find that it is actually more convenient to use lists (a vector of vectors) to store information from a for-loop. It's fast to program, you don't have a bunch of new objects crowding your workspace, and all the scary things (not really) in that link above won't be a problem.

# Save the results of a for-loop in a list!
# First, make a blank list to hold the results
results <- list()
for(i in ids){
  a <- paste0("Sample ", i, "_")
  h <- g[grep(a, g)]
  h_name <- paste0("sample_", i, "_group")
  results[[h_name]] <- h
}
results
#> $sample_12_group
#> [1] "New AS Plate 21_AS Plate_Sample 12_50.fcs" 
#> [2] "New AS Plate 21_AS Plate_Sample 12_100.fcs"
#> [3] "New AS Plate 21_AS Plate_Sample 12_25.fcs" 
#> [4] "New AS Plate 21_AS Plate_Sample 12_250.fcs"
#> [5] "New AS Plate 21_AS Plate_Sample 12_500.fcs"
#> 
#> $sample_1_group
#> [1] "New AS Plate 21_AS Plate_Sample 1_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 1_25.fcs" 
#> [3] "New AS Plate 21_AS Plate_Sample 1_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 1_50.fcs" 
#> [5] "New AS Plate 21_AS Plate_Sample 1_500.fcs"
#> 
#> $sample_10_group
#> [1] "New AS Plate 21_AS Plate_Sample 10_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 10_25.fcs" 
#> [3] "New AS Plate 21_AS Plate_Sample 10_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 10_50.fcs" 
#> [5] "New AS Plate 21_AS Plate_Sample 10_500.fcs"
#> 
#> $sample_11_group
#> [1] "New AS Plate 21_AS Plate_Sample 11_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 11_25.fcs" 
#> [3] "New AS Plate 21_AS Plate_Sample 11_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 11_50.fcs" 
#> [5] "New AS Plate 21_AS Plate_Sample 11_500.fcs"

Extra credit

For-loops are great: it's easy to see what's going on inside of them, its easy to do a lot of data handling in them, and they are usually reasonably fast to execute. But sometimes its all about speed. R is vectorized ([I'm honestly not exactly sure what this means][2] besides "it can do multiple calculations simultaneously"), but a for-loop doesn't take advantage of this very well. The apply() family of vectorized functions do, and they can usually be easy to implement in cases where you might also use a for-loop. Here's how you could do that with your data:

# Vectorized
lapply(ids, function(i) g[grep(paste0("Sample ", i, "_"), g)])
#> [[1]]
#> [1] "New AS Plate 21_AS Plate_Sample 12_50.fcs" 
#> [2] "New AS Plate 21_AS Plate_Sample 12_100.fcs"
#> [3] "New AS Plate 21_AS Plate_Sample 12_25.fcs" 
#> [4] "New AS Plate 21_AS Plate_Sample 12_250.fcs"
#> [5] "New AS Plate 21_AS Plate_Sample 12_500.fcs"
#> 
#> [[2]]
#> [1] "New AS Plate 21_AS Plate_Sample 1_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 1_25.fcs" 
#> [3] "New AS Plate 21_AS Plate_Sample 1_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 1_50.fcs" 
#> [5] "New AS Plate 21_AS Plate_Sample 1_500.fcs"
#> 
#> [[3]]
#> [1] "New AS Plate 21_AS Plate_Sample 10_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 10_25.fcs" 
#> [3] "New AS Plate 21_AS Plate_Sample 10_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 10_50.fcs" 
#> [5] "New AS Plate 21_AS Plate_Sample 10_500.fcs"
#> 
#> [[4]]
#> [1] "New AS Plate 21_AS Plate_Sample 11_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 11_25.fcs" 
#> [3] "New AS Plate 21_AS Plate_Sample 11_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 11_50.fcs" 
#> [5] "New AS Plate 21_AS Plate_Sample 11_500.fcs"
Created on 2021-10-14 by the reprex package (v2.0.1)

Data:

g <- c("New AS Plate 21_AS Plate_Sample 12_50.fcs", 
       "New AS Plate 21_AS Plate_Sample 1_100.fcs",
       "New AS Plate 21_AS Plate_Sample 1_25.fcs", 
       "New AS Plate 21_AS Plate_Sample 1_250.fcs",
       "New AS Plate 21_AS Plate_Sample 1_50.fcs",
       "New AS Plate 21_AS Plate_Sample 1_500.fcs",
       "New AS Plate 21_AS Plate_Sample 10_100.fcs",
       "New AS Plate 21_AS Plate_Sample 10_25.fcs",
       "New AS Plate 21_AS Plate_Sample 10_250.fcs",
       "New AS Plate 21_AS Plate_Sample 10_50.fcs",
       "New AS Plate 21_AS Plate_Sample 10_500.fcs",
       "New AS Plate 21_AS Plate_Sample 11_100.fcs",
       "New AS Plate 21_AS Plate_Sample 11_25.fcs",
       "New AS Plate 21_AS Plate_Sample 11_250.fcs",
       "New AS Plate 21_AS Plate_Sample 11_50.fcs",
       "New AS Plate 21_AS Plate_Sample 11_500.fcs",
       "New AS Plate 21_AS Plate_Sample 12_100.fcs",
       "New AS Plate 21_AS Plate_Sample 12_25.fcs",
       "New AS Plate 21_AS Plate_Sample 12_250.fcs",
       "New AS Plate 21_AS Plate_Sample 12_500.fcs")

[1]: Why is using assign bad?) [2]: How do I know a function or an operation in R is vectorized?

  • Related