Home > OS >  How do I make if else statements if my data contains regex string in R?
How do I make if else statements if my data contains regex string in R?

Time:01-12

I have two lines of code

file.list <- file.list[order(as.numeric(str_match(file.list, "-R\\s*(.*?)\\s*-")[,2]))]

and

file.list<-file.list[str_order(gsub("W.*1-","", file.list), numeric = TRUE)]

The first line of code will sort the names in the order R1,R2,R3,R4,R5 if my file.list contains:

file.list<-c( "W2345_S-001-R2-20D_790.datavalue.csv",
         "W2346_S-001-R4-20D_792.datavalue.csv",
         "W2347_S-001-R1-20D_789.datavalue.csv",
         "W2348_S-001-R3-20D_791.datavalue.csv",
         "W2349_S-001-R5-20D_793.datavalue.csv")

The second line of code will sort the names in the order R1_1, R1_2, R1_9, R1_10, R2_1, R2_2, R2_9, R2_10 if my file.list contains:

file.list= c("W2345_S-001-R1_1.csv",
             "W2346_S-001-R1_10.csv",
             "W2347_S-001-R1_2.csv",
             "W2348_S-001-R1_9.csv",
             "W2345_S-001-R2_1.csv",
             "W2346_S-001-R2_10.csv",
             "W2347_S-001-R2_2.csv",
             "W2348_S-001-R2_9.csv")  

How do I construct an if else statement, such that if file.list contains the names in the format as shown (eg. "W2345_S-001-R2-20D_790.datavalue.csv"), run the first line, else run the second line?

CodePudding user response:

You can use any with grepl which will return a single boolean value if any of the values match your regex:

if(any(grepl("-R\\s*(.*?)\\s*-", file.list))){
  file.list <- file.list[order(as.numeric(str_match(file.list, "-R\\s*(.*?)\\s*-")[,2]))] 
} else {
  file.list <- file.list[str_order(gsub("W.*1-","", file.list), numeric = TRUE)]
}

You can test this with changing the input to test_list and creating a fresh file.list:

test_list <- c( "W2345_S-001-R2-20D_790.datavalue.csv",
              "W2346_S-001-R4-20D_792.datavalue.csv",
              "W2347_S-001-R1-20D_789.datavalue.csv",
              "W2348_S-001-R3-20D_791.datavalue.csv",
              "W2349_S-001-R5-20D_793.datavalue.csv")

file.list <- c()
if(any(grepl("-R\\s*(.*?)\\s*-", test_list))){
  file.list <- test_list[order(as.numeric(str_match(test_list, "-R\\s*(.*?)\\s*-")[,2]))] 
} else {
  file.list <- test_list[str_order(gsub("W.*1-","", test_list), numeric = TRUE)]
}
# returns first line

# or
test_list <-  c("W2345_S-001-R1_1.csv",
             "W2346_S-001-R1_10.csv",
             "W2347_S-001-R1_2.csv",
             "W2348_S-001-R1_9.csv",
             "W2345_S-001-R2_1.csv",
             "W2346_S-001-R2_10.csv",
             "W2347_S-001-R2_2.csv",
             "W2348_S-001-R2_9.csv")

file.list <- c()
if(any(grepl("-R\\s*(.*?)\\s*-", test_list))){
  file.list <- test_list[order(as.numeric(str_match(test_list, "-R\\s*(.*?)\\s*-")[,2]))] 
} else {
  file.list <- test_list[str_order(gsub("W.*1-","", test_list), numeric = TRUE)]
}
# returns second line

CodePudding user response:

We can use one process for either:

file.list[do.call(order, strcapture("R([0-9] )_?([0-9]*).*", file.list, list(a=0L,b=0L)))]
# [1] "W2347_S-001-R1-20D_789.datavalue.csv"
# [2] "W2345_S-001-R2-20D_790.datavalue.csv"
# [3] "W2348_S-001-R3-20D_791.datavalue.csv"
# [4] "W2346_S-001-R4-20D_792.datavalue.csv"
# [5] "W2349_S-001-R5-20D_793.datavalue.csv"

file.list2[do.call(order, strcapture("R([0-9] )_?([0-9]*).*", file.list2, list(a=0L,b=0L)))]
# [1] "W2345_S-001-R1_1.csv"  "W2347_S-001-R1_2.csv"  "W2348_S-001-R1_9.csv" 
# [4] "W2346_S-001-R1_10.csv" "W2345_S-001-R2_1.csv"  "W2347_S-001-R2_2.csv" 
# [7] "W2348_S-001-R2_9.csv"  "W2346_S-001-R2_10.csv"

The regex is the same, and sorts either of your file.list vectors.

  •  Tags:  
  • r
  • Related