Home > OS >  How to return dataframe if no error is thrown when downloading file links
How to return dataframe if no error is thrown when downloading file links

Time:04-26

I am trying to download Single Cell matrices from a data set from the Gene Expression Omnibus but all the links have unique addresses. I wrote a function to try a combination of numbers in the URL until if found. The addresses differ by one number so I wrote this.

file_number <- as.character(29:38) # SRA file numbers
for (i in 1:10)
{
data <- tryCatch(      
           lapply(file_number[1], function(x) {
              
           base_url <- paste0(
               'ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM4715nnn/GSM47154', x,
                           '/suppl/GSM47154', x,'_P', i,
                           '.expression_matrix.txt.gz')
               
            # Download data to temporary directory
            temp <- tempfile()
            download.file(base_url, temp)
            gzfile(temp, 'rt')
            counts <- read.table(file = temp, row.names = 1)
            unlink(temp)}),
    
    error = function(e) { skip_to_next <<- TRUE})
    
    if(skip_to_next) { next }
    
    else print(data)
}

The code runs but it will not return the data frame from the correct link which should be ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM4715nnn/GSM4715429/suppl/GSM4715429_P1.expression_matrix.txt.gz .

CodePudding user response:

Since you run nested loops and need to return back data, consider nested lapply calls without any skip instructions. Simply add return lines and even output errors with corresponding URL attempt:

file_numbers <- as.character(29:38)     # SRA file numbers

df_list <- lapply(1:10, function(i) {
  lapply(file_numbers, function(x) {    
    tryCatch({        
      print(x)
      # Build URL
      base_url <- paste0(
        'ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM4715nnn/GSM47154', x,
        '/suppl/GSM47154', x, '_P', i,
        '.expression_matrix.txt.gz'
      )
      
      # Download data to temporary directory
      temp <- tempfile()
      download.file(base_url, temp)
      gzfile(temp, 'rt')
      
      # Read in data
      counts <- read.table(file = temp, row.names = 1)
      unlink(temp)
      
      return(counts)                       # RETURN DATA ON SUCCESS
    },
    error = function(e) {
      print(e)                             # OUTPUT MESSAGE TO CONSOLE
      return(NULL)                         # RETURN NULL ON ERROR
    })
  })
})

# REMOVE NULL ELEMENTS
df_list <- lapply(df_list, function(dfs) Filter(NROW, dfs))

Output (for first three of lapply calls):

Before Filter

lapply(df_list, \(x) lapply(x, \(d) d[1:5, 1:5]))
[[1]]
[[1]][[1]]
      GTTCTTAATCTG AGCCTCAACGCC CTGTCCTTCATG CCCCAGACGCTA CGCCCCAACTTA
ARPC2           72           88           62           78           90
RPS2            48           64           60           43           55
SNX10           22           36           29           26           25
B2M            660          620          684          611          672
LYZ            242          232          264          273          193

[[1]][[2]]
NULL

[[1]][[3]]
NULL


[[2]]
[[2]][[1]]
NULL

[[2]][[2]]
        CCGAGTCCCTGT CCCCCGGTGATG TTGCATATCACT TCGTCGGAAGGG CTATCCCGGGCT
KIF22              0            1            0            0            3
B2M              389           69          468          260          142
ADD1               0            2            0            4            0
IFITM3             6           22           13            5           14
SUPT16H            0            2            0            5            2

[[2]][[3]]
NULL


[[3]]
[[3]][[1]]
NULL

[[3]][[2]]
NULL

[[3]][[3]]
      AGATCATACCTT GCGAAGTCGCGA CGGGCGGCCGCA TTACCATGTCTT AAGTCAGGCGTC
SDAD1            1            4            4            1            4
COX7C           54           36           35           26           31
RPS24           76           74           79           32           51
ACIN1           10            4            5            4            3
GLO1             6            2            3            5            4

After Filter

lapply(df_list, \(x) lapply(x, \(d) d[1:5, 1:5]))
[[1]]
[[1]][[1]]
      GTTCTTAATCTG AGCCTCAACGCC CTGTCCTTCATG CCCCAGACGCTA CGCCCCAACTTA
ARPC2           72           88           62           78           90
RPS2            48           64           60           43           55
SNX10           22           36           29           26           25
B2M            660          620          684          611          672
LYZ            242          232          264          273          193


[[2]]
[[2]][[1]]
        CCGAGTCCCTGT CCCCCGGTGATG TTGCATATCACT TCGTCGGAAGGG CTATCCCGGGCT
KIF22              0            1            0            0            3
B2M              389           69          468          260          142
ADD1               0            2            0            4            0
IFITM3             6           22           13            5           14
SUPT16H            0            2            0            5            2


[[3]]
[[3]][[1]]
      AGATCATACCTT GCGAAGTCGCGA CGGGCGGCCGCA TTACCATGTCTT AAGTCAGGCGTC
SDAD1            1            4            4            1            4
COX7C           54           36           35           26           31
RPS24           76           74           79           32           51
ACIN1           10            4            5            4            3
GLO1             6            2            3            5            4
  •  Tags:  
  • r
  • Related