I have a list of matrices called list1 with the following structure:

     start end
[1,]   360 360
[2,]   394 394

     start end
[1,]    15  15

     start end
[1,]    45  45

     start end

     start  end
[1,]    13   13
[2,]   369  369
[3,]   602  602
[4,]   775  775
[5,]   983  983
[6,]  1200 1200
[7,]  1491 1491

Some of the items are empty but I would like to transform this list into a data.frame with two columns and the following structure:

ID                         pos
ENSLAFT00000000003         360
ENSLAFT00000000003         394
ENSLAFT00000000011          15
ENSLAFT00000000020          45
ENSLAFT00000000024          13
ENSLAFT00000000024         369
ENSLAFT00000000024         602
ENSLAFT00000000024         775
ENSLAFT00000000024         983
ENSLAFT00000000024        1200
ENSLAFT00000000024        1491 

ENSLAFT00000000023 was omitted in the output as it was an empty matrix in the initial list.

I can somewhat get the desired structure but without keeping the rowname identity by using:

as.data.frame(do.call(rbind, e)[,1])

But I am still lacking to keep the rownames, which are needed.

Do you have any suggestions for doing this data transformation in R?

CodePudding user response:

We extract the 'start' element by looping over the list and stack it to a two column data.frame

out <- stack(lapply(lst1, \(x) {
          st <- x[,"start"]
          if(length(st) == 0) st <- NA_real_
names(out) <- c("ID", "pos")

CodePudding user response:

Here is a tidyverse option:


map(list1, ~ select(as.data.frame(.x), start)) %>%
  enframe %>%
  unnest(value,keep_empty = TRUE)


  name               start
  <chr>              <dbl>
1 ENSLAFT00000000003   360
2 ENSLAFT00000000003   360
3 ENSLAFT00000000011    15
4 ENSLAFT00000000023    NA


list1 <- list(ENSLAFT00000000003 = structure(c(360, 360, 394, 394), .Dim = c(2L, 
2L), .Dimnames = list(NULL, c("start", "end"))), ENSLAFT00000000011 = structure(c(15, 
15), .Dim = 1:2, .Dimnames = list(NULL, c("start", "end"))), 
    ENSLAFT00000000023 = structure(logical(0), .Dim = c(0L, 2L
    ), .Dimnames = list(NULL, c("start", "end"))))

CodePudding user response:

Another possible solution:


l <- list(a = as.data.frame(matrix(sample(1:20,4), 2, 2)),
          b = as.data.frame(matrix(sample(1:20,4), 2, 2)),
          c = as.data.frame(matrix(sample(1:40,8), 4, 2)))

#> $a
#>   V1 V2
#> 1 15 14
#> 2 19  3
#> $b
#>   V1 V2
#> 1 10 11
#> 2 18  5
#> $c
#>   V1 V2
#> 1 14  5
#> 2 25 37
#> 3 26 28
#> 4 27  9

l %>% enframe %>% unnest(everything()) %>% select(ID = 1, pos = 2)

#> # A tibble: 8 × 2
#>   ID      pos
#>   <chr> <int>
#> 1 a        15
#> 2 a        19
#> 3 b        10
#> 4 b        18
#> 5 c        14
#> 6 c        25
#> 7 c        26
#> 8 c        27
