Follow-up: Putting back a given missing column from a data.frame into a list of dta.frames-CodePudding

I'm following up on this question. My LIST of data.frames below is made from my data. However, this LIST is missing the paper column (the name(s) of the missing column(s) are always provided) which is available in the original data.

I was wondering how to put the missing paper column back into LIST to achieve my DESIRED_LIST below?

I tried the solution suggested in this answer (lapply(LIST, function(x)data[do.call(paste, data[names(x)]) %in% do.call(paste, x),])) but it doesn't produce my DESIRED_LIST.

A Base R or tidyverse solution is appreciated.

Reproducible data and code are below.

m2="
paper     study sample    comp ES bar
1         1     1         1    1  7
1         2     2         2    2  6
1         2     3         3    3  5
2         3     4         4    4  4
2         3     4         4    5  3
2         3     4         5    6  2
2         3     4         5    7  1"
data <- read.table(text=m2,h=T)

        LIST <- list(data.frame(study=1       ,sample=1       ,comp=1),
                     data.frame(study=rep(3,4),sample=rep(4,4),comp=c(4,4,5,5)),
                     data.frame(study=c(2,2)  ,sample=c(2,3)  ,comp=c(2,3)))

DESIRED_LIST <- list(data.frame(paper=1       ,study=1       ,sample=1       ,comp=1),
                     data.frame(paper=rep(2,4),study=rep(3,4),sample=rep(4,4),comp=c(4,4,5,5)),
                     data.frame(paper=rep(1,2),study=c(2,2)  ,sample=c(2,3)  ,comp=c(2,3)))

CodePudding user response：

Please find a solution with the package data.table. Is this what you were looking for?

Reprex 1

library(data.table)

cols_to_remove <- c("ES")

split(setDT(data)[, (cols_to_remove) := NULL], by = c("paper", "study"))
#> $`1.1`
#>    paper study sample comp
#> 1:     1     1      1    1
#> 
#> $`1.2`
#>    paper study sample comp
#> 1:     1     2      2    2
#> 2:     1     2      3    3
#> 
#> $`2.3`
#>    paper study sample comp
#> 1:     2     3      4    4
#> 2:     2     3      4    4
#> 3:     2     3      4    5
#> 4:     2     3      4    5

^{Created on 2021-11-06 by the reprex package (v2.0.1)}

EDIT

Please find solution 2 with the package dplyr

Reprex 2

library(dplyr)

drop.cols <- c("ES")  

data %>% 
  group_by(paper, study) %>% 
  select(-drop.cols) %>% 
  group_split()

#> <list_of<
#>   tbl_df<
#>     paper : integer
#>     study : integer
#>     sample: integer
#>     comp  : integer
#>   >
#> >[3]>
#> [[1]]
#> # A tibble: 1 x 4
#>   paper study sample  comp
#>   <int> <int>  <int> <int>
#> 1     1     1      1     1
#> 
#> [[2]]
#> # A tibble: 2 x 4
#>   paper study sample  comp
#>   <int> <int>  <int> <int>
#> 1     1     2      2     2
#> 2     1     2      3     3
#> 
#> [[3]]
#> # A tibble: 4 x 4
#>   paper study sample  comp
#>   <int> <int>  <int> <int>
#> 1     2     3      4     4
#> 2     2     3      4     4
#> 3     2     3      4     5
#> 4     2     3      4     5

^{Created on 2021-11-07 by the reprex package (v2.0.1)}

CodePudding user response：

Consider ave to create a grouping column (due to repeated rows) and then run an iterative merge.

DESIRED_LIST_SO <- lapply(
  LIST,
  function(df) merge(
      transform(data, grp = ave(paper, paper, study, sample, comp, FUN=seq_along)),
      transform(df, grp = ave(study, study, sample, comp, FUN=seq_along)),
      by=c("study", "sample", "comp", "grp")
  )[c("paper", "study", "sample", "comp")]
)

all.equal(DESIRED_LIST, DESIRED_LIST_SO)
[1] TRUE

(Consider keeping the unique identifiers, ES and bar in desired list to avoid the duplicates rows.)

CodePudding user response：

A tidyverse solution. First, create a look-up table, data2, which contains the four target columns. mutate(across(.fns = as.numeric)) is to make column type consistent. It may not be needed. Second, use map to apply left_join to all data frames in LIST. LIST2 and DESIRED_LIST are completely the same.

data2 <- data %>%
  distinct(paper, study, sample, comp) %>%
  mutate(across(.fns = as.numeric))

LIST2 <- map(LIST, function(x){
  x2 <- x %>%
    left_join(data2, by = names(x)) %>%
    select(all_of(names(data2)))
  return(x2)
})

# Check if the results are the same
identical(DESIRED_LIST, LIST2)
# [1] TRUE