I'm following up on this question. My LIST
of data.frames below is made from my data
. However, this LIST
is missing the paper
column (the name(s) of the missing column(s) are always provided) which is available in the original data
.
I was wondering how to put the missing paper
column back into LIST
to achieve my DESIRED_LIST
below?
I tried the solution suggested in this answer (lapply(LIST, function(x)data[do.call(paste, data[names(x)]) %in% do.call(paste, x),])
) but it doesn't produce my DESIRED_LIST
.
A Base R or tidyverse solution is appreciated.
Reproducible data and code are below.
m2="
paper study sample comp ES bar
1 1 1 1 1 7
1 2 2 2 2 6
1 2 3 3 3 5
2 3 4 4 4 4
2 3 4 4 5 3
2 3 4 5 6 2
2 3 4 5 7 1"
data <- read.table(text=m2,h=T)
LIST <- list(data.frame(study=1 ,sample=1 ,comp=1),
data.frame(study=rep(3,4),sample=rep(4,4),comp=c(4,4,5,5)),
data.frame(study=c(2,2) ,sample=c(2,3) ,comp=c(2,3)))
DESIRED_LIST <- list(data.frame(paper=1 ,study=1 ,sample=1 ,comp=1),
data.frame(paper=rep(2,4),study=rep(3,4),sample=rep(4,4),comp=c(4,4,5,5)),
data.frame(paper=rep(1,2),study=c(2,2) ,sample=c(2,3) ,comp=c(2,3)))
CodePudding user response:
- Please find a solution with the package
data.table
. Is this what you were looking for?
Reprex 1
library(data.table)
cols_to_remove <- c("ES")
split(setDT(data)[, (cols_to_remove) := NULL], by = c("paper", "study"))
#> $`1.1`
#> paper study sample comp
#> 1: 1 1 1 1
#>
#> $`1.2`
#> paper study sample comp
#> 1: 1 2 2 2
#> 2: 1 2 3 3
#>
#> $`2.3`
#> paper study sample comp
#> 1: 2 3 4 4
#> 2: 2 3 4 4
#> 3: 2 3 4 5
#> 4: 2 3 4 5
Created on 2021-11-06 by the reprex package (v2.0.1)
EDIT
- Please find solution 2 with the package
dplyr
Reprex 2
library(dplyr)
drop.cols <- c("ES")
data %>%
group_by(paper, study) %>%
select(-drop.cols) %>%
group_split()
#> <list_of<
#> tbl_df<
#> paper : integer
#> study : integer
#> sample: integer
#> comp : integer
#> >
#> >[3]>
#> [[1]]
#> # A tibble: 1 x 4
#> paper study sample comp
#> <int> <int> <int> <int>
#> 1 1 1 1 1
#>
#> [[2]]
#> # A tibble: 2 x 4
#> paper study sample comp
#> <int> <int> <int> <int>
#> 1 1 2 2 2
#> 2 1 2 3 3
#>
#> [[3]]
#> # A tibble: 4 x 4
#> paper study sample comp
#> <int> <int> <int> <int>
#> 1 2 3 4 4
#> 2 2 3 4 4
#> 3 2 3 4 5
#> 4 2 3 4 5
Created on 2021-11-07 by the reprex package (v2.0.1)
CodePudding user response:
Consider ave
to create a grouping column (due to repeated rows) and then run an iterative merge
.
DESIRED_LIST_SO <- lapply(
LIST,
function(df) merge(
transform(data, grp = ave(paper, paper, study, sample, comp, FUN=seq_along)),
transform(df, grp = ave(study, study, sample, comp, FUN=seq_along)),
by=c("study", "sample", "comp", "grp")
)[c("paper", "study", "sample", "comp")]
)
all.equal(DESIRED_LIST, DESIRED_LIST_SO)
[1] TRUE
(Consider keeping the unique identifiers, ES
and bar
in desired list to avoid the duplicates rows.)
CodePudding user response:
A tidyverse
solution. First, create a look-up table, data2
, which contains the four target columns. mutate(across(.fns = as.numeric))
is to make column type consistent. It may not be needed. Second, use map
to apply left_join
to all data frames in LIST
. LIST2
and DESIRED_LIST
are completely the same.
data2 <- data %>%
distinct(paper, study, sample, comp) %>%
mutate(across(.fns = as.numeric))
LIST2 <- map(LIST, function(x){
x2 <- x %>%
left_join(data2, by = names(x)) %>%
select(all_of(names(data2)))
return(x2)
})
# Check if the results are the same
identical(DESIRED_LIST, LIST2)
# [1] TRUE