Home > Enterprise >  Find same elements in columns of data frames within a list
Find same elements in columns of data frames within a list

Time:03-23

I have a list of 5 dataframes, each one with 4 observations (rows) of 3 variables (columns)

Here a subset of my data:

list(rar_data_1 = structure(list(Allele1 = c(1L, 1L, 1L, 1L), 
    Allele2 = c(1L, 0L, 0L, 0L), Allele3 = c(0L, 0L, 0L, 0L)), row.names = c(NA, 
4L), class = "data.frame"), rar_data_2 = structure(list(Allele1 = c(0L, 
0L), Allele2 = c(1L, 0L), Allele3 = c(0L, 0L)), row.names = 5:6, class = "data.frame"), 
    rar_data_3 = structure(list(Allele1 = c(1L, 1L, 1L, 0L), 
        Allele2 = c(0L, 0L, 0L, 0L), Allele3 = c(0L, 0L, 0L, 
        0L)), row.names = 7:10, class = "data.frame"), rar_data_4 = structure(list(
        Allele1 = c(1L, 0L, 1L, 1L, 1L), Allele2 = c(0L, 0L, 
        0L, 0L, 0L), Allele3 = c(0L, 0L, 0L, 0L, 0L)), row.names = 11:15, class = "data.frame"), 
    rar_data_5 = structure(list(Allele1 = c(1L, 1L, 1L, 1L, 0L
    ), Allele2 = c(0L, 0L, 0L, 0L, 0L), Allele3 = c(0L, 0L, 0L, 
    0L, 0L)), row.names = 16:20, class = "data.frame"))


> rar_list

$rar_data_1

 Allele1 Allele2 Allele3
       1       1       0
       1       0       0
       1       0       0
       1       0       0

$rar_data_2

 Allele1 Allele2 Allele3
       0       1       0
       0       0       0

$rar_data_3

  Allele1 Allele2 Allele3
        1       0       0
        1       0       0
        1       0       0
        0       0       0

$rar_data_4

 Allele1 Allele2 Allele3
       1       0       0
       0       0       0
       1       0       0
       1       0       0
       1       0       0

$rar_data_5

 Allele1 Allele2 Allele3
       1       0       0
       1       0       0
       1       0       0
       1       0       0
       0       0       0

I need a table with the following information: For each data.frame (rar_data_1, rar_data_2, ...), and each column (Allele1, Allele2, ...). If there is a "1" in one data.frame column (e.g. rar_data_1/Allele1), ¿how many "1" do I find in the remaining data.frames (e.g. rar_data_2/Allele1, rar_data_3/Allele1, etc.?

In this case for rar_data_1

Allele1 Allele2 Allele 3
3 1 NA

CodePudding user response:

library(dplyr)
library(purrr)
map_dfr(rar_list, colSums, .id = "source") %>%
  mutate(across(starts_with("Allele"), ~ if_else(.x > 0, sum(.x > 0) - 1L, NA_integer_), .names = "{.col}_count_other_ones"))
# # A tibble: 5 × 7
#   source     Allele1 Allele2 Allele3 Allele1_count_other_ones Allele2_count_other_ones Allele3_count_oth…
#   <chr>        <dbl>   <dbl>   <dbl>                    <int>                    <int>              <int>
# 1 rar_data_1       4       1       0                        3                        1                 NA
# 2 rar_data_2       0       1       0                       NA                        1                 NA
# 3 rar_data_3       3       0       0                        3                       NA                 NA
# 4 rar_data_4       4       0       0                        3                       NA                 NA
# 5 rar_data_5       4       0       0                        3                       NA                 NA
  • Related