Row-wise sum of paired values in 2 identically labelled lists-CodePudding

I have two lists Sis1 and Sis2 with identical column names and identical first column values but different values in columns 2:n:

 Time    Nuc_11    Nuc_15    Nuc_16
 835.20  71.00670 101.13133  89.51763
 839.84 103.49544 146.28099  77.43385
 844.48 142.79152  85.98028  83.36374
 849.12  82.49453  90.12871  69.49881
 853.76  63.62815  77.91290 104.42857
 858.40  66.04051 110.48225 108.12739

What I want is to have a row-wise sum of Sis1 and Sis2 for each identically named pair of columns in the two lists (ie. sum of Sis1$Nuc_11 and Sis2$Nuc_11 for each row defined by the shared Time value). I'm missing something obvious. My best guess is:

NucID <- colnames(Sis1) 
SisSum <- bind_cols(Sis1, Sis2) %>%
      group_by(!!!NucID) %>%
      summarise(across(everything(),list("SisSum" = sum)))

but that's not correct. I'm not sworn to dplyr for the answer, so if there's a nicer approach I'll take it.

Thanks in advance for the help :)

CodePudding user response：

Since you say that they should be added "by the shared Time value", we cannot use bind_cols, which would require perfect alignment and shared-existence of all Time values. (If this is always the case, then just ignore it ... and pray that your assumption is always true.) I suggest a merge/join operation.

For discussion on merge/join operations, see How to join (merge) data frames (inner, outer, left, right) and What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?.
Further, joining on a floating-point is subject to a fundamental problem in programming (not just R), testing for perfect equality in floating-point numbers. To work around this, we need to determine a precision level at which two numbers are assumed to be effectively the same. (Note that often R can get floating-point intuitively correct, but there is no clear indication when it does not work.)

For more discussion on issues regarding floating-point equality, test (0.10 0.05) == 0.15 (returns false), and then read Why are these numbers not equal?, Is floating point math broken?, and https://en.wikipedia.org/wiki/IEEE_754.
Doing a merge/join based on common columns will be much simpler if we first pivot the data from the current wide format to a long format. (This relies on all of them being the same class, numeric in this case. If there is mixed-class, then this does not work.)

For discussion on pivoting, see Reshaping data.frame from wide to long format (wide-to-long) and the reverse, Reshape multiple value columns to wide format (long to wide).

I'll create a similarly-shaped Sis2, changing the numbers, and changing one of the Time values just a little to show what happens.

set.seed(42)
Sis2 <- Sis1 %>%
  mutate(across(starts_with("Nuc"), ~ runif(n(), 50, 100)))
Sis2$Time[3] <- Sis2$Time[3]   1e-5
Sis2
#     Time   Nuc_11   Nuc_15   Nuc_16
# 1 835.20 95.74030 86.82942 96.73361
# 2 839.84 96.85377 56.73333 62.77144
# 3 844.48 64.30698 82.84961 73.11464
# 4 849.12 91.52238 85.25324 97.00073
# 5 853.76 82.08728 72.88709 98.91132
# 6 858.40 75.95480 85.95561 55.87437

Despite the fact that they look similar, one of the Time values is in fact different, and will not join properly.

Sis1$Time
# [1] 835.20 839.84 844.48 849.12 853.76 858.40
Sis2$Time
# [1] 835.20 839.84 844.48 849.12 853.76 858.40

Sis1$Time == Sis2$Time
# [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE

Assuming that 0.001 is sufficient precision to determine as "effectively identical", here is working code that will work:

library(dplyr)
# library(tidyr) # pivot_*
full_join(
    Sis1 %>%
      mutate(Time_chr = sprintf("%0.03f", Time)) %>%
      tidyr::pivot_longer(-c(Time, Time_chr)),
    Sis2 %>%
      mutate(Time_chr = sprintf("%0.03f", Time)) %>%
      tidyr::pivot_longer(-c(Time, Time_chr)),
    by = c("Time_chr", "name")) %>%
  transmute(
    Time = coalesce(Time.x, Time.y),
    name,
    values = rowSums(cbind(value.x, value.y), na.rm = TRUE)
  ) %>%
  tidyr::pivot_wider(Time, names_from = "name", values_from = "values") %>%
  as.data.frame()
#     Time   Nuc_11   Nuc_15   Nuc_16
# 1 835.20 166.7470 187.9607 186.2512
# 2 839.84 200.3492 203.0143 140.2053
# 3 844.48 207.0985 168.8299 156.4784
# 4 849.12 174.0169 175.3819 166.4995
# 5 853.76 145.7154 150.8000 203.3399
# 6 858.40 141.9953 196.4379 164.0018

(The use of %>% as.data.frame() is purely to show all digits and rows without tibble's convenient brief-view. It is not required. Also ... I'm generally not a fan of nesting %>%-pipes inside function calls like that, but it shows what's being done well, I think; feel free to use temporary variables for pivoted storage if you like.)

Because we're doing a full-join, if there were any mismatches then we would see NA values. Such as this, without the use of converting Time to a string:

full_join(
    tidyr::pivot_longer(Sis1, -Time),
    tidyr::pivot_longer(Sis2, -Time),
    by = c("Time", "name")
  ) %>%
  transmute(
    Time, 
    name, 
    values = rowSums(cbind(value.x, value.y), na.rm = TRUE)
  ) %>%
  tidyr::pivot_wider(Time, names_from = "name", values_from = "values") %>%
  as.data.frame()
#     Time    Nuc_11    Nuc_15    Nuc_16
# 1 835.20 166.74700 187.96075 186.25124
# 2 839.84 200.34921 203.01432 140.20529
# 3 844.48 142.79152  85.98028  83.36374
# 4 849.12 174.01691 175.38195 166.49954
# 5 853.76 145.71543 150.79999 203.33989
# 6 858.40 141.99531 196.43786 164.00176
# 7 844.48  64.30698  82.84961  73.11464

Notice that 844.48 in row 3 and its similar-looking "844.48" in row 7 have not been added, due to the 1e-5 difference I added. You can get R to show them with more precision (see ?options for the 'digits' and 'scipen' arguments).

CodePudding user response：

library(tidyverse)

list <- list(
  read_table(
    " Time    Nuc_11    Nuc_15    Nuc_16
 835.20  71.00670 101.13133  89.51763
 839.84 103.49544 146.28099  77.43385
 844.48 142.79152  85.98028  83.36374
 849.12  82.49453  90.12871  69.49881
 853.76  63.62815  77.91290 104.42857
 858.40  66.04051 110.48225 108.12739"
  ),
  read_table(
    " Time    Nuc_11    Nuc_15    Nuc_16
 835.20  71.00670 101.13133  89.51763
 839.84 103.49544 146.28099  77.43385
 844.48 142.79152  85.98028  83.36374
 849.12  82.49453  90.12871  69.49881
 853.76  63.62815  77.91290 104.42857
 858.40  66.04051 110.48225 108.12739"
  )
)

do.call(rbind, list) %>% 
  pivot_longer(-Time) %>% 
  group_by(Time, name) %>% 
  summarise(sum = sum(value))

# A tibble: 18 × 3
# Groups:   Time [6]
    Time name     sum
   <dbl> <chr>  <dbl>
 1  835. Nuc_11  142.
 2  835. Nuc_15  202.
 3  835. Nuc_16  179.
 4  840. Nuc_11  207.
 5  840. Nuc_15  293.
 6  840. Nuc_16  155.
 7  844. Nuc_11  286.
 8  844. Nuc_15  172.
 9  844. Nuc_16  167.
10  849. Nuc_11  165.
11  849. Nuc_15  180.
12  849. Nuc_16  139.
13  854. Nuc_11  127.
14  854. Nuc_15  156.
15  854. Nuc_16  209.
16  858. Nuc_11  132.
17  858. Nuc_15  221.
18  858. Nuc_16  216.