Find a the value in one group that corresponds to the maximum value in another, then subtract that v-CodePudding

I have data like these:


A1    A2     A3       A4       B      C1        C2         C3       C4 

1     3      2        2        7       2        NA          6       9
4     6      12       1        3       1        6           5       2   
6     1      NA       1        7       3        2           2       1

I want to take the maximum value of the columns starting with "C" and then subtract the "A" column ending with the same number from "B". For example, the max of the "C"s in the first row would be 9 and so I would want to subtract A4 from B (7-2)

A1    A2     A3       A4       B      C1        C2         C3       C4      new 

1     11     2        2        7       2        NA          6       9        5
4     6      12       1        3       1        6           5       2       -3
6     1      NA       1        7       3        2           2       1        1

Is this possible? Maybe using "starts_with"?

CodePudding user response：

Maybe this could help:

library(dplyr)

df %>%
  rowwise() %>%
  mutate(idx = which.max(c_across(starts_with('C'))), 
         new = B - get(paste0('A', idx))) %>%
  select(-idx)


# A tibble: 3 × 10
# Rowwise: 
     A1    A2    A3    A4     B    C1    C2    C3    C4   new
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1     1     3     2     4     7     2    NA     6     9     3
2     4     6    12     1     3     1     6     5     2    -3
3     6     1    NA     1     7     3     2     2     1     1

CodePudding user response：

Here is a base solution:

cs = grep("C", names(df))
as = grep("A", names(df))

c_max = apply(df[cs], 1, which.max)
df$new = df$B - as.matrix(df[as])[cbind(1:nrow(df), c_max)]
df
#   A1 A2 A3 A4 B C1 C2 C3 C4 new
# 1  1  3  2  4 7  2 NA  6  9   3
# 2  4  6 12  1 3  1  6  5  2  -3
# 3  6  1 NA  1 7  3  2  2  1   1

Using this data (note that in your input your A values in input are different from the A values in the output... I used the input.)

df = read.table(text = 'A1    A2     A3       A4       B      C1        C2         C3       C4 
1     3      2        4        7       2        NA          6       9
4     6      12       1        3       1        6           5       2   
6     1      NA       1        7       3        2           2       1', header = T)

CodePudding user response：

Here is one with max.col

library(dplyr)
library(tidyr)
df1 %>%
   mutate( new = B- as.data.frame(across(starts_with('A')))[
        cbind(row_number(), max.col(across(starts_with('C'),
       replace_na, -5), 'first'))])

-output

   A1 A2 A3 A4 B C1 C2 C3 C4 new
1  1  3  2  2 7  2 NA  6  9   5
2  4  6 12  1 3  1  6  5  2  -3
3  6  1 NA  1 7  3  2  2  1   1

data

df1 <- structure(list(A1 = c(1L, 4L, 6L), A2 = c(3L, 6L, 1L), A3 = c(2L, 
12L, NA), A4 = c(2L, 1L, 1L), B = c(7L, 3L, 7L), C1 = c(2L, 1L, 
3L), C2 = c(NA, 6L, 2L), C3 = c(6L, 5L, 2L), C4 = c(9L, 2L, 1L
)), class = "data.frame", row.names = c(NA, -3L))