How do I populate upper.tri of matrix with matched integers from the lower.tri?-CodePudding

Issue

I have a dataframe of familial relationships coded with integers, where R01 is the relationship of person N to person 1, R02 their relationship to person 2 etc.

However, only the lower.tri of each family matrix is coded, so I am trying to write a function to match the correct relationship in the upper.tri.

Relationships

The relationships are coded in integers as follows:

1 = Spouse, 2 = Cohabiting partner, 3 = Son/daughter, 4 = Step son/daughter, 5 = Foster child, 6 = Son-in-law/daughter-in-law, 7 = Parent/guardian, 8 = Step-parent, 9 = Foster parent, 10 = Parent-in-law, 11 = Brother/sister, 12 = Step-brother/sister, 13 = Foster brother/sister, 14 = Brother/sister-in-law, 15 = Grand-child, 16 = Grand-parent, 17 = Other relative, 18 = Other non-relative.

thus the relationships are:

rel = c("1" = 1, "2" = 2, "3" = 7, "4" = 8, "5" = 9, "6" = 10, "7" = 3, "8" = 4, "9" = 5, "10" = 6, "11" = 11, "12" = 12, "13" = 13, "14" = 14, "15" = 16, "16" = 15, "17" = 17, "18" = 18)

Example Data

   household person R01 R02 R03 R04 R05 R06
1          1      1  NA  NA  NA  NA  NA  NA
2          1      2   1  NA  NA  NA  NA  NA
3          1      3   3   3  NA  NA  NA  NA
4          1      4   3   3  11  NA  NA  NA
5          2      1  NA  NA  NA  NA  NA  NA
6          2      2   3  NA  NA  NA  NA  NA
7          2      3  15   3  NA  NA  NA  NA
8          3      1  NA  NA  NA  NA  NA  NA
9          3      2  18  NA  NA  NA  NA  NA
10         4      1  NA  NA  NA  NA  NA  NA
11         5      1  NA  NA  NA  NA  NA  NA
12         5      2   5  NA  NA  NA  NA  NA

Required Output

   household person R01 R02 R03 R04 R05 R06
1          1      1  NA   1   7   7  NA  NA
2          1      2   1  NA   7   7  NA  NA
3          1      3   3   3  NA  11  NA  NA
4          1      4   3   3  11  NA  NA  NA
5          2      1  NA   1  16  NA  NA  NA
6          2      2   3  NA   1  NA  NA  NA
7          2      3  15   3  NA  NA  NA  NA
8          3      1  NA  18  NA  NA  NA  NA
9          3      2  18  NA  NA  NA  NA  NA
10         4      1  NA  NA  NA  NA  NA  NA
11         5      1  NA   9  NA  NA  NA  NA
12         5      2   5  NA  NA  NA  NA  NA

Example Code

df <- data.frame(household = c(1,1,1,1,2,2,2,3,3,4,5,5),
                 person = c(1,2,3,4,1,2,3,1,2,1,1,2),
                 R01 = c(NA, 1, 3, 3, NA, 3, 15, NA, 18, NA, NA, 5),
                 R02 = c(NA, NA, 3, 3, NA, NA, 3, rep(NA, 5)),
                 R03 = c(rep(NA,3), 11, rep(NA, 8)),
                 R04 = rep(NA, 12),
                 R05 = rep(NA, 12),
                 R06 = rep(NA, 12))

I know it's possible to write a function to do the matrix match and then apply it to each household with dplyr, however I'm not great at functions yet so I'm running into issues in a few areas.

CodePudding user response：

You can make the relationship matrix symmetric in each household, and at the same time recode the elements according to rel.

library(dplyr)

df %>%
  group_by(household) %>%
  group_modify(~ {
    mat <- as.matrix(.x[-1][1:nrow(.x)])
    mat[upper.tri(mat)] <- recode(t(mat)[upper.tri(mat)], !!!rel)
    cbind(.x[1], as_tibble(mat))
  }) %>% 
  ungroup()

# A tibble: 12 × 6
   household person   R01   R02   R03   R04
       <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
 1         1      1    NA     1     7     7
 2         1      2     1    NA     7     7
 3         1      3     3     3    NA    11
 4         1      4     3     3    11    NA
 5         2      1    NA     7    16    NA
 6         2      2     3    NA     7    NA
 7         2      3    15     3    NA    NA
 8         3      1    NA    18    NA    NA
 9         3      2    18    NA    NA    NA
10         4      1    NA    NA    NA    NA
11         5      1    NA     9    NA    NA
12         5      2     5    NA    NA    NA

CodePudding user response：

Here's a way to do it mostly using base R.

First, create f, a function that replace the upper triangle of a matrix with the matching value from the rel vector and the lower triangle of the same matrix.

Then, split your data according to the household, compute the lengths of each group so that the resulting matrix has the right number of columns, and then apply the function to each groups. Finally, bind_rows and cbind with the original data set.

f <- function(m) {
  m[upper.tri(m)] <- match(t(m)[upper.tri(m)], rel)
  m
}

l <- split(df[3:6], df$household)
len <- lapply(l, \(l) ncol(l) - (sum(sapply(l, \(x) any(!is.na(x))))   1))
l <- mapply(\(x, y) x[1:(length(x) - y)], l, len, SIMPLIFY = F)

cbind(df[1:2],
      dplyr::bind_rows(lapply(l, f)))

output

   household person R01 R02 R03 R04
1          1      1  NA   1   7   7
2          1      2   1  NA   7   7
3          1      3   3   3  NA  11
4          1      4   3   3  11  NA
5          2      1  NA   7  16  NA
6          2      2   3  NA   7  NA
7          2      3  15   3  NA  NA
8          3      1  NA  18  NA  NA
9          3      2  18  NA  NA  NA
10         4      1  NA  NA  NA  NA
11         5      1  NA   9  NA  NA
12         5      2   5  NA  NA  NA