Issue
I have a dataframe of familial relationships coded with integers, where R01
is the relationship of person N
to person 1
, R02
their relationship to person 2
etc.
However, only the lower.tri
of each family matrix is coded, so I am trying to write a function to match
the correct relationship in the upper.tri
.
Relationships
The relationships are coded in integers as follows:
1
= Spouse, 2
= Cohabiting partner, 3
= Son/daughter, 4
= Step son/daughter, 5
= Foster child, 6
= Son-in-law/daughter-in-law, 7
= Parent/guardian, 8
= Step-parent, 9
= Foster parent, 10
= Parent-in-law, 11
= Brother/sister, 12
= Step-brother/sister, 13
= Foster brother/sister, 14
= Brother/sister-in-law, 15
= Grand-child, 16
= Grand-parent, 17
= Other relative, 18
= Other non-relative.
thus the relationships are:
rel = c("1" = 1, "2" = 2, "3" = 7, "4" = 8, "5" = 9, "6" = 10, "7" = 3, "8" = 4, "9" = 5, "10" = 6, "11" = 11, "12" = 12, "13" = 13, "14" = 14, "15" = 16, "16" = 15, "17" = 17, "18" = 18)
Example Data
household person R01 R02 R03 R04 R05 R06
1 1 1 NA NA NA NA NA NA
2 1 2 1 NA NA NA NA NA
3 1 3 3 3 NA NA NA NA
4 1 4 3 3 11 NA NA NA
5 2 1 NA NA NA NA NA NA
6 2 2 3 NA NA NA NA NA
7 2 3 15 3 NA NA NA NA
8 3 1 NA NA NA NA NA NA
9 3 2 18 NA NA NA NA NA
10 4 1 NA NA NA NA NA NA
11 5 1 NA NA NA NA NA NA
12 5 2 5 NA NA NA NA NA
Required Output
household person R01 R02 R03 R04 R05 R06
1 1 1 NA 1 7 7 NA NA
2 1 2 1 NA 7 7 NA NA
3 1 3 3 3 NA 11 NA NA
4 1 4 3 3 11 NA NA NA
5 2 1 NA 1 16 NA NA NA
6 2 2 3 NA 1 NA NA NA
7 2 3 15 3 NA NA NA NA
8 3 1 NA 18 NA NA NA NA
9 3 2 18 NA NA NA NA NA
10 4 1 NA NA NA NA NA NA
11 5 1 NA 9 NA NA NA NA
12 5 2 5 NA NA NA NA NA
Example Code
df <- data.frame(household = c(1,1,1,1,2,2,2,3,3,4,5,5),
person = c(1,2,3,4,1,2,3,1,2,1,1,2),
R01 = c(NA, 1, 3, 3, NA, 3, 15, NA, 18, NA, NA, 5),
R02 = c(NA, NA, 3, 3, NA, NA, 3, rep(NA, 5)),
R03 = c(rep(NA,3), 11, rep(NA, 8)),
R04 = rep(NA, 12),
R05 = rep(NA, 12),
R06 = rep(NA, 12))
I know it's possible to write a function to do the matrix match and then apply it to each household with dplyr
, however I'm not great at functions yet so I'm running into issues in a few areas.
CodePudding user response:
You can make the relationship matrix symmetric in each household
, and at the same time recode the elements according to rel
.
library(dplyr)
df %>%
group_by(household) %>%
group_modify(~ {
mat <- as.matrix(.x[-1][1:nrow(.x)])
mat[upper.tri(mat)] <- recode(t(mat)[upper.tri(mat)], !!!rel)
cbind(.x[1], as_tibble(mat))
}) %>%
ungroup()
# A tibble: 12 × 6
household person R01 R02 R03 R04
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 NA 1 7 7
2 1 2 1 NA 7 7
3 1 3 3 3 NA 11
4 1 4 3 3 11 NA
5 2 1 NA 7 16 NA
6 2 2 3 NA 7 NA
7 2 3 15 3 NA NA
8 3 1 NA 18 NA NA
9 3 2 18 NA NA NA
10 4 1 NA NA NA NA
11 5 1 NA 9 NA NA
12 5 2 5 NA NA NA
CodePudding user response:
Here's a way to do it mostly using base R
.
First, create f
, a function that replace the upper triangle of a matrix with the match
ing value from the rel
vector and the lower triangle of the same matrix.
Then, split
your data according to the household, compute the lengths of each group so that the resulting matrix has the right number of columns, and then apply the function to each groups. Finally, bind_rows
and cbind
with the original data set.
f <- function(m) {
m[upper.tri(m)] <- match(t(m)[upper.tri(m)], rel)
m
}
l <- split(df[3:6], df$household)
len <- lapply(l, \(l) ncol(l) - (sum(sapply(l, \(x) any(!is.na(x)))) 1))
l <- mapply(\(x, y) x[1:(length(x) - y)], l, len, SIMPLIFY = F)
cbind(df[1:2],
dplyr::bind_rows(lapply(l, f)))
output
household person R01 R02 R03 R04
1 1 1 NA 1 7 7
2 1 2 1 NA 7 7
3 1 3 3 3 NA 11
4 1 4 3 3 11 NA
5 2 1 NA 7 16 NA
6 2 2 3 NA 7 NA
7 2 3 15 3 NA NA
8 3 1 NA 18 NA NA
9 3 2 18 NA NA NA
10 4 1 NA NA NA NA
11 5 1 NA 9 NA NA
12 5 2 5 NA NA NA