I have a dataframe of family relationships (parent, child, spouse, etc.) which is partially filled as per example below. I am trying to use R to fill in the missing variables <NA>
, but not sure where to begin. I've tried using ifelse()
but the code becomes so unwieldy I'm sure there must be a more efficient way.
Example dataframe
family person R01 R02 R03 R04 R05 R06
1 A 1 X Spouse Child Parent Parent Parent
2 A 2 <NA> X Child-in-law Parent Parent Parent
3 A 3 <NA> <NA> X GrandParent GrandParent GrandParent
4 A 4 <NA> <NA> <NA> X Sibling Sibling
5 A 5 <NA> <NA> <NA> <NA> X Sibling
6 A 6 <NA> <NA> <NA> <NA> <NA> X
7 B 1 X Spouse Parent Parent <NA> <NA>
8 B 2 <NA> X Parent Parent <NA> <NA>
9 B 3 <NA> <NA> X Sibling <NA> <NA>
10 B 4 <NA> <NA> <NA> X <NA> <NA>
11 C 1 X Parent <NA> <NA> <NA> <NA>
12 C 2 <NA> X <NA> <NA> <NA> <NA>
where R01 is the relationship of person x
to person 1
. For the second row of the dataframe above I would need R01
to be Spouse
as that matches with R02
in the first row. The relationships would match as per the df below.
Relationship Matches
[,1] [,2]
[1,] "Spouse" "Spouse"
[2,] "Parent" "Child"
[3,] "Child" "Parent"
[4,] "GrandParent" "GrandChild"
[5,] "GrandChild" "GrandParent"
[6,] "Parent-in-Law" "Child-in-law"
[7,] "Child-in-Law" "Parent-in-law"
Code to replicate Example
df1 <- data.frame(family = c(rep("A", 6), rep("B", 4), rep("C",2)),
person = c(1:6, 1:4, 1:2),
R01 = c("X", rep(NA,5),"X", rep(NA,3),"X",NA),
R02 = c("Spouse", "X", rep(NA,4), "Spouse", "X", NA, NA, "Parent", "X"),
R03 = c("Child", "Child-in-law", "X", NA, NA, NA, "Parent", "Parent", "X", rep(NA,3)),
R04 = c(rep("Parent",2), "GrandParent", "X", NA, NA, rep("Parent",2), "Sibling", "X", NA, NA),
R05 = c(rep("Parent",2), "GrandParent", "Sibling", "X", rep(NA,7)),
R06 = c(rep("Parent",2), "GrandParent", rep("Sibling",2), "X", rep(NA,6)))
relationshipmatch <- matrix(c("Spouse", "Parent", "Child", "GrandParent", "GrandChild", "Parent-in-law", "Child-in-law", "Spouse", "Child", "Parent", "GrandChild", "GrandParent", "Child-in-law", "Parent-in-law"), ncol = 2)
CodePudding user response:
This solution works with character
only. Since you have numeric
(integer
?) in reality, you may need to adapt the the [
-indexing in the function.
I'm assuming that the frame is always ordered row-wise by person
and column-wise incrementing R01:R06
.
invert_relationships <- function(mat) {
rel <- c(Spouse = "Spouse", Child = "Parent", Parent = "Child", GrandChild = "GrandParent",
GrandParent = "GrandChild", "Child-in-law" = "Parent-in-law",
"Parent-in-law" = "Child-in-law", Sibling = "Sibling", X = "X")
mat0 <- as.matrix(mat)[,seq_len(nrow(mat))]
mat0[] <- rel[match(as.matrix(mat0), names(rel))]
mat1 <- as.data.frame(mat)[,seq_len(nrow(mat0))]
mat1[lower.tri(mat1)] <- t(mat0)[lower.tri(mat0)]#mat0[upper.tri(mat0)]
cbind(mat1, mat[,-seq_len(nrow(mat0))])
}
df1 %>%
group_by(family) %>%
mutate(invert_relationships(select(cur_data(), -person))) %>%
ungroup()
# # A tibble: 12 x 8
# family person R01 R02 R03 R04 R05 R06
# <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 A 1 X Spouse Child Parent Parent Parent
# 2 A 2 Spouse X Child-in-law Parent Parent Parent
# 3 A 3 Parent Parent-in-law X GrandParent GrandParent GrandParent
# 4 A 4 Child Child GrandChild X Sibling Sibling
# 5 A 5 Child Child GrandChild Sibling X Sibling
# 6 A 6 Child Child GrandChild Sibling Sibling X
# 7 B 1 X Spouse Parent Parent NA NA
# 8 B 2 Spouse X Parent Parent NA NA
# 9 B 3 Child Child X Sibling NA NA
# 10 B 4 Child Child Sibling X NA NA
# 11 C 1 X Parent NA NA NA NA
# 12 C 2 Child X NA NA NA NA
CodePudding user response:
You can make the relationship matrix symmetric in each family, and at the same time swap Child
with Parent
in those relationships containing them. Here stringr::str_replace_all
is used to do swapping.
library(dplyr)
df1 %>%
group_by(family) %>%
group_modify(~ {
mat <- as.matrix(select(.x, starts_with("R") & !where(~all(is.na(.x)))))
mat[lower.tri(mat)] <- stringr::str_replace_all(
t(mat)[lower.tri(mat)],
c("Parent" = "Temp", "Child" = "Parent", "Temp" = "Child")
)
cbind(select(.x, !starts_with("R")), mat)
}) %>%
ungroup()
# A tibble: 12 × 8
family person R01 R02 R03 R04 R05 R06
<chr> <int> <chr> <chr> <chr> <chr> <chr> <chr>
1 A 1 X Spouse Child Parent Parent Parent
2 A 2 Spouse X Child-in-law Parent Parent Parent
3 A 3 Parent Parent-in-law X GrandParent GrandParent GrandParent
4 A 4 Child Child GrandChild X Sibling Sibling
5 A 5 Child Child GrandChild Sibling X Sibling
6 A 6 Child Child GrandChild Sibling Sibling X
7 B 1 X Spouse Parent Parent NA NA
8 B 2 Spouse X Parent Parent NA NA
9 B 3 Child Child X Sibling NA NA
10 B 4 Child Child Sibling X NA NA
11 C 1 X Parent NA NA NA NA
12 C 2 Child X NA NA NA NA