I am working with imaging data in a format similar to this:
name side modality
<chr> <chr> <chr>
1 alex right xray
2 alex left xray
3 brad right xray
4 brad left xray
5 alex right ct
6 alex left ct
7 brad right ct
8 alex right mri
9 brad right mri
10 brad left mri
Given each person is supposed to have left and right images of all modalities, it shows that Alex is missing a left MRI, Brad is missing a left CT, and Charlie (who doesn't appear in data
at all) has all images missing. I am trying to create a summary table that shows which elements are 'present' or 'absent', given a list of
names
(where Charlie is included). It would look something like this:
name left_xray right_xray left_ct right_ct left_mri right_mri n_absent
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 alex present present present present absent present 1
2 brad present present absent present present present 1
3 charlie absent absent absent absent absent absent 6
I have used various dplyr verbs to get a list of patients with missing data for each modality, but I'm not really sure where to start with creating a summary table.
Dummy data:
data <- tibble(name = c('alex', 'alex', 'brad', 'brad', 'alex', 'alex', 'brad', 'alex', 'brad', 'brad'),
side = c('right', 'left', 'right', 'left', 'right', 'left', 'right', 'right','right','left'),
modality = c('xray','xray','xray','xray','ct','ct','ct','mri','mri','mri'))
names <- tibble(name = c('alex', 'brad', 'charlie'))
Thank you!
CodePudding user response:
Code
library(dplyr)
library(tidyr)
expand_grid(
name = c('alex', 'brad', 'charlie'),
modality = c("xray","ct","mri"),
side = c("right",'left')
) %>%
left_join(
data %>%
mutate(aux = "present")
) %>%
mutate(aux = replace_na(aux,"absent")) %>%
unite(modality_side,side,modality) %>%
pivot_wider(names_from = modality_side,values_from = aux) %>%
rowwise() %>%
mutate(n_absent = sum(c_across(-name) == "absent"))
Output
# A tibble: 3 x 8
# Rowwise:
name right_xray left_xray right_ct left_ct right_mri left_mri n_absent
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <int>
1 alex present present present present present absent 1
2 brad present present present absent present present 1
3 charlie absent absent absent absent absent absent 6
CodePudding user response:
You can first concatenate the side
and modality
columns together, then generate a complete
combination of it and the names. Then transform this "long" formate into a "wide" format, and calculate the number of absence.
Update
I've added full_join(tmp, by = "name")
in my solution to accommodate the OP's updated request.
library(tidyverse)
data %>% mutate(tmp = paste0(side, "_", modality),
tmp2 = 1,
.keep = "unused") %>%
complete(name, tmp) %>%
pivot_wider(names_from = tmp, values_from = tmp2) %>%
full_join(tmp, by = "name") %>%
mutate(across(-name, ~ifelse(is.na(.x), "absent", "present"))) %>%
rowwise() %>%
mutate(n_absent = sum(c_across(-name) == "absent")) %>%
ungroup()
# A tibble: 3 × 8
name left_ct left_mri left_xray right_ct right_…¹ right…² n_abs…³
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <int>
1 alex present absent present present present present 1
2 brad absent present present present present present 1
3 charlie absent absent absent absent absent absent 6
# … with abbreviated variable names ¹right_mri, ²right_xray,
# ³n_absent
CodePudding user response:
An approach using full_join
of the present and all possible combinations of name, side and modality.
library(dplyr)
library(tidyr)
full_join(df %>% mutate(grp = 1),
setNames(crossing(
unique(unlist(c(df$name, Names))), unique(df$side), unique(df$modality)),
colnames(df)) %>% mutate(grp = 2), c("name", "side", "modality")) %>%
select(name:grp.x) %>%
mutate(grp.x = if_else(is.na(grp.x), "absent", "present")) %>%
pivot_wider(names_from=c("side", "modality"), values_from=grp.x) %>%
rowwise() %>%
mutate(n_absent = sum(across(contains("_"), ~ .x == "absent"))) %>%
ungroup()
Result
# A tibble: 3 × 8
name right_xray left_xray right_ct left_ct right_mri left_mri n_absent
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <int>
1 alex present present present present present absent 1
2 brad present present present absent present present 1
3 charlie absent absent absent absent absent absent 6
Data
df <- structure(list(name = c("alex", "alex", "brad", "brad", "alex",
"alex", "brad", "alex", "brad", "brad"), side = c("right", "left",
"right", "left", "right", "left", "right", "right", "right",
"left"), modality = c("xray", "xray", "xray", "xray", "ct", "ct",
"ct", "mri", "mri", "mri")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L))
Names <- structure(list(name = c("alex", "brad", "charlie")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -3L))