I have x,y coordinates of cells grouped by a patient ID as such:
PatientID | cX | cY |
---|---|---|
1 | 5348 | 4902 |
1 | 6360 | 4887 |
1 | 5398 | 4874 |
2 | 5348 | 4902 |
2 | 6360 | 4887 |
2 | 5398 | 4874 |
Where each row is x,y of an individual cell and its associated patient ID.
Essentially what I want to do is create a distance matrix for each patient ID and calculate the minimum, maximum, and mean distance values for each individual cell and add them as columns to the original dataframe.
CodePudding user response:
I made your data into a usable form with:
patient_data <- data.frame(
PatientId = c(1, 1, 1, 2, 2, 2),
cX = c(5348, 6360, 5398, 5348, 6360, 5398),
cY = c(4902, 4887, 4874, 4902, 4887, 4874)
)
PatientId cX cY
1 1 5348 4902
2 1 6360 4887
3 1 5398 4874
4 2 5348 4902
5 2 6360 4887
6 2 5398 4874
Then you are looking for the dplyr::group_by
and dplyr::group_modify
functions. You can use dplyr::group_map
to check the output from the different steps.
library(magrittr)
patient_data %>%
dplyr::group_by(PatientId) %>%
dplyr::group_modify(~ {
distance_matrix <- .x %>% dist(diag = FALSE, upper = TRUE) %>% as.matrix() # get distance matrix
diag(distance_matrix) <- NA # set diagonal values to NA
data.frame( # get min/max/avg for each row of the distance matrix
cell_id = seq(nrow(distance_matrix)),
min_dist = apply(distance_matrix, MARGIN = 1, FUN = min, na.rm = TRUE),
max_dist = apply(distance_matrix, MARGIN = 1, FUN = max, na.rm = TRUE),
avg_dist = apply(distance_matrix, MARGIN = 1, FUN = mean, na.rm = TRUE)
)
})
# A tibble: 6 × 5
# Groups: PatientId [2]
PatientId cell_id min_dist max_dist avg_dist
<dbl> <int> <dbl> <dbl> <dbl>
1 1 1 57.3 1012. 535.
2 1 2 962. 1012. 987.
3 1 3 57.3 962. 510.
4 2 1 57.3 1012. 535.
5 2 2 962. 1012. 987.
6 2 3 57.3 962. 510.