Home > Software engineering >  Compare the distance of same individual between groups [R]
Compare the distance of same individual between groups [R]

Time:10-01

I would like to know difference of position of same individuals between different groups. In other words, the difference of position between the "B150" (df$ID) of the groups "1" & and the "B150" of the group "2" (df$date), and repeat that for all individuals ("B145", "B140",...). The distance between different individuals (e.g. "B150" & "B145") does not interest me.

Here is the sample of the dataset:

df <- structure(list(ID = c("B150", "B145", "B140", "B136", "B150", 
"B145", "B140", "B136"), Ellipsoid_height_m = c(155.5, 155.5, 
155.4, 155.3, 155.5, 155.5, 155.4, 155.3), X_Lambert_72_m = c(232762.455, 
232763.271, 232764.444, 232765.093, 232764.955, 232765.771, 232766.944, 
232767.593), Y_Lambert_72_m = c(125994.937, 125996.489, 125997.991, 
125998.854, 125994.937, 125996.489, 125997.991, 125998.854), 
    Z_DNG_plus130cm = c(111.102, 111.102, 111.002, 110.902, 111.102, 
    111.102, 111.002, 110.902), Z_DNG = c(109.802, 109.802, 109.702, 
    109.602, 109.802, 109.802, 109.702, 109.602), Validite_Z = c("Non", 
    "Non", "Non", "Non", "Non", "Non", "Non", "Non"), Type = c("Pittag", 
    "Pittag", "Pittag", "Pittag", "Pittag", "Pittag", "Pittag", 
    "Pittag"), date = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
    2L), .Label = c("1", "2"), class = "factor")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -8L))

CodePudding user response:

Here is a way.
In order to make things simple, keep only the variables of relevant to the distance, reshape to wide format putting the same individuals in the same row and compute the distances.

suppressPackageStartupMessages({
  library(dplyr)
  library(tidyr)
})

euclid <- function(x1, y1, x2, y2) {
  sqrt( (x1-x2)^2   (y1-y2)^2 )
}
df %>%
  select(ID, contains("Lambert"), date) %>%
  pivot_wider(
    id_cols = ID,
    names_from = date,
    values_from = contains("Lambert")
  ) %>%
  mutate(Dist = euclid(X_Lambert_72_m_1, Y_Lambert_72_m_1, 
                       X_Lambert_72_m_2, Y_Lambert_72_m_2)) %>%
  select(ID, Dist)
#> # A tibble: 4 × 2
#>   ID     Dist
#>   <chr> <dbl>
#> 1 B150    2.5
#> 2 B145    2.5
#> 3 B140    2.5
#> 4 B136    2.5

Created on 2022-10-01 with reprex v2.0.2

CodePudding user response:

A combination of pivot_longer() and pivot_wider() from the tidyr package will get the 2 dates onto the same row for each ID and variable, then use summarise() from dplyr to subtract the respective coordinates and get the square root of the sums. Finally another pivot_wider() will get the data in a similar layout to its original shape. (Leave out this last pivot if you're happy with the data in long format.)

I'm assuming that X_Lambert_72_m, Y_Lambert_72_m and Z_DNG are your x, y and z cordinates from which to calculate the distance. If not, then change the variables in the select() line.

library(tidyr)
library(dplyr)

want <- df %>% 
  select(ID, date, X_Lambert_72_m, Y_Lambert_72_m, Z_DNG) %>% # keep ID, date and co-ordinates
  pivot_longer(cols=3:5,names_to='measure') %>% # pivot to longer format with one column of measure values
  pivot_wider(names_from=date) %>% # pivot wider to get one column for each date
  group_by(ID) %>% # group to get one result per ID 
  summarise(distance=sqrt(sum((`2`-`1`)^2))) # calculate distnace

  • Related