Home > Software design >  How to calculate the distance of each pair of one column in r
How to calculate the distance of each pair of one column in r

Time:10-17

I tried to calculate the lp norm of all pairs in one column. The answer just not right and I don't know why.

Here is my sample code.

 a <- c(23,41,32,58,26,77,45,67,23,78,22,9,20)
lp_norm = function(x, y, p){
 return(sum((abs(x-y))^p)^(1/p))
}
i = 1
while (i <= 13) {
        for(j in i:12){
    lp1 <- lp_norm(a[i],a[j 1],p=1)
    
        }
        i=i 1
        print(lp1)
        
}
}

And I have a dataframe with 10 column need to do the same thing. How can I apply this to all column?

CodePudding user response:

Here is one way to calculate this for different combinations of columns in a dataframe.

library(tidyverse)    

lp_norm <- function(data, x, y, p){
  data |>
    select(v1:= !!sym(x), v2:= !!sym(y))|>
    summarise(lp_norm = sum((abs(v1-v2))^p)^(1/p)) |>
    pull(lp_norm)
}

calc_lp_norm <- function(data, vars, p){
  combn(vars, 2) |>
    t() |>
    `colnames<-`(c("var1", "var2")) |>
    as_tibble()  |>
    mutate(lp_norm = map2_dbl(var1, var2, ~lp_norm(x = .x, y = .y, data = data, p = p)))
}


#few columns
calc_lp_norm(mtcars, c("mpg", "cyl", "hp", "wt"), p = 1)
#> # A tibble: 6 x 3
#>   var1  var2  lp_norm
#>   <chr> <chr>   <dbl>
#> 1 mpg   cyl     445. 
#> 2 mpg   hp     4051. 
#> 3 mpg   wt      540. 
#> 4 cyl   hp     4496  
#> 5 cyl   wt       95.0
#> 6 hp    wt     4591.

#all columns
calc_lp_norm(mtcars, colnames(mtcars), p = 1)
#> # A tibble: 55 x 3
#>    var1  var2  lp_norm
#>    <chr> <chr>   <dbl>
#>  1 mpg   cyl      445.
#>  2 mpg   disp    6740.
#>  3 mpg   hp      4051.
#>  4 mpg   drat     528.
#>  5 mpg   wt       540.
#>  6 mpg   qsec     136.
#>  7 mpg   vs       629.
#>  8 mpg   am       630.
#>  9 mpg   gear     525.
#> 10 mpg   carb     553.
#> # ... with 45 more rows

CodePudding user response:

We could either use combn (only returns pairwise combinations) in base R. Loop over the columns of data.frame 'dat', apply pair combinations of elements (assuming all are unique or else do combn(unique(u), 2) and apply the lp_norm function

lapply(dat, \(u) combn(u, 2, FUN = \(x) lp_norm(x[1], x[2], p = 1)))

Or if we need the output as a matrix (include pairwise combinations of mirror types as well i.e. 1 vs 2 and 2 vs 1 and 1 vs 1)

lapply(dat, \(u) outer(u, u, FUN = Vectorize(\(x, y) lp_norm(x, y, p = 1))))

But, as this is a distance function, using outer will be calculating the same distance twice distance between the same element

  •  Tags:  
  • r
  • Related