I tried to calculate the lp norm of all pairs in one column. The answer just not right and I don't know why.
Here is my sample code.
a <- c(23,41,32,58,26,77,45,67,23,78,22,9,20)
lp_norm = function(x, y, p){
return(sum((abs(x-y))^p)^(1/p))
}
i = 1
while (i <= 13) {
for(j in i:12){
lp1 <- lp_norm(a[i],a[j 1],p=1)
}
i=i 1
print(lp1)
}
}
And I have a dataframe with 10 column need to do the same thing. How can I apply this to all column?
CodePudding user response:
Here is one way to calculate this for different combinations of columns in a dataframe.
library(tidyverse)
lp_norm <- function(data, x, y, p){
data |>
select(v1:= !!sym(x), v2:= !!sym(y))|>
summarise(lp_norm = sum((abs(v1-v2))^p)^(1/p)) |>
pull(lp_norm)
}
calc_lp_norm <- function(data, vars, p){
combn(vars, 2) |>
t() |>
`colnames<-`(c("var1", "var2")) |>
as_tibble() |>
mutate(lp_norm = map2_dbl(var1, var2, ~lp_norm(x = .x, y = .y, data = data, p = p)))
}
#few columns
calc_lp_norm(mtcars, c("mpg", "cyl", "hp", "wt"), p = 1)
#> # A tibble: 6 x 3
#> var1 var2 lp_norm
#> <chr> <chr> <dbl>
#> 1 mpg cyl 445.
#> 2 mpg hp 4051.
#> 3 mpg wt 540.
#> 4 cyl hp 4496
#> 5 cyl wt 95.0
#> 6 hp wt 4591.
#all columns
calc_lp_norm(mtcars, colnames(mtcars), p = 1)
#> # A tibble: 55 x 3
#> var1 var2 lp_norm
#> <chr> <chr> <dbl>
#> 1 mpg cyl 445.
#> 2 mpg disp 6740.
#> 3 mpg hp 4051.
#> 4 mpg drat 528.
#> 5 mpg wt 540.
#> 6 mpg qsec 136.
#> 7 mpg vs 629.
#> 8 mpg am 630.
#> 9 mpg gear 525.
#> 10 mpg carb 553.
#> # ... with 45 more rows
CodePudding user response:
We could either use combn
(only returns pairwise combinations) in base R
. Loop over the columns of data.frame 'dat', apply pair combinations of elements (assuming all are unique or else do combn(unique(u), 2)
and apply the lp_norm
function
lapply(dat, \(u) combn(u, 2, FUN = \(x) lp_norm(x[1], x[2], p = 1)))
Or if we need the output as a matrix
(include pairwise combinations of mirror types as well i.e. 1 vs 2 and 2 vs 1 and 1 vs 1)
lapply(dat, \(u) outer(u, u, FUN = Vectorize(\(x, y) lp_norm(x, y, p = 1))))
But, as this is a distance function, using outer
will be calculating the same distance twice distance between the same element