Home > database >  Apply a function to a set of columns in a dataset
Apply a function to a set of columns in a dataset

Time:12-06

using this function I calculate the variance of some 3d points.

 centroid_3d_sq_dist <- function(
      point_matrix
    ) {
      
      if (nrow(point_matrix) == 1) {
        return(0)
      }
     mean_point <- apply(point_matrix, 2, mean)
    
      point_sq_distances <- apply(
        point_matrix,
        1,
        function(row_point) {
          sum((row_point - mean_point) ** 2)
        }
      )
      sum_sq_distances <- sum(point_sq_distances)
      return(sum_sq_distances)
    }
    
    
    
    point_3d_variance <- function(
      point_matrix
    ) {
      if (nrow(point_matrix) == 1) {
        return(0)
      }
      dist_var <- centroid_3d_sq_dist(point_matrix) /
        (nrow(point_matrix) - 1)
      return(dist_var)
    }

The argument of this function is a matrix (x,y,z).

Now I have a dataset with two 3D points.

 ID     Trial Size   PP    PA   FkA ciccioX ciccioY ciccioZ pinoX  pinoY   pinoZ
1 Gigi     1   40    39.6  1050. 31.5    521.   293.   10.6   516.   323.   6.41
2 Gigi     2   20.0  30.7  944.  9.35   525.   300.   12.6   520.   305.   7.09
3 Gigi     3   30    29.5  1056. 24.1    521.   298.   12.3   519.   321.   5.89
4 Gigi     5   60    53.0  1190. 53.0    680.   287.   64.4   699.   336.  68.6 
5 Bibi     1   40    38.3  1038. 31.4    524.   289.   10.9   519.   319.   6.17
6 Bibi     2   60    64.7  1293. 47.8    516.   282.   10.4   519.   330.   6.32
7 Bibi     3   20.0  33.8  1092. 17.5    523.   300.   12.8   518.   315.   6.22
8 Bibi     4   30    35.0  1108. 26.4    525.   295.   11.7   517.   320.   5.78
9 Bibi     5   50    46.5  1199. 34.2    515.   289.   11.2   517.   323.   6.27
10 Bibi    6   30    28.7  1016. 17.1    528.   298.   12.7   524.   314.   6.36

The 3D points are: ciccio: ciccioX ciccioY ciccioZ pino: pinoX pinoY pinoZ

I want to calculate the variance of ciccio and the variance of pino grouped by ID and SIZE.

I tried to do:

data %>%
  group_by(SubjectID, Size) %>%
  summarize(as.data.frame(matrix(f4(dd[7:9],dd[10:12]), nr = 1)))

But it doesn't work.

Do you have any advice?

CodePudding user response:

Your shown dataset is too small to calculate (meaningful) variations. But you could use:

library(dplyr)

df %>% 
  group_by(ID, Size) %>% 
  summarise(var_ciccio = point_3d_variance(as.matrix(across(ciccioX:ciccioZ))),
            var_pino   = point_3d_variance(as.matrix(across(pinoX:pinoZ))),
            .groups    = "drop")

This returns

# A tibble: 9 x 4
  ID     Size var_ciccio var_pinoo
  <chr> <dbl>      <dbl>     <dbl>
1 Bibi     20        0         0  
2 Bibi     30        9.5      42.7
3 Bibi     40        0         0  
4 Bibi     50        0         0  
5 Bibi     60        0         0  
6 Gigi     20        0         0  
7 Gigi     30        0         0  
8 Gigi     40        0         0  
9 Gigi     60        0         0 
  •  Tags:  
  • r
  • Related