Home > Mobile >  Using dplyr to apply a function to each group of a dataset
Using dplyr to apply a function to each group of a dataset

Time:12-05

I have this dataframe.

Sub <- c(1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2)
trial <-c(1,1,1,1,2,2,2,2,2,2,1,1,1,1,2,2,2,2,2,2)
One <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
Two <- c(1,0,0,0,1,0,0,0,0,0,1,0,0,0,1,1,1,0,0,1)
Three <- c(2,0,0,1,3,0,0,0,0,1,7,8,0,0,0,1,1,1,1,0)
Four  <- c(3,4,5,4,3,4,5,6,7,8,6,5,4,5,6,7,6,5,6,5)
Five  <- c(3,4,5,4,6,7,5,4,3,2,3,4,5,4,3,5,7,4,3,5)
Six <- c(3,4,5,4,6,7,5,4,3,2,3,4,5,4,3,5,7,4,3,5)
Seven <- c(3,4,5,4,9,7,5,4,3,2,3,4,5,4,3,5,7,4,3,5)

dat <- data.frame(Sub, trial, One, Two, Three, Four, Five, Six, Seven)

I created this function to calculate the correlation among my variables.

fun <- function(a,b,c,d,e,f,g) {
  v = cor(a,b)
  v1 = cor(a,c)
  v2 = cor(a,d)
  v3 = cor(a,e)
  v4 = cor(a,f)
  v5 = cor(a,g)
  
  return(c(v,v1,v2,v3,v4,v5))
    
  
}

I need to apply this function to each group of my dataset (Sub,trial).

dat %>%
    group_by(Sub,trial) %>%
    summarize(as.data.frame(matrix(fun(One, Two, Three, Four, Five, Six, Seven), nr = 1)))

However I got this result:

 Sub trial    V1    V2    V3    V4    V5    V6
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     1    NA    NA    NA    NA    NA    NA
2     1     2    NA    NA    NA    NA    NA    NA
3     2     1    NA    NA    NA    NA    NA    NA
4     2     2    NA    NA    NA    NA    NA    NA

Sub/trial are well grouped. But I got NA results for the other variables.

Do you have any advice?

Thank you.

CodePudding user response:

The solution by user @user438383 is the correct one.

The reason you get NA has nothing to do with applying the function.

As you get the the warning that standard deviation is zero you may consider this: R - Warning message: "In cor(...): the standard deviation is zero"

Here is an example:

# generate a list of dataframes with your groups:

my_list <- dat %>% 
  group_by(Sub, trial) %>% 
  group_split()

[[1]]
# A tibble: 5 x 9
    Sub trial   One   Two Three  Four  Five   Six Seven
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     1     1     1     2     3     3     3     3
2     1     1     1     0     0     4     4     4     4
3     1     1     1     0     0     5     5     5     5
4     1     1     1     0     1     4     4     4     4
5     1     1     1     1     7     6     3     3     3

[[2]]
# A tibble: 6 x 9
    Sub trial   One   Two Three  Four  Five   Six Seven
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     2     1     1     3     3     6     6     9
2     1     2     1     0     0     4     7     7     7
3     1     2     1     0     0     5     5     5     5
4     1     2     1     0     0     6     4     4     4
5     1     2     1     0     0     7     3     3     3
6     1     2     1     0     1     8     2     2     2

[[3]]
# A tibble: 3 x 9
    Sub trial   One   Two Three  Four  Five   Six Seven
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     2     1     1     0     8     5     4     4     4
2     2     1     1     0     0     4     5     5     5
3     2     1     1     0     0     5     4     4     4

[[4]]
# A tibble: 6 x 9
    Sub trial   One   Two Three  Four  Five   Six Seven
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     2     2     1     1     0     6     3     3     3
2     2     2     1     1     1     7     5     5     5
3     2     2     1     1     1     6     7     7     7
4     2     2     1     0     1     5     4     4     4
5     2     2     1     0     1     6     3     3     3
6     2     2     1     1     0     5     5     5     5

Now apply cor to the first group

my_list[[1]] %>%
  summarise(across(Two:Seven, ~cor(One, .)))

# gives:
# A tibble: 1 x 6
    Two Three  Four  Five   Six Seven
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1    NA    NA    NA    NA    NA    NA
Warning messages:
1: In cor(One, Two) : Standardabweichung ist Null
2: In cor(One, Three) : Standardabweichung ist Null
3: In cor(One, Four) : Standardabweichung ist Null
4: In cor(One, Five) : Standardabweichung ist Null
5: In cor(One, Six) : Standardabweichung ist Null
6: In cor(One, Seven) : Standardabweichung ist Null

# or correlation of two columns only One and two of group one
cor(my_list[[1]]$One, my_list[[1]]$Two)

# gives:
[1] NA
Warning message:
In cor(my_list[[1]]$One, my_list[[1]]$Two) : Standardabweichung ist Null

An extrapolated example with the mtcars dataset:

mtcars %>% 
  relocate(cyl, vs, everything()) %>% 
  group_by(cyl, vs) %>% 
  summarise(across(hp:carb, ~cor(., mpg)))
  
    cyl    vs     hp    drat     wt    qsec      am    gear   carb
  <dbl> <dbl>  <dbl>   <dbl>  <dbl>   <dbl>   <dbl>   <dbl>  <dbl>
1     4     0 NA     NA      NA     NA      NA      NA      NA    
2     4     1 -0.522  0.466  -0.721 -0.296   0.557   0.442  -0.189
3     6     0 -1      1      -0.101  0.931  NA      -1      -1    
4     6     1 -0.248 -0.249  -0.936 -0.0424 NA      -0.442  -0.442
5     8     0 -0.284  0.0479 -0.650 -0.104   0.0496  0.0496 -0.394
Warning messages:
1: In cor(am, mpg) : Standardabweichung ist Null
2: In cor(am, mpg) : Standardabweichung ist Null
  • Related