Home > other >  collapse a dataframe in R that contains both numeric and character variables
collapse a dataframe in R that contains both numeric and character variables

Time:03-03

I have the following data.frame:

data <- data.frame("ag" = rep(LETTERS[1:4],6), 
                   "date" = c(sapply(1:3, function(x) rep(x, 8))), 
                   "num_var1"= 1:24, 
                   "num_var2"= 24:1, 
                   "alpha_var1" = LETTERS[1:24], 
                   "alpha_var2" = LETTERS[25:2] )

and I would like to summarize (mean) its rows by ag and date using dplyr. The issue is that some rows include characters: in this case, I would like to get the first entry by group (the example dataset is already sorted).

Since my dataset has several entries, I would like the code to be able to recognize whether a variable is numeric (including integers) or a character. However, the best solution that I have so far is the following one:

data %>%
 dplyr::group_by(ag, date) %>%
 summarise(across(everything(), mean))

which creates NAs for non-numeric variables. Do you have a better solution?

CodePudding user response:

Is this what you are looking for?

library(dplyr)

data %>%
  dplyr::group_by(ag, date) %>%
  summarise(across(everything(), ~
                     if(is.numeric(.x)) mean(.x) else first(.x)))

#> `summarise()` has grouped output by 'ag'. You can override using the `.groups` argument.
#> # A tibble: 12 x 6
#> # Groups:   ag [4]
#>    ag     date num_var1 num_var2 alpha_var1 alpha_var2
#>    <chr> <int>    <dbl>    <dbl> <chr>      <chr>     
#>  1 A         1        3       22 A          Y         
#>  2 A         2       11       14 I          Q         
#>  3 A         3       19        6 Q          I         
#>  4 B         1        4       21 B          X         
#>  5 B         2       12       13 J          P         
#>  6 B         3       20        5 R          H         
#>  7 C         1        5       20 C          W         
#>  8 C         2       13       12 K          O         
#>  9 C         3       21        4 S          G         
#> 10 D         1        6       19 D          V         
#> 11 D         2       14       11 L          N         
#> 12 D         3       22        3 T          F

Created on 2022-03-03 by the reprex package (v2.0.1)

CodePudding user response:

Another possible solution:

library(tidyverse)

data %>%
  group_by(ag, date) %>%
  summarise(across(where(is.numeric), mean), 
    across(where(is.character), first), .groups = "drop")

#> # A tibble: 12 × 6
#>    ag     date num_var1 num_var2 alpha_var1 alpha_var2
#>    <chr> <int>    <dbl>    <dbl> <chr>      <chr>     
#>  1 A         1        3       22 A          Y         
#>  2 A         2       11       14 I          Q         
#>  3 A         3       19        6 Q          I         
#>  4 B         1        4       21 B          X         
#>  5 B         2       12       13 J          P         
#>  6 B         3       20        5 R          H         
#>  7 C         1        5       20 C          W         
#>  8 C         2       13       12 K          O         
#>  9 C         3       21        4 S          G         
#> 10 D         1        6       19 D          V         
#> 11 D         2       14       11 L          N         
#> 12 D         3       22        3 T          F
  • Related