Home > OS >  dataframe - how to perform actions on grouped dataframe
dataframe - how to perform actions on grouped dataframe

Time:02-19

I have the following dataframe morphology:

  month site  depth num.core num.plant num.leaf
  <chr> <chr> <dbl>    <dbl>     <dbl>    <dbl>
1 Oct   SB       12        1         1        5
2 Oct   SB       12        1         2       29
3 Oct   SB       12        1         3        7
4 Oct   SB       12        2         1        9
5 Oct   SB       12        2         2        4
6 Oct   SB       12        2         3       13

My aim if to count number of plants (num.plant) per core (num.core), at set date (month), and depth.

I have grouped the dataframe and counted the number of plants per core as I need:

morpho.group <- morphology %>%
  group_by(month, site, num.core, depth) %>%
  count(month,site,num.core,depth, name = "plant.count.Xcore") 
  month site   num.core depth plant.count.Xcore
  <chr> <chr>     <dbl> <dbl>             <int>
1 Dec   D           1     3                 4
2 Dec   D           2     3                 2
3 Dec   D           3     3                 3
4 Dec   D           4     3                 3
5 Dec   N           1    12                 1
6 Dec   N           2    12                 5

My issue is that I need to perform more actions on the morphology dataframe such as summing the number of leaves per core such as:

count.morpho <- morphology %>%
  group_by(month, site, num.core, depth) %>%
  summarise_at(vars("num.leaf", "num.roots"), sum)
 month site   num.core depth num.leaf num.roots
  <chr> <chr>     <dbl> <dbl>    <dbl>     <dbl>
1 Dec   D           1     3       11        13
2 Dec   D           2     3       17         8
3 Dec   D           3     3       14         4
4 Dec   D           4     3       40        10
5 Dec   N           1    12        3         2
6 Dec   N           2    12       40        10

I need to perform these actions such that they are continues and adds up to a single dataframe instead of pulling each calculated column to a new dataframe.

Any help is much appreciated :)

CodePudding user response:

count is really just a convenience function to look at n() for the groups, you can include it more literally and add other metrics.

(FYI, your data doesn't include num.roots, so I replaced it with num.plant here just for demonstration.)

morphology %>%
  group_by(month, site, num.core, depth) %>%
  summarize(
    plant.count.Xcore = n(), 
    across(c(num.leaf, num.plant), sum)
  ) %>%
  ungroup()
# # A tibble: 2 x 7
#   month site  num.core depth plant.count.Xcore num.leaf num.plant
#   <chr> <chr>    <int> <int>             <int>    <int>     <int>
# 1 Oct   SB           1    12                 3       41         6
# 2 Oct   SB           2    12                 3       26         6

FYI, summarize_at is "superseded" by across. Notice now the change occurs: use summarize as usual, use across but not assigned to something, by itself; first arg to across is a set of vars to choose, using similar methods as select including c(col1, col2), starts_with("num"), and negation of those options; the second argument is one or more functions in various ways, similar to summarize_at's function argument(s). See the colwise vignette for more details.

  • Related