Home > Net >  Group rows based one column and sum up the rest of the columns
Group rows based one column and sum up the rest of the columns

Time:11-12

I try to group the rows of a data.frame by the ID column and sum up all the numerical columns and drop the string column. For that, I created a small data.frame df1:

   ID string 2018 2019 2020 2021
1: a1     x2    3    3    0    4
2: a2     g3    5    5    4    0
3: a2     n2   11    6   13    3
4: a1     m3    3   21    9    8
5: a2     2w    9    1   16    5
6: a1    ps2   22    4    7    4
7: a1    kg2    6    0    9    6

and I try to get the sum of the years like in df2:

   ID 2018 2019 2020 2021
1: a1   34   28   25   22
2: a2   25   12   33    8

I was trying it with the group_by and summarize function of dplyr, but I wasn't able to get what I want.

library(dplyr)
df1 %>%
  group_by(ID) %>%
  summarize(df$2018 = sum(2018))

Thanks a lot

CodePudding user response:

You could use

library(data.table)

df1[, lapply(.SD[, `2018`:`2021`], sum), by = ID]

This returns

   ID 2018 2019 2020 2021
1: a1   34   28   25   22
2: a2   25   12   33    8

CodePudding user response:

You can use the across function to help you to select the right columns

library(tidyverse)

library(data.table)
#> 
#> Attaching package: 'data.table'
#> The following objects are masked from 'package:dplyr':
#> 
#>     between, first, last
#> The following object is masked from 'package:purrr':
#> 
#>     transpose
df1 = data.table(
  ID = c("a1", "a2", "a2", "a1", "a2", "a1", "a1"),
  "string" = c("x2", "g3", "n2", "m3", "2w", "ps2", "kg2"),
  "2018" = c(3,5,11,3,9,22,6),
  "2019" = c(3,5,6,21,1,4,0),
  "2020" = c(0,4,13,9,16,7,9),
  "2021" = c(4,0,3,8,5,4,6))


df2 = data.table(
  ID = c("a1", "a2"),
  "2018" = c(34,25),
  "2019" = c(28,12),
  "2020" = c(25,33),
  "2021" = c(22,8))

df1 |> 
  group_by(ID) |> 
  summarise(across(where(is.numeric),sum))
#> # A tibble: 2 x 5
#>   ID    `2018` `2019` `2020` `2021`
#>   <chr>  <dbl>  <dbl>  <dbl>  <dbl>
#> 1 a1        34     28     25     22
#> 2 a2        25     12     33      8

Created on 2021-11-11 by the reprex package (v2.0.1)

  • Related