I try to group the rows of a data.frame by the ID column and sum up all the numerical columns and drop the string column. For that, I created a small data.frame df1:
ID string 2018 2019 2020 2021
1: a1 x2 3 3 0 4
2: a2 g3 5 5 4 0
3: a2 n2 11 6 13 3
4: a1 m3 3 21 9 8
5: a2 2w 9 1 16 5
6: a1 ps2 22 4 7 4
7: a1 kg2 6 0 9 6
and I try to get the sum of the years like in df2:
ID 2018 2019 2020 2021
1: a1 34 28 25 22
2: a2 25 12 33 8
I was trying it with the group_by and summarize function of dplyr, but I wasn't able to get what I want.
library(dplyr)
df1 %>%
group_by(ID) %>%
summarize(df$2018 = sum(2018))
Thanks a lot
CodePudding user response:
You could use
library(data.table)
df1[, lapply(.SD[, `2018`:`2021`], sum), by = ID]
This returns
ID 2018 2019 2020 2021
1: a1 34 28 25 22
2: a2 25 12 33 8
CodePudding user response:
You can use the across function to help you to select the right columns
library(tidyverse)
library(data.table)
#>
#> Attaching package: 'data.table'
#> The following objects are masked from 'package:dplyr':
#>
#> between, first, last
#> The following object is masked from 'package:purrr':
#>
#> transpose
df1 = data.table(
ID = c("a1", "a2", "a2", "a1", "a2", "a1", "a1"),
"string" = c("x2", "g3", "n2", "m3", "2w", "ps2", "kg2"),
"2018" = c(3,5,11,3,9,22,6),
"2019" = c(3,5,6,21,1,4,0),
"2020" = c(0,4,13,9,16,7,9),
"2021" = c(4,0,3,8,5,4,6))
df2 = data.table(
ID = c("a1", "a2"),
"2018" = c(34,25),
"2019" = c(28,12),
"2020" = c(25,33),
"2021" = c(22,8))
df1 |>
group_by(ID) |>
summarise(across(where(is.numeric),sum))
#> # A tibble: 2 x 5
#> ID `2018` `2019` `2020` `2021`
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 a1 34 28 25 22
#> 2 a2 25 12 33 8
Created on 2021-11-11 by the reprex package (v2.0.1)