I have this df containg numeric values in each cell and i need to create a new variable with the mean values considering specific columns.
I need the mean values for each row considering all columns between 2013
:2019
SE_at `2002` `2003` `2004` `2005` `2006` `2007` `2008` `2009` `2010` `2011` `2012` `2013` `2014` `2015` `2016` `2017` `2018` `2019` `2020` `2021` `2022`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 01 NA 22 17 15.2 16.7 5.1 5.4 5.3 7.4 12.8 10.6 NA 6 7.4 7.5 8.4 8.8 6.1 8.3 17.4 38.7
2 02 NA 18.1 16 12.8 13.2 3.5 4.6 5.1 7.9 11.8 10.1 NA 6.6 6.8 7.1 8.1 8.3 5.1 8.1 15.1 36.3
3 03 NA 13.5 16.9 11.3 13.5 2.8 3.8 4.4 6.7 12.7 9.2 NA 5.1 6.2 6.9 8.8 7 5 7.9 15.5 37.3
4 04 NA 14.9 15 9.5 14.3 4.2 4.1 4.2 7.1 12.3 10.6 NA 6.7 7.8 6.9 7.9 7.9 4.3 7.7 14 34.9
5 05 NA 11.9 14.3 12.9 13.7 4.5 4.6 4.6 6.6 11.4 10.1 NA 7.3 14.6 6.4 7.6 8.4 4.2 8 13.2 36.2
6 06 NA 13.9 22 18.1 12.6 4.3 6.1 5.1 7.4 11.6 9.4 NA 6.5 14.5 6.4 8.5 9.2 4.7 8.3 14.7 26.2
i tryied summarise_at and calculating manually but i dont found a good solution.
heres the dput
structure(list(SE_at = c("01", "02", "03", "04", "05", "06"),
`2002` = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_), `2003` = c(22, 18.1, 13.5, 14.9, 11.9, 13.9),
`2004` = c(17, 16, 16.9, 15, 14.3, 22), `2005` = c(15.2,
12.8, 11.3, 9.5, 12.9, 18.1), `2006` = c(16.7, 13.2, 13.5,
14.3, 13.7, 12.6), `2007` = c(5.1, 3.5, 2.8, 4.2, 4.5, 4.3
), `2008` = c(5.4, 4.6, 3.8, 4.1, 4.6, 6.1), `2009` = c(5.3,
5.1, 4.4, 4.2, 4.6, 5.1), `2010` = c(7.4, 7.9, 6.7, 7.1,
6.6, 7.4), `2011` = c(12.8, 11.8, 12.7, 12.3, 11.4, 11.6),
`2012` = c(10.6, 10.1, 9.2, 10.6, 10.1, 9.4), `2013` = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `2014` = c(6,
6.6, 5.1, 6.7, 7.3, 6.5), `2015` = c(7.4, 6.8, 6.2, 7.8,
14.6, 14.5), `2016` = c(7.5, 7.1, 6.9, 6.9, 6.4, 6.4), `2017` = c(8.4,
8.1, 8.8, 7.9, 7.6, 8.5), `2018` = c(8.8, 8.3, 7, 7.9, 8.4,
9.2), `2019` = c(6.1, 5.1, 5, 4.3, 4.2, 4.7), `2020` = c(8.3,
8.1, 7.9, 7.7, 8, 8.3), `2021` = c(17.4, 15.1, 15.5, 14,
13.2, 14.7), `2022` = c(38.7, 36.3, 37.3, 34.9, 36.2, 26.2
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
CodePudding user response:
Select the columns within across
and get the rowMeans
library(dplyr)
df1 <- df1 %>%
mutate(Mean = rowMeans(across(`2013`:`2019`), na.rm = TRUE))
CodePudding user response:
Here is a variation of @akruns answer without across
but with select
:
library(dplyr)
df %>%
mutate(Mean = rowMeans(select(.,`2013`:`2019`), na.rm=TRUE), .before=1)
# A tibble: 6 x 23
Mean SE_at `2002` `2003` `2004` `2005` `2006` `2007` `2008` `2009`
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 7.37 01 NA 22 17 15.2 16.7 5.1 5.4 5.3
2 7 02 NA 18.1 16 12.8 13.2 3.5 4.6 5.1
3 6.5 03 NA 13.5 16.9 11.3 13.5 2.8 3.8 4.4
4 6.92 04 NA 14.9 15 9.5 14.3 4.2 4.1 4.2
5 8.08 05 NA 11.9 14.3 12.9 13.7 4.5 4.6 4.6
6 8.3 06 NA 13.9 22 18.1 12.6 4.3 6.1 5.1
# ... with 13 more variables: `2010` <dbl>, `2011` <dbl>, `2012` <dbl>,
# `2013` <dbl>, `2014` <dbl>, `2015` <dbl>, `2016` <dbl>, `2017` <dbl>,
# `2018` <dbl>, `2019` <dbl>, `2020` <dbl>, `2021` <dbl>, `2022` <dbl>