Home > Enterprise >  New variable containing ROW MEAN at specific columns
New variable containing ROW MEAN at specific columns

Time:07-30

I have this df containg numeric values in each cell and i need to create a new variable with the mean values considering specific columns.

I need the mean values for each row considering all columns between 2013:2019

    SE_at `2002` `2003` `2004` `2005` `2006` `2007` `2008` `2009` `2010` `2011` `2012` `2013` `2014` `2015` `2016` `2017` `2018` `2019` `2020` `2021` `2022`
  <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 01        NA   22     17     15.2   16.7    5.1    5.4    5.3    7.4   12.8   10.6     NA    6      7.4    7.5    8.4    8.8    6.1    8.3   17.4   38.7
2 02        NA   18.1   16     12.8   13.2    3.5    4.6    5.1    7.9   11.8   10.1     NA    6.6    6.8    7.1    8.1    8.3    5.1    8.1   15.1   36.3
3 03        NA   13.5   16.9   11.3   13.5    2.8    3.8    4.4    6.7   12.7    9.2     NA    5.1    6.2    6.9    8.8    7      5      7.9   15.5   37.3
4 04        NA   14.9   15      9.5   14.3    4.2    4.1    4.2    7.1   12.3   10.6     NA    6.7    7.8    6.9    7.9    7.9    4.3    7.7   14     34.9
5 05        NA   11.9   14.3   12.9   13.7    4.5    4.6    4.6    6.6   11.4   10.1     NA    7.3   14.6    6.4    7.6    8.4    4.2    8     13.2   36.2
6 06        NA   13.9   22     18.1   12.6    4.3    6.1    5.1    7.4   11.6    9.4     NA    6.5   14.5    6.4    8.5    9.2    4.7    8.3   14.7   26.2

i tryied summarise_at and calculating manually but i dont found a good solution.

heres the dput

     structure(list(SE_at = c("01", "02", "03", "04", "05", "06"), 
    `2002` = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), `2003` = c(22, 18.1, 13.5, 14.9, 11.9, 13.9), 
    `2004` = c(17, 16, 16.9, 15, 14.3, 22), `2005` = c(15.2, 
    12.8, 11.3, 9.5, 12.9, 18.1), `2006` = c(16.7, 13.2, 13.5, 
    14.3, 13.7, 12.6), `2007` = c(5.1, 3.5, 2.8, 4.2, 4.5, 4.3
    ), `2008` = c(5.4, 4.6, 3.8, 4.1, 4.6, 6.1), `2009` = c(5.3, 
    5.1, 4.4, 4.2, 4.6, 5.1), `2010` = c(7.4, 7.9, 6.7, 7.1, 
    6.6, 7.4), `2011` = c(12.8, 11.8, 12.7, 12.3, 11.4, 11.6), 
    `2012` = c(10.6, 10.1, 9.2, 10.6, 10.1, 9.4), `2013` = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `2014` = c(6, 
    6.6, 5.1, 6.7, 7.3, 6.5), `2015` = c(7.4, 6.8, 6.2, 7.8, 
    14.6, 14.5), `2016` = c(7.5, 7.1, 6.9, 6.9, 6.4, 6.4), `2017` = c(8.4, 
    8.1, 8.8, 7.9, 7.6, 8.5), `2018` = c(8.8, 8.3, 7, 7.9, 8.4, 
    9.2), `2019` = c(6.1, 5.1, 5, 4.3, 4.2, 4.7), `2020` = c(8.3, 
    8.1, 7.9, 7.7, 8, 8.3), `2021` = c(17.4, 15.1, 15.5, 14, 
    13.2, 14.7), `2022` = c(38.7, 36.3, 37.3, 34.9, 36.2, 26.2
    )), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

CodePudding user response:

Select the columns within across and get the rowMeans

library(dplyr)
df1 <- df1 %>%
     mutate(Mean = rowMeans(across(`2013`:`2019`), na.rm = TRUE))

CodePudding user response:

Here is a variation of @akruns answer without across but with select:

library(dplyr)

df %>% 
  mutate(Mean = rowMeans(select(.,`2013`:`2019`), na.rm=TRUE), .before=1)

# A tibble: 6 x 23
   Mean SE_at `2002` `2003` `2004` `2005` `2006` `2007` `2008` `2009`
  <dbl> <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1  7.37 01        NA   22     17     15.2   16.7    5.1    5.4    5.3
2  7    02        NA   18.1   16     12.8   13.2    3.5    4.6    5.1
3  6.5  03        NA   13.5   16.9   11.3   13.5    2.8    3.8    4.4
4  6.92 04        NA   14.9   15      9.5   14.3    4.2    4.1    4.2
5  8.08 05        NA   11.9   14.3   12.9   13.7    4.5    4.6    4.6
6  8.3  06        NA   13.9   22     18.1   12.6    4.3    6.1    5.1
# ... with 13 more variables: `2010` <dbl>, `2011` <dbl>, `2012` <dbl>,
#   `2013` <dbl>, `2014` <dbl>, `2015` <dbl>, `2016` <dbl>, `2017` <dbl>,
#   `2018` <dbl>, `2019` <dbl>, `2020` <dbl>, `2021` <dbl>, `2022` <dbl>
  • Related