I want to compute the mean for each subject id with different lengths and using a start year and end-CodePudding

I have dataset and for each observation there is a start year and end year. The dataset has being populated for all the years (2003 to 2011). And these years are variables/columns of their own and so for each observation, I just to keep the data that correspond to both the start year and end year. For example if

Id	Start Yr	End yr	Y2003	Y2004	Y2005	Y2006	Y2007	Y2008	Y2009
T001	2003	2007	2.3	1.1	1.2	1.8	7.2	5.6	10.2
T002	2006	2007	3.0	3.9	2.2	2.7	5.4	4.5	3.9

so subject with id T001, I want to only take the values in year 2003 to year 2007 and compute the mean. So I want to have

Id	Start Yr	End yr	Y2003	Y2004	Y2005	Y2006	Y2007	Y2008	Y2009	Mean
T001	2003	2007	2.3	1.1	1.2	1.8	7.2
T002	2006	2007				2.7	5.4

For id T001, the mean will be 2.72 and for T002 , the mean will be 4.05. Thank you. I'm new to R

CodePudding user response：

You could use rowwise() and c_across() to get the job done.

library(tidyverse)

df <- tribble(
  ~Id,  ~"Start Yr",    ~"End yr",  ~Y2003,     ~Y2004,     ~Y2005,     ~Y2006,     ~Y2007, ~Y2008, ~Y2009,
  "T001",   2003,   2007,   2.3,    1.1,    1.2,    1.8,    7.2,    5.6,    10.2,
  "T002",   2006,   2007,   3.0,    3.9,    2.2,    2.7,    5.4,    4.5,    3.9,
)


df %>% 
  rowwise() %>% 
  mutate(Mean = mean(c_across(
    cols = ends_with(as.character(`Start Yr`)):ends_with(as.character(`End yr`))
  )))
#> # A tibble: 2 × 11
#> # Rowwise: 
#>   Id    `Start Yr` `End yr` Y2003 Y2004 Y2005 Y2006 Y2007 Y2008 Y2009  Mean
#>   <chr>      <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 T001        2003     2007   2.3   1.1   1.2   1.8   7.2   5.6  10.2  2.72
#> 2 T002        2006     2007   3     3.9   2.2   2.7   5.4   4.5   3.9  4.05

But they require the use of the dplyr package.

CodePudding user response：

A tidyverse approach to achieve your desired result may look like so:

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(-c(Id, Start.Yr, End.yr), names_prefix = "Y", names_transform = as.integer) %>%
  mutate(value = if_else(name >= Start.Yr & name <= End.yr, value, NA_real_)) %>%
  group_by(Id) %>%
  mutate(Mean = mean(value, na.rm = TRUE)) %>%
  ungroup() %>%
  pivot_wider(names_from = name, values_from = value, names_prefix = "Y") %>%
  relocate(Mean, .after = last_col())
#> # A tibble: 2 × 11
#>   Id    Start.Yr End.yr Y2003 Y2004 Y2005 Y2006 Y2007 Y2008 Y2009  Mean
#>   <chr>    <int>  <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 T001      2003   2007   2.3   1.1   1.2   1.8   7.2    NA    NA  2.72
#> 2 T002      2006   2007  NA    NA    NA     2.7   5.4    NA    NA  4.05

DATA

df <- structure(list(Id = c("T001", "T002"), Start.Yr = c(2003L, 2006L), End.yr = c(2007L, 2007L), Y2003 = c(2.3, 3), Y2004 = c(
  1.1,
  3.9
), Y2005 = c(1.2, 2.2), Y2006 = c(1.8, 2.7), Y2007 = c(
  7.2,
  5.4
), Y2008 = c(5.6, 4.5), Y2009 = c(10.2, 3.9)), class = "data.frame", row.names = c(
  NA,
  -2L
))
``