I have dataset and for each observation there is a start year and end year. The dataset has being populated for all the years (2003 to 2011). And these years are variables/columns of their own and so for each observation, I just to keep the data that correspond to both the start year and end year. For example if
Id | Start Yr | End yr | Y2003 | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 |
---|---|---|---|---|---|---|---|---|---|
T001 | 2003 | 2007 | 2.3 | 1.1 | 1.2 | 1.8 | 7.2 | 5.6 | 10.2 |
T002 | 2006 | 2007 | 3.0 | 3.9 | 2.2 | 2.7 | 5.4 | 4.5 | 3.9 |
so subject with id T001, I want to only take the values in year 2003 to year 2007 and compute the mean. So I want to have
Id | Start Yr | End yr | Y2003 | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Mean |
---|---|---|---|---|---|---|---|---|---|---|
T001 | 2003 | 2007 | 2.3 | 1.1 | 1.2 | 1.8 | 7.2 | |||
T002 | 2006 | 2007 | 2.7 | 5.4 |
For id T001, the mean will be 2.72 and for T002 , the mean will be 4.05. Thank you. I'm new to R
CodePudding user response:
You could use rowwise()
and c_across()
to get the job done.
library(tidyverse)
df <- tribble(
~Id, ~"Start Yr", ~"End yr", ~Y2003, ~Y2004, ~Y2005, ~Y2006, ~Y2007, ~Y2008, ~Y2009,
"T001", 2003, 2007, 2.3, 1.1, 1.2, 1.8, 7.2, 5.6, 10.2,
"T002", 2006, 2007, 3.0, 3.9, 2.2, 2.7, 5.4, 4.5, 3.9,
)
df %>%
rowwise() %>%
mutate(Mean = mean(c_across(
cols = ends_with(as.character(`Start Yr`)):ends_with(as.character(`End yr`))
)))
#> # A tibble: 2 × 11
#> # Rowwise:
#> Id `Start Yr` `End yr` Y2003 Y2004 Y2005 Y2006 Y2007 Y2008 Y2009 Mean
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 T001 2003 2007 2.3 1.1 1.2 1.8 7.2 5.6 10.2 2.72
#> 2 T002 2006 2007 3 3.9 2.2 2.7 5.4 4.5 3.9 4.05
But they require the use of the dplyr
package.
CodePudding user response:
A tidyverse
approach to achieve your desired result may look like so:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(-c(Id, Start.Yr, End.yr), names_prefix = "Y", names_transform = as.integer) %>%
mutate(value = if_else(name >= Start.Yr & name <= End.yr, value, NA_real_)) %>%
group_by(Id) %>%
mutate(Mean = mean(value, na.rm = TRUE)) %>%
ungroup() %>%
pivot_wider(names_from = name, values_from = value, names_prefix = "Y") %>%
relocate(Mean, .after = last_col())
#> # A tibble: 2 × 11
#> Id Start.Yr End.yr Y2003 Y2004 Y2005 Y2006 Y2007 Y2008 Y2009 Mean
#> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 T001 2003 2007 2.3 1.1 1.2 1.8 7.2 NA NA 2.72
#> 2 T002 2006 2007 NA NA NA 2.7 5.4 NA NA 4.05
DATA
df <- structure(list(Id = c("T001", "T002"), Start.Yr = c(2003L, 2006L), End.yr = c(2007L, 2007L), Y2003 = c(2.3, 3), Y2004 = c(
1.1,
3.9
), Y2005 = c(1.2, 2.2), Y2006 = c(1.8, 2.7), Y2007 = c(
7.2,
5.4
), Y2008 = c(5.6, 4.5), Y2009 = c(10.2, 3.9)), class = "data.frame", row.names = c(
NA,
-2L
))
``