Home > Software engineering >  Supplementing age class determination with observation data in R
Supplementing age class determination with observation data in R

Time:03-01

I am working with a dataset of yearly observations structured like this

data<-data.frame(ID=c(rep("A",6),rep("B",12),rep("C",9)),
             FeatherID=rep(c("a","b","c"),9),
             Year=c(rep(2020,3),rep(2021,3),rep(2017,3),rep(2019,3),rep(2020,3),rep(2021,3),rep(2018,3),rep(2019,3),rep(2020,3)),
             Age_Field=c(rep("1CY",3),rep("2CY",3),rep("1CY",3),rep(">2CY",3),rep(">2CY",3),rep(">2CY",3),rep(">2CY",3),rep(">2CY",3),rep(">2CY",3)))
   ID FeatherID Year Age_Field
1   A         a 2020       1CY
2   A         b 2020       1CY
3   A         c 2020       1CY
4   A         a 2021       2CY
5   A         b 2021       2CY
6   A         c 2021       2CY
7   B         a 2017       1CY
8   B         b 2017       1CY
9   B         c 2017       1CY
10  B         a 2019      >2CY
11  B         b 2019      >2CY
12  B         c 2019      >2CY
13  B         a 2020      >2CY
14  B         b 2020      >2CY
15  B         c 2020      >2CY
16  B         a 2021      >2CY
17  B         b 2021      >2CY
18  B         c 2021      >2CY
19  C         a 2018      >2CY
20  C         b 2018      >2CY
21  C         c 2018      >2CY
22  C         a 2019      >2CY
23  C         b 2019      >2CY
24  C         c 2019      >2CY
25  C         a 2020      >2CY
26  C         b 2020      >2CY
27  C         c 2020      >2CY

For each yearly observation the age class of the individual is determined as first year (1CY), second year (2CY) or older (>2CY). I would like to combine this age class data with the information given by the year the individuals are observed in, in order to calculate a column with the true (minimum) age of each individual in each year, corresponding to this for the earlier example

 > data.frame(Age_True=c(rep(0,3),rep(1,3),rep(0,3),rep(2,3),rep(3,3),rep(4,3),rep(2,3),rep(3,3),rep(4,3)))
   Age_True
1         0
2         0
3         0
4         1
5         1
6         1
7         0
8         0
9         0
10        2
11        2
12        2
13        3
14        3
15        3
16        4
17        4
18        4
19        2
20        2
21        2
22        3
23        3
24        3
25        4
26        4
27        4

I've been trying different ways to do this, but I can't quite figure it out. Hopefully someone knows a way to do this. Cheers!

CodePudding user response:

This may be a bit hacky, but it works well. First, convert the Age_Field column into 1, 2, 3 values with readr::parse_number. Then computing the age with simple arithmetic.

library(reader)
library(dplyr)
data %>% 
  group_by(ID, Age_Field) %>% 
  mutate(Age = ifelse(nchar(Age_Field) == 4, parse_number(Age_Field)   1, parse_number(Age_Field)),
         Age = Year - Year[first(Age)]   Age - 1)

output

# A tibble: 27 x 5
# Groups:   ID, Age_Field [5]
   ID    FeatherID  Year Age_Field   Age
   <chr> <chr>     <dbl> <chr>     <dbl>
 1 A     a          2020 1CY           0
 2 A     b          2020 1CY           0
 3 A     c          2020 1CY           0
 4 A     a          2021 2CY           1
 5 A     b          2021 2CY           1
 6 A     c          2021 2CY           1
 7 B     a          2017 1CY           0
 8 B     b          2017 1CY           0
 9 B     c          2017 1CY           0
10 B     a          2019 >2CY          2
11 B     b          2019 >2CY          2
12 B     c          2019 >2CY          2
13 B     a          2020 >2CY          3
14 B     b          2020 >2CY          3
15 B     c          2020 >2CY          3
16 B     a          2021 >2CY          4
17 B     b          2021 >2CY          4
18 B     c          2021 >2CY          4
19 C     a          2018 >2CY          2
20 C     b          2018 >2CY          2
21 C     c          2018 >2CY          2
22 C     a          2019 >2CY          3
23 C     b          2019 >2CY          3
24 C     c          2019 >2CY          3
25 C     a          2020 >2CY          4
26 C     b          2020 >2CY          4
27 C     c          2020 >2CY          4
  • Related