I am working with a dataset of yearly observations structured like this
data<-data.frame(ID=c(rep("A",6),rep("B",12),rep("C",9)),
FeatherID=rep(c("a","b","c"),9),
Year=c(rep(2020,3),rep(2021,3),rep(2017,3),rep(2019,3),rep(2020,3),rep(2021,3),rep(2018,3),rep(2019,3),rep(2020,3)),
Age_Field=c(rep("1CY",3),rep("2CY",3),rep("1CY",3),rep(">2CY",3),rep(">2CY",3),rep(">2CY",3),rep(">2CY",3),rep(">2CY",3),rep(">2CY",3)))
ID FeatherID Year Age_Field
1 A a 2020 1CY
2 A b 2020 1CY
3 A c 2020 1CY
4 A a 2021 2CY
5 A b 2021 2CY
6 A c 2021 2CY
7 B a 2017 1CY
8 B b 2017 1CY
9 B c 2017 1CY
10 B a 2019 >2CY
11 B b 2019 >2CY
12 B c 2019 >2CY
13 B a 2020 >2CY
14 B b 2020 >2CY
15 B c 2020 >2CY
16 B a 2021 >2CY
17 B b 2021 >2CY
18 B c 2021 >2CY
19 C a 2018 >2CY
20 C b 2018 >2CY
21 C c 2018 >2CY
22 C a 2019 >2CY
23 C b 2019 >2CY
24 C c 2019 >2CY
25 C a 2020 >2CY
26 C b 2020 >2CY
27 C c 2020 >2CY
For each yearly observation the age class of the individual is determined as first year (1CY), second year (2CY) or older (>2CY). I would like to combine this age class data with the information given by the year the individuals are observed in, in order to calculate a column with the true (minimum) age of each individual in each year, corresponding to this for the earlier example
> data.frame(Age_True=c(rep(0,3),rep(1,3),rep(0,3),rep(2,3),rep(3,3),rep(4,3),rep(2,3),rep(3,3),rep(4,3)))
Age_True
1 0
2 0
3 0
4 1
5 1
6 1
7 0
8 0
9 0
10 2
11 2
12 2
13 3
14 3
15 3
16 4
17 4
18 4
19 2
20 2
21 2
22 3
23 3
24 3
25 4
26 4
27 4
I've been trying different ways to do this, but I can't quite figure it out. Hopefully someone knows a way to do this. Cheers!
CodePudding user response:
This may be a bit hacky, but it works well. First, convert the Age_Field
column into 1, 2, 3 values with readr::parse_number
. Then computing the age with simple arithmetic.
library(reader)
library(dplyr)
data %>%
group_by(ID, Age_Field) %>%
mutate(Age = ifelse(nchar(Age_Field) == 4, parse_number(Age_Field) 1, parse_number(Age_Field)),
Age = Year - Year[first(Age)] Age - 1)
output
# A tibble: 27 x 5
# Groups: ID, Age_Field [5]
ID FeatherID Year Age_Field Age
<chr> <chr> <dbl> <chr> <dbl>
1 A a 2020 1CY 0
2 A b 2020 1CY 0
3 A c 2020 1CY 0
4 A a 2021 2CY 1
5 A b 2021 2CY 1
6 A c 2021 2CY 1
7 B a 2017 1CY 0
8 B b 2017 1CY 0
9 B c 2017 1CY 0
10 B a 2019 >2CY 2
11 B b 2019 >2CY 2
12 B c 2019 >2CY 2
13 B a 2020 >2CY 3
14 B b 2020 >2CY 3
15 B c 2020 >2CY 3
16 B a 2021 >2CY 4
17 B b 2021 >2CY 4
18 B c 2021 >2CY 4
19 C a 2018 >2CY 2
20 C b 2018 >2CY 2
21 C c 2018 >2CY 2
22 C a 2019 >2CY 3
23 C b 2019 >2CY 3
24 C c 2019 >2CY 3
25 C a 2020 >2CY 4
26 C b 2020 >2CY 4
27 C c 2020 >2CY 4