I'm working on calculating the new diameters of trees I've measured by taking their initial diameter (taken in 2018) and then adding the centimeters of diameter growth (diam_growth)from 2020 and 2021. This would fill in the NAs in the column "dbh". Here's some sample data to help explain
tag <- c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)
diam_growth <- c(0.4, 0.5, NA, 0.7, 0.8, NA, 0.9, 1.0, NA, 0.1, 0.2, NA)
dbh <- c(NA, NA, 10, NA, NA, 15, NA, NA, 7, NA, NA, 12)
year <- c(2020, 2021, 2018, 2020, 2021, 2018, 2020, 2021, 2018, 2020, 2021, 2018)
tree_growth <- data.frame(tag, diam_growth, dbh, year)
tag diam_growth dbh year
1 1 0.4 NA 2020
2 1 0.5 NA 2021
3 1 NA 10 2018
4 2 0.7 NA 2020
5 2 0.8 NA 2021
6 2 NA 15 2018
7 3 0.9 NA 2020
8 3 1.0 NA 2021
9 3 NA 7 2018
10 4 0.1 NA 2020
11 4 0.2 NA 2021
12 4 NA 12 2018
So for example, for tag 1, the code would take the 2018 dbh (10) and add 0.4 for 2020 and 0.5 for 2021. Then for tag 2, it would add 0.7 and 0.8 to the initial DBH of 15 for 2020 and 2021 respectively and so on for each tag ID.
I'm happy to clarify further if this isn't super clear!
Any help or suggestions would be greatly appreciated!!
CodePudding user response:
Grouped by 'tag', replace
the 'diam_growth' values where they are NA
in 'dbh' by adding the values with the non-NA value from 'dbh'
library(dplyr)
tree_growth %>%
group_by(tag) %>%
mutate(diam_growth = replace(diam_growth, is.na(dbh),
diam_growth[is.na(dbh)] dbh[!is.na(dbh)])) %>%
ungroup
-output
# A tibble: 12 × 4
tag diam_growth dbh year
<dbl> <dbl> <dbl> <dbl>
1 1 10.4 NA 2020
2 1 10.5 NA 2021
3 1 NA 10 2018
4 2 15.7 NA 2020
5 2 15.8 NA 2021
6 2 NA 15 2018
7 3 7.9 NA 2020
8 3 8 NA 2021
9 3 NA 7 2018
10 4 12.1 NA 2020
11 4 12.2 NA 2021
12 4 NA 12 2018
Or use case_when
tree_growth %>%
group_by(tag) %>%
mutate(diam_growth = case_when(is.na(dbh) ~ diam_growth
dbh[!is.na(dbh)], TRUE ~ diam_growth)) %>%
ungroup