My data are as follows:
year site value dist
2019 A 200 10
2019 B NA 15
2019 C 50 30
2019 D NA 36
2020 A 300 10
2020 B NA 15
2020 C 100 30
2020 D NA 36
I would like to replace the NA values for site B as follows.
Site B NA = the sum of the products of the following two equations:
1) value of site A * (1/(dist of site B - dist of site A))
and
2) value of site C * (1/(dist of site C - dist of site B))
I would like to do this for each year, ideally using dplyr. Thank you in advance!
CodePudding user response:
library(dplyr)
df %>%
arrange(year, site) %>%
group_by(year) %>%
mutate(value = ifelse(is.na(value) & site == "B",
(lag(value) * (1/(dist - lag(dist)))) (lead(value)* (1/(lead(dist) - dist))),
value))
Output
year site value dist
<int> <chr> <dbl> <int>
1 2019 A 200 10
2 2019 B 43.3 15
3 2019 C 50 30
4 2019 D NA 36
5 2020 A 300 10
6 2020 B 66.7 15
7 2020 C 100 30
8 2020 D NA 36
Data
df <- structure(list(year = c(2019L, 2019L, 2019L, 2019L, 2020L, 2020L,
2020L, 2020L), site = c("A", "B", "C", "D", "A", "B", "C", "D"
), value = c(200L, NA, 50L, NA, 300L, NA, 100L, NA), dist = c(10L,
15L, 30L, 36L, 10L, 15L, 30L, 36L)), class = "data.frame", row.names = c(NA,
-8L))
CodePudding user response:
Update: Thanks to AndrewGB: wraped around parenthesis:
If I understand it correctly it should be like this?
library(dplyr)
df %>%
group_by(year) %>%
mutate(value = ifelse(is.na(value) & site == "B",
value[site=="A"]*(1/(dist[site=="B"] - dist[site=="A"]))
value[site=="C"]*(1/(dist[site=="C"] - dist[site=="B"])),
value)
)
year site value dist
<int> <chr> <dbl> <int>
1 2019 A 200 10
2 2019 B 43.3 15
3 2019 C 50 30
4 2019 D NA 36
5 2020 A 300 10
6 2020 B 66.7 15
7 2020 C 100 30
8 2020 D NA 36