Consider a dataset:
df1 <- tibble::tribble(~Place, ~Year, ~Cake, ~Coffee, ~Tea,
"Local Cafe 1", 2022, 50, 100, 30,
"Local Cafe 1", 2021, 50, NA, 30,
"Local Cafe 1", 2020, 50, 80, NA,
"Local Cafe 1", 2019, 50, 70, 20,
"Local Cafe 1", 2018, NA, 60, 20,
"Local Cafe 2", 2022, 60, NA, 40,
"Local Cafe 2", 2021, NA, 50, NA,
"Local Cafe 2", 2020, 40, 40, NA,
"Local Cafe 2", 2019, 30, 30, NA,
"Local Cafe 3", 2022, 30, 40, NA,
"Local Cafe 3", 2021, NA, NA, NA)
Here is the same dataset for a visual representation:
The desired actions:
- if sum is between, then mean (example: sequence 30, NA, 40 => NA<- 35 (30 40)/2
- If gap is more than one, then equal growth (example: sequence 30, NA, NA, 60 => 30, 40, 50, 60.
- If NA is first (say 2021 value is 100, and 2022 - NA), then sequence if available (say, 10, 20, 30, NA => 10, 20, 30, 40; or, if there is no data for sequence, then the final value is available (example: 30, NA => 30, 30)
- If NA is last - then the same logic: (NA, 10, 20, 30 => 0, 10, 20, 30; or if NA, 20 => 20, 20
The desired output:
Orange cells - are the filled-in values. I would be thankful for any suggestions for a nice solution :)
CodePudding user response:
We could group by 'Place', apply na.approx
(from zoo
) on the columns 'Cake' to 'Tea' and fill
the NAs at the end with the previous non-NA values
library(dplyr)
library(tidyr)
library(zoo)
df1 %>%
group_by(Place) %>%
mutate(across(Cake:Tea, ~ na.approx(.x, na.rm = FALSE))) %>%
fill(Cake:Tea, .direction = "downup") %>%
ungroup
-output
# A tibble: 11 × 5
Place Year Cake Coffee Tea
<chr> <dbl> <dbl> <dbl> <dbl>
1 Local Cafe 1 2022 50 100 30
2 Local Cafe 1 2021 50 90 30
3 Local Cafe 1 2020 50 80 25
4 Local Cafe 1 2019 50 70 20
5 Local Cafe 1 2018 50 60 20
6 Local Cafe 2 2022 60 50 40
7 Local Cafe 2 2021 50 50 40
8 Local Cafe 2 2020 40 40 40
9 Local Cafe 2 2019 30 30 40
10 Local Cafe 3 2022 30 40 NA
11 Local Cafe 3 2021 30 40 NA