Home > database >  Replace NA based on similar data in differend rows
Replace NA based on similar data in differend rows

Time:08-02

I have a dataframe as follows: (first three columns, fourth shows what I want)

table

Type <- rep(c("fruit", "vegetable"), each=4)
Kind <- c("Reference", "apple", "apple", "banana", "Reference", "tomato", "tomato", "mushroom")
Expected <- c(3,2,NA, NA, 5,4,NA, NA)
wanted <- c(3,2,2,3, 5,4,4,5)

mydat <- data.frame(Type, Kind, Expected, wanted)

I need the column "Expected" to be filled based on the data in other rows. If a row as NA in Expected, I want the value from Expected to be taken from the closest similar row regarding Type and Kind (example NA with an apple: fruit -> apple -> take the value 2 from the expected mass for all apple NAs). If there is none available, as for banana, I want to take the value it can find in the closest Reference, in the banana example this would be fruit -> Reference. The column wanted shows the values I would like to receive. Is this possible?

Thanks for your help!

CodePudding user response:

We can do this in 2 stages. First stage we fill by Type and Kind, second stage we fill again by Type, lumping together the "Reference" Kind with whatever still needs filling:

mydat %>%
  group_by(Type, Kind) %>%
  fill(Expected, .direction = "downup") %>%
  group_by(Type, temp_kind = Kind == "Reference" | is.na(Expected)) %>% 
  fill(Expected, .direction = "downup") %>%
  ungroup() %>%
  select(-temp_kind)
# # A tibble: 8 × 4
#   Type      Kind      Expected wanted
#   <chr>     <chr>        <dbl>  <dbl>
# 1 fruit     Reference        3      3
# 2 fruit     apple            2      2
# 3 fruit     apple            2      2
# 4 fruit     banana           3      3
# 5 vegetable Reference        5      5
# 6 vegetable tomato           4      4
# 7 vegetable tomato           4      4
# 8 vegetable mushroom         5      5

CodePudding user response:

Is this what you need?

mydat %>%
  group_by(Type) %>%
  mutate(wanted_2 = ifelse(is.na(Expected) & Kind == lag(Kind), lag(Expected), 
                           ifelse(is.na(Expected) & Kind != lag(Kind), Expected[Kind == "Reference"], Expected)))
# A tibble: 8 × 5
# Groups:   Type [2]
  Type      Kind      Expected wanted wanted_2
  <chr>     <chr>        <dbl>  <dbl>    <dbl>
1 fruit     Reference        3      3        3
2 fruit     apple            2      2        2
3 fruit     apple           NA      2        2
4 fruit     banana          NA      3        3
5 vegetable Reference        5      5        5
6 vegetable tomato           4      4        4
7 vegetable tomato          NA      4        4
8 vegetable mushroom        NA      5        5
  • Related