I'm relatively new to R and I have a large dataset where some values are missing and I would like to replace them with values that are already in the dataframe. See the little example below:
In the following example df ... :
df <- read.table(text = "Farm, Month, Stable, Food_consumed
AA, Apr, Out, 45
AA, Jun, Out, 56
BB, Apr, Out, 37
BB, Jun, Out, 79
CC, Apr, Out, 24
AA, Apr, In,
BB, Apr, In,
CC, Apr, In, 6.7", header = TRUE, sep = ",")
... I want R to fill in the two empty cells. For Stable=In on farm AA in april it should fill in the value from Stable=Out on farm AA in april. It should do this for every farm.
I tried using an ifelse statement in dplyr but I could not find out how to insert the correct value. Also, I guess it needs to be looped, to do the same thing for every farm.
df %>% mutate(Food_consumed=ifelse(is.na(Food_consumed) & Stable == "In" & Month == "Apr", .... , ....))
In the end, the dataframe should look like this:
df <- read.table(text = "Farm, Month, Stable, Food_consumed
AA, Apr, Out, 45
AA, Jun, Out, 56
BB, Apr, Out, 37
BB, Jun, Out, 79
CC, Apr, Out, 24
AA, Apr, In, 45
BB, Apr, In, 37
CC, Apr, In, 6.7", header = TRUE, sep = ",")
Any help is highly appreciated.
CodePudding user response:
Using tidyr::fill
you could fill the NA
values like so:
library(tidyr)
library(dplyr)
df %>%
group_by(Farm, Month) %>%
fill(Food_consumed) %>%
ungroup()
#> # A tibble: 8 × 4
#> Farm Month Stable Food_consumed
#> <chr> <chr> <chr> <dbl>
#> 1 AA " Apr" " Out" 45
#> 2 AA " Jun" " Out" 56
#> 3 BB " Apr" " Out" 37
#> 4 BB " Jun" " Out" 79
#> 5 CC " Apr" " Out" 24
#> 6 AA " Apr" " In" 45
#> 7 BB " Apr" " In" 37
#> 8 CC " Apr" " In" 6.7