Assume that you have a dataset like starwars. Also assume that you have 2 columns one numeric with 20 NA values and the other with species (human,Droid,machine,etc).
How to convert using pipes, only the na values that belong to category humans to the mean of the heights?
If we convert it to the total it will be wrong as machines may be a lit smaller or higher and as a result we will have some strange values as for the height of the humans.
P.s. I know how to do it using replace or ifelse, but how to add the categorization
CodePudding user response:
In the starwars scenario, you can do the following
library(dplyr)
starwars %>%
group_by(species) %>%
mutate(height = if_else(species == "Human" & is.na(height), mean(height, na.rm = TRUE), as.double(height))) %>%
ungroup()
As you can see from here, height
is filled with the average only with Human as species
library(dplyr)
starwars %>%
group_by(species) %>%
mutate(newheight = if_else(species == "Human" & is.na(height), mean(height, na.rm = TRUE), as.double(height))) %>%
ungroup() %>%
select(species, height, newheight) %>%
filter(is.na(height))
#> # A tibble: 6 x 3
#> species height newheight
#> <chr> <int> <dbl>
#> 1 Human NA 177.
#> 2 Human NA 177.
#> 3 Human NA 177.
#> 4 Human NA 177.
#> 5 Droid NA NA
#> 6 NA NA NA
In this specific example, you need to transform height
into a double
because it's an integer
, and, since if_else
is type-consistent and from the mean
you receive a double
, you need to transform height
accordingly.
CodePudding user response:
If I understand you correctly, you just want to replace NAs by group means?
This should do:
data(starwars)
head(starwars)
#This shows one missing value (NAs) for "Droid"
starwars %>%
group_by(species) %>%
summarize(M = mean(height, na.rm=T),
NAs = sum(is.na(height)))
#Replace NAs by group-wise means
starwars <- starwars %>%
group_by(species) %>%
mutate(height = if_else(is.na(height), mean(height, na.rm=T), as.double(height) )) %>%
ungroup()
#Now no missing value any more and means (M) remains the same
starwars %>%
group_by(species) %>%
summarize(M = mean(height, na.rm=T),
NAs = sum(is.na(height)))
CodePudding user response:
I would use case_when
and replace_na
, which was designed for these NA-replacing operations.
output<-starwars %>%
mutate(height = case_when(species=='Human' ~ replace_na(height, mean(height, na.rm=TRUE))))
If we are interested in Humans only, we do not need to group_by
.
If we want this transformation for every group, we could use
output<-starwars %>%
group_by(species) %>%
mutate(height = replace_na(height, mean(height, na.rm=TRUE)))
We can also use the zoo package with na.aggregate
:
library(zoo)
output<-starwars %>%
mutate(height = case_when(species=='Human' ~ na.aggregate(height, na.rm=TRUE)))