Home > other >  Some Na values and not all
Some Na values and not all

Time:11-04

Assume that you have a dataset like starwars. Also assume that you have 2 columns one numeric with 20 NA values and the other with species (human,Droid,machine,etc).

How to convert using pipes, only the na values that belong to category humans to the mean of the heights?

If we convert it to the total it will be wrong as machines may be a lit smaller or higher and as a result we will have some strange values as for the height of the humans.

P.s. I know how to do it using replace or ifelse, but how to add the categorization

CodePudding user response:

In the starwars scenario, you can do the following

library(dplyr)

starwars %>% 
  group_by(species) %>% 
  mutate(height = if_else(species == "Human" & is.na(height), mean(height, na.rm = TRUE), as.double(height))) %>% 
  ungroup()

As you can see from here, height is filled with the average only with Human as species

library(dplyr)

starwars %>% 
  group_by(species) %>% 
  mutate(newheight = if_else(species == "Human" & is.na(height), mean(height, na.rm = TRUE), as.double(height))) %>% 
  ungroup() %>% 
  select(species, height, newheight) %>% 
  filter(is.na(height))

#> # A tibble: 6 x 3
#>   species height newheight
#>   <chr>    <int>     <dbl>
#> 1 Human       NA      177.
#> 2 Human       NA      177.
#> 3 Human       NA      177.
#> 4 Human       NA      177.
#> 5 Droid       NA       NA 
#> 6 NA          NA       NA 

In this specific example, you need to transform height into a double because it's an integer, and, since if_else is type-consistent and from the mean you receive a double, you need to transform height accordingly.

CodePudding user response:

If I understand you correctly, you just want to replace NAs by group means?

This should do:

data(starwars)

head(starwars)

#This shows one missing value (NAs) for "Droid"
starwars %>%
  group_by(species) %>%
  summarize(M = mean(height, na.rm=T),
            NAs = sum(is.na(height)))

#Replace NAs by group-wise means
starwars <- starwars %>%
  group_by(species) %>%
  mutate(height = if_else(is.na(height), mean(height, na.rm=T), as.double(height) )) %>%
  ungroup()

#Now no missing value any more and means (M) remains the same
starwars %>%
  group_by(species) %>%
  summarize(M = mean(height, na.rm=T),
            NAs = sum(is.na(height)))

CodePudding user response:

I would use case_when and replace_na, which was designed for these NA-replacing operations.

output<-starwars %>% 
    mutate(height = case_when(species=='Human' ~ replace_na(height, mean(height, na.rm=TRUE))))

If we are interested in Humans only, we do not need to group_by. If we want this transformation for every group, we could use

output<-starwars %>% 
        group_by(species) %>%
        mutate(height = replace_na(height, mean(height, na.rm=TRUE)))

We can also use the zoo package with na.aggregate:

library(zoo)

output<-starwars %>% 
    mutate(height = case_when(species=='Human' ~ na.aggregate(height, na.rm=TRUE)))
  • Related