Home > Net >  How to mutate shared value from similar and dissimilar variables using dplyr?
How to mutate shared value from similar and dissimilar variables using dplyr?

Time:09-02

Was having an oddly difficult time trying to find the answer to this probably much answered question, but here is my query. Suppose I have a data frame like the one below:

df <- data.frame(age = c(30,40,-999,40,20),
           money.usd = c(-999,55,55,54,30),
           cars = c(1,1,2,0,-999))

Filtering and mutating the values are straight forward for single variables. For example, with an ifelse statement, I can turn the -999 in age to NA in the following way:

df %>% 
  mutate(age = ifelse(age == -999,"NA",age))

However, since all of these variables have this value and have different names, I was curious how I can achieve this sort of mutation across several variables. Additionally, if there is the case of many similar variables and many dissimilar variables, I imagine the case is more complicated but certainly ways to make it easier. For example, if I have the following data with three variables for "car":

df.2 <- data.frame(age = c(30,40,-999,40,20),
           money.usd = c(-999,55,55,54,30),
           cars.1 = c(1,1,2,0,-999),
           cars.2 = c(0,1,-999,0,0),
           cars.3 = c(-999,5,4,5,4))

How would one mutate the value for both age and money.usd while also selecting several variables of car in order to mutate the -999 value? To summarize, my main objective is switching this -999 value from across the data frame to a NA value.

CodePudding user response:

You can use tidyr::na_if to replace a value by NA, and across to apply it to multiple columns.

library(tidyr)
library(dplyr)

df.2 %>% 
  mutate(across(everything(), ~ na_if(.x, -999)))

If not NA, use replace:

df.2 %>% 
  mutate(across(everything(), ~ replace(.x, .x == -999, NA)))
  age money.usd cars.1 cars.2 cars.3
1  30        NA      1      0     NA
2  40        55      1      1      5
3  NA        55      2     NA      4
4  40        54      0      0      5
5  20        30     NA      0      4

Or in base R:

df.2[df.2 == -999] <- NA
  • Related