Home > other >  Changing values in multiple column given one condition (preferably in dplyr)
Changing values in multiple column given one condition (preferably in dplyr)

Time:12-09

I'm looking for an easy way to change several values for the same person. Preferably with dplyr or any other package from tidyverse.

Here my example:

df <- data.frame(personid = 1:3, class = c("class1", "class3", "class3"), classlevel = c(1, 11, 3), education = c("BA", "Msc", "BA"))
df

My dataset contains an entry with several mistakes. Person #2 should be part of class 1, at classlevel 1 und his education is BA, not MSc. I use mutate with case_when a lot, but in my case I don't want to change one variable with multiple condition, I have one condition and want to change multiple values in other variables based on this condition.

Basically, I'm looking for an shorter code which replaces this:

df$class[df$personid == 2] <- "class1"
df$classlevel[df$personid == 2] <- 1
df$education[df$personid == 2] <- "BA"
df

or this:

library(tidyverse)
df <- df |> 
   mutate(class = case_when(personid == 2 ~ "class1", TRUE ~ class)) |> 
   mutate(classlevel = case_when(personid == 2 ~ 1, TRUE ~ as.numeric(classlevel))) |> 
   mutate(education = case_when(personid == 2 ~ "BA", TRUE ~ education))
df

In my original data, there are several dozend cases like this, and I find it a bit tedious to use three lines of code for each person. Is there a shorter way?

Thanks for your input!

CodePudding user response:

One way would be to create a data frame of the values to be updated and use rows_update(). Note that this assumes the rows are uniquely identified.

library(dplyr)

df_update <- tribble(
  ~personid, ~class, ~classlevel, ~education,
  1,   "class1", 1, "BA"
  )  

df %>%
  rows_update(df_update, by = "personid")

  personid  class classlevel education
1        1 class1          1        BA
2        2 class1          1        BA
3        3 class3          3        BA

CodePudding user response:

I think I need a little bit more information to try to answer your question., but I'll try anyway.

If you want to change the value of some columns based on a unique condition across all the rows I recommend doing this (I created new columns col_name1 so you can see the original and ouput):

df <- df %>% mutate(class1 = case_when(class != "class1" ~ "class1", TRUE ~ class), 
                    classlevel1 = case_when( classlevel != 1 ~ 1, TRUE ~ as.numeric(classlevel)),
                    education1 = case_when( education != "BA" ~ "BA", TRUE ~ education))

If that was your problem, then you are probably not familiar with the concept of vectorization. Briefly, a vectorized function runs for all the rows or elements in your vector, without you needing to specify that. There are a lof of examples and tutorial on the web if you search "vectorization in R" or something similar.

Otherwise, if your condition changes for each single id (or row) in your data, then the problem is more complicated.

Let me know if that helps and, if it doesn't, consider providing more information in your question.

  • Related