How do I replace values in column B with other values in column B associated with the same value in-CodePudding

Not sure if this is possible, but I am trying to tell R that if there are two different possible values in column B for a unique value in column A, and one of those column B values is "Unknown", replace "Unknown" with the other column B value associated with the same column A value.

For example, in this simple dataset, I would like 'Unknown' in [2,2] to be replaced with "Red", because there is another instance of group "A" which is color "Red". However, "Unknown" in [4,2] would not be replaced because there is no other color value for group "C".

group <- c("A","A","B","C","B")
color <- c("Red", "Unknown", "Red","Unknown","Pink")
df <- data.frame(group, color)

CodePudding user response：

You can use the following code:

library(tidyverse)

group <- c("A","A","B","C","B")
color <- c("Red", "Unknown", "Red","Unknown","Pink")
df <- data.frame(group, color)

> df
  group   color
1     A     Red
2     A Unknown
3     B     Red
4     C Unknown
5     B    Pink

df %>% group_by(group) %>% mutate(color2 = ifelse(color == "Unknown",
                                                   ifelse(length(color[which(color != "Unknown")])==0,
                                                           "Unknown",
                                                           color[which(color != "Unknown")]),
                                                   color))


# A tibble: 5 x 3
# Groups:   group [3]
  group color   color2 
  <chr> <chr>   <chr>  
1 A     Red     Red    
2 A     Unknown Red    
3 B     Red     Red    
4 C     Unknown Unknown
5 B     Pink    Pink

CodePudding user response：

Here is a tidyverse solution:

left_join(df,
          df %>% filter(color!="Unknown") %>% 
            distinct() %>% 
            group_by(group) %>% filter(n()==1) %>% 
            rename(new_color=color)) %>%
  mutate(color=if_else(color=="Unknown", new_color, color)) %>% 
  select(!new_color)

  group color
1     A   Red
2     A   Red
3     B   Red
4     C  <NA>
5     B  Pink

Here is a data.table solution

library(data.table)
df = merge(setDT(df),unique(df[color!="Unknown"])[, if(.N==1) .(nc=color), by=group],all.x=T) %>% 
  .[, `:=`(color=fifelse(color=="Unknown",nc,color),nc=NULL)]

   group color
1:     A   Red
2:     A   Red
3:     B   Red
4:     B  Pink
5:     C  <NA>

CodePudding user response：

Here is a dplyr solution.

First group_by the group column, since you want to have some operations based on the group. Then use an if_else statement to find values that match "Unknown" AND also have more than one row in group. Replace these values with a random choice of colour present within group, and if the condition is not met, replace them with their original values.

Therefore, in a hypothetical case, when you have a "Unknown" value in B, it'll replace that "Unknown" with a random choice of "Red" or "Pink".

library(dplyr)

df %>%
  group_by(group) %>%
  mutate(color = ifelse(color == "Unknown" & n() > 1,
                         sample(
                           grep("Unknown", 
                                str_split(paste0(color, collapse = ","), ",", simplify = T), 
                                value = T, 
                                invert = T), 1
                         ), 
                         color))

# A tibble: 5 x 2
# Groups:   group [3]
  group color  
  <chr> <chr>  
1 A     Red    
2 A     Red    
3 B     Red    
4 C     Unknown
5 B     Pink

Your dataset

  group   color
1     A     Red
2     A Unknown
3     B     Red
4     C Unknown
5     B    Pink