Not sure if this is possible, but I am trying to tell R that if there are two different possible values in column B for a unique value in column A, and one of those column B values is "Unknown", replace "Unknown" with the other column B value associated with the same column A value.
For example, in this simple dataset, I would like 'Unknown' in [2,2] to be replaced with "Red", because there is another instance of group "A" which is color "Red". However, "Unknown" in [4,2] would not be replaced because there is no other color value for group "C".
group <- c("A","A","B","C","B")
color <- c("Red", "Unknown", "Red","Unknown","Pink")
df <- data.frame(group, color)
CodePudding user response:
You can use the following code:
library(tidyverse)
group <- c("A","A","B","C","B")
color <- c("Red", "Unknown", "Red","Unknown","Pink")
df <- data.frame(group, color)
> df
group color
1 A Red
2 A Unknown
3 B Red
4 C Unknown
5 B Pink
df %>% group_by(group) %>% mutate(color2 = ifelse(color == "Unknown",
ifelse(length(color[which(color != "Unknown")])==0,
"Unknown",
color[which(color != "Unknown")]),
color))
# A tibble: 5 x 3
# Groups: group [3]
group color color2
<chr> <chr> <chr>
1 A Red Red
2 A Unknown Red
3 B Red Red
4 C Unknown Unknown
5 B Pink Pink
CodePudding user response:
Here is a tidyverse solution:
left_join(df,
df %>% filter(color!="Unknown") %>%
distinct() %>%
group_by(group) %>% filter(n()==1) %>%
rename(new_color=color)) %>%
mutate(color=if_else(color=="Unknown", new_color, color)) %>%
select(!new_color)
group color
1 A Red
2 A Red
3 B Red
4 C <NA>
5 B Pink
Here is a data.table solution
library(data.table)
df = merge(setDT(df),unique(df[color!="Unknown"])[, if(.N==1) .(nc=color), by=group],all.x=T) %>%
.[, `:=`(color=fifelse(color=="Unknown",nc,color),nc=NULL)]
group color
1: A Red
2: A Red
3: B Red
4: B Pink
5: C <NA>
CodePudding user response:
Here is a dplyr
solution.
First group_by
the group
column, since you want to have some operations based on the group
. Then use an if_else
statement to find values that match "Unknown" AND also have more than one row in group
. Replace these values with a random choice of colour present within group
, and if the condition is not met, replace them with their original values.
Therefore, in a hypothetical case, when you have a "Unknown" value in B, it'll replace that "Unknown" with a random choice of "Red" or "Pink".
library(dplyr)
df %>%
group_by(group) %>%
mutate(color = ifelse(color == "Unknown" & n() > 1,
sample(
grep("Unknown",
str_split(paste0(color, collapse = ","), ",", simplify = T),
value = T,
invert = T), 1
),
color))
# A tibble: 5 x 2
# Groups: group [3]
group color
<chr> <chr>
1 A Red
2 A Red
3 B Red
4 C Unknown
5 B Pink
Your dataset
group color
1 A Red
2 A Unknown
3 B Red
4 C Unknown
5 B Pink