I have a dataset like this. I want to recognize all of the observations that have multiple values in the "color" columns and replace them with "multicolor"
ID color1 color2
23 red NA
44 blue purple
51 yellow NA
59 green orange
like this:
ID color
23 red
44 multicolor
51 yellow
59 multicolor
Any ideas would be much appreciated, thank you!
CodePudding user response:
Here is a way to do it in the tidyverse.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = starts_with("color"), values_to = "color", values_drop_na = TRUE) %>%
group_by(ID) %>%
summarize(n = n(),
color = toString(color), .groups = "drop") %>%
mutate(color = if_else(n > 1, "multicolor", color)) %>%
select(-n)
# # A tibble: 4 x 2
# ID color
# <int> <chr>
# 1 23 red
# 2 44 multicolor
# 3 51 yellow
# 4 59 multicolor
I did it this way on purpose. Note that if you stopped after the summarize()
line, you would get the actual colors.
# # A tibble: 4 x 3
# ID n color
# <int> <int> <chr>
# 1 23 1 red
# 2 44 2 blue, purple
# 3 51 1 yellow
# 4 59 2 green, orange
This would scale if you had many color columns, not just 2. Play around with it, there are a lot of ways to tweaks things like this.
Data
df <- read.table(textConnection("ID color1 color2
23 red NA
44 blue purple
51 yellow NA
59 green orange"), header = TRUE)
CodePudding user response:
You can do this, assuming data
is your dataset.
library(dplyr)
data <- data.frame(ID = c(23, 44, 51, 59),
color1 = c("red", "blue", "yellow", "green"),
color2 = c(NA, "purple", NA, "orange"))
data %>%
mutate(color = ifelse(is.na(color2), color1, "multicolor")) %>%
select(ID, color)
CodePudding user response:
Here's what seems like a straightforward solution:
library(dplyr)
library(stringr)
data %>%
mutate(
# step 1 - paste `color1` and `color2` together and remove " NA":
color = gsub("\\sNA", "", paste(color1, color2)),
# step 2 - count the number of white space characters:
color = str_count(color, " "),
# step 3 - label `color` as "multicolor" where `color` != 0:
color = ifelse(color == 0, color1, "multicolor")) %>%
# remove the obsolete color columns:
select(-matches("\\d$"))
ID color
1 23 red
2 44 multicolor
3 51 yellow
4 59 multicolor
Data:
data <- data.frame(ID = c(23, 44, 51, 59),
color1 = c("red", "blue", "yellow", "green"),
color2 = c(NA, "purple", NA, "orange"))
CodePudding user response:
A base R approach
# get colors from columns named color*
colo <- paste(names(table(unlist(df1[,grep("color",colnames(df1))]))), collapse="|")
colo
[1] "blue|green|red|yellow|orange|purple"
# match the colors and do the conversion
data.frame(
ID=df1$ID,
color=apply( df1, 1, function(x){
y=x[grep(colo, x)];
if(length(y)>1){y="multicolor"}; y } ) )
ID color
1 23 red
2 44 multicolor
3 51 yellow
4 59 multicolor
Data
df1 <- structure(list(ID = c(23L, 44L, 51L, 59L), color1 = c("red",
"blue", "yellow", "green"), color2 = c(NA, "purple", NA, "orange"
)), class = "data.frame", row.names = c(NA, -4L))