I have the following data frame:
fruit <- c("apple", "orange", "peach", "")
color <- c("red", "orange", "", "purple")
taste <- c("sweet", "", "sweet", "neutral")
df <- data.frame(fruit, color, taste)
I want to add all the columns together into one column named "combined":
combined <- c("apple red sweet", "orange orange", "peach sweet", "purple neutral")
And as a result, I have the following data frame:
df2 <- data.frame(fruit, color, taste, combined)
I took a stab at using regex:
df %>%
unite("combined",
fruit,
color,
taste,
sep=" ",
remove = FALSE)
I've been trying to remove " " when it is in the beginning or in the end or if there's a blank preceding it using the following regex, but it feels sloppy and doesn't seem to achieve exactly what I want:
df %>%
as_tibble() %>%
mutate(across(any_of(combined), gsub, pattern = "^\\ |\\ \\ \\ \\ |\\ \\ \\ |\\ \\ |\\ $", replacement = "")) %>%
mutate_if(is.character, trimws)
Any guidance would be appreciated! Thanks!
CodePudding user response:
We may replace the blanks (""
) with NA
and then use na.rm = TRUE
in unite
library(dplyr)
library(tidyr)
df %>%
mutate(across(everything(), ~ na_if(.x, ""))) %>%
unite(combined, everything(), sep = " ", na.rm = TRUE,
remove = FALSE)
-output
combined fruit color taste
1 apple red sweet apple red sweet
2 orange orange orange orange <NA>
3 peach sweet peach <NA> sweet
4 purple neutral <NA> purple neutral
CodePudding user response:
Create a function which takes two strings and produces their sum and apply it using Reduce
.
library(dplyr)
Paste <- function(x, y) paste0(x, ifelse(x == "" | y == "", "", " "), y)
df %>% mutate(combined = Reduce(Paste, .))
giving
fruit color taste combined
1 apple red sweet apple red sweet
2 orange orange orange orange
3 peach sweet peach sweet
4 purple neutral purple neutral