I have a large dataframe like below where there is an unknown number of color columns. The number of color columns is unknown because the columns are generated by transposition of a single column called color.
Painting | Color1 | Color2 | Color3 | Color... | ... |
---|---|---|---|---|---|
1 | Blue | Yellow | ... | ||
2 | Green | Blue | Orange | Yellow | ... |
3 | Yellow | Orange | Orange | ... | |
. | . | . | . | . | ... |
. | . | . | . | . | ... |
I would like to concatenate the colors into a single column like so:
Painting | Color1 | Color2 | Color3 | Color... | ... | All_colors |
---|---|---|---|---|---|---|
1 | Blue | Yellow | ... | Colors(Blue, Yellow) | ||
2 | Green | Blue | Orange | Yellow | ... | Colors(Green, Blue, Orange, Yellow . |
3 | Yellow | Orange | Green | ... | Colors(Yellow, Orange, Green) | |
. | . | . | . | . | ... | ...... |
. | . | . | . | . | ... |
I would like my code to be dynamic, so it automatically counts the number of color columns (can be any number greater than 1) and concatenates across all color columns for each row to create the value of All_colors.
I can do this relatively easily by looping over each column in each row and building the value of All_colors iteratively. However, the resulting code is unwieldly and feels unnecessarily complicated. Is there simple code that I can use to combine the values of all the columns whose name contains "Color" and which will iterate row by row through the dataset (something in dplyr perhaps?).
The example above is generalised. Please let me know if you need to see my approach (which like I've said was successful but I don't like it because it is too complex) and I will generalise it.
CodePudding user response:
With dplyr
we could use rowwise()
and c_across()
with toString
:
library(dplyr)
df |>
rowwise() |>
mutate(all_colours = toString(na.omit(c_across(starts_with("color"))))) |>
ungroup()
Output:
# A tibble: 3 × 5
painting color1 color2 color3 all_colours
<dbl> <chr> <chr> <chr> <chr>
1 1 red blue green red, blue, green
2 2 blue NA NA blue
3 2 green red NA green, red
Data
library(readr)
df <- read_table("painting color1 color2 color3
1 red blue green
2 blue
2 green red")
CodePudding user response:
Just to chip in another tidyverse
solution, using the pivot_*
family:
library(dplyr)
library(tidyr)
df <- structure(
list(painting = 1:3,
color1 = c("red", "blue", "green"),
color2 = c("blue", NA, "red"),
color3 = c("green", NA, NA)),
row.names = c(NA, -3L),
class = "data.frame")
df %>%
pivot_longer(-painting) %>%
group_by(painting) %>%
mutate(all_colors = paste(na.omit(value), collapse = ", ")) %>%
pivot_wider()
# # A tibble: 3 × 5
# # Groups: painting [3]
# painting all_colors color1 color2 color3
# <int> <chr> <chr> <chr> <chr>
# 1 1 red, blue, green red blue green
# 2 2 blue blue <NA> <NA>
# 3 3 green, red green red <NA>