Is there a dynamic way of concatenating values over an unknown number (i.e., a dynamic number) of co-CodePudding

I have a large dataframe like below where there is an unknown number of color columns. The number of color columns is unknown because the columns are generated by transposition of a single column called color.

Painting	Color1	Color2	Color3	Color...	...
1	Blue	Yellow			...
2	Green	Blue	Orange	Yellow	...
3	Yellow	Orange	Orange		...
.	.	.	.	.	...
.	.	.	.	.	...

I would like to concatenate the colors into a single column like so:

Painting	Color1	Color2	Color3	Color...	...	All_colors
1	Blue	Yellow			...	Colors(Blue, Yellow)
2	Green	Blue	Orange	Yellow	...	Colors(Green, Blue, Orange, Yellow .
3	Yellow	Orange	Green		...	Colors(Yellow, Orange, Green)
.	.	.	.	.	...	......
.	.	.	.	.	...

I would like my code to be dynamic, so it automatically counts the number of color columns (can be any number greater than 1) and concatenates across all color columns for each row to create the value of All_colors.

I can do this relatively easily by looping over each column in each row and building the value of All_colors iteratively. However, the resulting code is unwieldly and feels unnecessarily complicated. Is there simple code that I can use to combine the values of all the columns whose name contains "Color" and which will iterate row by row through the dataset (something in dplyr perhaps?).

The example above is generalised. Please let me know if you need to see my approach (which like I've said was successful but I don't like it because it is too complex) and I will generalise it.

CodePudding user response：

With dplyr we could use rowwise() and c_across() with toString:

library(dplyr) 

df |> 
  rowwise() |> 
  mutate(all_colours = toString(na.omit(c_across(starts_with("color"))))) |>
  ungroup()

Output:

# A tibble: 3 × 5
  painting color1 color2 color3 all_colours     
     <dbl> <chr>  <chr>  <chr>  <chr>           
1        1 red    blue   green  red, blue, green
2        2 blue   NA     NA     blue            
3        2 green  red    NA     green, red

Data

library(readr)

df <- read_table("painting  color1 color2 color3
1 red blue green
2   blue
2   green red")

CodePudding user response：

Just to chip in another tidyverse solution, using the pivot_* family:

library(dplyr)
library(tidyr)
df <- structure(
         list(painting = 1:3, 
              color1 = c("red", "blue", "green"), 
              color2 = c("blue", NA, "red"), 
              color3 = c("green", NA, NA)), 
          row.names = c(NA, -3L), 
          class = "data.frame")

df %>% 
   pivot_longer(-painting) %>% 
   group_by(painting) %>% 
   mutate(all_colors = paste(na.omit(value), collapse = ", ")) %>% 
   pivot_wider()

# # A tibble: 3 × 5
# # Groups:   painting [3]
#   painting all_colors       color1 color2 color3
#      <int> <chr>            <chr>  <chr>  <chr> 
# 1        1 red, blue, green red    blue   green 
# 2        2 blue             blue   <NA>   <NA>  
# 3        3 green, red       green  red    <NA>