Home > Net >  Is there a dynamic way of concatenating values over an unknown number (i.e., a dynamic number) of co
Is there a dynamic way of concatenating values over an unknown number (i.e., a dynamic number) of co

Time:12-06

I have a large dataframe like below where there is an unknown number of color columns. The number of color columns is unknown because the columns are generated by transposition of a single column called color.

Painting Color1 Color2 Color3 Color... ...
1 Blue Yellow ...
2 Green Blue Orange Yellow ...
3 Yellow Orange Orange ...
. . . . . ...
. . . . . ...

I would like to concatenate the colors into a single column like so:

Painting Color1 Color2 Color3 Color... ... All_colors
1 Blue Yellow ... Colors(Blue, Yellow)
2 Green Blue Orange Yellow ... Colors(Green, Blue, Orange, Yellow .
3 Yellow Orange Green ... Colors(Yellow, Orange, Green)
. . . . . ... ......
. . . . . ...

I would like my code to be dynamic, so it automatically counts the number of color columns (can be any number greater than 1) and concatenates across all color columns for each row to create the value of All_colors.

I can do this relatively easily by looping over each column in each row and building the value of All_colors iteratively. However, the resulting code is unwieldly and feels unnecessarily complicated. Is there simple code that I can use to combine the values of all the columns whose name contains "Color" and which will iterate row by row through the dataset (something in dplyr perhaps?).

The example above is generalised. Please let me know if you need to see my approach (which like I've said was successful but I don't like it because it is too complex) and I will generalise it.

CodePudding user response:

With dplyr we could use rowwise() and c_across() with toString:

library(dplyr) 

df |> 
  rowwise() |> 
  mutate(all_colours = toString(na.omit(c_across(starts_with("color"))))) |>
  ungroup()

Output:

# A tibble: 3 × 5
  painting color1 color2 color3 all_colours     
     <dbl> <chr>  <chr>  <chr>  <chr>           
1        1 red    blue   green  red, blue, green
2        2 blue   NA     NA     blue            
3        2 green  red    NA     green, red      

Data

library(readr)

df <- read_table("painting  color1 color2 color3
1 red blue green
2   blue
2   green red")

CodePudding user response:

Just to chip in another tidyverse solution, using the pivot_* family:

library(dplyr)
library(tidyr)
df <- structure(
         list(painting = 1:3, 
              color1 = c("red", "blue", "green"), 
              color2 = c("blue", NA, "red"), 
              color3 = c("green", NA, NA)), 
          row.names = c(NA, -3L), 
          class = "data.frame")

df %>% 
   pivot_longer(-painting) %>% 
   group_by(painting) %>% 
   mutate(all_colors = paste(na.omit(value), collapse = ", ")) %>% 
   pivot_wider()

# # A tibble: 3 × 5
# # Groups:   painting [3]
#   painting all_colors       color1 color2 color3
#      <int> <chr>            <chr>  <chr>  <chr> 
# 1        1 red, blue, green red    blue   green 
# 2        2 blue             blue   <NA>   <NA>  
# 3        3 green, red       green  red    <NA>  
  • Related