How to combine values with the same column name into a new dataframe in R?-CodePudding

I have the following dataset

Original dataset:

ID    Col1    Col1    Col1    Col2    Col2    Col2
A     Dog                     House
B     Dog                             Car     Bike
C             Cat             House
D                     Mouse                   Bike

Is there any way to create a new dataframe that combines all values with the same column name like below

Expected dataset:

ID    Col1    Col2    
A     Dog     House
B     Dog     Car, Bike
C     Cat     House
D     Mouse   Bike

CodePudding user response：

You could do something like this:

df <- structure(list(
  ID = c("A", "B", "C", "D"),
  Col1 = c("Dog", "Dog", NA, NA),
  Col1 = c(NA, NA, "Cat", NA),
  Col1 = c(NA, NA, NA, "Mouse"),
  Col2 = c("House", NA, "House", NA),
  Col2 = c(NA, "Car", NA, NA),
  Col2 = c(NA, "Bike", NA, "Bike")
),
class = c("data.frame"), row.names = c(NA, -4L)
)

library(tidyr)

df %>% 
  as_tibble(.name_repair = "unique") %>% 
  unite(col = "Col1", starts_with("Col1"), sep = ", ", na.rm = TRUE) %>% 
  unite(col = "Col2", starts_with("Col2"), sep = ", ", na.rm = TRUE)

#> New names:
#> • `Col1` -> `Col1...2`
#> • `Col1` -> `Col1...3`
#> • `Col1` -> `Col1...4`
#> • `Col2` -> `Col2...5`
#> • `Col2` -> `Col2...6`
#> • `Col2` -> `Col2...7`

#> # A tibble: 4 × 3
#>   ID    Col1  Col2     
#>   <chr> <chr> <chr>    
#> 1 A     Dog   House    
#> 2 B     Dog   Car, Bike
#> 3 C     Cat   House    
#> 4 D     Mouse Bike

^{Created on 2022-06-01 by the reprex package (v2.0.1)}

CodePudding user response：

Base R solution:

# Input data: df => data.frame
df <- structure(list(
  ID = c("A", "B", "C", "D"),
  Col1 = c("Dog", "Dog", NA, NA),
  Col1 = c(NA, NA, "Cat", NA),
  Col1 = c(NA, NA, NA, "Mouse"),
  Col2 = c("House", NA, "House", NA),
  Col2 = c(NA, "Car", NA, NA),
  Col2 = c(NA, "Bike", NA, "Bike")
),
  class = c("data.frame"), row.names = c(NA, -4L)
)


# Split-Apply-Combine: res => data.frame
res <- data.frame(
  do.call(
    cbind, 
      lapply(
        split.default(
          df,
          names(df)
        ),
        function(x){
          apply(
            x, 
            1, 
            FUN = function(y){
              toString(
                na.omit(y)
              )
            }
          )
        }
      )
  )[,unique(names(df))],
  stringsAsFactors = FALSE,
  row.names = row.names(df)
)

# output Result: data.frame => stdout(console)
res