I have the following dataset
Original dataset:
ID Col1 Col1 Col1 Col2 Col2 Col2
A Dog House
B Dog Car Bike
C Cat House
D Mouse Bike
Is there any way to create a new dataframe that combines all values with the same column name like below
Expected dataset:
ID Col1 Col2
A Dog House
B Dog Car, Bike
C Cat House
D Mouse Bike
CodePudding user response:
You could do something like this:
df <- structure(list(
ID = c("A", "B", "C", "D"),
Col1 = c("Dog", "Dog", NA, NA),
Col1 = c(NA, NA, "Cat", NA),
Col1 = c(NA, NA, NA, "Mouse"),
Col2 = c("House", NA, "House", NA),
Col2 = c(NA, "Car", NA, NA),
Col2 = c(NA, "Bike", NA, "Bike")
),
class = c("data.frame"), row.names = c(NA, -4L)
)
library(tidyr)
df %>%
as_tibble(.name_repair = "unique") %>%
unite(col = "Col1", starts_with("Col1"), sep = ", ", na.rm = TRUE) %>%
unite(col = "Col2", starts_with("Col2"), sep = ", ", na.rm = TRUE)
#> New names:
#> • `Col1` -> `Col1...2`
#> • `Col1` -> `Col1...3`
#> • `Col1` -> `Col1...4`
#> • `Col2` -> `Col2...5`
#> • `Col2` -> `Col2...6`
#> • `Col2` -> `Col2...7`
#> # A tibble: 4 × 3
#> ID Col1 Col2
#> <chr> <chr> <chr>
#> 1 A Dog House
#> 2 B Dog Car, Bike
#> 3 C Cat House
#> 4 D Mouse Bike
Created on 2022-06-01 by the reprex package (v2.0.1)
CodePudding user response:
Base R solution:
# Input data: df => data.frame
df <- structure(list(
ID = c("A", "B", "C", "D"),
Col1 = c("Dog", "Dog", NA, NA),
Col1 = c(NA, NA, "Cat", NA),
Col1 = c(NA, NA, NA, "Mouse"),
Col2 = c("House", NA, "House", NA),
Col2 = c(NA, "Car", NA, NA),
Col2 = c(NA, "Bike", NA, "Bike")
),
class = c("data.frame"), row.names = c(NA, -4L)
)
# Split-Apply-Combine: res => data.frame
res <- data.frame(
do.call(
cbind,
lapply(
split.default(
df,
names(df)
),
function(x){
apply(
x,
1,
FUN = function(y){
toString(
na.omit(y)
)
}
)
}
)
)[,unique(names(df))],
stringsAsFactors = FALSE,
row.names = row.names(df)
)
# output Result: data.frame => stdout(console)
res