I have a dataframe such as
group <- c("A", "A", "B", "C", "C")
tx <- c("A-201", "A-202", "B-201", "C-205", "C-206")
feature <- c("coding", "decay", "pending", "coding", "coding")
df <- data.frame(group, tx, feature)
I want to generate a new df with the entries in tx "listed" for each feature. I want the output to look like
group <- c("A", "B", "C")
coding <- c("A-201", NA, "C-205|C-206")
decay <- c("A-202", NA, NA)
pending <- c(NA, "B-201", NA)
df.out <- data.frame(group, coding, decay, pending)
So far I did not find a means to achieve this via a dplyr
function. Do I have to loop through my initial df?
CodePudding user response:
You may get the data in wide format using tidyr::pivot_wider
and use a function in values_fn
-
df.out <- tidyr::pivot_wider(df, names_from = feature, values_from = tx,
values_fn = function(x) paste0(x, collapse = '|'))
df.out
# group coding decay pending
# <chr> <chr> <chr> <chr>
#1 A A-201 A-202 NA
#2 B NA NA B-201
#3 C C-205|C-206 NA NA
CodePudding user response:
Here is an alternative way:
library(dplyr)
library(tidyr)
df %>%
group_by(group, feature) %>%
mutate(tx = paste(tx, collapse = "|")) %>%
distinct() %>%
pivot_wider(
names_from = feature,
values_from = tx
)
group coding decay pending
<chr> <chr> <chr> <chr>
1 A A-201 A-202 NA
2 B NA NA B-201
3 C C-205|C-206 NA NA
CodePudding user response:
Using dcast
from data.table
library(data.table)
dcast(setDT(df), group ~ feature, value.var = 'tx',
function(x) paste(x, collapse = "|"), fill = NA)
group coding decay pending
1: A A-201 A-202 <NA>
2: B <NA> <NA> B-201
3: C C-205|C-206 <NA> <NA>