I have a following dataframe:
df1 <- structure(list(group = c("KO", "WT", "KO", "KO", "KO", "KO",
"WT", "KO", "KO", "WT", "WT", "WT", "WT", "WT", "WT", "WT", "WT",
"WT", "WT", "KO", "KO"), name = c("rike", "rabe", "smake", "rike",
"rike", "rike", "rabe", "rike", "rike", "due", "rabe", "ene",
"ene", "due", "ene", "rabe", "due", "rabe", "due", "smake", "kum"
), type = c("C", "A", "A", "A", "C", "B", "A", "B", "B", "A",
"B", "A", "C", "C", "C", "C", "B", "C", "A", "C", "A"), posit = c(10,
2, 21, 5, 12, 22, 18, 19, 81, 22, 33, 31, 80, 40, 16, 16, 7,
9, 26, 27, 7)), row.names = c(NA, -21L), class = "data.frame")
I would like to combine 2 columns, one character ("type") and one numeric ("posit") in that manner, that all of categories (letters) would be joined with corresponding posits (numbers), for instance "A" and "37" as "A37" and all of the type-posit pairs for given "name" would be pasted in new column in ascending order (from lesser to bigger values). Also I'd like them to be separated with ":". The desired output is given below:
df2 <-structure(list(group = c("WT", "WT", "WT", "KO", "KO", "KO"),
name = c("ene", "due", "rabe", "kum", "rike", "smake"), type_posit = c("C16:A31:C80",
"B7:A22:A26:C40", "A2:C9:C16:A18:B33", "A7", "A5:C10:C12:B19:B22:B81",
"A21:C27")), class = "data.frame", row.names = c(NA, -6L))
I can achieve this by using set of dplyr functions, and creating intermediate dataframes, like this:
df2 <- df1 %>%
dplyr::mutate(t_p = paste0(type,posit)) %>%
dplyr::arrange(name,posit) %>%
dplyr::select(-type, -posit) %>%
dplyr::group_by(group, name) %>%
dplyr::summarise(tag_pos =paste0(t_p, collapse = ":"))
However I wonder, whether there is more efficient and/or cleaner way to do so? I would like to write a clean, understandable code.
CodePudding user response:
Using aggregate
basically.
aggregate(paste0(type, posit) ~ group name, df1, paste, collapse=':')
# group name paste0(type, posit)
# 1 WT due A22:C40:B7:A26
# 2 WT ene A31:C80:C16
# 3 KO kum A7
# 4 WT rabe A2:A18:B33:C16:C9
# 5 KO rike C10:A5:C12:B22:B19:B81
# 6 KO smake A21:C27
Sorted version:
aggregate(paste0(type, posit) ~ group name, df1, \(x) paste(sort(x), collapse=':'))
# group name paste0(type, posit)
# 1 WT due A22:A26:B7:C40
# 2 WT ene A31:C16:C80
# 3 KO kum A7
# 4 WT rabe A18:A2:B33:C16:C9
# 5 KO rike A5:B19:B22:B81:C10:C12
# 6 KO smake A21:C27
CodePudding user response:
You need to sort the rows by posit
and then summarise each group with paste0(..., collapse = ':')
.
library(dplyr)
df1 %>%
group_by(group, name) %>%
arrange(posit, .by_group = TRUE) %>%
summarise(type_posit = paste0(type, posit, collapse = ':'), .groups = 'drop')
# # A tibble: 6 × 3
# group name type_posit
# <chr> <chr> <chr>
# 1 KO kum A7
# 2 KO rike A5:C10:C12:B19:B22:B81
# 3 KO smake A21:C27
# 4 WT due B7:A22:A26:C40
# 5 WT ene C16:A31:C80
# 6 WT rabe A2:C9:C16:A18:B33
CodePudding user response:
Using data.table
library(data.table)
setDT(df1)[order(posit), .(type_posit = paste(type, sep = "",
posit, collapse = ":")),.(group, name)]
-output
group name type_posit
<char> <char> <char>
1: WT rabe A2:C9:C16:A18:B33
2: KO rike A5:C10:C12:B19:B22:B81
3: WT due B7:A22:A26:C40
4: KO kum A7
5: WT ene C16:A31:C80
6: KO smake A21:C27