I would like to merge several cells in a dataframe, based on a value in another column. However, I would also like to keep the data and not lose any values when merging, please see below.
I've tried to use the aggregate function but I don't think it works as my desired column is in the format character, rather than numeric.
Current dataframe format:
id food
1 salt
1 chicken
1 pasta
2 rice
2 broccoli
3 cream
3 pasta
3 garlic
3 lemon
... ...
Desired dataframe format:
id food
1 salt chicken pasta
2 rice broccoli
3 cream pasta garlic lemon
... ...
I was wondering if it was even possible to merge cells that have character/string properties, and if so how would I go about doing this?
Thank you very much
Data
dat <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L), food = c("salt",
"chicken", "pasta", "rice", "broccoli", "cream", "pasta", "garlic",
"lemon")), class = "data.frame", row.names = c(NA, -9L))
CodePudding user response:
You can do this with dplyr
:
library(dplyr)
dat |>
group_by(id) |>
summarise(food = paste(food, collapse = " "))
# A tibble: 3 x 2
# id food
# <int> <chr>
# 1 1 salt chicken pasta
# 2 2 rice broccoli
# 3 3 cream pasta garlic lemon
Alternatively here is a data.table
approach:
library(data.table)
setDT(dat)
dat[, .(food = paste(food, collapse = " ")), by = id]
# id food
# 1: 1 salt chicken pasta
# 2: 2 rice broccoli
# 3: 3 cream pasta garlic lemon
And for the sake of completeness a base R approach:
aggregate(
list(food = dat$food),
by = list(id = dat$id),
paste
)
# id food
# 1 1 salt, chicken, pasta
# 2 2 rice, broccoli
# 3 3 cream, pasta, garlic, lemon
CodePudding user response:
The code from SamR is excellent,
no matter what, do not use mine,
Conc<-NULL
for (i in 1:max(unique(dat[,1]))){
Conc[i]<- paste(subset.data.frame(dat, dat[,1]==i)[,2], sep = "", collapse = " ")
}
df <- as.data.frame(matrix(c(1:max(unique(dat[,1])),Conc), nrow= max(unique(dat[,1])),ncol = 2))
colnames(df)<-c("ID", "Food")