I have a data frame of three columns:title
, text
and grp
.
The data frame looks like this:
title | text | grp |
---|---|---|
Week 2 | In Week 2 the encoding... | 2 |
comments | collection of comments about Week 2 | 2 |
Statistics | Statistics is the discipline... | 3 |
comments | collection of comments about Statistics | 3 |
I want to add the comments
as columns to this dataframe and keep the value as its text corresponding to each title.
desired dataframe:
title | text | comments | grp |
---|---|---|---|
Week 2 | In Week 2 the encoding... | colection of comments about Week 2 | 2 |
Statistics | Statistics is the discipline... | collection of comments about Statistics | 3 |
I tried re-casting:
library(reshape2)
recast(df, text ~ comment, id.var = c("text"))
But gives me an error:
Error in unique.default(x) : unique() applies only to vectors
CodePudding user response:
Try this
library(dplyr , warn.conflicts = FALSE)
cdf <- df %>% filter(title == "comments") %>% select( text, grp)
colnames(cdf)[1] <- "comments"
nocdf <- df %>% select(title , text , grp) %>% filter(title != "comments")
new_df <- right_join(nocdf , cdf , by = "grp")
new_df %>% relocate(grp , .after = last_col())
#> title text
#> 1 Week.2 In.Week.2.the.encoding...
#> 2 Statistics Statistics.is.the.discipline...
#> comments grp
#> 1 collection.of.comments.about.Week.2 2
#> 2 collection.of.comments.about.Statistics 3
Created on 2022-06-05 by the reprex package (v2.0.1)
CodePudding user response:
You could try to split
every two rows and cbind
.
split(dat, 1:2) |> {\(.) cbind(.$`1`, comments=.$`2`[, 2])}()
# title text grp comments
# 1 Week 2 In Week 2 the encoding... 2 collection of comments about Week 2
# 3 Statistics Statistics is the discipline... 3 collection of comments about Statistics
Data:
dat <- structure(list(title = c("Week 2", "comments", "Statistics",
"comments"), text = c("In Week 2 the encoding...", "collection of comments about Week 2",
"Statistics is the discipline...", "collection of comments about Statistics"
), grp = c(2L, 2L, 3L, 3L)), class = "data.frame", row.names = c(NA,
-4L))