I have two columns title
and text
. I want to group the title by the number of texts it has received. Also, I want to uniquely group the title having the same name.
For e.g.
I have
title | text
-------------
A | I like...
B | I wish...
C | review1
C | review2
C | review3
D | Detecting...
C | review1
C | review2
E | New...
What I would like to have is:
title | text
-------------
A | I like...
B | I wish...
C | review1 review2 review3
D | Detecting...
C | review1 review2
E | New...
What I have tried is:
df %>%
filter(title %in% sample(unique(title))) %>%
group_by(title) %>%
select(title, text)
But it still does not give what I expect to have.
CodePudding user response:
Here's a dplyr
approach. The key is to set group_by
correctly so that it defines group based on row position AND values in the title
column.
library(dplyr)
df %>%
group_by(gp = c(0, na.omit(cumsum(lead(title) != title)))) %>%
summarize(title = unique(title), text = paste0(text, collapse = " ")) %>%
select(-gp)
# A tibble: 6 × 2
title text
<chr> <chr>
1 A I like...
2 B I wish...
3 C review1 review2 review3
4 D Detecting...
5 C review1 review2
6 E New...
CodePudding user response:
Another base R way
tmp=rle(df$title)
df$grp=rep(1:length(tmp$lengths),tmp$lengths)
aggregate(text~title grp,data=df,FUN=paste0,collapse=" ")
title grp text
1 A 1 I like...
2 B 2 I wish...
3 C 3 review1 review2 review3
4 D 4 Detecting...
5 C 5 review1 review2
6 E 6 New...
CodePudding user response:
I guess you can try the following base R option using aggregate
aggregate(text ~. unique(df), toString)
CodePudding user response:
I'm not using dplyr, but base R could handle it:
do.line = function(a.title){
return(c(a.title, paste(text[df$title == a.title], collapse = ' ' )))
}
t(sapply(unique(df$title), do.line))
CodePudding user response:
Pivoting comes to mind:
library(tidyverse)
# Build test data
df <- data.frame(title=c("A","B","C","C","C","D","C","C","E"),
text=c("I like...","I wish...","review1","review2","review3","Detecting","review1","review2","New..."))
# Combine all values in a list by pivoting
new_df <- df %>% pivot_wider(names_from=title, values_from=text, values_fn=list)
# Bring to desired format by pivoting back
new_df <- new_df %>% pivot_longer(cols=c(names(new_df)), names_to="title", values_to="text")
# Inspecting result
new_df
str(new_df)
# Example query
new_df %>% filter(title=="C") %>% unlist()
CodePudding user response:
You can write a function that concatenates the unique values of a vector and use that after the group_by:
library(dplyr)
df <- data.frame(title = c('A','B','C','C','C','D','C','C','E'),
text = c('I like...', 'I wish...', 'review1','review2','review3',
'Detecting...','review1','review2', 'New...'))
unique_paste <- function(text_vec) {
paste(unique(text_vec), collapse = " ")
}
df2 <- df %>%
mutate(id = cumsum(title != lag(title, default = 'A'))) %>%
group_by(id, title) %>%
do(text = unique_paste(.$text)) %>%
ungroup()