I have 4 columns title
, text
, year
, code
. I want to group the title by the number of texts it has received. Also, I want to uniquely group the title having the same name along with the year (int) and code (char).
For e.g.
I have
title | text | year | code
-------------------------------------
A | I like... | 2012 | i12
B | I wish... | 2012 | i12
C | review1 | 2013 | i13
C | review2 | 2013 | i13
C | review3 | 2013 | i13
D | Detecting... | 2014 | i14
C | review1 | 2015 | i15
C | review2 | 2015 | i15
E | New... | 2016 | i16
What I want is:
title | text | year | code
-----------------------------------------------
A | I like... | 2012 | i12
B | I wish... | 2012 | i12
C | review1 review2 review3 | 2013 | i13
D | Detecting... | 2014 | i14
C | review1 review2 | 2015 | i15
E | New... | 2016 | i16
I have tried:
df %>%
group_by(gp = c(0, na.omit(cumsum(lead(title) != title)))) %>%
summarize(title = unique(title), text = paste0(text, collapse = " ")) %>%
select(-gp)
which gives me:
title | text |
----------------------------------
A | I like... |
B | I wish... |
C | review1 review2 review3 |
D | Detecting... |
C | review1 review2 |
E | New... |
But when I do:
df %>%
group_by(gp = c(0, na.omit(cumsum(lead(title) != title)))) %>%
summarize(title = unique(title), text = paste0(text, collapse = " ")) %>%
select(-gp, year, code)
It gives:
Error in `stop_subscript()`:
! Can't subset columns that don't exist.
x Column `year` doesn't exist.
Data
df <- data.frame(title = c('A','B','C','C','C','D','C','C','E'),
text = c('I like...', 'I wish...', 'review1','review2','review3',
'Detecting...','review1','review2', 'New...'),
year = c(2012, 2012, 2013, 2013, 2013, 2014, 2015, 2015, 2016),
code = c("i12", "i12", "i13", "i13", "i13", "i14", "i15", "i15", "i16"))
CodePudding user response:
Try
df %>%
group_by(gp = c(0, na.omit(cumsum(lead(title) != title))) , year , code) %>%
summarize(title = unique(title), text = paste0(text, collapse = " ")) %>%
relocate(year , code , .after = last_col()) %>%
select(-gp)
CodePudding user response:
Try using unique
after you mutate
library(dplyr)
df |>
group_by(title, year) |>
mutate(text = paste(text, collapse = " ")) |>
unique()