Is there a method that can remove similar duplicated words from a comma separated string? There are a few methods out there but they seem to remove exact words only.
For example the following comma separated string is given below
words <- c("Hello, Hello, At desk (Idle), At desk (Idle)†, On floor (Active), On floor (Active)†, In meeting (Advisors), In meeting (Advisors)†, Day off (Birthday), Day off (Birthday)†")
and the desired result is
"Hello, At desk (Idle), On floor (Active), In meeting (Advisors), Day off (Birthday)"
What's been tried is
new.words <- strsplit(words, ",")
sapply(dup_words, function(x) rle(x)$value)
which only removes the exact duplicated words and returns
"Hello, At desk (Idle), At desk (Idle)†, On floor (Active), On floor (Active)†, In meeting (Advisors), In meeting (Advisors)*, Day off (Birthday), Day off (Birthday)†"
only removing the duplicated Hello.
Thanks!
CodePudding user response:
Not 100% what you want, since the commas will disappear, butmybe it might help you
library(stringr)
words <- c("Hello, Hello, At desk, At desk (Idle), On floor, On floor (Active), In meeting, In meeting *, Day off, Day off †")
words %>%
str_split(pattern = " ",simplify = TRUE) %>%
str_split(pattern = ",",simplify = TRUE) %>%
as.vector() %>%
unique() %>%
str_c(collapse = " ")
[1] "Hello At desk (Idle) On floor (Active) In meeting * Day off † "
CodePudding user response:
Based on the updated data and expected
gsub("†", "", gsub("\\([^\\)]\\)\\s*", "",
gsub("([^,] )(?:,\\s*\\1)*", "\\1", words)))
-output
[1] "Hello, At desk (Idle), On floor (Active),
In meeting (Advisors), Day off (Birthday)"