I have a column of strings variables that are separated with white space and need to remain strings. How can I remove the duplicate values and values longer than 4 characters?
company counts
company1 2222 2222 45345234 425352352352 6574745 299
company2 9909 4363465246 543 323 9909 3454534534 768
I would like to end up with something like this:
company counts
company1 2222 299
company2 9909 543 323 768
CodePudding user response:
strsplit
the strings, remove the long ones and the duplicates and paste
back together:
sapply(
strsplit(dat$counts, "\\s "),
\(x) paste(x[nchar(x) <= 4 & (!duplicated(x))], collapse=" ")
)
##[1] "2222 299" "9909 543 323 768"
Where dat
was:
dat <- read.csv(text="company,counts
company1,2222 2222 45345234 425352352352 6574745 299
company2,9909 4363465246 543 323 9909 3454534534 768")