remove duplicate values in cell without removing row-CodePudding

I have a column of strings variables that are separated with white space and need to remain strings. How can I remove the duplicate values and values longer than 4 characters?

company        counts 
company1       2222 2222 45345234 425352352352 6574745 299
company2       9909 4363465246 543 323 9909 3454534534 768

I would like to end up with something like this:

company        counts 
company1       2222 299
company2       9909 543 323 768

CodePudding user response：

strsplit the strings, remove the long ones and the duplicates and paste back together:

sapply(
    strsplit(dat$counts, "\\s "),
    \(x) paste(x[nchar(x) <= 4 & (!duplicated(x))], collapse=" ")
)
##[1] "2222 299"         "9909 543 323 768"

Where dat was:

dat <- read.csv(text="company,counts 
company1,2222 2222 45345234 425352352352 6574745 299
company2,9909 4363465246 543 323 9909 3454534534 768")