Home > database >  Is there an R function to remove repetition **within** observation?
Is there an R function to remove repetition **within** observation?


I have a large dataset that contains one column called "TYPE_DESCRIPTION" that describes the type of activity of each observation.

However, the raw dataset that I obtained somehow may contain more than one repetition of the same activity within the "TYPE_DESCRIPTION" column. Let's say for one observation, the activity (or value) shown within the "TYPE_DESCRIPTION" column can contain "Walking, Walking, Walking, Walking", instead of just "Walking". How do I remove the repetition of "Walking" within that column so I only have the value once?

I have tried the distinct() function, but it defines the "Walking, Walking, Walking, Walking" as one unique value. Whereas what I want is just "Walking".

This became a problem when later I want to add a new column using mutate() that groups the activity into higher order and write "Walking" in the codes. Since I only write "Walking" on the code, it does not recognize the variation of 'Walking' with different repetition and put it under different category that I need it to be.


CodePudding user response:

df <- data.frame(TYPE_DESCRIPTION = c("Walking,Walking, Walking, Walking", 
                                      "Running, Walking"))

df %>%
  mutate(row = row_number()) %>%
  separate_rows(TYPE_DESCRIPTION, sep = ",") %>%
  distinct(row, TYPE_DESCRIPTION)


# A tibble: 3 × 2
  <chr>            <int>
1 Walking              1
2 Running              2
3 Walking              2

CodePudding user response:

in Base R:

transform(df, uniq=sapply(strsplit(TYPE_DESCRIPTION, ', ?'), \(x)toString(unique(x))))

                   TYPE_DESCRIPTION             uniq
1 Walking,Walking, Walking, Walking          Walking
2                  Running, Walking Running, Walking
  •  Tags:  
  • r
  • Related