Home > database >  identifying last occurring duplicates in a vector in R
identifying last occurring duplicates in a vector in R

Time:02-16

I would like to identify all unique values and last occurring instances of multiple values in a vector. For example, I would like to to identify the positions

c(2,3,4,6,7)

in the vector:

v <- c("m", "m", "k", "r", "l", "o", "l")

I see that

(duplicated(v) | duplicated(v, fromLast = T))

identifies all duplicated values, yet I would like to only identify the last occurring instances of duplicated elements.

How to achieve this without a loop?

CodePudding user response:

Do you need:

duplicated(v)

[1] FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE

# and for index

which(duplicated(v))
[1] 2 7

or as akrun suggests:

which(!duplicated(v, fromLast = TRUE))

[1] 2 3 4 6 7

CodePudding user response:

You could do something like:

library(dplyr)

v %>% 
  as_tibble() %>% 
  mutate(index = row_number()) %>% 
  group_by(value) %>% 
  mutate(id=row_number()) %>%
  filter(id == max(id))

Which gives us:

# A tibble: 5 × 3
# Groups:   value [5]
  value index    id
  <chr> <int> <int>
1 m         2     2
2 k         3     1
3 r         4     1
4 o         6     1
5 l         7     2

Additionally, if you just want the index, you can do:

v %>% 
  as_tibble() %>% 
  mutate(index = row_number()) %>% 
  group_by(value) %>% 
  mutate(id=row_number()) %>%
  filter(id == max(id)) %>%
  pull(index)

...to get:

[1] 2 3 4 6 7

CodePudding user response:

We can try

> sort(tapply(seq_along(v), v, max))
m k r o l 
2 3 4 6 7

or

> unique(ave(seq_along(v), v, FUN = max))
[1] 2 3 4 7 6

or

> rev(length(v) - which(!duplicated(rev(v)))   1)
[1] 2 3 4 6 7
  •  Tags:  
  • r
  • Related