Home > Mobile >  Writing a function that takes a vector as input, throws away unwanted values, de-duplicates, and ret
Writing a function that takes a vector as input, throws away unwanted values, de-duplicates, and ret

Time:12-08

I'm trying to write a function that takes in a vector and subsets it according to several steps:

  1. Throws away any unwanted values
  2. Removes duplicates.
  3. Returns the indexes of the original vector after accounting for steps (1) and (2).

For example, provided with the following input vector:

vec_animals <- c("dog", "dog", "dog", "dog", "cat", "dolphin", "dolphin")

and

throw_away_val <- "cat"

I want my function get_indexes(x = vec_animals, y = throw_away_val) to return:

# [1] 1 6   # `1` is the index of the 1st unique ("dog") in `vec_animals`, `6` is the index of the 2nd unique ("dolphin")

Another example

vec_years <- c(2003, 2003, 2003, 2007, 2007, 2011, 2011, 2011)
throw_away_val <- 2003

Return:

# [1] 4 6 # `4` is the position of 1st unique (`2007`) after throwing away unwanted val; `6` is the position of 2nd unique (`2011`).

My initial attempt

The following function returns indexes but doesn't account for duplicates

get_index <- function(x, throw_away) {
  which(x != throw_away)
}

which then returns the indexes of the original vec_animals such as:

get_index(vec_animals, "cat")
#> [1] 1 2 3 4 6 7

If we use this output to subset vec_animal we get:

vec_animals[get_index(vec_animals, "cat")]
#> [1] "dog"     "dog"     "dog"     "dog"     "dolphin" "dolphin"

You could have suggested to operate on this output such as:

vec_animals[get_index(vec_animals, "cat")] |> unique()
#> [1] "dog"     "dolphin"

But no, I need get_index() to return the correct indexes right away (in this case 1 and 6).


EDIT


A relevant procedure in which we can get the indexes of first occurrences of duplicates is provided with

library(bit64)

vec_num <- as.integer64(c(4, 2, 2, 3, 3, 3, 3, 100, 100))
unipos(vec_num)
#> [1] 1 2 4 8

Or more generally

which(!duplicated(vec_num))
#> [1] 1 2 4 8

Such solutions would have been great if had not needed to also throw away unwanted values.

CodePudding user response:

Try:

get_index <- function(x, throw_away) {
  which(!duplicated(x) & x!=throw_away)
  }

> get_index(vec_animals, "cat")
[1] 1 6

CodePudding user response:

Here is a simple self-written function that provides the needed information.

vec_animals <- c("dog", "dog", "dog", "dog", "cat", "dolphin", "dolphin")

get_indexes <- function(x, throw_away){
  elements <- (unique(x))[(unique(x)) != throw_away]
  index <- lapply(1:length(elements), function(i) {which(x %in% elements[i]) })
  index2return <- c()
  for (j in 1:length(index)) {
    index2return <- c(index2return, min(index[[j]]))
  }
  return(index2return)
}

get_indexes(x = vec_animals, throw_away = "cat")
[1] 1 6

CodePudding user response:

My approach :

vec_animals <- c("dog", "dog", "dog", "dog", "cat", "dolphin", "dolphin")
throw_away_val <- "cat"

my_function <- function(x, y) {
my_df <- data.frame("Origin" = x,
                  "Position" = seq.int(from = 1, to = length(x), by = 1),
                  stringsAsFactors = FALSE)
my_var <- which(my_df$Origin %in% y)
if (length(my_var)) {
my_df <- my_df[-my_var,]
}
my_df <- my_df[!duplicated(my_df$Origin),]
return (my_df)
}

my_df <- my_function(vec_animals, throw_away_val)
  • Related