Home > Software engineering >  Improve performance of list searching function in r
Improve performance of list searching function in r

Time:10-09

Morning StackOverflow,

I am creating a function that searches through a single column ColumnOfDatasetToSearch of a matrix Dataset for a number of search terms SearchFeatures. It works well for matrices that have 10^4 rows but really slows down when the row number gets above 10^6 or when SearchFeatures has more than a 100 terms. I thought that vectorizing the ColumnOfDatasetToSearch would improve my speed but only had modest performance improvement.

ListSearcher <- function(SearchFeatures, Dataset, ColumnOfDatasetToSearch){
  RowNumber <- NA
  ColumnOfInterest <- pull(Dataset, ColumnOfDatasetToSearch)
  LengthOfSearchTerms <- length(SearchFeatures)
  for (j in 1:LengthOfSearchTerms){
    if(length(i <- grep(SearchFeatures[j], ColumnOfInterest)))
      RowNumber <- append(RowNumber, i)
  }
  IdentifiersWithThoseSerchTerms <- unique(na.omit((Dataset$Identifiers[RowNumber])))
  return(IdentifiersWithThoseSerchTerms)
}

Thanks in advance for your suggestions.

NewToCoding

CodePudding user response:

Imagine you are using the dataset iris and want to return the column Petal.Length instead of Identifiers.

Does this work? It should be considerably faster

ListSearcher <- function(SearchFeatures, Dataset, ColumnOfDatasetToSearch){
  searchstring <- paste0(SearchFeatures, collapse =  "|")
  selection <- grepl(searchstring, Dataset[[ColumnOfDatasetToSearch]])
  Dataset[selection, ]$Petal.Length
}
# try with a subset of iris
iris[c(1,2,51,52), ]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
#> 1           5.1         3.5          1.4         0.2     setosa
#> 2           4.9         3.0          1.4         0.2     setosa
#> 51          7.0         3.2          4.7         1.4 versicolor
#> 52          6.4         3.2          4.5         1.5 versicolor
ListSearcher(c("ver", "se") , iris[c(1,2,51,52), ], "Species")
#> [1] 1.4 1.4 4.7 4.5
  • Related