Find the most common digit in a vector of integers-CodePudding

I am doing this practice problem that wants me to write a function that can calculate the digits that occur the most number of times in the array.

The example is input:

x = c(25, 2, 3, 57, 38, 41)

and the return value is 2, 3, 5, since these numbers 2, 3 and 5 all occurs 2 times which is the most.

CodePudding user response：

One approach would look something like this, although I am sure there are more efficient approaches:

my_vector <- c(25, 2, 3, 57, 38, 41)

# function to evaluate the number of times a certain digit occurrs
digit_occurrence <- function(vector) {
  
  # collape vector to a single string without commas
  x <- paste(vector, sep = '', collapse = '')

  # create empty vector
  digit <- c()

  # loop over each unique digit and store its occurrence
  for(i in paste(as.character(0:9))) {
    digit[i] <- lengths(regmatches(x, gregexpr(i, x)))
  }

  digit

}

> digit_occurrence(my_vector)
0 1 2 3 4 5 6 7 8 9 
0 1 2 2 1 2 0 1 1 0

CodePudding user response：

A solution using the table() function to get a dataframe with the frequency of each digit (instead of counting with a for loop), and then arranging that dataframe by frequency and extracting the top three digits directly:

input_vector <- c(25, 2, 3, 57, 38, 41)

top_digits <- function(my_array, n=3) {
  
  # `as.character` converts the digits to strings, 
  # `strsplit` splits each one into individual characters (e.g. "23" into "2" and "3")
  # and `unlist` "flattens" the result to a unique string vector 
  my_array_splitted <- unlist(strsplit(as.character(input_vector), ""))
  
  # `table` creates a vector of frequencies
  # `as.data.frame` converts the vector into a DF with 2 columns: digits and frequencies
  df_digits <- as.data.frame(table(my_array_splitted))
  
  # Sorting the DF by frequency
  df_digits <- df_digits[order(df_digits$Freq, decreasing = TRUE),]
  
  # Extracting the first `n` elements of the digits column (which is now sorted) and converting back to integer
  # (we need the intermediate step as character because the column is originally factor, and converting directly to integer is unsafe
  as.integer(as.character(df_digits$my_array_splitted[1:n]))
}

CodePudding user response：

This could be another option for you:

fn <- function(x) {
  # First We separate every single digit in each element but we need to turn
  # the each element into character string beforehand. We then use do.call 
  # function to apply c function on every element of the resulting list to 
  # flatten the list to a vector
  digits <- do.call(c, sapply(x, function(y) strsplit(as.character(y), "")))
  
  # In the end we calculate the frequencies and sort the in decreasing order
  most_freq <- sort(table(digits), decreasing = TRUE)
  most_freq
}

fn(x)

digits_num
2 3 5 1 4 7 8 
2 2 2 1 1 1 1

CodePudding user response：

This approach is similar and uses table

count = function(x) {
    # make a table of counts of all the digits
    tab = table(strsplit(paste(x, collapse=""), ""))
    # access the names of the last digits
    names(tab[max(tab)])
}

And a fun benchmark because it's Christmas:

x = sample(1:1000, 100000, replace=T)

Unit: milliseconds
                expr       min        lq      mean    median        uq      max
               me(x)  46.63262  52.34020  57.33796  53.87266  58.91561 123.5481
             anou(x) 319.14199 351.43877 381.35371 374.78037 408.67354 490.3464
 digit_occurrence(x) 149.83663 151.61908 160.47220 156.88108 161.57646 245.5067
       top_digits(x)  42.40598  49.92426  55.87991  51.90813  56.61563 109.5608