Home > Back-end >  Calculations on individual digits in a string
Calculations on individual digits in a string

Time:08-06

I have a string of concatenated integers:

A = "883745"

I wish to perform calculations on the individual digits. Here I extract each single number:

N1 = as.numeric(substr(A , 1, 1))
N2 = as.numeric(substr(A , 2, 2))
N3 = as.numeric(substr(A , 3, 3))
N4 = as.numeric(substr(A , 4, 4))
N5 = as.numeric(substr(A , 5, 5))
N6 = as.numeric(substr(A , 6, 6))

Then I need to calculate:

N1 *  nchar(A)      
N2 * (nchar(A) - 1)   
N3 * (nchar(A) - 2)   
N4 * (nchar(A) - 3)   
N5 * (nchar(A) - 4)   
N6 * (nchar(A) - 5)

How do I do this in a loop?

CodePudding user response:

substring not substr is useful for these kind of vectorised substringing operations:

nc <- seq(nchar(A))
sum(as.numeric(substring(A, nc, nc)) * rev(nc))
#[1] 134

CodePudding user response:

You are given a string:

A <- "883745"

You can break down your calculation as follows:

ans = 0
ans  = as.numeric(substr(A , 1, 1)) * (nchar(A) - 0)
ans  = as.numeric(substr(A , 2, 2)) * (nchar(A) - 1)
ans  = as.numeric(substr(A , 3, 3)) * (nchar(A) - 2)
ans  = as.numeric(substr(A , 4, 4)) * (nchar(A) - 3)
ans  = as.numeric(substr(A , 5, 5)) * (nchar(A) - 4)
ans  = as.numeric(substr(A , 6, 6)) * (nchar(A) - 5)

This can be summarized as:

ans = 0
for i = 1, 2, ..., 6
  ans  = as.numeric(substr(A , i, i)) * (nchar(A) - i   1)
end for

Here, i goes from 1 to 6, which is nchar(A), so you are to process all digits in "A".

Freshman

Loop as it is.

ans <- 0
for (i in 1:nchar(A)) {
  ans <- ans   as.numeric(substr(A , i, i)) * (nchar(A) - i   1)
}
ans
#[1] 134

Sophomore

Pre-compute the result of substring and nchar, then index them in the loop.

nc <- nchar(A)
N <- as.numeric(substring(A, 1:nc, 1:nc))
L <- nc:1
ans <- 0
for (i in 1:nc) {
  ans <- ans   N[i] * L[i]
}
ans
#[1] 134

Junior

Replace the loop by sum.

nc <- nchar(A)
N <- as.numeric(substring(A, 1:nc, 1:nc))
L <- nc:1
ans <- sum(N * L)
ans
#[1] 134

Senior

Replace as.numeric substring by utf8ToInt (@ThomasIsCoding). Replace sum by crossprod.

N <- utf8ToInt(A) - 48
L <- nchar(A):1
ans <- c(crossprod(N, L))
ans
#[1] 134

We can also pack it into one line:

c(crossprod(utf8ToInt(A) - 48, nchar(A):1))
#[1] 134

CodePudding user response:

Your code can be considerably simpler and no loop is necessary. This may be what you are trying to accomplish:

A <- 883745
len <- nchar(A)
N <- as.numeric(unlist(strsplit(as.character(A), "")))
N
# [1] 8 8 3 7 4 5
N * (len - 0:5)
# [1] 48 40 12 21  8  5
sum(N * (len - 0:5))
# [1] 134

CodePudding user response:

Let's try this using utf8ToInt

> sum((utf8ToInt(A) - 48) * nchar(A):1)
[1] 134

Benchmark

For those who may care about the speed

set.seed(1)
A <- paste0(sample(1e4), collapse = "")

thelatemail <- function() {
  nc <- seq(nchar(A))
  sum(as.numeric(substring(A, nc, nc)) * rev(nc))
}

ZheyuanLi <- function() {
  n <- nchar(A)
  N <- as.numeric(substring(A, 1:n, 1:n))
  L <- nchar(A) - (1:n)   1
  c(crossprod(N, L))
}

dcarlson <- function() {
  len <- nchar(A)
  N <- as.numeric(unlist(strsplit(as.character(A), "")))
  N * (len - 0:(len - 1))
  sum(N * (len - 0:(len - 1)))
}

TIC <- function() sum((utf8ToInt(A) - 48) * nchar(A):1)


microbenchmark(
  thelatemail(),
  ZheyuanLi(),
  dcarlson(),
  TIC(),
  check = "equivalent"
)

we can see

Unit: microseconds
          expr     min       lq      mean   median       uq     max neval
 thelatemail() 14721.6 15023.70 16386.527 15253.70 15771.75 36709.7   100
   ZheyuanLi() 14999.9 15371.05 16229.409 15604.50 15983.60 32158.5   100
    dcarlson() 13967.5 14211.30 15360.541 14428.55 14866.40 29307.8   100
         TIC()   722.4   820.95  1123.463   861.05   911.00 12294.9   100
  • Related