strsplit(rquote, split = "")[[1]] in R-CodePudding

rquote <- "r's internals are irrefutably intriguing"
chars <- strsplit(rquote, split = "")[[1]]

This question has been asked before on this forum and has one answer on it but I couldn't understand anything from that answer, so here I am asking this question again.

In the above code what is the meaning of [[1]] ?

The program that I'm trying to run:

rquote <- "r's internals are irrefutably intriguing"
chars <- strsplit(rquote, split = "")[[1]]

rcount <- 0

for (char in chars) {
  if (char == "r") {
      rcount <- rcount   1
  }
  if (char == "u") {
      break
  }
  
}

print(rcount)

When I don't use [[1]] I get the following warning message in for loop and I get a wrong output of 1 for rcount instead of 5:

Warning message: the condition has length > 1 and only the first element will be used

CodePudding user response：

strsplit is vectorized. That means it splits each element of a vector into a vectors. To handle this vector of vectors it returns a list in which a slot (indexed by [[) corresponds to a element of the input vector.

If you use the function on a one element vector (single string as you do), you get a one-slot list. Using [[1]] right after strsplit() selects the first slot of the list - the anticipated vector.

Unfortunately, your list chars works in a for loop - you have one iteration with the one slot. In if you compare the vector of letters against "r" which throws the warning. Since the first element of the comparison is TRUE, the condition holds and rcount is rised by 1 = your result. Since you are not indexing the letters but the one phrase, the cycle stops there.

Maybe if you run something like strsplit(c("one", "two"), split="") , the outcome will be more straightforward.

> strsplit(c("one", "two"), split="")
[[1]]
[1] "o" "n" "e"

[[2]]
[1] "t" "w" "o"

> strsplit(c("one", "two"), split="")[[1]] 
[1] "o" "n" "e"

> strsplit(c("one"), split="")[[1]][2] 
[1] "n"

CodePudding user response：

We'll start with the below as data, without [[1]]:

rquote <- "r's internals are irrefutably intriguing"
chars2 <- strsplit(rquote, split = "")
class(chars2)
[1] "list"

It is always good to have an estimate of your return value, your above '5'. We have both length and lengths.

length(chars2)
[1] 1     # our list
lengths(chars2)
[1] 40    # elements within our list

We'll use lengths in our for loop for counter, and, as you did, establish a receiver vector outside the loop,

rcount2 <- 0
for (i in 1:lengths(chars2)) {
    if (chars2[[1]][i] == 'r') {
      rcount2 <- rcount2  1
       }
    if (chars2[[1]][i] == 'u') {
      break
      }
}
print(rcount2)
[1] 6
length(which(chars2[[1]] == 'r')) # as a check, and another way to estimate
[1] 6

Now supposing, rather than list, we have a character vector:

chars1 <- strsplit(rquote, split = '')[[1]]
length(chars1)
[1] 40
rcount1 <- 0
for(i in 1:length(chars1)) {
if(chars1[i] == 'r') {
rcount1 <- rcount1  1
}
if (chars1[i] == 'u') {
break
}
}
print(rcount1)
[1] 5
length(which(chars1 == 'r'))
[1] 6

Hey, there's your '5'. What's going on here? Head scratch...

all.equal(chars1, unlist(chars2))
[1] TRUE

That break should just give us 5 'r' before a 'u' is encountered. What's happening when it's a list (or does that matter...?), how does the final r make it into rcount2?

And this is where the fun begins. Jeez. break for coffee and thinking. Runs okay. Usual morning hallucination. They come and go. But, as a final note, when you really want to torture yourself, put browser() inside your for loop and step thru.

Browse[1]> i
[1] 24
Browse[1]> n
debug at #7: break
Browse[1]> chars2[[1]][i] == 'u'
[1] TRUE
Browse[1]> n
> rcount2
[1] 5