making for loop for character vector in R-CodePudding

char_vector <- c("Africa", "identical", "ending" ,"aa" ,"bb", "rain" ,"Friday" ,"transport") # character vector

Suppose I have the above character vector I would like to create a for loop to print on the screen only the elements in a vector that have more than 5 characters and starts with a vowel and also delete from the vector those elements that do not start with a vowel
I created this for loop but it also gives null characters

for (i in char_vector){
    if (str_length(i) > 5){
    i <- str_subset(i, "^[AEIOUaeiou]")
    print(i)
    
    } 
}

The result for the above is

[1] "Africa"
[1] "identical"
[1] "ending"
character(0)
character(0)

My desired result would only be the first 3 characters
I'm really new to R and facing huge difficulty with creating a for loop for this problem. Any help would be greatly appreciated!

CodePudding user response：

Use grepl with the pattern ^[AEIOUaeiuo]\w{5,}$:

char_vector <- c("Africa", "identical", "ending" ,"aa" ,"bb", "rain" ,"Friday" ,"transport")
char_vector <- char_vector[grepl("^[AEIOUaeiuo]\\w{5,}$", char_vector)]
char_vector

[1] "Africa"    "identical" "ending"

The regex pattern used here says to match words which:

^             from the start of the word
[AEIOUaeiuo]  starts with a vowel
\w{5,}        followed by 5 or more characters (total length > 5)
$             end of the word

CodePudding user response：

You don't need for loop, because we use vectorized functions in R.

A simple solution using grep and substr (refer to Tim Blegeleisen answer for details):

substr(grep('^[aeiu].{4}', char_vector, T, , T), 1, 3)
# [1] "Afr" "ide" "end"

CodePudding user response：

With stringr functions, you'd rather use str_detect instead of str_subset, and you can take advantage of the fact that those functions are vectorized:

library(stringr)
char_vector[str_length(char_vector) > 5 & str_detect(char_vector, "^[AEIOUaeiou]")]
#[1] "Africa"    "identical" "ending"

or if you want your for loop as a single vector:

vec <- c()
for (i in char_vector){
  if (str_length(i) > 5 & str_detect(i, "^[AEIOUaeiou]")){
    vec <- c(vec, i)
  } 
}
vec
# [1] "Africa"    "identical" "ending"

CodePudding user response：

The first 3 characters?


library(stringr)
for (i in char_vector){
  if (str_length(i) > 5 & str_detect(i, "^[AEIOUaeiou]")) {
    word <- str_sub(i, 1, 3)
    print(word)
    
  } 
}

output is:

[1] "Afr"
[1] "ide"
[1] "end"

CodePudding user response：

Using only base R functions. No need for a loop. I wrapped the steps in a function so you can use the function with other character vectors. You could make this code shorter (see @utubun's answer) but I feel it is easier to understand the process with a "one line one step" approach.

char_vector <- c("Africa", "identical", "ending" ,"aa" ,"bb", "rain" ,"Friday" ,"transport")
yourfun <- function(char_vector){
  char_vector <- char_vector[nchar(char_vector)>= 5] # grab only the strings that are at least 5 characters long
  char_vector <- char_vector[grep(pattern = "^[AEIOUaeiou]", char_vector)] # grab strings that starts with vowel
  return(char_vector) # print the first three strings
  # remove comments to get the first three characters of each string
  # out <- substring(char_vector, 1, 3) # select only the first 3 characters of each string
  # return(out)
}
yourfun(char_vector = char_vector)
#> [1] "Africa" "identical" "ending"

^{Created on 2022-05-09 by the reprex package (v2.0.1)}