how to count second word strings in R-CodePudding

Greetings of the day

As this day starts with the below dataset df with roughly 1000 rows

S.No    Names
1      Hello Arun
2      Hello,Kamal
3      Hello Nazi
4      Hello:Ganesh
5      Hello*vishnu

I need to count Names column and only the second word in it but i have special characters involved in it

I have tried stringr package but don't know exactly to apply them.

I need my output as like below

S.No    Names     count
1   Hello Arun      4
2   Hello,Kamal     5
3   Hello Nazi      4
4   Hello:Ganesh    6
5   Hello*vishnu    6

Thanks in advance

CodePudding user response：

You can drop everything until a special characters (punctuation) or a space and count the number of characters remaining with nchar.

df$count <- nchar(sub('.*([[:punct:]]|\\s)', '', df$Names))
df

#  S.No        Names count
#1    1   Hello Arun     4
#2    2  Hello,Kamal     5
#3    3   Hello Nazi     4
#4    4 Hello:Ganesh     6
#5    5 Hello*vishnu     6

Same thing can also be written in dplyr if you prefer that.

df %>% mutate(count = nchar(sub('.*([[:punct:]]|\\s)', '', Names)))

CodePudding user response：

Another possible solution, based on stringr:

library(tidyverse)

df %>% 
  mutate(count = str_extract(Names, "(?<=(\\s|[:punct:]))[:alpha:] $") %>%
         str_count)

#>   S.No        Names count
#> 1    1   Hello Arun     4
#> 2    2  Hello,Kamal     5
#> 3    3   Hello Nazi     4
#> 4    4 Hello:Ganesh     6
#> 5    5 Hello*vishnu     6