Greetings of the day
As this day starts with the below dataset df
with roughly 1000 rows
S.No Names
1 Hello Arun
2 Hello,Kamal
3 Hello Nazi
4 Hello:Ganesh
5 Hello*vishnu
I need to count Names
column and only the second word in it but i have special characters involved in it
I have tried stringr
package but don't know exactly to apply them.
I need my output as like below
S.No Names count
1 Hello Arun 4
2 Hello,Kamal 5
3 Hello Nazi 4
4 Hello:Ganesh 6
5 Hello*vishnu 6
Thanks in advance
CodePudding user response:
You can drop everything until a special characters (punctuation) or a space and count the number of characters remaining with nchar
.
df$count <- nchar(sub('.*([[:punct:]]|\\s)', '', df$Names))
df
# S.No Names count
#1 1 Hello Arun 4
#2 2 Hello,Kamal 5
#3 3 Hello Nazi 4
#4 4 Hello:Ganesh 6
#5 5 Hello*vishnu 6
Same thing can also be written in dplyr
if you prefer that.
df %>% mutate(count = nchar(sub('.*([[:punct:]]|\\s)', '', Names)))
CodePudding user response:
Another possible solution, based on stringr
:
library(tidyverse)
df %>%
mutate(count = str_extract(Names, "(?<=(\\s|[:punct:]))[:alpha:] $") %>%
str_count)
#> S.No Names count
#> 1 1 Hello Arun 4
#> 2 2 Hello,Kamal 5
#> 3 3 Hello Nazi 4
#> 4 4 Hello:Ganesh 6
#> 5 5 Hello*vishnu 6