How to extract a specific text from a vector that is separated with multiple commas in R-CodePudding

This is my first post and I am relatively new to R world so I hope I post my question respectfully to the website. I search for this but I could not come up with something efficient.

I have a column that has such a structure:

df$col1 <- c("book, pencil,eraser,pen", "book,pen", "music,art,sport", "apple, banana, kiwi, watermelon", "Earth, Mars, Jupiter").

what I would like to do is that I would like to create a new column that is going to be built based on certain elements of the col1.

If the first cell has 2 commas, then I would like to extract the element between the first and the second comma and write it to the first cell in the new column. If the next cell has 3 commas, then I would like to extract the element between the second and third comma and write it to the second cell in the new column and so on.

As can be seen from the example of col1, I have cells not in order of the number of commas so sometimes a three-comma-separated cell structure might occur again in the following cells. I need to account for that too.

Could you please help me in this regard?

Your help is much appreciated in advance!

CodePudding user response：

What about the following?

library(tidyverse)

df %>% 
 mutate(col2 = str_split(col1, "\\s*,\\s*") %>%
   map_chr(~ if (length(.x) %in% 1:2) {.x[length(.x)]} 
      else {.x[length(.x) - 1]}))

#>                              col1   col2
#> 1         book, pencil,eraser,pen eraser
#> 2                        book,pen    pen
#> 3                 music,art,sport    art
#> 4 apple, banana, kiwi, watermelon   kiwi
#> 5            Earth, Mars, Jupiter   Mars

CodePudding user response：

Here's a straightforward regex solution to extract the pre-ultimate word into a new column:

df %>%
  mutate(col2 = str_extract(col1, "\\w (?=,[^,] $)"))
                             col1   col2
1         book, pencil,eraser,pen eraser
2                        book,pen   book
3                 music,art,sport    art
4 apple, banana, kiwi, watermelon   kiwi
5            Earth, Mars, Jupiter   Mars

Data:

df <- data.frame(col1 =c("book,pencil,eraser,pen", "book,pen", "music,art,sport"))

CodePudding user response：

You could use strsplit. I this case n is 3.

df$col1 <- c("book, pencil,eraser,pen", "book,pen", "music,art,sport")
strsplit(df$col1, ',')[[1]][3]

[1] "eraser"

EDIT If I understand your question correctly, you could do something like this:

 df <- data.frame(col1 =c("book,pencil,eraser,pen", "book,pen", "music,art,sport"), stringsAsFactors = F)
 df$col2 <- lapply(df$col1, FUN = function(x) {strsplit(x, ",")[[1]][stringr::str_count(x, ",")]})
 df
                    col1   col2
1 book,pencil,eraser,pen eraser
2               book,pen   book
3        music,art,sport    art