This is my first post and I am relatively new to R world so I hope I post my question respectfully to the website. I search for this but I could not come up with something efficient.
I have a column that has such a structure:
df$col1 <- c("book, pencil,eraser,pen", "book,pen", "music,art,sport", "apple, banana, kiwi, watermelon", "Earth, Mars, Jupiter").
what I would like to do is that I would like to create a new column that is going to be built based on certain elements of the col1
.
If the first cell has 2 commas, then I would like to extract the element between the first and the second comma and write it to the first cell in the new column. If the next cell has 3 commas, then I would like to extract the element between the second and third comma and write it to the second cell in the new column and so on.
As can be seen from the example of col1, I have cells not in order of the number of commas so sometimes a three-comma-separated cell structure might occur again in the following cells. I need to account for that too.
Could you please help me in this regard?
Your help is much appreciated in advance!
CodePudding user response:
What about the following?
library(tidyverse)
df %>%
mutate(col2 = str_split(col1, "\\s*,\\s*") %>%
map_chr(~ if (length(.x) %in% 1:2) {.x[length(.x)]}
else {.x[length(.x) - 1]}))
#> col1 col2
#> 1 book, pencil,eraser,pen eraser
#> 2 book,pen pen
#> 3 music,art,sport art
#> 4 apple, banana, kiwi, watermelon kiwi
#> 5 Earth, Mars, Jupiter Mars
CodePudding user response:
Here's a straightforward regex solution to extract the pre-ultimate word into a new column:
df %>%
mutate(col2 = str_extract(col1, "\\w (?=,[^,] $)"))
col1 col2
1 book, pencil,eraser,pen eraser
2 book,pen book
3 music,art,sport art
4 apple, banana, kiwi, watermelon kiwi
5 Earth, Mars, Jupiter Mars
Data:
df <- data.frame(col1 =c("book,pencil,eraser,pen", "book,pen", "music,art,sport"))
CodePudding user response:
You could use strsplit
. I this case n
is 3.
df$col1 <- c("book, pencil,eraser,pen", "book,pen", "music,art,sport")
strsplit(df$col1, ',')[[1]][3]
[1] "eraser"
EDIT If I understand your question correctly, you could do something like this:
df <- data.frame(col1 =c("book,pencil,eraser,pen", "book,pen", "music,art,sport"), stringsAsFactors = F)
df$col2 <- lapply(df$col1, FUN = function(x) {strsplit(x, ",")[[1]][stringr::str_count(x, ",")]})
df
col1 col2
1 book,pencil,eraser,pen eraser
2 book,pen book
3 music,art,sport art