I need to split the 'fiction_work' column (see picture) into 3 separate columns 'work', 'author', 'year'.
picture: https://i.stack.imgur.com/nRat5.jpg
I tried this one, but only maneged to separate 'work', from 'author'. I do not really understand how i can separate the year in brackets.
separated <- separate (total, col = 'fiction_works', into = c('work', 'author'), sep= ",")
I'm doing my best to improve my R skills in but cannot figure this one. Any help is much appreciated. Thanks in advance.
CodePudding user response:
This can be done easily using dplyr
and str_extract
via the use of regular expressions
Reproducible Data
library(tidyverse)
df <- data.frame(fiction_works = c("The A.B.C Murders (1936), Agatha Christie",
"A ton image (1998), Louise L. Lambrichs",
"About A Boy (1998), Nick Horriby"))
Solution
df2 <- df %>%
mutate(Work = str_extract(string = fiction_works, pattern = ". (?=\\s\\()"),
Author = str_extract(string = fiction_works, pattern = "(?<=,\\s). "),
Year = str_extract(string = fiction_works, pattern = "[0-9] ")) %>%
select(Work:Year)
df2
Work Author Year
1 The A.B.C Murders Agatha Christie 1936
2 A ton image Louise L. Lambrichs 1998
3 About A Boy Nick Horriby 1998
You might run into issues if any titles have numbers in them, but I couldn't tell if you had that problem via the posted image.
CodePudding user response:
library(tidyverse)
df %>%
extract(fiction_works, c("work", "year", "author"), "(.*?) [(](\\d )[), ] (.*)")
work year author
1 The A.B.C Murders 1936 Agatha Christie
2 A ton image 1998 Louise L. Lambrichs
3 About A Boy 1998 Nick Horriby
CodePudding user response:
Using base R
read.csv(text = sub("\\)", "", sub("\\s*\\(", ",", df$fiction_works)),
header = FALSE, col.names = c("work", "year", "author"))
-output
work year author
1 The A.B.C Murders 1936 Agatha Christie
2 A ton image 1998 Louise L. Lambrichs
3 About A Boy 1998 Nick Horriby