Home > Software engineering >  Split date data into 3 separate columns in R
Split date data into 3 separate columns in R

Time:09-22

I need to split the 'fiction_work' column (see picture) into 3 separate columns 'work', 'author', 'year'.

picture: https://i.stack.imgur.com/nRat5.jpg

I tried this one, but only maneged to separate 'work', from 'author'. I do not really understand how i can separate the year in brackets.

separated <- separate (total, col = 'fiction_works', into = c('work', 'author'), sep= ",")

I'm doing my best to improve my R skills in but cannot figure this one. Any help is much appreciated. Thanks in advance.

CodePudding user response:

This can be done easily using dplyr and str_extract via the use of regular expressions


Reproducible Data

library(tidyverse)

df <- data.frame(fiction_works = c("The A.B.C Murders (1936), Agatha Christie",
                                   "A ton image (1998), Louise L. Lambrichs",
                                   "About A Boy (1998), Nick Horriby"))

Solution

df2 <- df %>% 
  mutate(Work = str_extract(string = fiction_works, pattern = ". (?=\\s\\()"),
         Author = str_extract(string = fiction_works, pattern = "(?<=,\\s). "),
         Year = str_extract(string = fiction_works, pattern = "[0-9] ")) %>% 
  select(Work:Year)


df2
               Work              Author Year
1 The A.B.C Murders     Agatha Christie 1936
2       A ton image Louise L. Lambrichs 1998
3       About A Boy        Nick Horriby 1998

You might run into issues if any titles have numbers in them, but I couldn't tell if you had that problem via the posted image.

CodePudding user response:

library(tidyverse) 

df %>%
   extract(fiction_works, c("work", "year", "author"), "(.*?) [(](\\d )[), ] (.*)")
                   work year              author
    1 The A.B.C Murders 1936     Agatha Christie
    2       A ton image 1998 Louise L. Lambrichs
    3       About A Boy 1998        Nick Horriby

CodePudding user response:

Using base R

read.csv(text = sub("\\)", "", sub("\\s*\\(", ",", df$fiction_works)),
    header = FALSE, col.names = c("work", "year", "author"))

-output

               work year               author
1 The A.B.C Murders 1936      Agatha Christie
2       A ton image 1998  Louise L. Lambrichs
3       About A Boy 1998         Nick Horriby
  • Related