I need to create a dataframe from a .csv file containing author references:
refs <- data.frame(reference = "Harris P R, Harris D L (1983). Training for the Metaindustrial Work Culture. Journal of European Industrial Training, 7(7): 22.")
Essentially I want to pull out the coauthors, year of publication, and article title.
refs$author[1]
Harris P R, Harris D L
refs$year[1]
1983
refs$title[1]
Training for the Metaindustrial Work Culture
At this stage, I do not need a publication source as I can get this via rscopus
.
I can extract authors and years with this code:
refs <- refs %>%
mutate(author = sub("\\(.*", "", reference),
year = str_extract(reference, "\\d{4}")))
However, I need help extracting the title (substring between two periods after bracketed date).
CodePudding user response:
This regex works for your minimal example:
refs <- data.frame(reference = "Harris P R, Harris D L (1983). Training for the Metaindustrial Work Culture. Journal of European Industrial Training, 7(7): 22.")
sub("[^.] \\.([^.] )\\..*", "\\1", refs$reference)
#> [1] " Training for the Metaindustrial Work Culture"
Explanation:
"[^.] \\.([^.] )\\..*"
- whole regex
[^.] \\.
- one or more characters that isn't a period, followed by a period (i.e. everything up until the first period)
([^.] )\\..*
- start capturing 'group 1' "("
which contains one or more characters that aren't a period ([^.]
) then stop capturing group 1 ")"
at the next period "\\."
(group 1 now = the title), then match everything else ".*"
Then, in the sub command, you print group 1 ("\\1"
).
Unfortunately, you may run into problems with your 'real world' data. Using rscopus to extract the title might be a better solution to avoid unforeseen errors.
Using tidyverse functions:
library(tidyverse)
refs <- data.frame(reference = "Harris P R, Harris D L (1983). Training for the Metaindustrial Work Culture. Journal of European Industrial Training, 7(7): 22.")
refs %>%
mutate(author = sub("\\(.*", "", reference),
year = str_extract(reference, "\\d{4}"),
title = sub("[^.] \\.([^.] )\\..*", "\\1", reference))
#> reference
#> 1 Harris P R, Harris D L (1983). Training for the Metaindustrial Work Culture. Journal of European Industrial Training, 7(7): 22.
#> author year title
#> 1 Harris P R, Harris D L 1983 Training for the Metaindustrial Work Culture
Created on 2022-12-05 with reprex v2.0.2