I have a data which is like this:
data <-
"--------------------------------------------------------\nJrId:
1\nJournalTitle: AADE editors' journal\nMedAbbr: AADE Ed J\nISSN
(Print): 0160-6999\nISSN (Online): \nIsoAbbr: AADE Ed J\nNlmId:
7708172\n--------------------------------------------------------\nJrId:
2\nJournalTitle: AANA journal\nMedAbbr: AANA J\nISSN (Print):
0094-6354\nISSN (Online): 2162-5239\nIsoAbbr: AANA J\nNlmId:
0431420\n--------------------------------------------------------\nJrId:
3\nJournalTitle: AARN news letter\nMedAbbr: AARN News Lett\nISSN
(Print): 0001-0197\nISSN (Online): \nIsoAbbr: AARN News Lett\nNlmId:
1251052\n--------------------------------------------------------"
Is there a way to change it to something like this?
# A tibble: 3 × 7
JrID JournalTitle MedAbbr `ISSN (Print)` `ISSN (Online)` IsoAbbr NlmId
<dbl> <chr> <chr> <chr> <chr> <chr> <dbl>
1 1 AADE editors' journal AADE Ed J 0160-6999 AADE Ed J 7708172
2 2 AANA journal AANA J 0094-6354 2162-5239 AANA J 431420
3 3 AARN news letter AARN News Lett 0001-0197 AARN News Lett 1251052
I made this example manually but I want to do it automatically in r
CodePudding user response:
Using a bit of tidyr
and dplyr
you could do:
library(tidyr)
library(dplyr)
raw_text <- readLines(fil)
data.frame(raw = raw_text) |>
# Get rid of empty rows
filter(raw != "") |>
# Add an id
mutate(id = cumsum(grepl("^- $", raw))) |>
# Get rid of separator lines
filter(!grepl("^- $", raw)) |>
separate(raw, into = c("category", "value"), sep = ":\\s?") |>
pivot_wider(names_from = category, values_from = value) |>
select(-id)
#> # A tibble: 3 × 7
#> JrId JournalTitle MedAbbr ISSN (Print…¹ ISSN …² IsoAbbr NlmId
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 AADE editors' journal AADE Ed J 0160-6999 "" AADE E… 7708…
#> 2 2 AANA journal AANA J 0094-6354 "2162-… AANA J 0431…
#> 3 3 AARN news letter AARN News Lett 0001-0197 "" AARN N… 1251…
#> # … with abbreviated variable names ¹`ISSN (Print)`, ²`ISSN (Online)`
DATA
text <- "\n--------------------------------------------------------\nJrId: 1\nJournalTitle: AADE editors' journal\nMedAbbr: AADE Ed J\nISSN (Print): 0160-6999\nISSN (Online): \nIsoAbbr: AADE Ed J\nNlmId: 7708172\n--------------------------------------------------------\nJrId: 2\nJournalTitle: AANA journal\nMedAbbr: AANA J\nISSN (Print): 0094-6354\nISSN (Online): 2162-5239\nIsoAbbr: AANA J\nNlmId: 0431420\n--------------------------------------------------------\nJrId: 3\nJournalTitle: AARN news letter\nMedAbbr: AARN News Lett\nISSN (Print): 0001-0197\nISSN (Online): \nIsoAbbr: AARN News Lett\nNlmId: 1251052\n--------------------------------------------------------\n"
# Create example file
fil <- tempfile(".txt")
writeLines(text, fil)
CodePudding user response:
Using {unglue} :
items <- strsplit(data, "\n?--------------------------------------------------------\n?")[[1]][-1]
items <- gsub("\n", " ", items)
pattern <- "JrId: {JrId} JournalTitle: {JournalTitle} MedAbbr: {MedAbbr} ISSN (Print): {ISSN_Print} ISSN (Online): {ISSN_Online} IsoAbbr: {IsoAbbr} NlmId: {nNlmId}"
unglue::unglue_data(items, pattern)
#> JrId JournalTitle MedAbbr ISSN_Print ISSN_Online
#> 1 1 AADE editors' journal AADE Ed J 0160-6999
#> 2 2 AANA journal AANA J 0094-6354 2162-5239
#> 3 3 AARN news letter AARN News Lett 0001-0197
#> IsoAbbr nNlmId
#> 1 AADE Ed J 7708172
#> 2 AANA J 0431420
#> 3 AARN News Lett 1251052