I'm using Zotero to create a BibTeX list of references from PDFs, and it uses { } around words whose case must be preserved.
title = {Novel breeding habitat, oviposition microhabitat, and parental care in {Bokermannohyla} caramaschii ({Anura}: {Hylidae}) in southeastern {Brazil}},
However, some people in my team use Mendeley, which doesn't seem to know about this rule of BibTeX format, and the { } still appear in their titles after importing from the BibTeX file I've sent.
So I want to write a small script (in R) to remove the { } inside the main { } of the title (and other fields), so that the above line would, in the modified file, become as below.
title = {Novel breeding habitat, oviposition microhabitat, and parental care in Bokermannohyla caramaschii (Anura: Hylidae) in southeastern Brazil},
I've tried a lot, but nothing works. What is the Regex to do that?
CodePudding user response:
You can convert matches of the regular expression
(?<!^title = ){|}(?!,$)
to empty strings.
The regular expression can be broken down as follows. (I've shown spaces as character classes containing a space so that they are visible to the reader.)
(?<! # begin a negative lookbehind
^ # match the start of the string
title[ ]=[ ] # match 'title = '
) # end negative lookbehind
{ # match '{'
| # or
} # match '}'
(?! # begin a negative lookahead
,$ # match a comma at the end of the string
) # end a negative lookahead
CodePudding user response:
Here's a parser that removes just {
and }
, and only when inside a complete set of { ... }
. It doesn't pretend to be fast or efficient, but with reasonable-length strings, you shouldn't notice any lag.
func <- function(S) {
spl <- strsplit(S, "")[[1]]
out <- character(0)
inbrace <- 0L
for (i in seq_along(spl)) {
ch <- spl[i]
if (ch == "{") {
if (inbrace < 1L) out <- c(out, ch)
inbrace <- inbrace 1L
} else if (ch == "}") {
if (inbrace == 0L) {
stop("unmatched close brace at: ", i)
} else if (inbrace == 1L) {
out <- c(out, ch)
}
inbrace <- max(0L, inbrace - 1L)
} else out <- c(out, ch)
}
if (inbrace != 0L) stop("finished missing ", inbrace, " close-brace(s)")
paste(out, collapse = "")
}
Demo:
func('title = {Novel breeding habitat, oviposition microhabitat, and parental care in {Bokermannohyla} caramaschii ({Anura}: {Hylidae}) in southeastern {Brazil}},')
# [1] "title = {Novel breeding habitat, oviposition microhabitat, and parental care in Bokermannohyla caramaschii (Anura: Hylidae) in southeastern Brazil},"
It tries to be very specific, failing if either an unmatched }
occurs or if the input ends while a {
remains unmatched.
func('title = {Novel breeding habitat, oviposition microhabitat, and parental care in {Bokermannohyla} caramaschii ({Anura}: {Hylidae}) in southeastern {Brazil},')
# Error in func("title = {Novel breeding habitat, oviposition microhabitat, and parental care in {Bokermannohyla} caramaschii ({Anura}: {Hylidae}) in southeastern {Brazil},") :
# finished missing 1 close-brace(s)
func('title = {Novel breeding habitat, oviposition microhabitat, and parental care in {Bokermannohyla}} caramaschii ({Anura}: {Hylidae}) in southeastern {Brazil}},')
# Error in func("title = {Novel breeding habitat, oviposition microhabitat, and parental care in {Bokermannohyla}} caramaschii ({Anura}: {Hylidae}) in southeastern {Brazil}},") :
# unmatched close brace at: 156
CodePudding user response:
Here's a strategy that works if we can be sure that the "%%%" and "###" strings are not going to be present in the titles. First we change the first "{" to "%%%" and the last "}" to "###". Then remove all "{" and "}", and then put the first "{" and last "}" back in.
txt <- "title = {Novel breeding habitat, oviposition microhabitat, and parental care in {Bokermannohyla} caramaschii ({Anura}: {Hylidae}) in southeastern {Brazil}},"
txt2 <- sub("(^[^{] )(\\{)", "\\1%%%", txt) # placeholder for first "{"
txt3 <- sub("(\\})([^}]*$)", "###\\2", txt2) # " " for last "}"
txt4 <- gsub("\\{|\\}", "", txt3) # remove the rest
txt5 <- sub("%%%", "{", tx4) # put the leading and trailing ones back
txt6 <- sub("###", "}", txt5)
txt6
[1] "title = {Novel breeding habitat, oviposition microhabitat, and parental care in Bokermannohyla caramaschii (Anura: Hylidae) in southeastern Brazil},"