Assume I have the following code (240 occurrences of the same pattern).
<description>some02text01</description>
<class>selzoom</class>
<title>more text</title>
<explanation />
I want to replace <explanation />
with <explanation>02</explanation>
, depending on the Number before text01
.
What I tried so far:
Extracting the numbers from the description line:
list <- inputtext %>%
xml_find_all("//description") %>%
as_list() %>%
unlist()
list <- list[1:240]
results <- c()
for (i in list) {
results <- c(results, str_extract(i, "\\d\\d(?=te)"))
}
Put a placeholder in the <explanation />
line, so it's now:
<explanation>NN</explanation>
Then I did str_replace_all(inputtext, "NN", results)
. But it returns
Warning message:
In stri_replace_all_regex(string, pattern, fix_replacement(replacement), :
longer object length is not a multiple of shorter object length
But when I do sum(!!str_count(inputtext, "NN"))
it's the same value as the length of results
.
Any idea where the problem is?
CodePudding user response:
Similar to your code, I also use str_extract
to get the digits for replacement. You can use sapply
to go through all of your 240 codes and incorporate the str_extract
result in paste0
and sub
.
For demonstration purpose, I've created two strings
patterns for testing.
Input
strings <- c("<description>some02text01</description>
<class>selzoom</class>
<title>more text</title>
<explanation />",
"<description>some240text01</description>
<class>selzoom</class>
<title>more text</title>
<explanation />")
Code
sapply(strings, function(x) sub("<explanation />", paste0("<explanation>", str_extract(x, "(?<=some)\\d (?=text)"), "</explanation>"), x))
Output
<description>some02text01</description>
<class>selzoom</class>
<title>more text</title>
<explanation>02</explanation> <description>some240text01</description>
<class>selzoom</class>
<title>more text</title>
<explanation>240</explanation>
CodePudding user response:
The answer below assumes that you're reading in lines and that each line is read into "explanation_list" variable. (In your question, you use the variable name "list" which is ambiguous with the R data type "list"). The number is extracted and saved into a variable "num" and a data.frame of original and modified lines.
explanation_frame <- data.frame(line=character(0),
num=numeric(0),
new_line=character(0))
explanation_list <- list()
explanation_list[[length(explanation_list) 1]] <- "<description>some02text01</description>
<class>selzoom</class>
<title>more text</title>
<explanation />"
for (i in 1:length(explanation_list)) {
num <- gsub("^[[:alpha:][:cntrl:]<>/] (\\d )te.*$","\\1",explanation_list[i])
new_line <- gsub("<explanation />",paste0("<explanation>",num,"</explanation>"),explanation_list[i])
if (is.na(as.numeric(num))) num <- ""
explanation_frame <- rbind(explanation_frame,
data.frame(line=explanation_list[i],
num=as.numeric(num),
new_line=new_line))
}
explanation_frame$new_line
CodePudding user response:
You can try this
library(stringr)
t <- gregexpr("\\d text" , x)
number <- str_sub(x , t[[1]] , t[[1]] attr(t[[1]] , "match.length") - 5)
sub("<explanation />" , paste0("<explanation>" , number , "</explanation>") , x)
- output
[1] "<description>some02text01</description>\n
<class>selzoom</class>\n<title>more text</title>\n
<explanation>02</explanation>"
- data
x <- "<description>some02text01</description>
<class>selzoom</class>
<title>more text</title>
<explanation />"