Home > Blockchain >  Replacing strings in R based on numbers before string
Replacing strings in R based on numbers before string

Time:08-03

Assume I have the following code (240 occurrences of the same pattern).

<description>some02text01</description>
<class>selzoom</class>
<title>more text</title>
<explanation />

I want to replace <explanation /> with <explanation>02</explanation>, depending on the Number before text01.

What I tried so far:
Extracting the numbers from the description line:

list <- inputtext %>%
  xml_find_all("//description") %>%
  as_list() %>%
  unlist()

list <- list[1:240]

results <- c()
for (i in list) {
  results <- c(results, str_extract(i, "\\d\\d(?=te)"))
}

Put a placeholder in the <explanation /> line, so it's now:
<explanation>NN</explanation>

Then I did str_replace_all(inputtext, "NN", results). But it returns

Warning message:
In stri_replace_all_regex(string, pattern, fix_replacement(replacement),  :
  longer object length is not a multiple of shorter object length

But when I do sum(!!str_count(inputtext, "NN")) it's the same value as the length of results.

Any idea where the problem is?

CodePudding user response:

Similar to your code, I also use str_extract to get the digits for replacement. You can use sapply to go through all of your 240 codes and incorporate the str_extract result in paste0 and sub.

For demonstration purpose, I've created two strings patterns for testing.

Input

strings <- c("<description>some02text01</description>
<class>selzoom</class>
<title>more text</title>
<explanation />", 
"<description>some240text01</description>
<class>selzoom</class>
<title>more text</title>
<explanation />")

Code

sapply(strings, function(x) sub("<explanation />", paste0("<explanation>", str_extract(x, "(?<=some)\\d (?=text)"), "</explanation>"), x))

Output

<description>some02text01</description>
<class>selzoom</class>
<title>more text</title>
<explanation>02</explanation> <description>some240text01</description>
<class>selzoom</class>
<title>more text</title>
<explanation>240</explanation>

CodePudding user response:

The answer below assumes that you're reading in lines and that each line is read into "explanation_list" variable. (In your question, you use the variable name "list" which is ambiguous with the R data type "list"). The number is extracted and saved into a variable "num" and a data.frame of original and modified lines.

explanation_frame <- data.frame(line=character(0),
                                num=numeric(0),
                                new_line=character(0))
explanation_list <- list()

explanation_list[[length(explanation_list) 1]] <- "<description>some02text01</description>
<class>selzoom</class>
<title>more text</title>
<explanation />"

for (i in 1:length(explanation_list)) {
  num <- gsub("^[[:alpha:][:cntrl:]<>/] (\\d )te.*$","\\1",explanation_list[i])
  new_line <- gsub("<explanation />",paste0("<explanation>",num,"</explanation>"),explanation_list[i])
  
  if (is.na(as.numeric(num))) num <- ""
  
  explanation_frame <- rbind(explanation_frame,
                            data.frame(line=explanation_list[i],
                                       num=as.numeric(num),
                                       new_line=new_line))
}

explanation_frame$new_line

CodePudding user response:

You can try this

library(stringr)

t <- gregexpr("\\d text" , x)
number <- str_sub(x , t[[1]] , t[[1]]   attr(t[[1]] , "match.length") - 5)

sub("<explanation />" , paste0("<explanation>" , number , "</explanation>") , x)

  • output

[1] "<description>some02text01</description>\n
<class>selzoom</class>\n<title>more text</title>\n
<explanation>02</explanation>"
  • data
x <- "<description>some02text01</description>
<class>selzoom</class>
<title>more text</title>
<explanation />"
  • Related