Home > Net >  Compare two strings in R and see additions, deletions
Compare two strings in R and see additions, deletions

Time:06-01

I want to compare two character values in R and see which characters where added and deleted to display it later similar to git diff --color-words=. (see screenshot below)

For example:

a <- "hello world"
b <- "helo world!"

diff <- FUN(a, b)

where diff would somehow show that an l was dropped and a ! was added.

The ultimate goal is to construct an html string like this hel<span >l</span>o world<span >!</span>.

I am aware of diffobj but so far I cannot get it to return the character differences, only the differences between elements.

Output of git diff --color-words=.

the output looks like this: regex output

CodePudding user response:

Base R has a function enter image description here

CodePudding user response:

Found a solution using diffobj::ses_dat() and splitting the data into its characters before.

get_html_diff <- function(a, b) {
  aa <- strsplit(a, "")[[1]]
  bb <- strsplit(b, "")[[1]]
  s <- diffobj::ses_dat(aa, bb)
  
  m <- cumsum(as.integer(s$op) != c(Inf, s$op[1:(length(s$op) - 1)]))
  
  res <- paste(
    sapply(split(seq_along(s$op), m), function(i) {
      val <- paste(s$val[i], collapse = "")
      if (s$op[i[[1]]] == "Insert")
        val <- paste0("<span class=\"add\">", val, "</span>")
      if (s$op[i[[1]]] == "Delete")
        val <- paste0("<span class=\"del\">", val, "</span>")
      val
    }),
    collapse = "")
  res
}

get_html_diff("hello world", "helo World!")
#> [1] "hel<span class=\"del\">l</span>o <span class=\"del\">w</span><span class=\"add\">W</span>orld<span class=\"add\">!</span>"

Created on 2022-05-31 by the reprex package (v2.0.1)

  • Related