Home > Software engineering >  How do I replace part of a string with a lua filter in Pandoc, to convert from .md to .pdf?
How do I replace part of a string with a lua filter in Pandoc, to convert from .md to .pdf?

Time:12-07

I am writing markdown files in Obsidian.md and trying to convert them via Pandoc and LaTeX to PDF. Text itself works fine doing this, howerver, in Obsidian I use ==equal signs== to highlight something, however this doesn't work in LaTeX.

So I'd like to create a filter that either removes the equal signs entirely, or replaces it with something LaTeX can render, e.g. \hl{something}. I think this would be the same process.

I have a filter that looks like this:

return {
  {
    Str = function (elem)
      if elem.text == "hello" then
        return pandoc.Emph {pandoc.Str "hello"}
      else
        return elem
      end
    end,
  }
}

this works, it replaces any instance of "hello" with an italicized version of the word. HOWEVER, it only works with whole words. e.g. if "hello" were part of a word, it wouldn't touch it. Since the equal signs are read as part of one word, it won't touch those.

How do I modify this (or, please, suggest another filter) so that it CAN replace and change parts of a word?

Thank you!

this works, it replaces any instance of "hello" with an italicized version of the word. HOWEVER, it only works with whole words. e.g. if "hello" were part of a word, it wouldn't touch it. Since the equal signs are read as part of one word, it won't touch those.

How do I modify this (or, please, suggest another filter) so that it CAN replace and change parts of a word?

Thank you!

CodePudding user response:

A string like Hello, World! becomes a list of inlines in pandoc: [ Str "Hello,", Space, Str "World!" ]. Lua filters don't make matching on that particularly convenient: the best method is currently to write a filter for Inlines and then iterate over the list to find matching items.

For a complete example, see https://gist.github.com/tarleb/a0646da1834318d4f71a780edaf9f870.

Assuming we already found the highlighted text and converted it to a Span with with class mark. Then we can convert that to LaTeX with

function Span (span)
  if span.classes:includes 'mark' then
    return {pandoc.RawInline('latex', '\\hl{')} ..
      span.content ..
      {pandoc.RawInline('latex', '}')}
  end
end

Note that the current development version of pandoc, which will become pandoc 3 at some point, supports highlighted text out of the box when called with

pandoc --from=markdown mark ...

E.g.,

echo '==Hi Mom!==' | pandoc -f markdown mark -t latex
⇒ \hl{Hi Mom!}
  • Related