Home > Enterprise >  How to get YAML metadata from a markdown file using Pandoc? [Haskell]
How to get YAML metadata from a markdown file using Pandoc? [Haskell]

Time:04-21

I want to get the metadata and also the content.

Given this markdown text:

---
title: The Title
---

just some content

And this function to extract the Meta:

extractMeta :: Pandoc -> Meta 
extractMeta (Pandoc meta _) = meta

I always get

Meta {unMeta = fromList []}

as a response.

It seems like the metadata block is included somewhere else:

Pandoc (Meta {unMeta = fromList []}) [HorizontalRule,Para [Str "title:",Space,Str "Something",SoftBreak,Str "id:",Space,Str "123123",SoftBreak,Str "---"],Para [Str "hi",Space,Str "this",Space,Str "is",Space,Str "the",Space,Str "first",Space,Str "blog",Space,Str "post"]]

Is there any way I can make Pandoc parse the YAML metadata using its Haskell API?

Thanks.

Full code:

module Main where
  import Data.Text (pack, Text)
  import Text.Pandoc (def, readMarkdown, runIO, ReaderOptions (readerStandalone))
  import Data.Functor ((<&>))

  readAndPack :: FilePath -> IO Text
  readAndPack = (<$>) pack . readFile . (  ) "posts/"

  main :: IO ()
  main = do
    md     <- readAndPack "binary.markdown"
    pandoc <- runIO $ readMarkdown def { readerStandalone = True } md
    print pandoc

CodePudding user response:

The issue is: YAML metadata is not enabled by default. You need Ext_yaml_metadata_block on to get what you want. Here's a full code that does parse YAML metadata as expected.

module Main where

import Data.Text (pack, Text)
import Text.Pandoc (def, readMarkdown, runIO, ReaderOptions (readerStandalone))
import Text.Pandoc.Extensions
import Text.Pandoc.Options
import Data.Functor ((<&>))

readAndPack :: FilePath -> IO Text
readAndPack = (<$>) pack . readFile . (  ) "posts/"

main :: IO ()
main = do
  md     <- readAndPack "binary.markdown"
  pandoc <- runIO $ 
    readMarkdown 
      (def { 
        readerStandalone = True,
        readerExtensions = 
          enableExtension Ext_yaml_metadata_block (readerExtensions def) } ) 
      md
  print pandoc

Here's the output:

Right (Pandoc (Meta {unMeta = fromList [("title",MetaInlines [Str "The",Space,Str "Title"])]}) [Para [Str "just",Space,Str "some",Space,Str "content"]])
  • Related