Home > Enterprise >  Serializing complex ASTs in Haskell
Serializing complex ASTs in Haskell

Time:03-10

I'm using a library in Haskell which has this very, very complex recursive data structure that represents an AST. It contains dozens of different constructors, some with simply recursive definitions, some with mutually recursive definitions, and it's all around nasty.

I want to be able to serialize this giant recursive monster into a JSON string, and then be able to de-serialize it. It's a data class, so I feel I should be able to just have some sort of generic function that turns it into a giant human-readable string in JSON format. I really, really want to avoid writing custom serialization logic for it's 80 constructors.

Is this even possible?

To clarify, I'm trying to serialize this data structure, which is part of the official GHC API. I'm aware pretty-printing gives me a string but I'd really like this as a JSON structure.

EDIT: The class is too complex for Generic to create a suitable ToJSON and FromJSON, unless I'm missing something.

CodePudding user response:

The only reasonable approach will be to use standalone deriving clauses to derive Generic instances for (most of) the types involved, and generate as many FromJSON/ToJSON instances as possible using the default Generic-based defaults.

I started fiddling with it, and I saw no insurmountable technical barriers, but the amount of boilerplate required is non-trivial. You'll need a boatload of Generic instances. You may also need to work with a modified copy of the ghc-lib source, because some types (e.g., TyCon) are not exported with their constructors, preventing derivation of the instances.

Overall, the Generic instances aren't so bad because most can be derived polymorphically in the phase:

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE UndecidableInstances #-}

import BasicTypes
import CoAxiom
-- etc. --

import GHC.Generics

deriving instance Generic (AnnDecl p)
deriving instance Generic (AnnProvenance p)
deriving instance Generic (Branches br)
deriving instance Generic (CoAxiom br)
deriving instance Generic (ForeignDecl p)
deriving instance Generic (GenLocated l e)
deriving instance Generic (HsBracket p)
deriving instance Generic (HsExpr p)
-- etc. --

The FromJSON, ToJSON instances are a little more difficult. The phase parameter is used, via type families, to change the types in parts of the tree, so a polymorphic instance:

import Data.Aeson

instance FromJSON (HSExpr p)

will start demanding a lot of type family instances, like instance FromJSON (XWrap p) and a few dozen others. You can't supply these polymorphically:

instance FromJSON (XWrap p)  -- Illegal type synonym family application

because they're type families, and that's not supported by GHC. I think the best approach is to define instances for each needed phase, and since there are some inter-phase dependencies, you'll need to define instances for multiple phases, even if you're only trying to serialize for one phase. So:

instance FromJSON (HSExpr GhcTc)
instance FromJSON (HSExpr GhcRn)
-- etc. --

From there, it's a matter for following the trail of compiler error messages re: missing instances and filling them all in. A few keyboard macros in your editor of choice should ease the pain.

You'll eventually get down to some leaf types that probably shouldn't be serialized generically. For example, FastString is a string stored in a common hash table for fast comparison, and you'll want/need to serialize and deserialize it manually (or deal with reconstructing the hash table on the deserialized end).

Anyway, I stopped after around 35 Generic instances and 50 FromJSON instances, and I figure I was only about a quarter done at that point. On the other hand, that took me less than an hour, so I think it's doable with a day or two of tedious work.

Here's what I had before I lost interest. About half of the FromJSON instances typecheck; the rest are still demanding instances. I was using GHC 8.10.7, though, so the module names and types probably won't match yours.

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE UndecidableInstances #-}
{-# LANGUAGE TemplateHaskell #-}

module MyModule where

import BasicTypes
import CoAxiom
import FastString
import GHC.Hs
import GHC.Hs.Extension
import Name
import SrcLoc
import TyCoRep
import TyCon
import Unique
import UniqSet
import Var
import qualified Data.Array as Array

import GHC.Generics
import Data.Aeson

deriving instance Generic (AnnDecl p)
deriving instance Generic (AnnProvenance p)
deriving instance Generic (Branches br)
deriving instance Generic (CoAxiom br)
deriving instance Generic (ForeignDecl p)
deriving instance Generic (GenLocated l e)
deriving instance Generic (HsBracket p)
deriving instance Generic (HsExpr p)
deriving instance Generic (HsGroup p)
deriving instance Generic (HsImplicitBndrs p (LHsType p))
deriving instance Generic (HsRecField' id arg)
deriving instance Generic (HsSplice p)
deriving instance Generic (HsType p)
deriving instance Generic (HsWildCardBndrs p (LHsType p))
deriving instance Generic (Match p (LHsExpr p))
deriving instance Generic (MatchGroup p (LHsExpr p))
deriving instance Generic (RuleDecls p)
deriving instance Generic (StmtLR p p (LHsExpr p))
deriving instance Generic (VarBndr var argf)
deriving instance Generic (WarnDecl p)
deriving instance Generic (WarnDecls p)
deriving instance Generic AnonArgFlag
deriving instance Generic ArgFlag
deriving instance Generic CoAxBranch
deriving instance Generic Coercion
deriving instance Generic ForeignImport
deriving instance Generic NoExtCon
deriving instance Generic NoExtField
deriving instance Generic Role
deriving instance Generic SourceText
deriving instance Generic SrcSpan
deriving instance Generic StringLiteral
deriving instance Generic TyLit
deriving instance Generic Type
deriving instance Generic WarningTxt

instance (FromJSON l, FromJSON e) => FromJSON (GenLocated l e)
instance FromJSON (AnnDecl GhcTc)
instance FromJSON (AnnProvenance Var)
instance FromJSON (Branches br)
instance FromJSON (CoAxiom Branched)
instance FromJSON (ConDeclField GhcRn)
instance FromJSON (ConDeclField GhcTc)
instance FromJSON (ForeignDecl GhcTc)
instance FromJSON (GRHS GhcTc (LHsExpr GhcTc))
instance FromJSON (HsBracket GhcRn)
instance FromJSON (HsBracket GhcTc)
instance FromJSON (HsExpr GhcRn)
instance FromJSON (HsExpr GhcTc)
instance FromJSON (HsGroup GhcRn)
instance FromJSON (HsGroup GhcTc)
instance FromJSON (HsImplicitBndrs GhcTc (LHsExpr GhcTc))
instance FromJSON (HsImplicitBndrs GhcTc (LHsType GhcTc))
instance FromJSON (HsLocalBindsLR GhcTc GhcTc)
instance FromJSON (HsRecField' (AmbiguousFieldOcc GhcTc) (LHsExpr GhcTc))
instance FromJSON (HsRecFields GhcTc (LHsExpr GhcTc))
instance FromJSON (HsSplice GhcTc)
instance FromJSON (HsTyVarBndr GhcRn)
instance FromJSON (HsTyVarBndr GhcTc)
instance FromJSON (HsType GhcRn)
instance FromJSON (HsType GhcTc)
instance FromJSON (HsValBindsLR GhcTc GhcTc)
instance FromJSON (HsWildCardBndrs GhcRn (LHsSigType GhcRn))
instance FromJSON (HsWildCardBndrs GhcRn (LHsType GhcRn))
instance FromJSON (Match GhcTc (LHsExpr GhcTc))
instance FromJSON (MatchGroup GhcTc (LHsExpr GhcTc))
instance FromJSON (RuleDecls GhcRn)
instance FromJSON (RuleDecls GhcTc)
instance FromJSON (StmtLR GhcRn GhcRn (LHsExpr GhcRn))
instance FromJSON (StmtLR GhcTc GhcTc (LHsExpr GhcTc))
instance FromJSON (VarBndr TyCoVar ArgFlag)
instance FromJSON (WarnDecl GhcTc)
instance FromJSON (WarnDecls GhcTc)
instance FromJSON AnonArgFlag
instance FromJSON ArgFlag
instance FromJSON CoAxBranch
instance FromJSON Coercion
instance FromJSON ForeignImport
instance FromJSON NoExtField
instance FromJSON Role
instance FromJSON SourceText
instance FromJSON SrcSpan
instance FromJSON StringLiteral
instance FromJSON TyLit
instance FromJSON Type
instance FromJSON WarningTxt

-- Non-generic instances, a mixture of:
-- 1. Those that shouldn't be derived generically (e.g., FastString)
-- 2. Those that will need access to the constructors (e.g., TyCon)
instance FromJSON RealSrcSpan where parseJSON = undefined
instance FromJSON FastString where parseJSON = undefined
instance FromJSON a => FromJSON (UniqSet a) where parseJSON = undefined
instance FromJSON Var where parseJSON = undefined
instance FromJSON NoExtCon where parseJSON = undefined
instance (FromJSON i, FromJSON e) => FromJSON (Array.Array i e) where parseJSON = undefined
instance FromJSON TyCon where parseJSON = undefined
instance FromJSON Unique where parseJSON = undefined
instance FromJSON Name where parseJSON = undefined
  • Related