Home > Enterprise >  Haskell - Transform a list of unions into a tuple of lists
Haskell - Transform a list of unions into a tuple of lists

Time:09-17

I'm looking for a way to transform a list into an n-tuple with one list for each of the n constructors in a disjoint union. The standard library defines a similar function specifically for Eithers:

partitionEithers :: [Either a b] -> ([a], [b])

I'm looking for techniques for solving the generalized problem with the following requirements:

  • convenient to write
  • as little boilerplate as possible
  • processes the list in a single pass
  • datatype-generics, metaprogramming, existing libraries etc are all permitted
Example

Here is an example specification with two proposed solutions:

partitionSum :: [MySum] -> ([A], [B], [C], [D])

data MySum
  = CaseA A
  | CaseB B
  | CaseC C
  | CaseD D

data A = A deriving Show
data B = B deriving Show
data C = C deriving Show
data D = D deriving Show

-- expect "([A,A],[B,B,B],[],[D])"
test :: IO ()
test = print . partitionSum $
  [CaseD D, CaseB B, CaseA A, CaseA A, CaseB B, CaseB B]

First attempt: n list comprehensions that traverse the list n times.

partitionSum1 :: [MySum] -> ([A], [B], [C], [D])
partitionSum1 xs =
  ( [a | CaseA a <- xs]
  , [b | CaseB b <- xs]
  , [c | CaseC c <- xs]
  , [d | CaseD d <- xs]
  )

Second attempt: a single traversal of the input list. I have to manually thread the state through the fold which makes the solution a little repetitive and annoying to write.

partitionSum2 :: [MySum] -> ([A], [B], [C], [D])
partitionSum2 = foldr f ([], [], [], [])
  where
    f x (as, bs, cs, ds) =
      case x of
        CaseA a -> (a : as, bs, cs, ds)
        CaseB b -> (as, b : bs, cs, ds)
        CaseC c -> (as, bs, c : cs, ds)
        CaseD d -> (as, bs, cs, d : ds)

CodePudding user response:

In addition to the Representable answer:


A thing that came to me from seeing foldr f ([], [], [], []) was to define a monoid where the nil case is mempty

{-# DerivingVia #-}
..
import GHC.Generics (Generically(..), ..)

type Classify :: Type
type Classify = C [A] [B] [C] [D]
  deriving
  stock Generic

  deriving (Semigroup, Monoid)
  via Generically Classify

-- mempty = C [] [] [] []
-- C as bs cs ds <> C as1 bs1 cd1 ds1 = C (as    as1) (bs    bs1) (cs    cs1) (ds    ds1)

Generically will be exported from GHC.Generics in the future. It defines Classify as a semigroup and monoid through generic pointwise lifting.

With this all you need is a classifier function, that classifies a MySum into Classify and you can define partition in terms of foldMap

classify :: MySum -> Classify
classify = \case
  SumA a -> C [a] [] [] []
  SumB b -> C [] [b] [] []
  SumC c -> C [] [] [c] []
  SumD d -> C [] [] [] [d]

partition :: Foldable f => f MySum -> Classify
partition = foldMap classify

CodePudding user response:

As your function is a transformation from sums to products, there's a fairly simple implementation using generics-sop. This is a library which enhances GHCs generics with more specialized types that make induction on algebriac type (i.e. sums of products) simpler.

First, a prelude:

{-# LANGUAGE DeriveGeneric, StandaloneDeriving #-}

import Generics.SOP hiding ((:.:))
import qualified GHC.Generics as GHC
import GHC.Generics ((:.:)(..))


partitionSum :: (Generic t) => [t] -> NP ([] :.: NP I) (Code t)

This is the method you want to write. Let's examine its type.

  • the single argument is a list of some generic type. Pretty straightforward. Note here that Generic is the one from generics-sop, not from GHC
  • the returned value is an n-ary product (n-tuple) where each element is a list composed with NP I (itself an n-ary product, because generally, algebraic datatype constructors might have more than one field)
  • Code t is the sum-of-products type representation of t. It's a list of lists of type. e.g. Code (Either a b) ~ '[ '[a], '[b] ]. The generic value representation of t is SOP I (Code t) - a sum of of products over the "code".

To implement this, we can convert each t to its generic representation, then fold over the resulting list:


partitionSum = partitionSumGeneric . map from

partitionSumGeneric :: SListI xss => [SOP I xss] -> NP ([] :.: NP I) xss
partitionSumGeneric = foldr (\(SOP x) -> classifyGeneric x) emptyClassifier

partitionSumGeneric is pretty much the same as partitionSum, but operates on generic representations of values.

Now for the interesting part. Let's begin with the base case of our fold. This should contain empty lists in every position. generics-sop provides a handy mechanism for generating a product type with a uniform value in each position:

emptyClassifier :: SListI xs => NP ([] :.: NP I) xs
emptyClassifier = hpure (Comp1 [])

The recursive case is as follows: if the value has tag at index k, add that value to the list at index k in the accumulator. We can do this with simultaneous recursion on both the sum type (it's generic now, so a value of type NS (NP I) xs - a sum of products) and on the accumulator.

classifyGeneric :: NS (NP I) xss -> NP ([] :.: NP I) xss -> NP ([] :.: NP I) xss
classifyGeneric (Z x)  (Comp1 l :* ls) = (Comp1 $ x : l) :* ls
classifyGeneric (S xs) (      l :* ls) =              l  :* classifyGeneric xs ls

Your example with some added data to make it a bit more interesting:

data MySum
  = CaseA A
  | CaseB B
  | CaseC C
  | CaseD D

-- All that's needed for `partitionSum' to work with your type
deriving instance GHC.Generic MySum
instance Generic MySum

data A = A Int deriving Show
data B = B String Int deriving Show
data C = C deriving Show
data D = D Integer deriving Show

test = partitionSum $
  [CaseD $ D 0, CaseB $ B "x" 1, CaseA $ A 2, CaseA $ A 3, CaseB $ B "y" 4, CaseB $ B "z" 5]

the result is:

Comp1 {unComp1 = [I (A 2) :* Nil,I (A 3) :* Nil]} :* Comp1 {unComp1 = [I (B "x" 1) :* Nil,I (B "y" 4) :* Nil,I (B "z" 5) :* Nil]} :* Comp1 {unComp1 = []} :* Comp1 {unComp1 = [I (D 0) :* Nil]} :*Nil
  • Related