I'm looking for a way to transform a list into an n-tuple with one list for each of the n constructors in a disjoint union. The standard library defines a similar function specifically for Either
s:
partitionEithers :: [Either a b] -> ([a], [b])
I'm looking for techniques for solving the generalized problem with the following requirements:
- convenient to write
- as little boilerplate as possible
- processes the list in a single pass
- datatype-generics, metaprogramming, existing libraries etc are all permitted
Example
Here is an example specification with two proposed solutions:
partitionSum :: [MySum] -> ([A], [B], [C], [D])
data MySum
= CaseA A
| CaseB B
| CaseC C
| CaseD D
data A = A deriving Show
data B = B deriving Show
data C = C deriving Show
data D = D deriving Show
-- expect "([A,A],[B,B,B],[],[D])"
test :: IO ()
test = print . partitionSum $
[CaseD D, CaseB B, CaseA A, CaseA A, CaseB B, CaseB B]
First attempt: n list comprehensions that traverse the list n times.
partitionSum1 :: [MySum] -> ([A], [B], [C], [D])
partitionSum1 xs =
( [a | CaseA a <- xs]
, [b | CaseB b <- xs]
, [c | CaseC c <- xs]
, [d | CaseD d <- xs]
)
Second attempt: a single traversal of the input list. I have to manually thread the state through the fold which makes the solution a little repetitive and annoying to write.
partitionSum2 :: [MySum] -> ([A], [B], [C], [D])
partitionSum2 = foldr f ([], [], [], [])
where
f x (as, bs, cs, ds) =
case x of
CaseA a -> (a : as, bs, cs, ds)
CaseB b -> (as, b : bs, cs, ds)
CaseC c -> (as, bs, c : cs, ds)
CaseD d -> (as, bs, cs, d : ds)
CodePudding user response:
In addition to the Representable
answer:
A thing that came to me from seeing foldr f ([], [], [], [])
was to define a monoid where the nil case is mempty
{-# DerivingVia #-}
..
import GHC.Generics (Generically(..), ..)
type Classify :: Type
type Classify = C [A] [B] [C] [D]
deriving
stock Generic
deriving (Semigroup, Monoid)
via Generically Classify
-- mempty = C [] [] [] []
-- C as bs cs ds <> C as1 bs1 cd1 ds1 = C (as as1) (bs bs1) (cs cs1) (ds ds1)
Generically
will be exported from GHC.Generics
in the future. It defines Classify
as a semigroup and monoid through generic pointwise lifting.
With this all you need is a classifier function, that classifies a MySum
into Classify
and you can define partition
in terms of foldMap
classify :: MySum -> Classify
classify = \case
SumA a -> C [a] [] [] []
SumB b -> C [] [b] [] []
SumC c -> C [] [] [c] []
SumD d -> C [] [] [] [d]
partition :: Foldable f => f MySum -> Classify
partition = foldMap classify
CodePudding user response:
As your function is a transformation from sums to products, there's a fairly simple implementation using generics-sop
. This is a library which enhances GHCs generics with more specialized types that make induction on algebriac type (i.e. sums of products) simpler.
First, a prelude:
{-# LANGUAGE DeriveGeneric, StandaloneDeriving #-}
import Generics.SOP hiding ((:.:))
import qualified GHC.Generics as GHC
import GHC.Generics ((:.:)(..))
partitionSum :: (Generic t) => [t] -> NP ([] :.: NP I) (Code t)
This is the method you want to write. Let's examine its type.
- the single argument is a list of some generic type. Pretty straightforward. Note here that
Generic
is the one fromgenerics-sop
, not from GHC - the returned value is an n-ary product (n-tuple) where each element is a list composed with
NP I
(itself an n-ary product, because generally, algebraic datatype constructors might have more than one field) Code t
is the sum-of-products type representation oft
. It's a list of lists of type. e.g.Code (Either a b) ~ '[ '[a], '[b] ]
. The generic value representation oft
isSOP I (Code t)
- a sum of of products over the "code".
To implement this, we can convert each t
to its generic representation, then fold over the resulting list:
partitionSum = partitionSumGeneric . map from
partitionSumGeneric :: SListI xss => [SOP I xss] -> NP ([] :.: NP I) xss
partitionSumGeneric = foldr (\(SOP x) -> classifyGeneric x) emptyClassifier
partitionSumGeneric
is pretty much the same as partitionSum
, but operates on generic representations of values.
Now for the interesting part. Let's begin with the base case of our fold. This should contain empty lists in every position. generics-sop
provides a handy mechanism for generating a product type with a uniform value in each position:
emptyClassifier :: SListI xs => NP ([] :.: NP I) xs
emptyClassifier = hpure (Comp1 [])
The recursive case is as follows: if the value has tag at index k
, add that value to the list at index k
in the accumulator. We can do this with simultaneous recursion on both the sum type (it's generic now, so a value of type NS (NP I) xs
- a sum of products) and on the accumulator.
classifyGeneric :: NS (NP I) xss -> NP ([] :.: NP I) xss -> NP ([] :.: NP I) xss
classifyGeneric (Z x) (Comp1 l :* ls) = (Comp1 $ x : l) :* ls
classifyGeneric (S xs) ( l :* ls) = l :* classifyGeneric xs ls
Your example with some added data to make it a bit more interesting:
data MySum
= CaseA A
| CaseB B
| CaseC C
| CaseD D
-- All that's needed for `partitionSum' to work with your type
deriving instance GHC.Generic MySum
instance Generic MySum
data A = A Int deriving Show
data B = B String Int deriving Show
data C = C deriving Show
data D = D Integer deriving Show
test = partitionSum $
[CaseD $ D 0, CaseB $ B "x" 1, CaseA $ A 2, CaseA $ A 3, CaseB $ B "y" 4, CaseB $ B "z" 5]
the result is:
Comp1 {unComp1 = [I (A 2) :* Nil,I (A 3) :* Nil]} :* Comp1 {unComp1 = [I (B "x" 1) :* Nil,I (B "y" 4) :* Nil,I (B "z" 5) :* Nil]} :* Comp1 {unComp1 = []} :* Comp1 {unComp1 = [I (D 0) :* Nil]} :*Nil