Accessing a common field in sum type variants-CodePudding

Suppose I have a sum type (or several, in fact), that I know by design all have a common field:

data T1 a
  = C1 String a
  | C2 Int a
  | C3 Bool a

data T2 a
  = C4 Int Int a
  | C5 [String] a

Is there a way to access the a field without having to pattern match on all variants across all types?

(I ask in the context of defining ASTs & having a neat way of accessing node-specific information)

CodePudding user response：

A Minor Technical Detail

At the boring super-technical level, no. There is no way to access the fields of a constructor without pattern matching. Pattern matching is the primitive operation that causes the constructor to be evaluated. Before that, the fields don't even necessarily exist (thanks to non-strict evaluation).

Some Options That Might Be Useful

But you probably didn't mean that low-level question. You probably want a way to work with these data types without constantly writing pattern matches. And that can be done. It's just a matter of writing some functions. Which functions, though? ...that can be interesting.

You can write simple accessor functions:

t1ToA :: T1 a -> a
t1ToA (C1 _ x) = x
t1ToA (C2 _ x) = x
t1ToA (C3 _ x) = x

t2ToA :: T2 a -> a
t2ToA (C4 _ _ x) = x
t2ToA (C5 _ x) = x

Don't automatically reject this approach. Sure, it's a bit hungry on namespace because you need a different function name for each type. On the other hand, it's really good for readability and type inference. There's nothing magical anywhere. You might write some matching setter and modifier functions as well.

If you find that's getting to be too namespace hungry when you have various set and modify functions added in, you could use the van Laarhoven trick:

t1A :: Functor f => (a -> f a) -> T1 a -> f (T1 a)
t1A g (C1 x y) = C1 x <$> g y
t1A g (C2 x y) = C2 x <$> g y
t1A g (C3 x y) = C3 x <$> g y

t2A :: Functor f => (a -> f a) -> T2 a -> f (T2 a)
t2A g (C4 x y z) = C4 x y <$> g z
t2A g (C5 x y) = C5 x <$> g y

This representation lets you do reading and updating from the same type, though it is awkward without some helper functions. This is the representation used by libraries like lens, which provide you a huge number of those helper functions. But maybe you don't want to worry about learning how to work with this representation. I'm going to assume this isn't really what you're looking for and not even go into the details of how those helper functions work. But at a high level, they make clever use of specific types for f like Identity and Const a.

An option if you are willing to give up some type inference in order to reduce namespace use is to go for some sort of ad-hoc class:

class ToA f where
    toA :: f a -> a

instance ToA T1 where
    toA (C1 _ x) = x
    toA (C2 _ x) = x
    toA (C3 _ x) = x

instance ToA T2 where
    toA :: T2 a -> a
    toA (C4 _ _ x) = x
    toA (C5 _ x) = x

You could choose to combine this with the van Laarhoven encoding, for what it's worth. This would minimize the amount of namespace you grab, but requiring some additional helpers for the sake of using them easily.

There are a few other options that you might be able to work with, like using less ad-hoc tools GHC provides. Data and Generic are different classes you could work with where GHC gives you a lot of the tools already. But these tend to be very complex to pick up the first time around.

But Maybe There's a Better Solution

There's one last option that is actually the one I would recommend in most cases. Refactor your data types so the shared values aren't duplicated.

data WithA t a = WithA t a
data T1
    = C1 String
    | C2 Int
    | C3 Bool

And so on. Or however you might choose to refactor it. The important part is that the shared field is lifted out of the sum type, and is just always present. I think that this often ends up working the best. It often communicates what you mean better. When you have 3 constructors which each have a field of the same type it's not immediately obvious that that field should be seen as interchangeable between the constructors, even when the datatype is polymorphic over that field's type. But if it's a single field outside of the multiple constructors it is immediately obvious that it's always the same thing. Don't underestimate the communication value that provides for all future maintainers of the code.