I have a type to describe post-calibration radiocarbon date probability distributions. The details and background don't matter for the question: It boils down to one probability value in _calPDFDens
for each year in _calPDFCals
:
data CalPDF = CalPDF {
-- | Sample identifier, e.g. a lab number
_calPDFid :: String
-- | Years calBCAD
, _calPDFCals :: VU.Vector YearBCAD
-- | Probability densities for each year in '_calPDFCals'
, _calPDFDens :: VU.Vector Float
}
(VU
is Data.Vector.Unboxed
)
Now: It is common practice to sum multiple such distributions to derive a sum probability distribution. That means a full outer join on the years in _calPDFCals
and then summing the respective values in _calPDFDens
. I implemented this as follows:
sumPDFs :: CalPDF -> CalPDF -> CalPDF
sumPDFs = combinePDFs ( )
combinePDFs :: (Float -> Float -> Float) -> CalPDF -> CalPDF -> CalPDF
combinePDFs f (CalPDF name1 cals1 dens1) (CalPDF name2 cals2 dens2) =
let startRange = minimum [VU.head cals1, VU.head cals2]
stopRange = maximum [VU.last cals1, VU.last cals2]
emptyBackdrop = zip [startRange..stopRange] (repeat (0.0 :: Float))
pdf1 = VU.toList $ VU.zip cals1 dens1
pdf2 = VU.toList $ VU.zip cals2 dens2
pdfCombined = fullOuter f pdf2 (fullOuter f pdf1 emptyBackdrop)
pdfNew = CalPDF (name1 " " name2) (VU.fromList $ map fst pdfCombined) (VU.fromList $ map snd pdfCombined)
in normalizeCalPDF pdfNew
where
-- https://stackoverflow.com/questions/24424403/join-or-merge-function-in-haskell
fullOuter :: (Float -> Float -> Float) -> [(YearBCAD, Float)] -> [(YearBCAD, Float)] -> [(YearBCAD, Float)]
fullOuter _ xs [] = xs
fullOuter _ [] ys = ys
fullOuter f xss@(x:xs) yss@(y:ys)
| fst x == fst y = (fst x, f (snd x) (snd y)) : fullOuter f xs ys
| fst x < fst y = x : fullOuter f xs yss
| otherwise = y : fullOuter f xss ys
I was wondering if I could rewrite this code, so that CalPDF
becomes an instance of Monoid
and sumPDFs
becomes <>
.
The issue I can not overcome and which lead me to post is question, is how mempty
should look like. I already have something like this in combinePDFs
: emptyBackdrop
. This is required in my implementation, to fill or complete years in between both input PDFs, if they do not overlap.
emptyBackdrop
fulfills some of the requirements for mempty
, but it depends on the input PDFs. Theoretically, the true mempty
would be a CalPDF
, which starts at the beginning of time, ends at the end of time and attributes each of these infinite years a probability of zero. But this can not be implemented with unboxed vectors.
Is there an elegant way to make CalPDF
and instance of Monoid
? Would it be useful already to make it an instance of Semigroup
with what I have already?
Edit: As suggested by @leftaroundabout here is a reproducible, minimal implementation of the setup described above.
main :: IO ()
main = do
let myPDF1 = [(1,1), (2,1), (3,1)]
myPDF2 = [(2,1), (3,1), (4,1)]
putStrLn $ show $ sumPDFs myPDF1 myPDF2
type CalPDF = [(Int, Float)]
sumPDFs :: CalPDF -> CalPDF -> CalPDF
sumPDFs pdf1 pdf2 =
let startRange = minimum [fst $ head pdf1, fst $ head pdf2]
stopRange = maximum [fst $ last pdf1, fst $ last pdf2]
emptyBackdrop = zip [startRange..stopRange] (repeat (0.0 :: Float))
pdfCombined = fullOuter pdf2 (fullOuter pdf1 emptyBackdrop)
in pdfCombined
where
fullOuter :: [(Int, Float)] -> [(Int, Float)] -> [(Int, Float)]
fullOuter xs [] = xs
fullOuter [] ys = ys
fullOuter xss@(x@(year1,dens1):xs) yss@(y@(year2,dens2):ys)
| year1 == year2 = (year1, dens1 dens2) : fullOuter xs ys
| year1 < year2 = x : fullOuter xs yss
| otherwise = y : fullOuter xss ys
CodePudding user response:
Any Semigroup
can be lifted to a Monoid
with Maybe
.
CodePudding user response:
Consider reworking your type a bit.
import Data.Map (Map)
import qualified Data.Map as M
data CalPDF = CalPDF
{ _calPDFid :: [String]
, _calPDFdens :: Map YearBCAD Float
}
The instances can now be quite short indeed:
instance Semigroup CalPDF where
CalPDF id dens <> CalPDF id' dens' = CalPDF
(id <> id')
(M.unionWith ( ) dens dens')
instance Monoid CalPDF where
mempty = CalPDF mempty mempty
You could use HashMap
or IntMap
in place of Map
in essentially the same way if one of those fits your needs better.