Given two list of the form:
l1 = [k1, k2, k3]
l2 = [(k1, x1), (k2, x2), (k1, x3), (k2, x4), (k5, x5), ..)
I would like to group the x'es in a list with their respective key such that:
l = [(k1, [x1, x3]), (k2, [x2, x4]), (k5, [x5]), ..]
I have already googled the question and all the solutions use library code, I was however wondering how you implement this recursively. What I have so far is:
groupHelper [] l2 = []
groupHelper (k, value):tail1 (k1, values):tail2 | k == k1 = groupHelper tail1 (k1,(value:values)):tail2
| k /= k1 = (k1, values):groupHelper (k,value):tail1 tail2
groupHelper (k, value):tail1 [] = (k, value):groupHelper tail1 (k,value)
As you will notice my code makes absolutely no sense. I have been stuck on this problem for some time now, so I was wondering if adding a third accumulator parameter in case I've enumerated through l1 I can go back, would help.
I'm really confused right now, if anyone could point me in the right direction that would be appreciated.
CodePudding user response:
Start with a simpler function: update :: (a, b) -> [(a, [b])] -> [(a, [b])]
, which takes a single pair and updates the in-progress grouping.
update (k, v) [] = ...
update (k, v) ((k1, vs):rest) = ...
Once you do that, it' s simple matter of folding your list of pairs using update
.
groupby :: [(a, b)] -> [(a, [b])]
groupby = foldr update []
If you don't want to use foldr
, it would be a good exercise to implement it from scratch.
foldr_ :: (a -> b -> b) -> b -> [a] -> b
foldr_ f z [] = ...
foldr_ f z (x:xs) = ...
There's little to be gained by ignoring the abstraction provided by foldr
and trying to implement groupby
entirely from first principles.
CodePudding user response:
Not necessarily particularly elegant, but we can pattern match on the list, and then iterate over the tail to find all values with the same key, then cons the result onto the result of running the same function recursively on the remaining elements.
groupByFirst [] = []
groupByFirst ((k, v):tl) = (k, v:map snd s) : groupByFirst d
where
(s, d) = partition ((k ==) . fst) tl
ghci> l2 = [("k1", "x1"), ("k2", "x2"), ("k1", "x3"), ("k2", "x4"), ("k5", "x5")]
ghci> groupByFirst l2
[("k1",["x1","x3"]),("k2",["x2","x4"]),("k5",["x5"])]
The complexity of this is less than ideal, but it should be simple enough to understand. Effectively, evaluating this would look like:
groupByFirst [("k1", "x1"), ("k2", "x2"), ("k1", "x3"), ("k2", "x4"), ("k5", "x5")]
("k1", ["x1", "x3"]) : groupByFirst [("k2", "x2"), ("k2", "x4"), ("k5", "x5")]
("k1", ["x1", "x3"]) : ("k2", ["x2", "x4"]) : groupByFirst [("k5", "x5")]
("k1", ["x1", "x3"]) : ("k2", ["x2", "x4"]) : ("k5", ["x5"]) : groupByFirst []
("k1", ["x1", "x3"]) : ("k2", ["x2", "x4"]) : ("k5", ["x5"]) : []
[("k1", ["x1", "x3"]), ("k2", ["x2", "x4"]), ("k5", ["x5"])]