Home > Enterprise >  Combine list of lists with named indices in Map-like structure
Combine list of lists with named indices in Map-like structure

Time:09-17

I have a program with two data structures I wish to combine. The use of Data.Map here is incidental because I'm using it elsewhere for a related purpose. If a solution never uses Data.Map, that's fine (probably better). I've simplified the problem to the below script that has all the essential elements.

My actual program is in a different domain, but in the analogy various "interviewers" are assigned to interview all the people in given households (named by index position of the "house"). I would like to determine which interviewers will need to conduct multiple interviews.

If an interviewer is assigned multiple households, she automatically must interview multiple people (in the scenario, all households are occupied). However, if she is assigned only one household, she might also need to interview the several people there.

The initial wrong approach I found (misled by my wrong assumption about the domain) produces the result below. However, I'm having trouble formulating the correct solution. For my purpose, the order in which the interviews occur in the result is not important.

import Data.Map.Strict (Map)
import qualified Data.Map.Strict as Map

-- Create Map from list of pairs, making dup vals into a list
fromListWithDuplicates :: Ord k => [(k, v)] -> Map k [v]
fromListWithDuplicates pairs =
    Map.fromListWith (  ) [(k, [v]) | (k, v) <- pairs]

data Person = Person {
    name :: String
    } deriving (Show, Eq)

households = [[Person "Alice", Person "Bob"],
              [Person "Carlos"],
              [Person "Dabir", Person "Eashan"],
              [Person "Fatima"] ]

interviewers = [("Agent1", [0]), ("Agent2", [1,2]), ("Agent3", [3])]

multiInterviewsWRONG households interviewers =
    let assignments = [(agent, name person) |
                       (agent, houseIndices) <- interviewers,
                       index <- houseIndices,
                       person <- (households !! index),
                       length houseIndices > 1 ]
    in Map.assocs $ fromListWithDuplicates assignments

main :: IO ()
main = do
    -- Prints: [("Agent2", ["Eashan","Dabir","Carlos"])]
    putStrLn $ show (multiInterviewsWRONG households interviewers)

    -- Correct: [("Agent2", ["Eashan","Dabir","Carlos"]),
    --           ("Agent1", ["Alice","Bob"]]

Followup: this solution is just Willem Van Onsem's below, but putting it in one place:

import Util (lengthExceeds)
multiInterviews households interviewers =
    let assignments = [(agent, name person) |
                       (agent, houseIndices) <- interviewers,
                       index <- houseIndices,
                       person <- (households !! index) ]
    in filter (flip lengthExceeds 1 . snd)
        (Map.assocs $ fromListWithDuplicates assignments)

CodePudding user response:

Obviously Willem's answer is great, but I think it can't hurt to also offer one without a list comprehension:

atLeastTwo :: [a] -> Bool
atLeastTwo (_:_:_) = True
atLeastTwo _ = False

transformSnd :: (b -> c) -> (a, b) -> (a, c)
transformSnd fun (f, s) = (f, fun s)
-- or transformSnd = second (from Control.Arrow; h/t Willem)
-- or transformSnd = fmap (from (,)'s Functor instance; h/t Will Ness)
-- or transformSnd = second (from Data.Bifunctor)

mult :: [(String, [String])]
mult = filter (atLeastTwo . snd) . map (transformSnd toInterviewees) $ interviewers
  where toInterviewees = map name . concatMap (households !!)

-- mult == [("Agent1",["Alice","Bob"]),("Agent2",["Carlos","Dabir","Eashan"])]

I'm reasonably sure the two versions run equally fast; which one is more readable depends on who's doing the reading.

There are a couple of functional differences. First, with Willem's answer, you get a map, while with this one you get a list (but the difference is mainly cosmetic, and you said you didn't care much).

Second, the two versions behave differently if there are two pairs in the interviewers list that have the same first element. Doing it Willem's way will do what you probably want, i. e. treat them as one pair with a longer second element; doing it this way will give you two pairs in the result list which have the same first element.

Also, you probably know this, but: if you find yourself combining lists a lot, you might want sets instead.

CodePudding user response:

You should remove the length houseIndices > 1 constraints, since that means that it will only retain agents, given they have to interview two or more households. You thus should use as list comprehension:

multiInterviews households interviewers =
    let assignments = [
        (agent, name person) |
        (agent, houseIndices) <- interviewers,
        index <- houseIndices,
        person <- households !! index
      ]
    # …

The given list comprehension will produce a list that looks like:

Prelude> :{
Prelude| [
Prelude|         (agent, name person) |
Prelude|         (agent, houseIndices) <- interviewers,
Prelude|         index <- houseIndices,
Prelude|         person <- households !! index
Prelude|       ]
Prelude| :}
[("Agent1","Alice"),("Agent1","Bob"),("Agent2","Carlos"),("Agent2","Dabir"),("Agent2","Eashan"),("Agent3","Fatima")]

We however need to filter, we can look at the assocs with lists that contain at least two items. We can implement an efficient function to determine if the list has at least two items:

atLeastTwo :: [a] -> Bool
atLeastTwo (_:_:_) = True
atLeastTwo _ = False

and apply this filter to the assocs of the Map:

multiInterviews households interviewers =
    let assignments = …
    in filter (atLeastTwo . snd) (Map.assocs (fromListWithDuplicates assignments))
  • Related