Home > other >  How to use filter to find neighboring words in a text
How to use filter to find neighboring words in a text

Time:11-29

first time poster, long time lurker.

as pretext: I have, prior to this, written a function that counts the amount of times a pair of words occur in a text, this function calculates every single pair of words throughout the text. Like so:

pairCounter =  map (\x -> (head x,length x)). groupTuples . sort

This function returns: [((String, String), Int)] The first/second string being word1/2 in the pair, and the Int is how many times this can be found, or the "tally" of the pair if you will.

What I now would like to do is create a function that only returns the "neighbors" to any given word. For instance:

neighbours [(("red","car"),2),(("house","red"),1)] "red"

should return [("car",2),("house",1)] or some reordering of this list.

So basically; we have established all pairs of any given word, but now I want to single out only the neighbors to this word and a tally of its frequency.

So far, I have thought about using filters in this way:

filter (\(x, y) -> x /= c || y /= c)
--^(I have no idea if this is syntax correct but it is just to give myself an idea where to start)

However I find it hard to come up with a way to use filters and also include the tally of my neighbors, my Int argument that is.

Please let me know if any of this is against any rules or similar, thank you.

CodePudding user response:

For a given word c you thus should retain the items for which the first String, or the second String are equal to c. We should use ((s1, s2), v) as pattern since the outer 2-tuple has as elements a 2-tuple of Strings as first item, and an Int as second item.

We can work with concatMap :: Foldable t => (a -> [b]) -> t a -> [b] and work with a function that will return [(s2, v)] if s1 matches, [(s1, v)] if s2 matches, and the empty list if none of the two elements was a match:

We thus filter with:

neighbors :: (Foldable f, Eq a) -> f ((a, a), b) -> a -> [(a, b)]
neighbors items query = concatMap f items
  where f ((s1, s2), v)
            | query == s1 = [(s2, v)]
            | query == s2 = [(s1, v)]
            | otherwise = []

CodePudding user response:

One very idiomatic way would be via a list comprehension:

neighbours :: Eq a => [((a, a), b)] -> a -> [(a, b)]
neighbours items query =
    [ (neighbor, count)
    | ((s1, s2), count) <- items
    , (self, neighbor) <- [(s1, s2), (s2, s1)]
    , self == query
    ]

Actually, I'd probably put the arguments in the other order to match conventions used in existing libraries and shorten the names so that it comfortably fits on one line:

neighbours :: Eq a => a -> [((a, a), b)] -> [(a, b)]
neighbours x xs = [(x4, n) | ((x1, x2), n) <- xs, (x3, x4) <- [(x1, x2), (x2, x1)], x3 == x]

I suspect that the part where you don't care about order will come up in other parts of your code, and so additionally I would consider splitting out a part that symmetrizes. This will also be helpful if, later, you decide that pairs that occur in both orders should be normalized and their counts summed or some such thing, because you will only have to change one location to propagate that update to all consumers.

-- (s)ource, (t)arget, (u)ndirected edge, (w)eight, (w)eighted edge(s)
undirected :: [((a, a), b)] -> [((a, a), b)]
undirected ws = [(u, w) | ((s, t), w) <- ws, u <- [(s, t), (t, s)]]

neighbours :: Eq a => a -> [((a, a), b)] -> [(a, b)]
neighbours x xs = [(t, w) | ((s, t), w) <- undirected xs, s == x]

Alternately, you might decide to make the graph undirected from the very beginning.

import qualified Data.Map as M

-- export UPair the type but not UPair the constructor
data UPair a = UPair a a
    deriving (Eq, Ord, Read, Show, Functor, Foldable, Traversable)

-- this is strict. can also make a lazy variant
upair :: Ord a => a -> a -> UPair a
upair a a' = if a < a' then UPair a a' else UPair a' a

pairCounter :: [a] -> M.Map (UPair a) Int
pairCounter as = M.fromListWith ( ) $ zipWith (\a a' -> (upair a a', 1)) as (tail as)
  • Related