What is the idiomatic Haskell way to check if one list contains all the values in another list?-CodePudding

My question is similar to How can i use haskell to check if a list contains the values in a tuple, but in my case, the list with values which must all be in my original list is the alphabet. It feels really messy to have 26 calls to elem, all ANDed together.

Is there any more concise way to check if a given list contains all of the elements in another list, where the other list is 26 items long?

CodePudding user response：

I like the answer by user1984, but it does leave implicit a bit the way you would handle non-alphabetic characters. Here's one that's almost as simple, but doesn't need extra code to handle non-alphabetic characters. The basic idea is to go the other direction; their answer builds up a set, this one tears one down by deleting one character at a time. It is also O(n), like theirs.

import qualified Data.Set as S

isPangram :: String -> Bool
isPangram = S.null . foldr S.delete (S.fromList ['a'..'z'])

If you want case insensitivity, you can import Data.Char and change S.delete to (S.delete . toLower).

CodePudding user response：

My understanding is that you are trying to check whether a string is a pangram, meaning it contains all the [insert your preferred language] alphabet.

The following code uses Data.Set. A set data structure can't contain duplicate elements by definition. We use this property of a set to count the number of elements that are in it after we added all the character from the string to it. If it's equal to the number of the alphabet we are working with, it means that the string contains all the characters of the alphabet, so it's a pangram.

Note that this code does not take care of the difference between lower and upper case letters and also counts white space and punctuation. This would give you wrong answers if there are any of these cases. With some tweaking you can make it work though.

The time complexity of this solution is n * log 26 which is just n in Big O, as Daniel Wagner pointed out. The reason is that while Haskell's default set is a balanced tree, the number of elements in the set never goes above 26, so it becomes linear. Using a hashing implementation of the set may result in some optimization for very large strings. The space complexity is constant since the additional space we are using never exceeds 26 characters.

import qualified Data.Set as S

isPangram :: [Char] -> Bool
isPangram s = length (S.fromList s) == 26

CodePudding user response：

This is based on Daniel Wagner's answer, but it finishes early if all the letters are found, whereas his walks the whole list, then works backwards deleting elements, then gives an answer when it gets back to the beginning.

import qualified Data.Set as S
import Data.Set (Set)

letters :: Set Char
letters = S.fromList ['a'..'z']

isPangram :: String -> Bool
isPangram xs0 = go xs0 letters
  where
    go :: String -> Set Char -> Bool
    go [] letters = S.null letters
    go (c : cs) letters =
      S.null letters || go cs (S.delete c letters)

You may be wondering why I match on the string first and check on letters under the match, instead of the other way around. Well, that's only because it's fun to rewrite this using foldr:

isPangram :: String -> Bool
isPangram xs0 = foldr step S.null xs0 letters
  where
    step :: String -> Set Char -> Bool
    step c r letters =
      S.null letters || r (S.delete c letters)

For something like the alphabet, Data.Set is not actually the most efficient way. It's better to use something like IntSet, which is much more compact and significantly faster.

import qualified Data.IntSet as S
import Data.IntSet (IntSet)

letters :: IntSet
letters = S.fromList (map fromEnum ['a'..'z'])

isPangram :: String -> Bool
isPangram xs0 = foldr step S.null xs0 letters
  where
    step :: String -> IntSet -> Bool
    step c r letters =
      S.null letters || r (S.delete (fromEnum c) letters)