How to optimise Haskell code to pass HackerRanks timed out test cases (Not for any ongoing competiti-CodePudding

I been learning Haskell for around 4 months now and I have to say, the learning curve is definitely hard(scary also :p).

After solving about 15 easy questions, today I moved to my first medium difficulty problem on HackerRank https://www.hackerrank.com/challenges/climbing-the-leaderboard/problem.

It was 10 test cases and I am able to pass 6 of them, but the rest fail with timeout, now the interesting part is, I can already see a few parts that have potential for performance increase, for example, I am using nub to remove duplicated from a [Int], but still I am not able to build a mental model for algorithmic performance, the main cause of that being unsure about Haskell compiler will change my code and how laziness plays a role here.

import Data.List (nub)

getInputs :: [String] -> [String]
getInputs (_:r:_:p:[]) = [r, p]

findRating :: Int -> Int -> [Int] -> Int
findRating step _ [] = step
findRating step point (x:xs) = if point >= x then step else findRating (step   1) point xs

solution :: [[Int]] -> [Int]
solution [rankings, points] = map (\p -> findRating 1 p duplicateRemovedRatings) points
                            where duplicateRemovedRatings = nub rankings


main :: IO ()
main = interact (unlines . map show . solution . map (map read . words) . getInputs . lines)

Test Case in GHCI

:l "solution"
let i = "7\n100 100 50 40 40 20 10\n4\n5 25 50 120"
solution i // "6\n4\n2\n1\n"

Specific questions I have:

Will the duplicateRemovedRankings variable be calculated once, or on each iteration of the map function call.
Like in imperative programming languages, I can verify the above question using some sort of printing mechanism, is there some equivalent way of doing the same with Haskell.
According to my current understanding, the complexity of this algorithm would be, I know nub is O(n^2)
- findRating is O(n)
- getInputs is O(1)
- solution is O(n^2)

How can I reason about this and build a mental model for performance.

If this violates community guidelines, please comment and I will delete this. Thank you for the help :)

CodePudding user response：

First, to answer your questions:

Yes, duplicateRemovedRankings is computed only once. No repeated computation.
To debug-trace, you can use trace and its friends (see the docs for examples and explanation). Yes, it can be used even in pure, non-IO code. But obviously, don't use it for "normal" output.
Yes, your understanding of complexity is correct.

Now, how to pass HackerRank's tricky tests.

First, yes, you're right that nub is O(N^2). However, in this particular case you don't have to settle for that. You can use the fact that the rankings come pre-sorted to get a linear version of nub. All you have to do is skip elements while they're equal to the next one:

betterNub (x:y:rest) 
  | x == y = betterNub (y:rest)
  | otherwise = x : betterNub (y:rest)
betterNub xs = xs

This gives you O(N) for betterNub itself, but it's still not good enough for HackerRank, because the overall solution is still O(N*M) - for each game you are iterating over all rankings. No bueno.

But here you can get another improvement by observing that the rankings are sorted, and searching in a sorted list doesn't have to be linear. You can use a binary search instead!

To do this, you'll have to get yourself constant-time indexing, which can be achieved by using Array instead of list.

Here's my implementation (please don't judge harshly; I realize I probably got edge cases overengineered, but hey, it works!):

import Data.Array (listArray, bounds, (!))

findIndex arr p
  | arr!end' > p = end'   1
  | otherwise = go start' end'
  where
    (start', end') = bounds arr

    go start end =
      let mid = (start   end) `div` 2
          midValue = arr ! mid
      in
        if midValue == p then mid
        else if mid == start then (if midValue < p then start else end)
        else if midValue < p then go start mid
        else go mid end

solution :: [[Int]] -> [Int]
solution [rankings, points] = map (\p -> findIndex duplicateRemovedRatings p   1) points
                            where duplicateRemovedRatings = toArr $ betterNub rankings

toArr l = listArray (0, (length l - 1)) l

With this, you get O(log N) for the search itself, making the overall solution O(M * log N). And this seems to be good enough for HackerRank.

(note that I'm adding 1 to the result of findIndex - this is because the exercise requires 1-based index)