Understanding of reduce high order function in swift-CodePudding

I am referring the document here

https://developer.apple.com/documentation/swift/array/2298686-reduce

So I got code snippet from one article its about reduce which gives unique value set from the array

func countUniques<T: Comparable>(_ array: Array<T>) -> Int {
  let sorted = array.sorted()
  let initial: (T?,Int) = (.none, 0)
    
  let reduced = sorted.reduce(initial) {
    return ($1, $0.0 == $1 ? $0.1 : $0.1   1)// Doubt here
  }
  return reduced.1
}
print(countUniques([1,1,1,1,1,1,2,2,2,2,3,3,3,4]))// prints 4

Reduce is generally used to sum up elements but here it's little tricky & I am not able to understand return ($1, $0.0 == $1 ? $0.1 : $0.1 1) what this is actually doing ?

CodePudding user response：

Recall that reduce works by repeated applying a function (that is the closure you pass in) to an accumulator (that is the initial), for each element of the sequence:

var acc = initial
for element in array {
    acc = f(acc, element) // f is the closure
}

Now knowing that, this reduction will make a lot more sense if you give meaningful names to $0.0, $0.1 and $1. Here I have renamed $0 to (prev, total), and $1 to curr.

func countUniques<T: Comparable>(_ array: Array<T>) -> Int {
    let sorted = array.sorted()
    let initial: (T?,Int) = (.none, 0)
    
    let (_, total) = sorted.reduce(initial) { acc, curr in
        let (prev, total) = acc
        return (curr, prev == curr ? total : total   1)
    }
    return total
}

prev is used to keep track of the previous element that we have seen. Notice how we always return curr as the first element of the pair. This is to say that the current element that we are seeing, becomes the "previous element" in the next iteration. Also note that it starts off .none (nil), as we have not seen anything yet.

If prev is not curr, we add 1 to total, otherwise we use the same total. In this way, we can count how many times we see a different element from the one just saw.

For a sorted array, "how many times we see a different element from the one just saw" is the same number as the number of unique elements, because the duplicates will all be grouped next to each other.

CodePudding user response：

initial not having labels makes that code too impenetrable.

Fixing that and using the into overload should make things more clear. But with either overload, $0 is the accumulating value, and $1 is the current iteration of the sequence being reduced.

extension Array where Element: Comparable {
  var countOfUniques: Int {
    sorted()
      .reduce(into: (previous: Element?.none, count: 0)) {
        if $0.previous != $1 {
          $0.count  = 1
        }
        $0.previous = $1
      }
      .count
  }
}

If you write this with a more reusable case of reduce, you can disregard $1, but only if you name $0.

import Algorithms

[1,1,1,1,1,1,2,2,2,2,3,3,3,4].uniqued().count

public extension Sequence {
  /// - Complexity: O(n)
  var count: Int {
    reduce(0) { count, _ in count   1 }
  }
}

CodePudding user response：

Debugging is a useful skill. If you don't understand, you can add breakpoints and see the values, conditional breakpoints with prints, etc, or directly print().

Since the reduce() is "complex", let's explicit it:

Let's name the variables. It's a reduce(_:_), so we know what mean the first and second parameter. The first one is "initial" at start, and is modified during the course of the iteration, so let's call it partialResult, and it will be the finalResult on last iteration, giving its value to the return function, here let reduced =. The second one is the element of the iteration:

let reduced = sorted.reduce(initial) { partialResult, currentIterationItem in
    return (currentIterationItem, partialResult.0 == currentIterationItem ? partialResult.1 : partialResult.1   1)// Doubt here
}

Let's remove the ternary if now, with an explicit if/else:

let reduced = sorted.reduce(initial) { partialResult, currentIterationItem in
    if partialResult.0 == currentIterationItem {
        return (currentIterationItem, partialResult.1)
    } else {
        return (currentIterationItem, partialResult.1   1)
    }
}

Let's use a temp variable which will holds the value returned:

let reduced = sorted.reduce(initial) { partialResult, currentIterationItem in
    var returnValue: (T?,Int)
    if partialResult.0 == currentIterationItem {
        returnValue = (currentIterationItem, partialResult.1)
    } else {
        returnValue = (currentIterationItem, partialResult.1   1)
    }
    return returnValue
}

Now, let's add logs:

let reduced = sorted.reduce(initial) { partialResult, currentIterationItem in
    print("Current Item to analyze: \(currentIterationItem)")
    print("Partial Result: \(partialResult) before")
    var returnValue: (T?,Int)
    if partialResult.0 == currentIterationItem {
        print("partialResult.0 == currentIterationItem")
        returnValue = (currentIterationItem, partialResult.1)
    } else {
        print("partialResult.0 != currentIterationItem")
        returnValue = (currentIterationItem, partialResult.1   1)
    }
    print("Return value: \(returnValue) which is partialResult at next iteration")
    return returnValue
}

You can try it now, but your example is not the best one. Why? Because it's using an already sorted array, and "4" is also the value of the last item, it could bring confusion...
Instead, let's use an array of String, single letters for better understanding (1 -> A, 2 -> B, 3 -> C, 4 -> D):

print(countUniques5(["A","A","A","A","A","A","B","B","B","B","C","C","C","D"]))// prints 4

Okay, now, what are the logs:

Current Item to analyze: A
Partial Result: (nil, 0) before
partialResult.0 != currentIterationItem
Return value: (Optional("A"), 1) which is partialResult at next iteration
Current Item to analyze: A
Partial Result: (Optional("A"), 1) before
partialResult.0 == currentIterationItem
Return value: (Optional("A"), 1) which is partialResult at next iteration
Current Item to analyze: A
Partial Result: (Optional("A"), 1) before
partialResult.0 == currentIterationItem
Return value: (Optional("A"), 1) which is partialResult at next iteration
Current Item to analyze: A
Partial Result: (Optional("A"), 1) before
partialResult.0 == currentIterationItem
Return value: (Optional("A"), 1) which is partialResult at next iteration
Current Item to analyze: A
Partial Result: (Optional("A"), 1) before
partialResult.0 == currentIterationItem
Return value: (Optional("A"), 1) which is partialResult at next iteration
Current Item to analyze: A
Partial Result: (Optional("A"), 1) before
partialResult.0 == currentIterationItem
Return value: (Optional("A"), 1) which is partialResult at next iteration
Current Item to analyze: B
Partial Result: (Optional("A"), 1) before
partialResult.0 != currentIterationItem
Return value: (Optional("B"), 2) which is partialResult at next iteration
Current Item to analyze: B
Partial Result: (Optional("B"), 2) before
partialResult.0 == currentIterationItem
Return value: (Optional("B"), 2) which is partialResult at next iteration
Current Item to analyze: B
Partial Result: (Optional("B"), 2) before
partialResult.0 == currentIterationItem
Return value: (Optional("B"), 2) which is partialResult at next iteration
Current Item to analyze: B
Partial Result: (Optional("B"), 2) before
partialResult.0 == currentIterationItem
Return value: (Optional("B"), 2) which is partialResult at next iteration
Current Item to analyze: C
Partial Result: (Optional("B"), 2) before
partialResult.0 != currentIterationItem
Return value: (Optional("C"), 3) which is partialResult at next iteration
Current Item to analyze: C
Partial Result: (Optional("C"), 3) before
partialResult.0 == currentIterationItem
Return value: (Optional("C"), 3) which is partialResult at next iteration
Current Item to analyze: C
Partial Result: (Optional("C"), 3) before
partialResult.0 == currentIterationItem
Return value: (Optional("C"), 3) which is partialResult at next iteration
Current Item to analyze: D
Partial Result: (Optional("C"), 3) before
partialResult.0 != currentIterationItem
Return value: (Optional("D"), 4) which is partialResult at next iteration

We know the goal of the method, and we know how reduce(_:_:) works.

So what's happening? We have a start (nil, 0). We then, test the current element against nil, it's different, we'll have now (A, 1). Then, since it's sorted, we do the same for the remaining "A", it's the same, the partialResult is still (A,1). When entering next letter "B", it goes into the else, so we keep the letter "B" into partialResult, and we increment the second tuple value. We know have (B, 2), etc.

So, the tuple meaning is: first value: Current item, second value: counter of different items seen.
It works, because array if sorted. If it wasn't, the output would be totally wrong.

In my opinion, it's not good (as they is no explanation) code. It's a little hard to read, understand what's happening. A Set if possible would have been better, but it needs to be Hashable compliant.

Usually, when you need to create a var before reduce(_:_:), prefer reduce(into:_:).

In a nutshell: See intermediary values & decomplex code by breaking into multiple small steps...

CodePudding user response：

I would suggest a multi-step process to reverse-engineer this code:

Let us replace the closure‘s “shorthand argument names” (the $0 and the $1) with meaningful names as deduced from the reduce documentation. The reduce function’s closure’s parameters are (a) the “result” from the previous iteration; and (b) the current element from the array for the current iteration of the reduce closure:
```
func countUniques<T: Comparable>(_ array: [T]) -> Int {
    let sorted = array.sorted()
    let initial: (T?, Int) = (.none, 0)

    let reduced = sorted.reduce(initial) { previousResult, item in
        return (item, previousResult.0 == element ? previousResult.1 : previousResult.1   1) // Doubt here
    }
    return reduced.1
}
```
Already, it is starting to be a little more clear. But let us continue our analysis.

If the ternary syntax (the a ? b : c) makes the meaning opaque, it is often useful to break it down into an if-else statement:

func countUniques2<T: Comparable>(_ array: [T]) -> Int {
    let sorted = array.sorted()
    let initial: (T?, Int) = (.none, 0)

    let reduced = sorted.reduce(initial) { previousResult, item in
        if previousResult.0 == item {
            return (item, previousResult.1)
        } else {
            return (item, previousResult.1   1)
        }
    }
    return reduced.1
}

OK, so what is (T?, Int) and the subsequent .0 and .1 syntax? That is a tuple. It is a pair of values which is being used in three places, namely:
- the initial value;
- the first parameter of the closure, previousResult; and
- the return value of the closure.
You are apparently familiar with reduce that is passing a single value from one iteration to another (e.g., a simple sum). But when you see a tuple in a reduce function closure, that simply means that reduce is passing two values from one iteration to the next. So what are these two tuple elements?

In the code, we see previousResult.0 (the first element of the tuple) and previousResult.1 (the second element of the tuple). So, what does that first element of the tuple represent?

By looking at what the closure returns, we can infer that previousResult.0 is simply the previous value in the array (notice that the return returns the current item in the array as the first parameter).

How about the second element of that tuple, the previousResult.1? Well, we can see if the previous value is the same and the current value in the array, it simply returns previousResult.1, but if the values are different, it returns previousResult.1 plus 1. Clearly this is just the count of unique values encountered thus far. I.e., if the current array item is different than the previous item in the array, it increments the count.
```
func countUniques3<T: Comparable>(_ array: [T]) -> Int {
    let sorted = array.sorted()
    let initial: (T?, Int) = (.none, 0)                             // first element of tuple is previous value in sorted array; second element is count of unique values encountered thus far

    let reduced = sorted.reduce(initial) { previousResult, item in
        let (previousValue, previousUniquesCount) = previousResult  // decompose the tuple's two elements into meaningful names
        if previousValue == item {
            return (item, previousUniquesCount)
        } else {
            return (item, previousUniquesCount   1)
        }
    }
    return reduced.1                                                // the final result is the second element of the tuple returned by `reduce`
}
```

Now that we can start to see what the routine is doing, we could translate this into imperative routine, replacing the tuple with two separate variables to represent the tuple’s two elements:

func countUniques4<T: Comparable>(_ array: [T]) -> Int {
    let sorted = array.sorted()

    var previousValue: T?
    var count = 0

    for item in sorted {
        if item != previousValue {
            count  = 1
            previousValue = item
        }
    }

    return count
}

Or, even more concisely:

func countUniques5<T: Comparable>(_ array: [T]) -> Int {
    let sorted = array.sorted()

    var previousValue: T?
    var count = 0

    for value in sorted where value != previousValue {
        count  = 1
        previousValue = value
    }

    return count
}

Now, one can debate the merits of reduce with a tuple. It is a clever technique, but arguably makes the code harder to understand at a glance. Admittedly, once you have done it a few times, it becomes easier to interpret. I guess we are lucky that the author did not elect to make it even more “concise”:
```
func countUniques6<T: Comparable>(_ array: [T]) -> Int {
    array.sorted().reduce((T?.none, 0)) { ($1, $0.0 == $1 ? $0.1 : $0.1   1) }.1
}
```
or
```
func countUniques7<T: Comparable>(_ array: [T]) -> Int {
    array.sorted().reduce(into: (T?.none, 0)) { if $0.0 != $1 { $0 = ($1, $0.1   1) } }.1
}
```
As you can see, you can write very compact expressions with functional code, shorthand argument names, tuples, etc. There can even be a perverse pleasure in doing so. But generally we strive for code that is easily understood and maintained (except for those rare cases where we have to sacrifice these features for the sake of performance, which is not the case here).

Speaking of performance, this whole routine is somewhat inefficient. Notably, the algorithm is predicated upon first sorting the array. That is slow. It is also an inefficient use of space.

For example, if the array element values were hashable, using a Set would be more efficient and avoids the complexity of the above:

func countUniques8<T: Hashable>(_ array: [T]) -> Int {
   return Set(array).count
}

There is no need to build a separate sorted array, sorting each and every element of the array, simply to determine how many unique values there are. You only need to know whether the value has been encountered before. A Set accomplishes this efficiently.