What would the Big O notation be for alphabetically sorting each element in a nested list comprehens-CodePudding

# find all possible sorted substrings of s
substr = ["".join(sorted(s[i: j])) 
                for i in range(len(s)) 
                    for j in range(i   1, len(s)   1)]

I know the sorted() method is O(nlog(n)), and the finding of all possible substrings is O(n^2). However, the .join() is what is throwing me off. sorted() returns a list and .join() iterates over each element in the list to append it to a string. So it should be a linear operation.

Would my substring sorter therefore be O(n^2) for the nested loop * O(nlogn) for sorting each result * O(n) for joining? Ergo O(n^4logn)??

If so, would breaking up the operations make this more efficient? I had another implementation where I move the sorting of the substrings to a second list comprehension

substr = [s[i: j] for i in range(len(s)) 
                for j in range(i   1, len(s)   1)]

sortedSubstr = ["".join(sorted(s)) for s in substr]

This would make it the O(n^2) list comprehension first O(n)*O(nlogn) for the second list comprehension

Making the overall program now O(n^2logn)

Please correct me if I am wrong. Thanks

CodePudding user response：

For the sake of analysis, let's replace the comprehensions with their equivalent loops as described in the docs.

Your first approach becomes

substr = []
for i in range(len(s)): # O(n)
    for j in range(i   1, len(s)   1): # O(n)
        substr.append("".join(sorted(s[i: j]))) # O(n*logn)

# Overall O(n^3 * logn)

Note that substr.append("".join(sorted(s[i: j]))) is not multiplicative, rather it is a sequential combination of the following operations (no actual assignments happen)

temp = s[i:j] # O(j-i), which worst case we can take O(n)
sorted_temp = sorted(temp) # O(n*logn)
joined_temp = "".join(sorted_temp) # O(n)
substr.append(joined_temp) # amortized O(1)

# Overall O(n*logn)

Now in the second approach the code becomes

substr = []
for i in range(len(s)): # O(n)
    for j in range(i   1, len(s)   1): # O(n)
        substr.append(s[i:j]) # O(n) for the slice
# first loop: O(n^3)

sortedSubstr = []
for s in substr: # O(n^2), we have appended n^2 times in the first loop
    sortedSubstr.append("".join(sorted(s))) # O(n*logn)
# second loop: O(n^3 * logn)

# Overall O(n^3 * logn)

So as you can see, they should be the same time complexity. I almost missed substr length being n^2 rather than n, so that might be a pitfall.

References for time complexity of various operations:

CodePudding user response：

For the first algorithm time complexity is O(n^3*log(n)), because after two loop you are not making a join for each atomic action in sorted. You separately sort and join. So O(n) O(n*log(n)) = O(n*log(n)), which multiplied by O(n^2) (nested loops) gives us O(n^3*log(n)).

About the second algorithm.

Calculation of substr gives us O(n^3): same O(n^2) for nested loops multiplied by O(n) for slicing s.
Note that len(substr) is O(n^2) — for each (i, j) from nested loops.
Calculation of sortedSubstr gives us O(n^3*log(n)): for each element of substr (whose count is O(n^2)) we call sorted. Each element's len is O(n), so sorted(s) gives us O(n*log(n)). So, samely, O(n^2) * O(n*log(n)) = O(n^3*log(n)).
Calculation of substr (O(n^3)) plus calculation of sortedSubstr (O(n^3*log(n))) yields O(n^3*log(n)).

So the same O(n^3*log(n)) for the second.

CodePudding user response：

In this expression:

"".join(sorted(s[i: j]))

the O(n) join is negligible, because you're doing it once, after you've done the O(nlogn) sort.

O(nlogn n) = O((n 1)logn) = O(nlogn)

Doing that sort n^2 times gets you to O(n^3 logn), regardless of whether/when you do the join.