Home > Enterprise >  How do I check if a list is partially sorted? How to define a threshold for partially sorted lists?
How do I check if a list is partially sorted? How to define a threshold for partially sorted lists?

Time:08-07

I have hundreds of lists that look like that:

list_1 = [10,20,30,40,70,90,230,450] # sorted list example
list_2 = [10,20,40,30,70,230,90,450] # partially sorted list example
list_a = [20,450,90,30,70,230,10,40] # messy/unsorted list example

Some lists are sorted, partially sorted and some are completly unsorted. I want to get the lists that are sorted and partially sorted. I know how to check if a list is sorted:

if sorted(lst) == lst:
    # code here

How do I get the partially sorted lists as well? How to define a threshold for what I mean with partially sorted? For example only 10% of numbers in a given list are not in correct order.

CodePudding user response:

I'd try to define it as

Partially sorted list has at most f(n) total distance between elements and their expected location.

The function f(n) itself should be defined according to your use case, and how "expensive" are out of orders.

Now, using this definition, simply sum the values of distances between each element and its expected location.

def is_partially_sorted(l):
  sorted_l = sorted(l)
  # binary_search() is doing binary search in sorted list to find x, and returns its index.
  diffs = [abs(idx - binary_search(sorted_l, x) for idx, x in enumerate(l)]  
  # f(x) is the threshold function described above.
  return sum(diffs) <= f(len(l))

(Not a native python coder, so hope this is sufficiently readable)

CodePudding user response:

An inversion is a pair of elements that are out of order. I.e. indices i < j, but arr[i] > arr[j].

A reasonable metric of sortedness is the count of inversions, or normalized as the ratio of the count of inversions to the max possible inversions, which is choose(n,2).

You can find it in O(n log n) time with modified merge sort, as explained here.

  • Related