I was working on a problem and wanted to sort things based on a condition. I have an array of words and a hash that has a count of how many times each word appears in the array of words. The problem calls for you to return the elements based on descending order of frequency of each word in the initial array of words (most frequent words appear first and least frequent appear last in the return array).
However if two words appear the same amount of times, then sort them alphabetically (lexographically) in the return array. In JavaScript, I've been able to write it like this:
`let frequentWords = Object.keys(hash).sort((a, b) => {
if (hash[b] === hash[a]) {
return a.localeCompare(b);
} else {
return hash[b] - hash[a];
}
});`
I wanted to know how to write this but equivalently in Python with sorted(list, key=lambda x:(some function here)), but I'm not sure how to do so. I want to be able to sort based on multiple conditions for any problem that needs sorting in the future, but I'm not sure how to write a lambda function for key that can take in multiple conditions.
I've seen this as a solution:
freq_words = sorted(hash, key=lambda x: (-hash[x],x))
And I've tried reading the documentation, but I'm not sure how this works and unsure what to do if I need to sort based on three conditions. Whereas this is easy to do in a JS callback function, I'm unsure of the Python syntax.
I'm coding in Python3 and cmp no longer exists, so I'm trying to figure out how to write this with just the key parameter.
Thanks!
CodePudding user response:
While the JavaScript sort
function provides two items for comparison,
the Python sorted
function provides only one item to the lambda for consideration.
The design of the Python function is for a particular cause. You take the item to be compared as input and produce a value (a key) that can be used to compare it to other items.
See:
https://docs.python.org/3/glossary.html#term-key-function
https://docs.python.org/3/library/functions.html?highlight=sorted#sorted
This is how freq_words = sorted(hash, key=lambda x: (-hash[x],x))
works:
- Pass each key of
hash
to lambda asx
. - Return a tuple containing negative of value corresponding to key
x
and the key itself. - The tuple generated from each
x
is internaly compared by Python.
In tuple comparison, the respective first item of both tuples is compared. If they are equal, the next corresponding items are compared and so on, until an inequality arises, or there are no more items.
See: https://docs.python.org/3/howto/sorting.html#decorate-sort-undecorate
The negative sign causes a descending sort to occur -- a hack for numbers. While the solution works, it uses very Python specific syntax and is thus very abstract and unreadable.
A possibly less performant, but more readable sort would be:
alpha_sorted = sorted(hash.keys()) # Sort alphabetically
freq_words = sorted(alpha_sorted, key=lambda x: hash[x], reverse=True) # Sort with key
This works as Python sorting is stable. See:
https://docs.python.org/3/howto/sorting.html#sort-stability-and-complex-sorts
What is stability in sorting algorithms and why is it important?
CodePudding user response:
Essentially what is happening is the the lambda function is constructing tuples from each key, value pair in hash
, and sorting those.
So, e.g., if you have:
hsh = {'a': 10, 'ba': 8, 'bo': 8, 'c': 12, 'do': 12, 'da': 12}
then you can think of
sorted(hsh, key=lambda x: (-hsh[x], x))
as equivalent to sorting:
[(-10, 'a'), (-8, 'ba'), (-8, 'bo'), (-12, 'c'), (-12, 'do'), (-12, 'da')]
Tuples are compared element-wise, so -12 comes first, and it then compares 'c', 'do', and 'da' which get ordered: 'c', da', 'do', so our first elements are [(-12, 'c'), (-12, 'da'), (-12, 'do')]
or elements at index 3, 5 and 4 respectively, so the final output elements are the original keys at index 3, 5, and 4, namely ['c', 'da', 'do', ...]
.
If you wanted to add more conditions, e.g.:
hsh = {'a': (10, 12), 'bo': (10, 14), 'ba': (10, 14)}
(maybe the first number is counts on wikipedia, and the second is counts in some text corpus, and you want to order by: the second digit, then the first, then alphabetically, you can do:
sorted(hsh, key=lambda x: (-hsh[x][1], -hsh[x][0], x))
.
Hopefully that's enough that you can generalize from here!