I was doing an exercise on Hackerrank, and in the discussion I stumbled upon a very nice one line solution to solve the following question:
S is an alphanumeric string, and you have to sort it using the following rules:
- All sorted lowercase letters are ahead of uppercase letters.
- All sorted uppercase letters are ahead of digits.
- All sorted odd digits are ahead of sorted even digits.
the solution offered was:
print(*sorted(input(), key=lambda c: (-ord(c) >> 5, c in '02468', c)), sep='')
Now, I understand how lambda works and how ord works, I want to understand this code properly, so here's what I did to understand it:
b = "Sorting1234"
print(*sorted(input(), key=lambda c: (ord(c))), sep='')
Output:
['1', '2', '3', '4', 'S', 'g', 'i', 'n', 'o', 'r', 't']
So, this code sorts the given string based on the ASCII code. Now, we want to have the lowercase letters first, then the upper case and then numbers, so we now reverse this sequence:
sorted(b,key=lambda x: (-ord(x)))
Output:
['t', 'r', 'o', 'n', 'i', 'g', 'S', '4', '3', '2', '1']
Now, this part I don't get, why are we using binary shift operator, what purpose does it serve?
sorted(b,key=lambda x: (-ord(x)>>3))
Output:
['r', 't', 'o', 'i', 'n', 'g', 'S', '1', '2', '3', '4']
Now with right shift 5:
sorted(b,key=lambda x: (-ord(x)>>5))
Output:
['o', 'r', 't', 'i', 'n', 'g', 'S', '1', '2', '3', '4']
We are not even halfway from what we actually, want, but the full code suddenly yields the correct answer:
sorted(b,key=lambda x: (-ord(x)>>5, x in '02467', x))
Output:
['g', 'i', 'n', 'o', 'r', 't', 'S', '1', '3', '2', '4']
CodePudding user response:
Look at the ASCII table, the 128 characters arranged in four columns of 32 (only showing the printable characters here):
0 32 64 @ 96 `
1 33 ! 65 A 97 a
2 34 " 66 B 98 b
3 35 # 67 C 99 c
4 36 $ 68 D 100 d
5 37 % 69 E 101 e
6 38 & 70 F 102 f
7 39 ' 71 G 103 g
8 40 ( 72 H 104 h
9 41 ) 73 I 105 i
10 42 * 74 J 106 j
11 43 75 K 107 k
12 44 , 76 L 108 l
13 45 - 77 M 109 m
14 46 . 78 N 110 n
15 47 / 79 O 111 o
16 48 0 80 P 112 p
17 49 1 81 Q 113 q
18 50 2 82 R 114 r
19 51 3 83 S 115 s
20 52 4 84 T 116 t
21 53 5 85 U 117 u
22 54 6 86 V 118 v
23 55 7 87 W 119 w
24 56 8 88 X 120 x
25 57 9 89 Y 121 y
26 58 : 90 Z 122 z
27 59 ; 91 [ 123 {
28 60 < 92 \ 124 |
29 61 = 93 ] 125 }
30 62 > 94 ^ 126 ~
31 63 ? 95 _ 127
Numbering the columns 0 to 3, you'll find the digits in column 1, the uppercase letters in column 2, and the lowercase letters in column 3.
Rightshift by 5 is equivalent to dividing by 32 and rounding down. So ord(c) >> 5
would give you the column number 1, 2 or 3, depending on the type digit/upper/lower. Since we want the reversed order of that, we negate.
The actual expression -ord(c) >> 5
negates first (as indicated by the spaces), which shifts things slightly: The top row (the letters with codes 0, 32, 64 and 96) get the wrong column number. But those characters are irrelevant for us.