If I have some strings (Example strings: ["niiiice", "niiiiiiiceee", "nice", "yummy", "shiiinee", "shine", "hello", "print", "priintering", "priinter", "Howdy", "yuup", "yup", "soooouuuuuppppp", "soup", "yeehaw"]
) how do I check if they are similar and just have repeated chars and then find which of the two in the check should go first (smallest first)? (Example output: ["nice", "niiiice", "niiiiiiiceee", "yummy", "shine", "shiiinee", "hello", "print", "priinter", "priintering", "Howdy", "yup", "yuup", "soup", "soooouuuuuppppp", "yeehaw"]
NOTE:
If possible the check should leave everything else in the same order. By this I mean if there are more strings that don't have similar counterparts would they stay in roughly the same location.
CodePudding user response:
You could squeeze out repeated chars, so that "similar" strings become equal.
import re
a = ["niiiice", "niiiiiiiceee", "nice", "shiiinee", "shine"]
def squeeze(s):
return re.sub(r'(.)\1 ', r'\1', s)
a.sort(key=lambda s: (squeeze(s), len(s)))
print(a)
Output:
['nice', 'niiiice', 'niiiiiiiceee', 'shine', 'shiiinee']
Alternatively, if you only want to sort consecutive groups of "similar" strings:
from itertools import groupby
import re
a = ["niiiice", "niiiiiiiceee", "nice", "yummy", "shiiinee", "shine", "hello", "print", "priintering", "priinter", "Howdy", "yuup", "yup", "soooouuuuuppppp", "soup", "yeehaw"]
def squeeze(s):
return re.sub(r'(.)\1 ', r'\1', s)
a = [s for _, g in groupby(a, squeeze) for s in sorted(g, key=len)]
print(a)
Output:
['nice', 'niiiice', 'niiiiiiiceee', 'yummy', 'shine', 'shiiinee', 'hello', 'print', 'priintering', 'priinter', 'Howdy', 'yup', 'yuup', 'soup', 'soooouuuuuppppp', 'yeehaw']
CodePudding user response:
Another solution, using itertools.groupby
:
import itertools
sorted(["niiiice", "niiiiiiiceee", "nice", "shiiinee", "shine"], key=lambda s: ([k for k, v in itertools.groupby(s)], len(s)))