How to check if strings are same but one has repeated chars-CodePudding

If I have some strings (Example strings: ["niiiice", "niiiiiiiceee", "nice", "yummy", "shiiinee", "shine", "hello", "print", "priintering", "priinter", "Howdy", "yuup", "yup", "soooouuuuuppppp", "soup", "yeehaw"]) how do I check if they are similar and just have repeated chars and then find which of the two in the check should go first (smallest first)? (Example output: ["nice", "niiiice", "niiiiiiiceee", "yummy", "shine", "shiiinee", "hello", "print", "priinter", "priintering", "Howdy", "yup", "yuup", "soup", "soooouuuuuppppp", "yeehaw"]

NOTE:

If possible the check should leave everything else in the same order. By this I mean if there are more strings that don't have similar counterparts would they stay in roughly the same location.

CodePudding user response：

You could squeeze out repeated chars, so that "similar" strings become equal.

import re

a = ["niiiice", "niiiiiiiceee", "nice", "shiiinee", "shine"]

def squeeze(s):
    return re.sub(r'(.)\1 ', r'\1', s)

a.sort(key=lambda s: (squeeze(s), len(s)))

print(a)

Output:

['nice', 'niiiice', 'niiiiiiiceee', 'shine', 'shiiinee']

Alternatively, if you only want to sort consecutive groups of "similar" strings:

from itertools import groupby
import re

a = ["niiiice", "niiiiiiiceee", "nice", "yummy", "shiiinee", "shine", "hello", "print", "priintering", "priinter", "Howdy", "yuup", "yup", "soooouuuuuppppp", "soup", "yeehaw"]

def squeeze(s):
    return re.sub(r'(.)\1 ', r'\1', s)

a = [s for _, g in groupby(a, squeeze) for s in sorted(g, key=len)]

print(a)

Output:

['nice', 'niiiice', 'niiiiiiiceee', 'yummy', 'shine', 'shiiinee', 'hello', 'print', 'priintering', 'priinter', 'Howdy', 'yup', 'yuup', 'soup', 'soooouuuuuppppp', 'yeehaw']

CodePudding user response：

Another solution, using itertools.groupby:

import itertools
sorted(["niiiice", "niiiiiiiceee", "nice", "shiiinee", "shine"], key=lambda s: ([k for k, v in itertools.groupby(s)], len(s)))