Home > Net >  How to check if strings are same but one has repeated chars
How to check if strings are same but one has repeated chars

Time:05-16

If I have some strings (Example strings: ["niiiice", "niiiiiiiceee", "nice", "yummy", "shiiinee", "shine", "hello", "print", "priintering", "priinter", "Howdy", "yuup", "yup", "soooouuuuuppppp", "soup", "yeehaw"]) how do I check if they are similar and just have repeated chars and then find which of the two in the check should go first (smallest first)? (Example output: ["nice", "niiiice", "niiiiiiiceee", "yummy", "shine", "shiiinee", "hello", "print", "priinter", "priintering", "Howdy", "yup", "yuup", "soup", "soooouuuuuppppp", "yeehaw"]

NOTE:

If possible the check should leave everything else in the same order. By this I mean if there are more strings that don't have similar counterparts would they stay in roughly the same location.

CodePudding user response:

You could squeeze out repeated chars, so that "similar" strings become equal.

import re

a = ["niiiice", "niiiiiiiceee", "nice", "shiiinee", "shine"]

def squeeze(s):
    return re.sub(r'(.)\1 ', r'\1', s)

a.sort(key=lambda s: (squeeze(s), len(s)))

print(a)

Output:

['nice', 'niiiice', 'niiiiiiiceee', 'shine', 'shiiinee']

Alternatively, if you only want to sort consecutive groups of "similar" strings:

from itertools import groupby
import re

a = ["niiiice", "niiiiiiiceee", "nice", "yummy", "shiiinee", "shine", "hello", "print", "priintering", "priinter", "Howdy", "yuup", "yup", "soooouuuuuppppp", "soup", "yeehaw"]

def squeeze(s):
    return re.sub(r'(.)\1 ', r'\1', s)

a = [s for _, g in groupby(a, squeeze) for s in sorted(g, key=len)]

print(a)

Output:

['nice', 'niiiice', 'niiiiiiiceee', 'yummy', 'shine', 'shiiinee', 'hello', 'print', 'priintering', 'priinter', 'Howdy', 'yup', 'yuup', 'soup', 'soooouuuuuppppp', 'yeehaw']

CodePudding user response:

Another solution, using itertools.groupby:

import itertools
sorted(["niiiice", "niiiiiiiceee", "nice", "shiiinee", "shine"], key=lambda s: ([k for k, v in itertools.groupby(s)], len(s)))
  • Related