Home > Net >  Sorting a list in Python while respecting the relative order position of duplicate sort keys?
Sorting a list in Python while respecting the relative order position of duplicate sort keys?

Time:09-22

I'm trying to sort a list to while respecting the original sorting order and keeping its "logical integrity" – e.g. if an element is on the most right it will move to the most left when sorting. The background here is to make LDAP's sorting compatible with x509 certificates.

An element's distinguishedName in LDAP looks as follows: CN=John Doe,OU=Marketing,OU=Users,DC=contoso,DC=loc, but certificates require their names to be saved in the following format DC=loc,DC=contoso,OU=Users,OU=Marketing,CN=John Doe.

Because I want to offer some error redundancy when users add normal x509 certificates I came up with a simple sort to normalize user inputs e.g. OU=Tech,ST=Berlin,C=DE,CN=test.com becomes C=DE,ST=Berlin,OU=Tech,CN=test.com; however I'm having some issues getting the desired results when parsing LDAP distinguishedNames – it's not keeping the original ordering logic of the DCs and OUs.

My code looks like this:

SUBJECT_FIELDS = [
    "DC",
    "C",
    "ST",
    "L",
    "O",
    "OU",
    "CN",
    "emailAddress",
]

def sort_name(subject):
    return sorted(subject, key=lambda e: SUBJECT_FIELDS.index(e[0]))
>>> subject = [('CN', 'John Doe'), ('OU', 'Users'), ('DC', 'contoso'), ('DC', 'loc')] 
>>> sort_name(subject)
[('DC', 'contoso'), ('DC', 'loc'), ('OU', 'Users'), ('CN', 'John Doe')]
# should be [('DC', 'loc'), ('DC', 'contoso'), ('OU', 'Users'), ('CN', 'John Doe')]

>>> subject = [('CN', 'John Doe'), ('OU', 'Marketing'), ('OU', 'Users'), ('OU', 'OtherUnit'), ('DC', 'contoso'), ('DC', 'loc')]
>>> sort_name(subject)
[('DC', 'contoso'), ('DC', 'loc'), ('OU', 'Marketing'), ('OU', 'Users'), ('OU', 'OtherUnit'), ('CN', 'John Doe')]
# should be [('DC', 'loc'), ('DC', 'contoso'), ('OU', 'OtherUnit'), ('OU', 'Users'), ('OU', 'Marketing'), ('CN', 'John Doe')]

>>> subject = [('DC', 'loc'), ('DC', 'contoso'), ('DC', 'ad'), ('OU', 'Users'), ('CN', 'John Doe')]
>>> sort_name(subject)
[('DC', 'loc'), ('DC', 'contoso'), ('DC', 'ad'), ('OU', 'Users'), ('CN', 'John Doe')]
# works as intended

>>> subject = [('OU', 'Users'), ('CN', 'John Doe'), ('DC', 'contoso'), ('DC', 'loc')] 
>>> sort_name(subject)
[('DC', 'contoso'), ('DC', 'loc'), ('OU', 'Users'), ('CN', 'John Doe')]
# probably never going to happen that somebody screws an OU up like this, but should be:
# [('DC', 'loc'), ('DC', 'contoso'), ('OU', 'Users'), ('CN', 'John Doe')]

So the issue is that sorted does not "reverse" multiple occurrences of the same sorting-key, thus screwing up the original logical integrity of the name.

I have already tried extending the sorted function with a weighted ordering based on how far on start or end of the list the subject field is.

def sort_name(subject):
    half_index = len(subject) // 2
    relative_index = lambda x: len(subject) - subject.index(x) \
        if subject.index(x) 1 > half_index \
        else subject.index(x) - len(subject)
    sorted_fields = sorted(subject, key=lambda e: (SUBJECT_FIELDS.index(e[0]), relative_index(e)))
    return sorted(subject, key=lambda e: (SUBJECT_FIELDS.index(e[0]), relative_index(e)))

But unfortunately that seems to involve a lot more unexpected results.

CodePudding user response:

Try:

def sort_func(s):
    return [
        t
        for _, t in sorted(
            enumerate(s),
            key=lambda k: (SUBJECT_FIELDS.index(k[1][0]), -k[0]),
        )
    ]


subject = [
    ("CN", "John Doe"),
    ("OU", "Users"),
    ("DC", "contoso"),
    ("DC", "loc"),
]

print(sort_func(subject))

Prints:

[('DC', 'loc'), ('DC', 'contoso'), ('OU', 'Users'), ('CN', 'John Doe')]

Other subjects:

subject = [
    ("CN", "John Doe"),
    ("OU", "Marketing"),
    ("OU", "Users"),
    ("OU", "OtherUnit"),
    ("DC", "contoso"),
    ("DC", "loc"),
]
print(sort_func(subject))

subject = [
    ("OU", "Users"),
    ("CN", "John Doe"),
    ("DC", "contoso"),
    ("DC", "loc"),
]
print(sort_func(subject))

subject = [
    ("DC", "ad"),
    ("DC", "contoso"),
    ("DC", "loc"),
    ("OU", "Users"),
    ("CN", "John Doe"),
]
print(sort_func(subject))

Prints:

[('DC', 'loc'), ('DC', 'contoso'), ('OU', 'OtherUnit'), ('OU', 'Users'), ('OU', 'Marketing'), ('CN', 'John Doe')]
[('DC', 'loc'), ('DC', 'contoso'), ('OU', 'Users'), ('CN', 'John Doe')]
[('DC', 'loc'), ('DC', 'contoso'), ('DC', 'ad'), ('OU', 'Users'), ('CN', 'John Doe')]

EDIT 2:

SUBJECT_FIELDS = [
    "DC",
    "C",
    "ST",
    "L",
    "O",
    "OU",
    "CN",
    "emailAddress",
]


def sort_func(s):
    if SUBJECT_FIELDS.index(s[0][0]) > SUBJECT_FIELDS.index(s[-1][0]):
        return sorted(s[::-1], key=lambda k: SUBJECT_FIELDS.index(k[0]))
    return sorted(s, key=lambda k: SUBJECT_FIELDS.index(k[0]))


subject = [
    ("CN", "John Doe"),
    ("OU", "Users"),
    ("DC", "contoso"),
    ("DC", "loc"),
]
print(sort_func(subject))

subject = [
    ("CN", "John Doe"),
    ("OU", "Marketing"),
    ("OU", "Users"),
    ("OU", "OtherUnit"),
    ("DC", "contoso"),
    ("DC", "loc"),
]
print(sort_func(subject))

subject = [
    ("DC", "loc"),
    ("DC", "contoso"),
    ("DC", "ad"),
    ("OU", "Users"),
    ("CN", "John Doe"),
]
print(sort_func(subject))

subject = [
    ("OU", "Users"),
    ("CN", "John Doe"),
    ("DC", "contoso"),
    ("DC", "loc"),
]
print(sort_func(subject))

subject = [
    ("DC", "loc"),
    ("DC", "contoso"),
    ("CN", "John Doe"),
    ("OU", "Users"),
]
print(sort_func(subject))

Prints:

[('DC', 'loc'), ('DC', 'contoso'), ('OU', 'Users'), ('CN', 'John Doe')]
[('DC', 'loc'), ('DC', 'contoso'), ('OU', 'OtherUnit'), ('OU', 'Users'), ('OU', 'Marketing'), ('CN', 'John Doe')]
[('DC', 'loc'), ('DC', 'contoso'), ('DC', 'ad'), ('OU', 'Users'), ('CN', 'John Doe')]
[('DC', 'loc'), ('DC', 'contoso'), ('OU', 'Users'), ('CN', 'John Doe')]
[('DC', 'loc'), ('DC', 'contoso'), ('OU', 'Users'), ('CN', 'John Doe')]
  • Related