Home > Software engineering >  Getting distinct values from from a list comprised of lists containing a comma delimited string
Getting distinct values from from a list comprised of lists containing a comma delimited string

Time:12-03

Main list:

data = [
["629-2, text1, 12"],
["629-2, text2, 12"],
["407-3, text9, 6"],
["407-3, text4, 6"],
["000-5, text7, 0"],
["000-5, text6, 0"],
]

I want to get a list comprised of unique lists like so:

data_unique = [
["629-2, text1, 12"],
["407-3, text9, 6"],
["000-5, text6, 0"],
]

I've tried using numpy.unique but I need to pare it down further as I need the list to be populated by lists containing a single unique version of the numerical designator in the beginning of the string, ie. 629-2...

I've also tried using chain from itertools like this:

def get_unique(data):
    return list(set(chain(*data)))

But that only got me as far as numpy.unique.

Thanks in advance.

CodePudding user response:

Code

from itertools import groupby

def get_unique(data):
    def designated_version(item):
        return item[0].split(',')[0]

    return [list(v)[0] for _, v in groupby(sorted(data, 
                                                  key = designated_version),
                                           designated_version)]

 

Test

print(get_unique(data))
# Output
[['629-2, text1, 12'], ['407-3, text9, 6'], ['000-5, text7, 0']]

Explanation

  • Sorts data by designated number (in case not already sorted)
  • Uses groupby to group by the unique version of the numerical designator of each item in list i.e. lambda item: item[0].split(',')[0]
  • List comprehension keeps the first item in each grouped list i.e. list(v)[0]

CodePudding user response:

Your data_unique is not complete or the question is not complete. Do you take the first occurrence the first item before the delimitor?

CodePudding user response:

I have used recursion to solve the problem!

def get_unique(lst):
        if not lst:
            return []
        if lst[0] in lst[1:]:
            return get_unique(lst[1:])
        else:
            return [lst[0]]   get_unique(lst[1:])

data = [
["629-2, text1, 12"],
["629-2, text2, 12"],
["407-3, text9, 6"],
["407-3, text4, 6"],
["000-5, text7, 0"],
["000-5, text6, 0"],
]
print(get_unique(data))

Here I am storing the last occurrence of the element in list.

CodePudding user response:

# Convert the list of lists to a set
data_set = set(tuple(x) for x in data)

# Convert the set back to a list
data_unique = [list(x) for x in data_set]
  • Related