Main list:
data = [
["629-2, text1, 12"],
["629-2, text2, 12"],
["407-3, text9, 6"],
["407-3, text4, 6"],
["000-5, text7, 0"],
["000-5, text6, 0"],
]
I want to get a list comprised of unique lists like so:
data_unique = [
["629-2, text1, 12"],
["407-3, text9, 6"],
["000-5, text6, 0"],
]
I've tried using numpy.unique
but I need to pare it down further as I need the list to be populated by lists containing a single unique version of the numerical designator in the beginning of the string, ie. 629-2...
I've also tried using chain
from itertools
like this:
def get_unique(data):
return list(set(chain(*data)))
But that only got me as far as numpy.unique
.
Thanks in advance.
CodePudding user response:
Code
from itertools import groupby
def get_unique(data):
def designated_version(item):
return item[0].split(',')[0]
return [list(v)[0] for _, v in groupby(sorted(data,
key = designated_version),
designated_version)]
Test
print(get_unique(data))
# Output
[['629-2, text1, 12'], ['407-3, text9, 6'], ['000-5, text7, 0']]
Explanation
- Sorts data by designated number (in case not already sorted)
- Uses groupby to group by the unique version of the numerical designator of each item in list i.e.
lambda item: item[0].split(',')[0]
- List comprehension keeps the first item in each grouped list i.e.
list(v)[0]
CodePudding user response:
Your data_unique
is not complete or the question is not complete. Do you take the first occurrence the first item before the delimitor?
CodePudding user response:
I have used recursion to solve the problem!
def get_unique(lst):
if not lst:
return []
if lst[0] in lst[1:]:
return get_unique(lst[1:])
else:
return [lst[0]] get_unique(lst[1:])
data = [
["629-2, text1, 12"],
["629-2, text2, 12"],
["407-3, text9, 6"],
["407-3, text4, 6"],
["000-5, text7, 0"],
["000-5, text6, 0"],
]
print(get_unique(data))
Here I am storing the last occurrence of the element in list.
CodePudding user response:
# Convert the list of lists to a set
data_set = set(tuple(x) for x in data)
# Convert the set back to a list
data_unique = [list(x) for x in data_set]