I have a question and different ideas on how to approach it, but I'd like to know what more experienced programmers would do.
Background: I have a nested list, with sub lists, that always have two items, that looks like this: [[category, item];[category, item];...]. categories and items are strings. There are items that end with a number or float and there are others, that end with just characters or spaces. Some (but not all) items are duplicates.
Problem: Just focused on the items, categories do not need to be altered. Do not suggest to use dict or something else, it needs to be a nested list like above.
- I need to delete every duplicate, that does not end with a number/float - so just one of the 1-10 duplicates remains in the list.
- If there are duplicates (same items), that end with numbers, I need to sum up all numbers/floats and just leave one entry with the summed up number and delete the original ones.
Example:
Input:
[["fruits", "Apple 1"];["fruits", "Apple 2"];["fruits", "Apple 5"];["cooled", "iced tea 1,5"]; ["cooled", "iced tea 2"]; ["fruits"; "onions"]; ["fruits"; "onions"];["fruits"; "onions"];["frozen"; "Pizza"]
Output:
[["fruits", "Apple 8"];["cooled", "iced tea 3,5"];["fruits"; "onions"];["frozen"; "Pizza"]
Any ideas?
CodePudding user response:
First of all items must be separated with ,
and not with ;
.
You can read more useful stuff about nested lists here.
I would approach this task with something like this :
myNestedList = [ ["fruits", "Apple 1"],["fruits", "Apple 2"],["fruits", "Apple 5"],["cooled", "iced tea 1,5"], ["cooled", "iced tea 2"], ["fruits", "onions"], ["fruits", "onions"],["fruits", "onions"],["frozen", "Pizza"] ]
myNewNestedList = []
duplicates = []
for i in myNestedList:
if i not in myNewNestedList:
myNewNestedList.append(i)
else:
duplicates.append(i)
print("Items cleaned : ",myNewNestedList)
print("Duplicates that found and cleaned : ", duplicates)
Output :
Items cleaned : [['fruits', 'Apple 1'], ['fruits', 'Apple 2'], ['fruits', 'Apple 5'], ['cooled', 'iced tea 1,5'], ['cooled', 'iced tea 2'], ['fruits', 'onions'], ['frozen', 'Pizza']]
Duplicates that found and cleaned : [['fruits', 'onions'], ['fruits', 'onions']]
CodePudding user response:
What about this:
entries = [["fruits", "Apple 1"],
["fruits", "Apple 2"],
["fruits", "Apple 5"],
["cooled", "iced tea 1,5"],
["cooled", "iced tea 2"],
["fruits", "onions"],
["fruits", "onions"],
["fruits", "onions"],
["frozen", "Pizza"]]
counts = {}
categories = {}
for category, entry in entries:
try:
value = float(
''.join(
re.findall(r'[\d,.]*\d', entry)).replace(',', '.')
)
except ValueError:
value = 1
item = re.findall(r'[a-zA-Z].*[a-zA-Z]', entry)[0]
categories[item] = category
try:
counts[item] = value
except KeyError:
counts[item] = value
result = []
for item, count in counts.items():
result.append([
categories[item],
item if count == 1 else f'{item} {count}'.replace('.', ',')
])
A bit more string/float/int formatting would be needed though to exactly replicate the desired output.
CodePudding user response:
I would start, by transforming your list into better data. Instead of having strings with two items, you should have tuples where the first item is Apple
, onion
etc. and the second item is either a float or None. You can use regexp to do this work easily.
import re
def transform_to_tuple(data):
m = re.match(r"([A-Za-z ] )( \d (,\d )?)?$", data)
if m:
return (m.group(1), float(m.group(2).replace(",",".")) if m.group(2) else None)
else:
raise ValueError("Malformed string '{}'".format(data))
data = [["fruits", "Apple 1"],["fruits", "Apple 2"],["fruits", "Apple 5"],["cooled", "iced tea 1,5"], ["cooled", "iced tea 2"],
["fruits", "onions"], ["fruits", "onions"],["fruits", "onions"],["frozen", "Pizza"]]
betterdata = [(x[0],transform_to_tuple(x[1])) for x in data]
Once this is done, iterate over the list of tuples and working with the data will become a lot easier.