Best way to edit nested list for specific problem?-CodePudding

I have a question and different ideas on how to approach it, but I'd like to know what more experienced programmers would do.

Background: I have a nested list, with sub lists, that always have two items, that looks like this: [[category, item];[category, item];...]. categories and items are strings. There are items that end with a number or float and there are others, that end with just characters or spaces. Some (but not all) items are duplicates.

Problem: Just focused on the items, categories do not need to be altered. Do not suggest to use dict or something else, it needs to be a nested list like above.

I need to delete every duplicate, that does not end with a number/float - so just one of the 1-10 duplicates remains in the list.
If there are duplicates (same items), that end with numbers, I need to sum up all numbers/floats and just leave one entry with the summed up number and delete the original ones.

Example:

Input:

[["fruits", "Apple 1"];["fruits", "Apple 2"];["fruits", "Apple 5"];["cooled", "iced tea 1,5"]; ["cooled", "iced tea 2"]; ["fruits"; "onions"]; ["fruits"; "onions"];["fruits"; "onions"];["frozen"; "Pizza"]

Output:

[["fruits", "Apple 8"];["cooled", "iced tea 3,5"];["fruits"; "onions"];["frozen"; "Pizza"]

Any ideas?

CodePudding user response：

First of all items must be separated with , and not with ;.

You can read more useful stuff about nested lists here.

I would approach this task with something like this :

myNestedList = [ ["fruits", "Apple 1"],["fruits", "Apple 2"],["fruits", "Apple 5"],["cooled", "iced tea 1,5"], ["cooled", "iced tea 2"], ["fruits", "onions"], ["fruits", "onions"],["fruits", "onions"],["frozen", "Pizza"] ]
myNewNestedList = []
duplicates = []
for i in myNestedList:
    if i not in myNewNestedList:
        myNewNestedList.append(i)
    else:
        duplicates.append(i)

print("Items cleaned : ",myNewNestedList)
print("Duplicates that found and cleaned : ", duplicates)

Output :

Items cleaned :  [['fruits', 'Apple 1'], ['fruits', 'Apple 2'], ['fruits', 'Apple 5'], ['cooled', 'iced tea 1,5'], ['cooled', 'iced tea 2'], ['fruits', 'onions'], ['frozen', 'Pizza']]
Duplicates that found and cleaned :  [['fruits', 'onions'], ['fruits', 'onions']]

CodePudding user response：

What about this:

entries = [["fruits", "Apple 1"],
           ["fruits", "Apple 2"],
           ["fruits", "Apple 5"],
           ["cooled", "iced tea 1,5"],
           ["cooled", "iced tea 2"],
           ["fruits", "onions"],
           ["fruits", "onions"],
           ["fruits", "onions"],
           ["frozen", "Pizza"]]

counts = {}
categories = {}
for category, entry in entries:
    try:
        value = float(
            ''.join(
                re.findall(r'[\d,.]*\d', entry)).replace(',', '.')
        )
    except ValueError:
        value = 1

    item = re.findall(r'[a-zA-Z].*[a-zA-Z]', entry)[0]
    categories[item] = category

    try:
        counts[item]  = value
    except KeyError:
        counts[item] = value

result = []
for item, count in counts.items():
    result.append([
        categories[item],
        item if count == 1 else f'{item} {count}'.replace('.', ',')
    ])

A bit more string/float/int formatting would be needed though to exactly replicate the desired output.

CodePudding user response：

I would start, by transforming your list into better data. Instead of having strings with two items, you should have tuples where the first item is Apple, onion etc. and the second item is either a float or None. You can use regexp to do this work easily.

import re

def transform_to_tuple(data):
    m = re.match(r"([A-Za-z ] )( \d (,\d )?)?$", data)
    if m:
        return (m.group(1), float(m.group(2).replace(",",".")) if m.group(2) else None)
    else:
        raise ValueError("Malformed string '{}'".format(data))

data = [["fruits", "Apple 1"],["fruits", "Apple 2"],["fruits", "Apple 5"],["cooled", "iced tea 1,5"], ["cooled", "iced tea 2"],
        ["fruits", "onions"], ["fruits", "onions"],["fruits", "onions"],["frozen", "Pizza"]]

betterdata = [(x[0],transform_to_tuple(x[1])) for x in data]

Once this is done, iterate over the list of tuples and working with the data will become a lot easier.