Filtering a list of strings by combining duplicate values, as well as appending a duplicate value-CodePudding

I currently have a list of strings that looks something like:

"a 05/13/22 apple"        
"a 05/13/22 apple"        
"b 05/13/22 apple"        
"b 05/13/22 apple"        
"c 05/13/22 apple"        
"c 05/13/22 apple"        
"a 05/27/22 strawberry"   
"a 05/27/22 strawberry"   
"b 05/27/22 strawberry"   
"b 05/27/22 strawberry"   
"c 05/27/22 strawberry"   
"c 05/27/22 strawberry"   
"a 07/29/22 banana"       
"a 07/29/22 banana"       
"b 07/29/22 banana"       
"b 07/29/22 banana"       
"c 07/29/22 banana"       
"c 07/29/22 banana"

I've split the strings into separate values, and am trying to achieve an output similar to this:

6 occurrences found for apple 
05/13/22 apple [a,b,c]

6 occurrences found for strawberry
05/27/22 strawberry [a,b,c]

6 occurrences found for banana
07/29/22 banana [a,b,c]

I've attempted to loop over the values, as so

fruit_exists = []
pexists = []
pfruit, pdate, pletters = '','',[]
for instance in fruit_info:
        letter, date, fruit = instance.split()
        if fruit not in fruit_exists:
            fruit_exists.append(fruit)    
        if pdate == '':
            pfruit = fruit
            pdate = date
        if pdate pfruit not in pexists:
            if letter not in pletters:
                pletters.append(letter)
            if pdate != date:
                print(f'{pdate} - {pfruit} for {", ".join(pletters)}')
                pexists.append(pdate pfruit)
                pfruit = fruit
                pdate = date
                pletters = []
    print(f'{pdate} - {pfruit} for {", ".join(pletters)}')

As well as other iterations of this for loop, however I think I may be approaching the problem incorrectly, as I do not seem to retrieve the correct values when trying to solve the issue this way

CodePudding user response：

There are about 1000 ways to do this, depending on your skills. Here is one that uses 2 dictionaries and some basic python.

data = ['a 05/13/22 apple',        
        'a 05/13/22 apple',        
        'b 05/13/22 apple',        
        'b 05/13/22 apple',        
        'c 05/13/22 apple',        
        'c 05/13/22 apple',        
        'a 05/27/22 strawberry',   
        'a 05/27/22 strawberry',   
        'b 05/27/22 strawberry',   
        'b 05/27/22 strawberry',   
        'c 05/27/22 strawberry',   
        'c 05/27/22 strawberry',   
        'a 07/29/22 banana',       
        'a 07/29/22 banana',       
        'b 07/29/22 banana',       
        'b 07/29/22 banana',       
        'c 07/29/22 banana',       
        'c 07/29/22 banana'] 

count_dict = dict()  # to hold the count
ltr_dict =   dict()  # to hold the set of letters
for item in data:
    # split it up...
    ltr, dt, fruit = item.strip().split()
    # get the fruit/dt tuple from the keys, if not there bring back 0
    count =count_dict.get((fruit, dt), 0)
    # put it in the dictionary, incremented by 1
    count_dict[(fruit, dt)] = count   1 
    # get the set of letters currently seen, or in none, an empty set
    ltrs = ltr_dict.get((fruit, dt), set())
    # add the letter to the set, and put it back in...
    ltrs.add(ltr)
    ltr_dict[(fruit, dt)] = ltrs

# produces these results:
print(count_dict)
print(ltr_dict)

# we can then iterate through the keys of the dictionary to get a nicer format:

for f,d in count_dict.keys():
    print(f'have count of {count_dict.get((f,d))} for {f, d} in letters:')
    print(','.join(sorted(ltr_dict.get((f,d)))))

Output:

{('apple', '05/13/22'): 6, ('strawberry', '05/27/22'): 6, ('banana', '07/29/22'): 6}
{('apple', '05/13/22'): {'b', 'c', 'a'}, ('strawberry', '05/27/22'): {'b', 'c', 'a'}, ('banana', '07/29/22'): {'b', 'c', 'a'}}
have count of 6 for ('apple', '05/13/22') in letters:
a,b,c
have count of 6 for ('strawberry', '05/27/22') in letters:
a,b,c
have count of 6 for ('banana', '07/29/22') in letters:
a,b,c

CodePudding user response：

This might help:

fruits = dict()
for instance in fruit_info:
        letter, date, fruit = instance.split(' ')
        if fruit not in fruits or fruits[fruit][1] != date:            
            fruits[fruit] = [1, date, [letter]]    
        else:
            fruits[fruit][0]  = 1
            if letter not in fruits[fruit][2]:
                fruits[fruit][2].append(letter)

for key in fruits:
    my_list = fruits[key]
    print(f"{my_list[0]} occurrences found for {key}")
    print(f"{my_list[1]} {key} {fruits[fruit][2]}")

CodePudding user response：

In case you have new entries where the dates are not all the same, this implementation will take care of any amount of fruits!

Step 1 We will use a fruit dictionary to store our values after parsing the list where fruit_list is the list you gave above:

fruit_dict = {}
for item in fruit_list:
    letter, date, fruit = item.split()
    if fruit not in fruit_dict:
        fruit_dict[fruit] = {
            'letters': [letter],
            'dates': [date],
            'occurences': 1
        }
    else:
        fruit_dict[fruit]['letters'].append(letter) if letter not in fruit_dict[fruit]['letters'] else None
        fruit_dict[fruit]['dates'].append(date) if date not in fruit_dict[fruit]['dates'] else None
        fruit_dict[fruit]['occurences']  = 1

Step 2 Then print our values:

for fruit in fruit_dict.keys():
    letters = fruit_dict[fruit]['letters']
    dates = str(fruit_dict[fruit]['dates'])
    occurences = fruit_dict[fruit]['occurences']
    print(f"{occurences} occurences found for {fruit}\n{dates} {fruit} {letters}\n")

CodePudding user response：

Here's another approach but using Counter. The code creates a dictionary which has this skeleton: {fruit: {date: Counter()}}

d = {}

# your_list is your list of strings
for i in your_list:
    char, date, fruit = i.strip().split()
    if key := d.get(fruit):
        key[date] = key[date]   Counter(char)
    else:
        d[fruit] = {date: Counter(char)}

for i in d:
    c: Counter = list(d[i].values())[0]
    print(i, list(d[i])[0], ", total", c.total(), 'occurrences:',
          *[f"{i[0]}={i[1]}," for i in c.most_common()])

Output:

apple 05/13/22 , total 6 occurrences: a=2, b=2, c=2,
strawberry 05/27/22 , total 6 occurrences: a=2, b=2, c=2,
banana 07/29/22 , total 6 occurrences: a=2, b=2, c=2,