Home > Software design >  Filtering a list of strings by combining duplicate values, as well as appending a duplicate value
Filtering a list of strings by combining duplicate values, as well as appending a duplicate value

Time:08-10

I currently have a list of strings that looks something like:

"a 05/13/22 apple"        
"a 05/13/22 apple"        
"b 05/13/22 apple"        
"b 05/13/22 apple"        
"c 05/13/22 apple"        
"c 05/13/22 apple"        
"a 05/27/22 strawberry"   
"a 05/27/22 strawberry"   
"b 05/27/22 strawberry"   
"b 05/27/22 strawberry"   
"c 05/27/22 strawberry"   
"c 05/27/22 strawberry"   
"a 07/29/22 banana"       
"a 07/29/22 banana"       
"b 07/29/22 banana"       
"b 07/29/22 banana"       
"c 07/29/22 banana"       
"c 07/29/22 banana"       

I've split the strings into separate values, and am trying to achieve an output similar to this:

6 occurrences found for apple 
05/13/22 apple [a,b,c]

6 occurrences found for strawberry
05/27/22 strawberry [a,b,c]

6 occurrences found for banana
07/29/22 banana [a,b,c]  

I've attempted to loop over the values, as so

fruit_exists = []
pexists = []
pfruit, pdate, pletters = '','',[]
for instance in fruit_info:
        letter, date, fruit = instance.split()
        if fruit not in fruit_exists:
            fruit_exists.append(fruit)    
        if pdate == '':
            pfruit = fruit
            pdate = date
        if pdate pfruit not in pexists:
            if letter not in pletters:
                pletters.append(letter)
            if pdate != date:
                print(f'{pdate} - {pfruit} for {", ".join(pletters)}')
                pexists.append(pdate pfruit)
                pfruit = fruit
                pdate = date
                pletters = []
    print(f'{pdate} - {pfruit} for {", ".join(pletters)}')

As well as other iterations of this for loop, however I think I may be approaching the problem incorrectly, as I do not seem to retrieve the correct values when trying to solve the issue this way

CodePudding user response:

There are about 1000 ways to do this, depending on your skills. Here is one that uses 2 dictionaries and some basic python.

data = ['a 05/13/22 apple',        
        'a 05/13/22 apple',        
        'b 05/13/22 apple',        
        'b 05/13/22 apple',        
        'c 05/13/22 apple',        
        'c 05/13/22 apple',        
        'a 05/27/22 strawberry',   
        'a 05/27/22 strawberry',   
        'b 05/27/22 strawberry',   
        'b 05/27/22 strawberry',   
        'c 05/27/22 strawberry',   
        'c 05/27/22 strawberry',   
        'a 07/29/22 banana',       
        'a 07/29/22 banana',       
        'b 07/29/22 banana',       
        'b 07/29/22 banana',       
        'c 07/29/22 banana',       
        'c 07/29/22 banana'] 

count_dict = dict()  # to hold the count
ltr_dict =   dict()  # to hold the set of letters
for item in data:
    # split it up...
    ltr, dt, fruit = item.strip().split()
    # get the fruit/dt tuple from the keys, if not there bring back 0
    count =count_dict.get((fruit, dt), 0)
    # put it in the dictionary, incremented by 1
    count_dict[(fruit, dt)] = count   1 
    # get the set of letters currently seen, or in none, an empty set
    ltrs = ltr_dict.get((fruit, dt), set())
    # add the letter to the set, and put it back in...
    ltrs.add(ltr)
    ltr_dict[(fruit, dt)] = ltrs

# produces these results:
print(count_dict)
print(ltr_dict)

# we can then iterate through the keys of the dictionary to get a nicer format:

for f,d in count_dict.keys():
    print(f'have count of {count_dict.get((f,d))} for {f, d} in letters:')
    print(','.join(sorted(ltr_dict.get((f,d)))))

Output:

{('apple', '05/13/22'): 6, ('strawberry', '05/27/22'): 6, ('banana', '07/29/22'): 6}
{('apple', '05/13/22'): {'b', 'c', 'a'}, ('strawberry', '05/27/22'): {'b', 'c', 'a'}, ('banana', '07/29/22'): {'b', 'c', 'a'}}
have count of 6 for ('apple', '05/13/22') in letters:
a,b,c
have count of 6 for ('strawberry', '05/27/22') in letters:
a,b,c
have count of 6 for ('banana', '07/29/22') in letters:
a,b,c

CodePudding user response:

This might help:

fruits = dict()
for instance in fruit_info:
        letter, date, fruit = instance.split(' ')
        if fruit not in fruits or fruits[fruit][1] != date:            
            fruits[fruit] = [1, date, [letter]]    
        else:
            fruits[fruit][0]  = 1
            if letter not in fruits[fruit][2]:
                fruits[fruit][2].append(letter)

for key in fruits:
    my_list = fruits[key]
    print(f"{my_list[0]} occurrences found for {key}")
    print(f"{my_list[1]} {key} {fruits[fruit][2]}")

CodePudding user response:

In case you have new entries where the dates are not all the same, this implementation will take care of any amount of fruits!

Step 1 We will use a fruit dictionary to store our values after parsing the list where fruit_list is the list you gave above:

fruit_dict = {}
for item in fruit_list:
    letter, date, fruit = item.split()
    if fruit not in fruit_dict:
        fruit_dict[fruit] = {
            'letters': [letter],
            'dates': [date],
            'occurences': 1
        }
    else:
        fruit_dict[fruit]['letters'].append(letter) if letter not in fruit_dict[fruit]['letters'] else None
        fruit_dict[fruit]['dates'].append(date) if date not in fruit_dict[fruit]['dates'] else None
        fruit_dict[fruit]['occurences']  = 1

Step 2 Then print our values:

for fruit in fruit_dict.keys():
    letters = fruit_dict[fruit]['letters']
    dates = str(fruit_dict[fruit]['dates'])
    occurences = fruit_dict[fruit]['occurences']
    print(f"{occurences} occurences found for {fruit}\n{dates} {fruit} {letters}\n")

CodePudding user response:

Here's another approach but using Counter. The code creates a dictionary which has this skeleton: {fruit: {date: Counter()}}

d = {}

# your_list is your list of strings
for i in your_list:
    char, date, fruit = i.strip().split()
    if key := d.get(fruit):
        key[date] = key[date]   Counter(char)
    else:
        d[fruit] = {date: Counter(char)}

for i in d:
    c: Counter = list(d[i].values())[0]
    print(i, list(d[i])[0], ", total", c.total(), 'occurrences:',
          *[f"{i[0]}={i[1]}," for i in c.most_common()])

Output:

apple 05/13/22 , total 6 occurrences: a=2, b=2, c=2,
strawberry 05/27/22 , total 6 occurrences: a=2, b=2, c=2,
banana 07/29/22 , total 6 occurrences: a=2, b=2, c=2,
  • Related