Home > OS >  Need to remove (and partiallly merge) nearly duplicate items from list of dictionaries
Need to remove (and partiallly merge) nearly duplicate items from list of dictionaries

Time:12-06

I have a list of dictionaries in this form: (example) [{name: aa, year: 2022}, {name: aa, year: 2021}, {name: bb, year: 2016}, {name: cc, year: 2015}]. What i need is to remove the items where the name is the same, but make a list where the years are added together (every year can be in a list, for my purposes, this doesn't matter). So the example list of dictionaries would look like this: [{name: aa, year: [2022, 2021}, {name: bb, year: [2016]}, {name: cc, year: [2015]}]. My current code looks like this.

def read_csv_file(self, path):
    book_list = []
    with open(path) as f:
        read_dict = csv.DictReader(f)
        for i in read_dict:
            book_list.append(i)
           

    bestsellers = []
    for i in list_of_books:
        seen_books = []
        years_list = []
        if i["Name"] not in seen_books:
            years_list.append(i["Year"])
            seen_books.append(i)
        else:
            years_list.append(i["Year"])

        if i['Genre'] == 'Non Fiction':
            bestsellers.append(FictionBook(i["Name"], i["Author"], float(i["User Rating"]), int(i["Reviews"]), float(i["Price"]), years_list, i["Genre"]))
        else:
            bestsellers.append(NonFictionBook(i["Name"], i["Author"], float(i["User Rating"]), int(i["Reviews"]), float(i["Price"]), years_list, i["Genre"]))
    for i in bestseller:
        print(i.title)

Ultimately my code needs to extract data from a csv file and then create instances of the class Fictionbook or Nonfictionbook depending on the genre. I think i have the CSV file and making the books finished, i just need to filter the near-duplicate dictionaries and merge them in the lists of years if that makes sense. If anything is unclear please let me know, so i can explain further.

CodePudding user response:

Use dict.setdefault() to create a list if the key has not yet been seen:

lod=[{'name': 'aa', 'year': 2022}, {'name': 'aa', 'year': 2021}, {'name': 'bb', 'year': 2016}, {'name': 'cc', 'year': 2015}]

result={}
for d in lod:
    result.setdefault(d['name'], []).append(d['year'])

>>> result
{'aa': [2022, 2021], 'bb': [2016], 'cc': [2015]}

Then put the list back together:

>>> [{'name': n, 'year': v} for n,v in result.items()]
[{'name': 'aa', 'year': [2022, 2021]}, {'name': 'bb', 'year': [2016]}, {'name': 'cc', 'year': [2015]}]

CodePudding user response:

This works:

dict_list = [{'name': 'aa', 'year': 2022}, {'name': 'aa', 'year': 2021}, {'name': 'bb', 'year': 2016}, {'name': 'cc', 'year': 2015}]

new_dict_list = []
names_seen = set()
for name in [d['name'] for d in dict_list]:
    if not name in names_seen:
        new_dict_list.append({'name':name, 'year':[d['year'] for d in dict_list if d['name']==name]})
    names_seen.add(name)

new_dict_list
# Out[68]: 
# [{'name': 'aa', 'year': [2022, 2021]},
#  {'name': 'bb', 'year': [2016]},
#  {'name': 'cc', 'year': [2015]}]
  • Related