How to prevent the elements in the dictionary constantly being updated?-CodePudding

Here is the full code:

import re

f = open('movies.item','r') 
# First three item of movies.item below:
#1|Toy Story (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Toy Story (1995)|0|0|0|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0 
#2|GoldenEye (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?GoldenEye (1995)|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0
#3|Four Rooms (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Four Rooms (1995)|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0
empty_list = []
for items in f:
    new_item = re.sub(r'\n', '', items)
    empty_list.append(new_item)
movie_names = []
splitted_list = None
for i in range(len(empty_list)):

    splitted_list = empty_list[i].split("|")
    movie_names.append(splitted_list[1])

genres = ["Unknown", "Action", "Adventure", "Animation", "Children's","Comedy", "Crime", "Documentary", "Drama",
"Fantasy", "Film-Noir", "Horror", "Musical", "Mystery","Romance", "Sci-Fi", "Thriller", "War", "Western"]
genres.reverse()
genredict = {}
last_dict = {}
reversegenresum = []
for i in range(len(empty_list)):
    x = list(empty_list[i])
    claer_list = []
    for k in range(len(x)):
        if x[k] != "|":
            claer_list.append(x[k])

    claer_list.reverse()
    reverse_genre_data = claer_list[0:19]
    reversegenresum.append(reverse_genre_data)
    

for i in range(3): #trying for 3 movie
    for j in range(len(genres)): 
        if reversegenresum[i][j] == '1':      
            genredict[genres[j]] = '1'
        last_dict[movie_names[i]] = genredict    


print(last_dict)

What am I trying to do? I try to match data from the file that named 'movies.item'. There are movies and their data information like '0|0|1|0'. If the value of the data is equal to 1 I need to match it with the corresponding category. But I can only do this for 1 movie. Although I do not get an error when I try to do it otherwise, all my data is shaped according to the last data. If you don't understand what I mean, please copy the code and try it yourself.

Input :

{'Toy Story (1995)': {'Comedy': '1', "Children's": '1', 'Animation': '1', 'Thriller': '1', 'Adventure': '1', 'Action': '1'}, 
 'GoldenEye (1995)': {'Comedy': '1', "Children's": '1', 'Animation': '1', 'Thriller': '1', 'Adventure': '1', 'Action': '1'}, 
 'Four Rooms (1995)': {'Comedy': '1', "Children's": '1', 'Animation': '1', 'Thriller': '1', 'Adventure': '1', 'Action': '1'}}

What I want:

{'Toy Story (1995)': {'Animation': 1, "Children's": 1, 'Comedy': 1}, 
 'GoldenEye (1995)': {'Action': 1, 'Adventure': 1, 'Thriller': 1}, 
 'Four Rooms (1995)': {'Thriller': 1},

CodePudding user response：

Problem

You keep saving the exact same dict genredict to every film

last_dict[movie_names[i]] = genredict

Simple fix

Use a new dict for each film, and assign it after the j loop it's enough

for i in range(3):  
    genredict = {}
    for j in range(len(genres)):
        if reversegenresum[i][j] == '1':
            genredict[genres[j]] = '1'
    last_dict[movie_names[i]] = genredict

Improve

You have basically 4 loops that iterate over the same thing : the films, instead of doing the actions one by one on each films, do them together on the films one by one

genres = ["Unknown", "Action", "Adventure", "Animation", "Children's", "Comedy", "Crime", "Documentary", "Drama",
          "Fantasy", "Film-Noir", "Horror", "Musical", "Mystery", "Romance", "Sci-Fi", "Thriller", "War", "Western"]
result = {}
with open('movies.item', 'r') as f:
    for items in f:
        index, name, date, url, _, *values = items.rstrip("\n").split("|")
        item_genre = dict(zip(genres, values))
        result[name] = {genre: value for genre, value in item_genre.items() if value == '1'}

split the line once and retrieve all the elements you need : name and values
dict(zip( , )) to pair the genres and the values
{genre: value for genre, value in item_genre.items() if value == '1'} to keep only the genre with 1

Note that final line should better the following, you don't need a dict where all values are the same (1), just keep a list

result[name] = [genre for genre, value in item_genre.items() if value == '1']

# {'Toy Story (1995)': ['Animation', "Children's", 'Comedy'], 'GoldenEye (1995)': ['Action', 'Adventure', 'Thriller'], 'Four Rooms (1995)': ['Thriller']}