Home > Enterprise >  using a loop to insert values from a column from a data frame into a dictionary key
using a loop to insert values from a column from a data frame into a dictionary key

Time:01-25

There are around 60.000 dictionaries stored in a list. There is also a dataframe with the same amount of rows of which I want to take one column and insert into the dictionaries as a key value pair. I have created a for loop which is supposed to update dictionary values, which however seems to take forever. I am looking for a more optimal way to succeed considering the amount of rows.

new_dicties = []
for i in list_of_dicts:
    for x in resultsDf0['created_at']:
        i['created_at']=x
        new_dicties.append(i)

CodePudding user response:

Due to the nested loop and based on the information you've given you are doing 60,000 x 60,000 = 3,600,000,000 dictionary updates, most of them in vain because you're overriding each update 59,999 times.

So I suspect you have the following situation: A dataframe df and a list of dictionaries list_of_dicts that have the same length (length of df = number of rows), for instance:

df = pd.DataFrame({"created_at": ["2023-01-01", "2023-01-02", "2023-01-03"]})
list_of_dicts = [{"key": i} for i in range(1, 4)]

Most likely you're trying to do:

new_dicties = []
for d, v in zip(list_of_dicts, df["created_at"]):
    d["created_at"] = v
    new_dicties.append(d)

Now this gives you the following new_dicties

[{'key': 1, 'created_at': '2023-01-01'},
 {'key': 2, 'created_at': '2023-01-02'},
 {'key': 3, 'created_at': '2023-01-03'}]

but list_of_dicts looks the same, because the variables are references (pointers, if you will).

If that's fine, then you could also just stick with the original list_of_dicts and do

for d, v in zip(list_of_dicts, df["created_at"]):
    d["created_at"] = v
new_dicties = list_of_dicts  # Maybe not needed

If that's not what you want, then you could do either

new_dicties = [d | {"created_at": v} for d, v in zip(list_of_dicts, df["created_at"])]

in case you have Python 3.9 or higher or

new_dicties = []
for d, v in zip(list_of_dicts, df["created_at"]):
    d = dict(d)
    d["created_at"] = v
    new_dicties.append(d)

CodePudding user response:

new_dicties = []
for i in list_of_dicts:
    for j in i.keys():
        new_dicties.append(j)
  • Related