Home > Net >  Recreating list of objects efficiently
Recreating list of objects efficiently

Time:02-26

I am trying to convert a list of objects which has been queried using SQLAlchemy.

The issue that I am having is that the process is taking around 18-20 seconds to loop, process and send the data to the frontdnd. The data is of around 5 million rows which is way too slow to put into production.

Here is an example of what I using.

test = [
   {"id": 5, "date":"2022-01-01 00:00:00"},
   {"id": 5, "date": "2022-01-01 00:00:00"},
   {"id": 5, "date": "2022-01-01 00:00:00"},
   {"id": 5, "date": "2022-01-01 00:00:00"},
]

test_dict = {}

for i in test:
    if i["id"] not in test_dict:
        test_dict[i["id"]] = []
    test_dict[i["id"]].append(i["date"].isoformat)

Expected output should be e.g

[
   {5: [date, date, date, date, date]},
   {6: [date]}
]

I totally understand this is not working code and I am not looking to fix it. I just wrote this on the fly but my main focus is what to do with the for loop to speed the process up.

Thank you everyone for your help.

Thank you everyone for your answers so far.

Providing more info, the data needs to be sent to the frontend which is then rendered on a graph. This data is updated around every minute or so and can also be requested between 2 time ranges. These time ranges are always a minimum of 35 days so the rows returned are always a minimum of 5 million or so. 20 seconds for a graph to load for the end user I would say is too slow. The for loop is the cause of this bottleneck but would be nice to get the for loop down to say 5 seconds at least.

Thank you

CodePudding user response:

If I understood currently you can use pandas dataframes

test = [
   {"id": 5, "date":"2022-01-01 00:00:00"},
   {"id": 5, "date": "2022-02-01 00:00:00"},
   {"id": 5, "date": "2022-03-01 00:00:00"},
   {"id": 5, "date": "2022-04-01 00:00:00"},
   {"id": 6, "date": "2022-05-01 00:00:00"},
]
import pandas as pd

df = pd.DataFrame.from_dict(test)
res = df.groupby("id").agg(list)
print(res)

Output :

                                                                                   date
id                                                                                      
5   [2022-01-01 00:00:00, 2022-02-01 00:00:00, 2022-03-01 00:00:00, 2022-04-01 00:00:00]
6                                                                  [2022-05-01 00:00:00]

And if you want it to be as dict you can use res.to_dict()

CodePudding user response:

You should probably not send 5 millions objects to the frontend.

Usually we use pagination, filters, and sort elements.

Then if you are really willing to do so, the fastest way would probably be to cache your data, for instance by creating and maintaining a json file on your server that the clients would download.

  • Related