Home > Software engineering >  Why do python lists take up so much memory?
Why do python lists take up so much memory?

Time:10-25

I have a pandas dataframe containing large volumes of text in each row and it takes up 1.6GB of space when converted to .pkl. Now I want to make a list of words from this dataframe, and I thought that something as simple as [word for text in df.text for word in i.split()] should suffice, however, this expression eats up all 16GB of ram in 10 seconds and that's it. It is really interesting to me how that works, why is it not just above 1.6GB? I know that lists allocate a little more memory to be able to expand, so I have tried tuples - the same result. I even tried writing everything into a file as tuples ('one', 'two', 'three') and then opening the file and doing eval - still the same result. Why does that happen? Does pandas compress data or is python that inefficient? What is a better way to do it?

CodePudding user response:

You can use a generator. For example map(func, iterable)

  • Related