Home > Software engineering >  does filtering a dataframe result in an increase use of ram?
does filtering a dataframe result in an increase use of ram?

Time:04-04

Say I have 16 gigs of ram. and I have an in memory DataFrame (df) that is like 8 gigs. Say I want to keep this df in memory, but I have to filter it for using in another purpose.

so I do something like

filtered = df.iloc[x, y]

and say filtered is like 2 gigs.

does that mean my in memory usage is now 10 gigs or does it mean its still 8?

I'm trying to figure out how to architect something, keeping an efficient use of ram.

maybe if I use it instead of creating a new variable it remains at 8, other wise it increases?

trainOn(df.iloc[x, y])

What do you think?

CodePudding user response:

It depends on how you do the filtering. The pandas documentation says:

Whether a copy or a reference is returned for a setting operation, may depend on the context.

Here is a little example to demonstrate both possible behaviors. First, when you use df.iloc with slicing, it returns a view of the original dataframe, so memory load should hardly be affected:

import numpy as np
import pandas as pd

df = pd.DataFrame(np.array(range(9)).reshape((3, 3)))

filtered = df.iloc[:2, :2]
df.iloc[0, 0] = 777

print(filtered)
     0  1
0  777  1
1    3  4

Second, using Boolean filters instead, apparently a copy is returned, so this would affect the memory load:

df = pd.DataFrame(np.array(range(9)).reshape((3, 3)))

x = y = [True, False, True]
filtered = df.iloc[x, y]
df.iloc[0, 0] = 777

print(filtered)
   0  2
0  0  2
2  6  8
  • Related