Iterate through different dataframes and apply a function to each one-CodePudding

I have 4 different dataframes containing time series data that all have the same structure.

My goal is to take each individual dataframe and pass it through a function I have defined that will group them by datestamp, sum the columns and return a new dataframe with the columns I want. So in total I want 4 new dataframes that have only the data I want.

I just looked through this post: Loop through different dataframes and perform actions using a function but applying this did not change my results.

Here is my code:

I am putting the dataframes in a list so I can iterate through them

dfs = [vds, vds2, vds3, vds4]

This is my function I want to pass each dataframe through:

def VDS_pre(df):
    df = df.groupby(['datestamp','timestamp']).sum().reset_index()
    df = df.rename(columns={'datestamp': 'Date','timestamp':'Time','det_vol': 'VolumeVDS'})
    df = df[['Date','Time','VolumeVDS']]
    
    return df

This is the loop I made to iterate through my dataframe list and pass each one through my function:

for df in dfs:
    df = VDS_pre(df)

However once I go through my loop and go to print out the dataframes, they have not been modified and look like they initially did. Thanks for the help!

CodePudding user response：

However once I go through my loop and go to print out the dataframes, they have not been modified and look like they initially did.

Yes, this is actually the case as they have been not modified ( scroll to the bottom of this answer to see the essence of all the text and code below ). The following code demonstrates this:

df_1 = [1]
df_2 = [2]
dfs  = [df_1, df_2]
def f(df):
    df = [df[0] 10]
    print('>', df)
    return df
for df in dfs: 
    df = f(df)
print(dfs, df_1, df_2)

printing:

> [11]
> [12]
[[1], [2]] [1] [2]

To achieve the required effect another code can be used:

df_1 = [1]
df_2 = [2]
dfs  = [ df_1,   df_2]
sdfs = ['df_1', 'df_2']
def f(df):
    df = [df[0] 10]
    print('>', df)
    return df
for sdf in sdfs:
    tdf = eval(sdf)
    exec(str(f'{sdf} = f(tdf)'))
print(dfs, df_1, df_2)

printing:

> [11]
> [12]
[[1], [2]] [11] [12]

Notice here that while df_1, df_2 have now new values, dfs remained unchanged.

In other words translating this to your case you can use:

sdfs = ['vds', 'vds2', 'vds3', 'vds4']
# ...
for sdf in sdfs:
    tdf = eval(sdf)
    exec(str(f'{sdf} = VDS_pre(tdf)'))

to meet your expectations.

The core of the surprise with the in the question described experienced outcome is the illusion that: I am putting the dataframes in a list so I can iterate through them dfs = [vds, vds2, vds3, vds4] expecting that the identifier vds, vds2, ... deliver afterwards the changed content. But they don't.

So what can be done if you want to avoid eval() and exec() in your code? You have to change your point of view and use the list with dataframes to store the changed values. The following code demonstrates this approach:

df_1 = [1]
df_2 = [2]
dfs  = [ df_1,   df_2]
def f(df):
    df = [df[0] 10]
    return df
for indx, df in enumerate(dfs):
    dfs[indx] = f(df)
print(dfs, df_1, df_2) # gives [[11], [12]] [1] [2]

In other words translating this to your case you can use:

for indx, df in enumerate(dfs):
    dfs[indx] = VDS_pre(df)

( or use the list comprehension: dfs = [VDS_pre(df) for df in dfs] as suggested in a comment to your question by onyambu. )

and store the changed dataframes in the dfs list. But notice that you can access the changed dataframes only through the dfs list and not through the genuine identifier/variables as they will still deliver the old values until you assign the new dataframes to them:

vds = dfs[0]; vds2=dfs[1]; vds3=dfs[2]; vds4=dfs[3]

what if you want to do it in a loop requires then the sdfs list and exec().

What is the actual lesson from all the text and code above?

Assignment to an item in a for item in lst: loop does not have any effect on both the list lst and the identifier/variables from which the lst items got their values:

lst = [1,2,3]
for item in lst:
    item = 0
print(lst) # gives: [1, 2, 3]
# ---
v1=1;v2=2;v3=3
lst = [v1,v2,v3]
for item in lst:
    item = 0
print(lst, v1, v2, v3) # gives: [1, 2, 3] 1 2 3

CodePudding user response：

Pandas frame operations return new copy of data. Your snippet store the result in df variable which is not stored or updated to your initial list. This is why you don't have any stored result after execution.

If you don't need to keep original frames, you may simply overwrite them:

for i, df in enumerate(dfs):
    dfs[i] = VDS_pre(df)

If not just use a second list and append result to it.

l = []
for df in dfs:
    df2 = VDS_pre(df)
    l.append(df2)

Now you are able to store the result of your processing.