Home > Back-end >  Loop through different dataframes and perform actions using a function
Loop through different dataframes and perform actions using a function

Time:12-11

I have 10 dataframes that have the same structure (same number of rows and columns) and I am trying to find an efficient way of performing several actions such as renaming columns with a for loop. I have tried putting them in a list such as

dfs = [df1, df2, df3]
for i in dfs:
    i.rename(columns={'A': 'a1'},inplace=True)

but it doesn't work. Another issue occurs if I try to use a function and then loop such as:

def groupdfs(anydf)
    anydf = anydf.groupby("A").sum

for i in dfs:
    groupdfs(i)

No changes are happening to the dataframes. I have searched similar old questions but nothing have worked. What is the best way to loop through many dataframes when you want to perform the same changes to each of them?

CodePudding user response:

The first piece of code you wrote should work fine.

data = np.array([(1, 2, 3), (4, 5, 6), (7, 8, 9)])
dff = pd.DataFrame(data, columns=['c', 'a', 'b'])

dff

    c   a   b
0   1   2   3
1   4   5   6
2   7   8   9

dato = np.array([(11, 12, 13), (41, 15, 16), (17, 18, 9)])
dfg = pd.DataFrame(data, columns=['c', 'a', 'b'])

dfg

    c   a   b
0   1   2   3
1   4   5   6
2   7   8   9

dffs = [dff, dfg]
for i in dffs:
    i.rename(columns={'a': 'a1'},inplace=True)

    c   a1  b
0   1   2   3
1   4   5   6
2   7   8   9

The only thing I can think of is that you should add a line in the end to save changes to files.

CodePudding user response:

For the first part

Since everything is the same, you can create a list with new column names and assign it to all of them like this:

column_names = ['a1', 'a2', 'a3']
for df in [df1, df2, df3]:
    df.columns = column_names

Or, if you want to use dictionary to change some columns only you can:

for df in [df1, df2, df3]:
    df.rename({'A':'a1'}, axis=1, inplace = True)

Note that axis = 1 stands for columns level

For the second part

There are two issues:

  1. The groupby creates a new DataFrame that has to be assigned to a variable if you want to use it again
  2. Since you are in a function, you have to return that new DataFrame to be assigned to a variable outside the function as below:
def groupdfs(anydf)
    return anydf.groupby("A").sum()

for i in dfs:
    i = groupdfs(i)

This will replace the old DataFrame with the new groupby one. It's better to create new variables for the new groupby dataframes

  • Related