PySpark: Iterate over list of dataframes-CodePudding

I have a couple of dataframe and I want all columns of them to be in uppercase. I did this as follows:

for col in df1.columns:
    df1 = df1.withColumnRenamed(col, col.upper())

for col in df2.columns:
    df2 = df2.withColumnRenamed(col, col.upper())

No I want to do this in an array iteration like this:

list = (df1, df2, df3)
for x in list:
   for col in x.columns:
      x = x.withColumnRenamed(col, col.upper())

But somehow this does not work (but no error displayed), the columns stay in lowercase. I also tried to attach an "return x" at the end but that doesn't work either. Can someone help me?

CodePudding user response：

The changes to your dataframe are not reflecting in the original variables viz. df1, df2, and df3.

You could use the globals() function to achieve this. Code below:

a = ['df1', 'df2', 'df3']
for x in a:
    for col in globals()[x].columns:
        globals()[x] = globals()[x].withColumnRenamed(col, col.upper())

You might have to use either globals() or locals() depending on your use case.

globals() and locals() both help in accessing a variable by a string, and both of them return a dictionary of variables. You can read more about them online.

EDIT : Also, list is a keyword in your code, you should change the variable name to something else.

CodePudding user response：

Okay pri's answer worked for me if I added a global statement to it.

global df1, df2, df3
a = ['df1', 'df2', 'df3']
for x in a:
    for col in globals()[x].columns:
        globals()[x] = globals()[x].withColumnRenamed(col, col.upper())