Home > Enterprise >  PySpark: Iterate over list of dataframes
PySpark: Iterate over list of dataframes

Time:11-25

I have a couple of dataframe and I want all columns of them to be in uppercase. I did this as follows:

for col in df1.columns:
    df1 = df1.withColumnRenamed(col, col.upper())

for col in df2.columns:
    df2 = df2.withColumnRenamed(col, col.upper())

No I want to do this in an array iteration like this:

list = (df1, df2, df3)
for x in list:
   for col in x.columns:
      x = x.withColumnRenamed(col, col.upper())

But somehow this does not work (but no error displayed), the columns stay in lowercase. I also tried to attach an "return x" at the end but that doesn't work either. Can someone help me?

CodePudding user response:

The changes to your dataframe are not reflecting in the original variables viz. df1, df2, and df3.

You could use the globals() function to achieve this. Code below:

a = ['df1', 'df2', 'df3']
for x in a:
    for col in globals()[x].columns:
        globals()[x] = globals()[x].withColumnRenamed(col, col.upper())

You might have to use either globals() or locals() depending on your use case.

globals() and locals() both help in accessing a variable by a string, and both of them return a dictionary of variables. You can read more about them online.

EDIT : Also, list is a keyword in your code, you should change the variable name to something else.

CodePudding user response:

Okay pri's answer worked for me if I added a global statement to it.

global df1, df2, df3
a = ['df1', 'df2', 'df3']
for x in a:
    for col in globals()[x].columns:
        globals()[x] = globals()[x].withColumnRenamed(col, col.upper())
  • Related