Code to create dataframe:
source_df = spark.createDataFrame(
[
("Jose", "BLUE"),
("lI", "BrOwN")
],
["name", "eye_color"]
)
I have written following code to convert the 'eye-color' column to lowercase:
actual_df = source_df
for col_name in actual_df.columns if column == 'eye_color' else column for column in actual_df.columns:
actual_df = actual_df.withColumn(col_name, lower(col(col_name)))
I am getting following error:
Cell In [26], line 2
for col_name in actual_df.columns if column == 'eye_color' else column for column in actual_df.columns:
^
SyntaxError: invalid syntax
CodePudding user response:
This is more a python problem than a spark problem. Your python syntax is not correct.
If you want to keep the same structure, that is make a transformation for each column that matches some criteria, there are multiple ways to do it:
# using a if
for col_name in actual_df.columns:
if col_name == 'eye_color':
actual_df = actual_df.withColumn(col_name, lower(col(col_name)))
# using filter
for col_name in filter(lambda column: column == 'eye_color', actual_df.columns):
actual_df = actual_df.withColumn(col_name, lower(col(col_name)))
# using list comprehension
for col_name in [column for column in actual_df.columns if column == 'eye_color']:
actual_df = actual_df.withColumn(col_name, lower(col(col_name)))
But in your situation, as mentioned in one comment, since you only transform one column I would not use a loop. A single withColumn
would do the trick.
source_df.withColumn('eye_color', lower(col('eye_color')))