I am trying to accompany the following I have done with a regular df in pandas with a RDD or Spark DF in pyspark. I was trying to solve it with the foreach() function, but I have failed at all attempts. Has somebody got a neat solution for it?
for i in range(len(all_songs)):
if all_songs['loudness'][i] >0:
loudness = all_songs.loc[i, 'loudness']
all_songs['loudness'][i] = loudness * -1
Thank you very much!
CodePudding user response:
I am not sure if pure DataFrame API solution is valid for your case, but I would achieve what you have described with the following code:
from pyspark.sql.functions import when, col
# Assume that `df` is your DataFrame
replaced_df = df.withColumn("loudness", when(col("loudness") > 0, col("loudness") * -1).otherwise(col("loudness")))