Home > Back-end >  Copy one dataframe column value to another using condition in scala
Copy one dataframe column value to another using condition in scala

Time:11-02

DF1:

accountID cost
act1 100
act2 150

DF2:

accountID serviceType region date cost output
act1 s1 us-east-1 2022-07-01 20
act1 s1 us-east-1 2022-07-02 50
act1 s1 us-east-1 2022-07-30 100
act2 s1 us-east-1 2022-07-01 100
act2 s1 us-east-1 2022-07-30 150

excepted output: when accountID is matched between dataframes, output column should be value filled from DF1

accountID serviceType region date cost output
act1 s1 us-east-1 2022-07-01 20 100
act1 s1 us-east-1 2022-07-02 50 100
act1 s1 us-east-1 2022-07-30 100 100
act2 s1 us-east-1 2022-07-01 100 150
act2 s1 us-east-1 2022-07-30 150 150

I tried with withColumn and joins but it is not working.

val outputDF = df2.withColumn("output",when(col("accountID") === partitionedDf("accountID"),partitionedDf("cost")))

related post in python however I want it in scala. assign values of one dataframe column to another dataframe column based on condition

CodePudding user response:

A join is what you are looking for. Its purpose is exactly to merge related data from different tables/datasets.

df1.join(df2.withColumnRenamed("cost", "output"), "accountID")

Since in both df1 and df2 you have a column cost and you want the df2.cost to be renamed into output, it is convenient (but not required) to do it before the join (df2.withColumnRenamed("cost", "output")).

In this case a left join is used (the default, since not specified) and the join condition is the equality of the accountID column.

  • Related