Copy one dataframe column value to another using condition in scala-CodePudding

DF1:

accountID	cost
act1	100
act2	150

DF2:

accountID	serviceType	region	date	cost
act1	s1	us-east-1	2022-07-01	20
act1	s1	us-east-1	2022-07-02	50
act1	s1	us-east-1	2022-07-30	100
act2	s1	us-east-1	2022-07-01	100
act2	s1	us-east-1	2022-07-30	150

excepted output: when accountID is matched between dataframes, output column should be value filled from DF1

accountID	serviceType	region	date	cost	output
act1	s1	us-east-1	2022-07-01	20	100
act1	s1	us-east-1	2022-07-02	50	100
act1	s1	us-east-1	2022-07-30	100	100
act2	s1	us-east-1	2022-07-01	100	150
act2	s1	us-east-1	2022-07-30	150	150

I tried with withColumn and joins but it is not working.

val outputDF = df2.withColumn("output",when(col("accountID") === partitionedDf("accountID"),partitionedDf("cost")))

related post in python however I want it in scala. assign values of one dataframe column to another dataframe column based on condition

CodePudding user response：

A join is what you are looking for. Its purpose is exactly to merge related data from different tables/datasets.

df1.join(df2.withColumnRenamed("cost", "output"), "accountID")

Since in both df1 and df2 you have a column cost and you want the df2.cost to be renamed into output, it is convenient (but not required) to do it before the join (df2.withColumnRenamed("cost", "output")).

In this case a left join is used (the default, since not specified) and the join condition is the equality of the accountID column.