I have two tables in my database. I need to perform left outer join on these two tables with condition table1.id = table2.id also, source should match
Below are my two source tables.
Table 1 :
`source id type `
eu2 10000162 N4
sus 10000162 M1
pda 10000162 XM
Table 2 :
`source id code1 code2`
eu2 10000162 CDNG_GRP PROB_CD
sus 10000162 AANV NW
pda 10000162 PM2 VLPD
Expected output :
source id type concat
eu2 10000162 N4 CDNG_GRP-PROB_CD
sus 10000162 M1 AANV-NW
pda 10000162 XM PM2-VLPD
I want this result in Dataframe.
Thanks in advance !
CodePudding user response:
Spark always returns a dataframe (until specified not to do so)
Try this: Considering your tables are already spark dataframe
left_join = table1.join(table2, table1.id==table1.id, "leftouter")
left_join.show()
CodePudding user response:
To get the desired result, you need to perform join on the source
and id
columns.
import pyspark.sql.functions as F
...
df = df1.join(df2, on=['id', 'source'], how='left') \
.withColumn('concat', F.concat('code1', F.lit('-'), 'code2')) \
.select(['source', 'id', 'type', 'concat'])
df.show(truncate=False)