Home > Enterprise >  Pyspark Databricks - Need to join two dataframes
Pyspark Databricks - Need to join two dataframes

Time:10-04

I need to join two dataframes as follows:

Dataframe1:

date name
2021-01-10 David

Dataframe2:

column1 column2 column3
1 xxx Hello
2 dd Test
3 eee Block
4 dd Support

Dataframe3 = Dataframe1 Dataframe2

date name column1 column2 column3
2021-01-10 David 1 xxx Hello
2021-01-10 David 2 dd Test
2021-01-10 David 3 eee Block
2021-01-10 David 4 dd Support

I tried to do some merges (left & outer) and also the concat but without success. Could you help me to be able to implement this solution?

CodePudding user response:

On Databricks:

d1 = spark.createDataFrame(dataframe1)
d2 = spark.createDataFrame(dataframe2)

d1.createOrReplaceTempView("vw_d1")
d2.createOrReplaceTempView("vw_d2")


d3 = spark.sql('''select * from vw_d1 CROSS JOIN vw_d2''')

CodePudding user response:

A cross join should do it: pyspark.sql.DataFrame.crossJoin

Dataframe3 = Dataframe1.crossJoin(Dataframe2)

Result:

Dataframe3.collect()
[Row(date=datetime.date(2021, 1, 10), name='David', column1=1, column2='xxx', column3='Hello'),
 Row(date=datetime.date(2021, 1, 10), name='David', column1=2, column2='dd', column3='Test'),
 Row(date=datetime.date(2021, 1, 10), name='David', column1=3, column2='eee', column3='Block'),
 Row(date=datetime.date(2021, 1, 10), name='David', column1=4, column2='dd', column3='Support')]
  • Related