I need to join two dataframes as follows:
Dataframe1:
date | name |
---|---|
2021-01-10 | David |
Dataframe2:
column1 | column2 | column3 |
---|---|---|
1 | xxx | Hello |
2 | dd | Test |
3 | eee | Block |
4 | dd | Support |
Dataframe3 = Dataframe1 Dataframe2
date | name | column1 | column2 | column3 |
---|---|---|---|---|
2021-01-10 | David | 1 | xxx | Hello |
2021-01-10 | David | 2 | dd | Test |
2021-01-10 | David | 3 | eee | Block |
2021-01-10 | David | 4 | dd | Support |
I tried to do some merges (left & outer) and also the concat but without success. Could you help me to be able to implement this solution?
CodePudding user response:
On Databricks:
d1 = spark.createDataFrame(dataframe1)
d2 = spark.createDataFrame(dataframe2)
d1.createOrReplaceTempView("vw_d1")
d2.createOrReplaceTempView("vw_d2")
d3 = spark.sql('''select * from vw_d1 CROSS JOIN vw_d2''')
CodePudding user response:
A cross join should do it: pyspark.sql.DataFrame.crossJoin
Dataframe3 = Dataframe1.crossJoin(Dataframe2)
Result:
Dataframe3.collect()
[Row(date=datetime.date(2021, 1, 10), name='David', column1=1, column2='xxx', column3='Hello'),
Row(date=datetime.date(2021, 1, 10), name='David', column1=2, column2='dd', column3='Test'),
Row(date=datetime.date(2021, 1, 10), name='David', column1=3, column2='eee', column3='Block'),
Row(date=datetime.date(2021, 1, 10), name='David', column1=4, column2='dd', column3='Support')]